Discussion:
[Wikimetrics] Wikimetrics timeouts
Dan Andreescu
2015-01-23 17:31:06 UTC
Permalink
Wikimetrics has been having serious connectivity problems for a few days.
It turned out to be solvable by using some new hostnames (
labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports
and let me know if anything is still wrong.
Hi everyone. I will work on this as soon as I get into the office, in
about an hour from now. Yuvi suggested one thing that I wasn't aware of
that might make this a simple fix.
Hi Kevin,
Sorry to be a pest but do you have any update on sorting out the
Wikimetrics issues? It seems to have gotten worse since we last spoke to
you with around 1 in 10 reports going through.
Thanks,
Dan
All the developers are in transit to SF today. Dan said he'd be in the
office this afternoon. First dev I see I'll notify them of problems in
wikimetrics.
On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker <
Hello again gentlemen,
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
It's good to hear it's working again. Don't hesitate to reach out to
us
trouble again.
On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker <
It's working perfectly now--a thousand thank yous, Dan and Marcel.
On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez <
Thanks so much Dan and Marcel!
-E
On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu <
forgot Marcel - my fault. Jaime & folks, in general Marcel rules
and
he's probably going to help you out faster / better than I can.
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
Ok, Amanda and anyone else who had problems. Please try again. I
think I've cleared up some gunk and that might have helped
things. We'll be
looking at performance more closely soon.
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and
recurrent =
0 and created < date('2014-12-01');
** This deleted records that are not visible in the system
anymore.
They are recoverable from the wikimetrics database backups but we
don't need
them in the database. These probably slowed some things down, in
total the
statement deleted 1623628 rows.
* alter table report add column old_recurrent tinyint(1); update
report
set recurrent = 0, old_recurrent = 1 where user_id = 461 and
recurrent = 1;
** This disables WikimetricsBot recurrent reports, but preserves
the
data so we can deal with them later. When labs is done
re-synchronizing, we
will be re-running these reports. They feed data to Vital Signs,
in case
someone's curious about what they are.
* Stopped and rebooted the system. The backup system seems to be
hanging or taking a really long time. I'd like to take a look at
this in
more depth, but my guess is the amount it's transferring has gone
beyond
what we expected.
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
We're sorry - the problems we were facing last week have probably
festered. I'm going to turn off some things and reset the
system. I'll
report back.
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
Oh yes, and Jaime did have me restart my browser and clear the
cache,
but it did not help.
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Hello Kevin,
Jaime asked me to email you about some trouble I've been
having with
Wikimetrics. The whole team has been experiencing a pretty
high rate of
failures in both report creation and cohort uploads. Almost
nothing has
gotten through for me today: of the last 13 reports I've run,
3 were
successful. Of the failures, I would say maybe only two or
three "pended"
at all before becoming failures. I've been experiencing the
same problem
with cohort uploads.
The reports have been: Newly Registered, Edits, and Rolling
Active
Editor using expanded cohorts. Please find attached an
example of one of
the reports. I tried uploading cohorts using text files of
user names and
pasting user names from Notepad into the "Paste Usernames"
field. I do
expand the cohorts every time.
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this
problem in the
future?
Many thanks for the assistance, and please do let me know if
you
need any more information from me on this.
Best,
Amanda
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
Edward Galvez
2015-01-23 17:44:44 UTC
Permalink
Thank you so much!!! We really appreciate it!

-Edward
Post by Dan Andreescu
Wikimetrics has been having serious connectivity problems for a few days.
It turned out to be solvable by using some new hostnames (
labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports
and let me know if anything is still wrong.
Hi everyone. I will work on this as soon as I get into the office, in
about an hour from now. Yuvi suggested one thing that I wasn't aware of
that might make this a simple fix.
Hi Kevin,
Sorry to be a pest but do you have any update on sorting out the
Wikimetrics issues? It seems to have gotten worse since we last spoke to
you with around 1 in 10 reports going through.
Thanks,
Dan
All the developers are in transit to SF today. Dan said he'd be in the
office this afternoon. First dev I see I'll notify them of problems in
wikimetrics.
On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker <
Hello again gentlemen,
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
It's good to hear it's working again. Don't hesitate to reach out
to us
of
trouble again.
On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker <
It's working perfectly now--a thousand thank yous, Dan and Marcel.
On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez <
Thanks so much Dan and Marcel!
-E
On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu <
forgot Marcel - my fault. Jaime & folks, in general Marcel rules
and
he's probably going to help you out faster / better than I can.
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
Ok, Amanda and anyone else who had problems. Please try again.
I
think I've cleared up some gunk and that might have helped
things. We'll be
looking at performance more closely soon.
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and
recurrent =
0 and created < date('2014-12-01');
** This deleted records that are not visible in the system
anymore.
They are recoverable from the wikimetrics database backups but
we don't need
them in the database. These probably slowed some things down,
in total the
statement deleted 1623628 rows.
* alter table report add column old_recurrent tinyint(1); update
report
set recurrent = 0, old_recurrent = 1 where user_id = 461 and
recurrent = 1;
** This disables WikimetricsBot recurrent reports, but preserves
the
data so we can deal with them later. When labs is done
re-synchronizing, we
will be re-running these reports. They feed data to Vital
Signs, in case
someone's curious about what they are.
* Stopped and rebooted the system. The backup system seems to be
hanging or taking a really long time. I'd like to take a look
at this in
more depth, but my guess is the amount it's transferring has
gone beyond
what we expected.
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
We're sorry - the problems we were facing last week have
probably
festered. I'm going to turn off some things and reset the
system. I'll
report back.
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
Oh yes, and Jaime did have me restart my browser and clear the
cache,
but it did not help.
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Hello Kevin,
Jaime asked me to email you about some trouble I've been
having with
Wikimetrics. The whole team has been experiencing a pretty
high rate of
failures in both report creation and cohort uploads. Almost
nothing has
gotten through for me today: of the last 13 reports I've
run, 3 were
successful. Of the failures, I would say maybe only two or
three "pended"
at all before becoming failures. I've been experiencing the
same problem
with cohort uploads.
The reports have been: Newly Registered, Edits, and Rolling
Active
Editor using expanded cohorts. Please find attached an
example of one of
the reports. I tried uploading cohorts using text files of
user names and
pasting user names from Notepad into the "Paste Usernames"
field. I do
expand the cohorts every time.
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this
problem in the
future?
Many thanks for the assistance, and please do let me know if
you
need any more information from me on this.
Best,
Amanda
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
Anna Koval
2015-01-23 18:39:15 UTC
Permalink
+1 :)

Thanks, Dan, et al.

Worked great for me this morning. I'm a happy camper.

Anna :)
Post by Edward Galvez
Thank you so much!!! We really appreciate it!
-Edward
Post by Dan Andreescu
Wikimetrics has been having serious connectivity problems for a few
days. It turned out to be solvable by using some new hostnames (
labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports
and let me know if anything is still wrong.
Hi everyone. I will work on this as soon as I get into the office, in
about an hour from now. Yuvi suggested one thing that I wasn't aware of
that might make this a simple fix.
Hi Kevin,
Sorry to be a pest but do you have any update on sorting out the
Wikimetrics issues? It seems to have gotten worse since we last spoke to
you with around 1 in 10 reports going through.
Thanks,
Dan
All the developers are in transit to SF today. Dan said he'd be in
the office this afternoon. First dev I see I'll notify them of problems in
wikimetrics.
On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker <
Hello again gentlemen,
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
It's good to hear it's working again. Don't hesitate to reach out
to us
of
trouble again.
On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker <
It's working perfectly now--a thousand thank yous, Dan and Marcel.
On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez <
Thanks so much Dan and Marcel!
-E
On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu <
forgot Marcel - my fault. Jaime & folks, in general Marcel
rules and
he's probably going to help you out faster / better than I can.
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
Ok, Amanda and anyone else who had problems. Please try
again. I
think I've cleared up some gunk and that might have helped
things. We'll be
looking at performance more closely soon.
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and
recurrent =
0 and created < date('2014-12-01');
** This deleted records that are not visible in the system
anymore.
They are recoverable from the wikimetrics database backups but
we don't need
them in the database. These probably slowed some things down,
in total the
statement deleted 1623628 rows.
* alter table report add column old_recurrent tinyint(1);
update report
set recurrent = 0, old_recurrent = 1 where user_id = 461 and
recurrent = 1;
** This disables WikimetricsBot recurrent reports, but
preserves the
data so we can deal with them later. When labs is done
re-synchronizing, we
will be re-running these reports. They feed data to Vital
Signs, in case
someone's curious about what they are.
* Stopped and rebooted the system. The backup system seems to
be
hanging or taking a really long time. I'd like to take a look
at this in
more depth, but my guess is the amount it's transferring has
gone beyond
what we expected.
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
We're sorry - the problems we were facing last week have
probably
festered. I'm going to turn off some things and reset the
system. I'll
report back.
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
Oh yes, and Jaime did have me restart my browser and clear
the cache,
but it did not help.
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Hello Kevin,
Jaime asked me to email you about some trouble I've been
having with
Wikimetrics. The whole team has been experiencing a pretty
high rate of
failures in both report creation and cohort uploads. Almost
nothing has
gotten through for me today: of the last 13 reports I've
run, 3 were
successful. Of the failures, I would say maybe only two or
three "pended"
at all before becoming failures. I've been experiencing the
same problem
with cohort uploads.
The reports have been: Newly Registered, Edits, and Rolling
Active
Editor using expanded cohorts. Please find attached an
example of one of
the reports. I tried uploading cohorts using text files of
user names and
pasting user names from Notepad into the "Paste Usernames"
field. I do
expand the cohorts every time.
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this
problem in the
future?
Many thanks for the assistance, and please do let me know if
you
need any more information from me on this.
Best,
Amanda
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
--
Sent from Gmail Mobile
Amanda Bittaker
2015-01-23 18:48:10 UTC
Permalink
Everything is working great--thanks so much for the early-morning
save, Mr. Andreescu! We really appreciate it!
Post by Anna Koval
+1 :)
Thanks, Dan, et al.
Worked great for me this morning. I'm a happy camper.
Anna :)
Post by Edward Galvez
Thank you so much!!! We really appreciate it!
-Edward
Post by Dan Andreescu
Wikimetrics has been having serious connectivity problems for a few days.
It turned out to be solvable by using some new hostnames
(labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports
and let me know if anything is still wrong.
On Fri, Jan 23, 2015 at 10:46 AM, Dan Andreescu
Hi everyone. I will work on this as soon as I get into the office, in
about an hour from now. Yuvi suggested one thing that I wasn't aware of
that might make this a simple fix.
Hi Kevin,
Sorry to be a pest but do you have any update on sorting out the
Wikimetrics issues? It seems to have gotten worse since we last spoke to you
with around 1 in 10 reports going through.
Thanks,
Dan
All the developers are in transit to SF today. Dan said he'd be in
the office this afternoon. First dev I see I'll notify them of problems in
wikimetrics.
On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker
Hello again gentlemen,
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
It's good to hear it's working again. Don't hesitate to reach out
to us
trouble again.
On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker
It's working perfectly now--a thousand thank yous, Dan and Marcel.
On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez
Thanks so much Dan and Marcel!
-E
On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu
forgot Marcel - my fault. Jaime & folks, in general Marcel
rules and
he's probably going to help you out faster / better than I can.
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
Ok, Amanda and anyone else who had problems. Please try again.
I
think I've cleared up some gunk and that might have helped
things. We'll be
looking at performance more closely soon.
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and
recurrent =
0 and created < date('2014-12-01');
** This deleted records that are not visible in the system
anymore.
They are recoverable from the wikimetrics database backups but
we don't need
them in the database. These probably slowed some things down,
in total the
statement deleted 1623628 rows.
* alter table report add column old_recurrent tinyint(1);
update report
set recurrent = 0, old_recurrent = 1 where user_id = 461 and
recurrent = 1;
** This disables WikimetricsBot recurrent reports, but
preserves the
data so we can deal with them later. When labs is done
re-synchronizing, we
will be re-running these reports. They feed data to Vital
Signs, in case
someone's curious about what they are.
* Stopped and rebooted the system. The backup system seems to be
hanging or taking a really long time. I'd like to take a look
at this in
more depth, but my guess is the amount it's transferring has
gone beyond
what we expected.
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
We're sorry - the problems we were facing last week have probably
festered. I'm going to turn off some things and reset the
system. I'll
report back.
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
Oh yes, and Jaime did have me restart my browser and clear
the cache,
but it did not help.
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Hello Kevin,
Jaime asked me to email you about some trouble I've been
having with
Wikimetrics. The whole team has been experiencing a pretty
high rate of
failures in both report creation and cohort uploads. Almost
nothing has
gotten through for me today: of the last 13 reports I've
run, 3 were
successful. Of the failures, I would say maybe only two or
three "pended"
at all before becoming failures. I've been experiencing the
same problem
with cohort uploads.
The reports have been: Newly Registered, Edits, and Rolling
Active
Editor using expanded cohorts. Please find attached an
example of one of
the reports. I tried uploading cohorts using text files of
user names and
pasting user names from Notepad into the "Paste Usernames"
field. I do
expand the cohorts every time.
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this
problem in the
future?
Many thanks for the assistance, and please do let me know if
you
need any more information from me on this.
Best,
Amanda
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
--
Sent from Gmail Mobile
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
charles andrès (WMCH)
2015-01-24 10:33:16 UTC
Permalink
Hello all,

From my side I cannot even connect, I receive the following message, when trying to log with WMCH gmail account.

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Thanks for your help


Charles
Wikimetrics has been having serious connectivity problems for a few days. It turned out to be solvable by using some new hostnames (labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports and let me know if anything is still wrong.
Hi everyone. I will work on this as soon as I get into the office, in about an hour from now. Yuvi suggested one thing that I wasn't aware of that might make this a simple fix.
Hi Kevin,
Sorry to be a pest but do you have any update on sorting out the Wikimetrics issues? It seems to have gotten worse since we last spoke to you with around 1 in 10 reports going through.
Thanks,
Dan
All the developers are in transit to SF today. Dan said he'd be in the office this afternoon. First dev I see I'll notify them of problems in wikimetrics.
Hello again gentlemen,
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
It's good to hear it's working again. Don't hesitate to reach out to us
trouble again.
It's working perfectly now--a thousand thank yous, Dan and Marcel.
Thanks so much Dan and Marcel!
-E
forgot Marcel - my fault. Jaime & folks, in general Marcel rules and
he's probably going to help you out faster / better than I can.
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
Ok, Amanda and anyone else who had problems. Please try again. I
think I've cleared up some gunk and that might have helped things. We'll be
looking at performance more closely soon.
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and recurrent =
0 and created < date('2014-12-01');
** This deleted records that are not visible in the system anymore.
They are recoverable from the wikimetrics database backups but we don't need
them in the database. These probably slowed some things down, in total the
statement deleted 1623628 rows.
* alter table report add column old_recurrent tinyint(1); update report
set recurrent = 0, old_recurrent = 1 where user_id = 461 and recurrent = 1;
** This disables WikimetricsBot recurrent reports, but preserves the
data so we can deal with them later. When labs is done re-synchronizing, we
will be re-running these reports. They feed data to Vital Signs, in case
someone's curious about what they are.
* Stopped and rebooted the system. The backup system seems to be
hanging or taking a really long time. I'd like to take a look at this in
more depth, but my guess is the amount it's transferring has gone beyond
what we expected.
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
We're sorry - the problems we were facing last week have probably
festered. I'm going to turn off some things and reset the system. I'll
report back.
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
Oh yes, and Jaime did have me restart my browser and clear the cache,
but it did not help.
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Hello Kevin,
Jaime asked me to email you about some trouble I've been having with
Wikimetrics. The whole team has been experiencing a pretty high rate of
failures in both report creation and cohort uploads. Almost nothing has
gotten through for me today: of the last 13 reports I've run, 3 were
successful. Of the failures, I would say maybe only two or three "pended"
at all before becoming failures. I've been experiencing the same problem
with cohort uploads.
The reports have been: Newly Registered, Edits, and Rolling Active
Editor using expanded cohorts. Please find attached an example of one of
the reports. I tried uploading cohorts using text files of user names and
pasting user names from Notepad into the "Paste Usernames" field. I do
expand the cohorts every time.
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this problem in the
future?
Many thanks for the assistance, and please do let me know if you
need any more information from me on this.
Best,
Amanda
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Dan Andreescu
2015-01-24 15:54:22 UTC
Permalink
Charles, it looks like Google authentication is broken, I was able to
reproduce the error. I will look at it on Monday but in the meantime you
can use Mediawiki authentication. It will create a separate user for you
and you would have to re-create your cohorts, but it would let you work if
you have urgent reports.

I would appreciate it if you filed a bug in Phabricator about the Google
auth problem.

On Saturday, January 24, 2015, "charles andrÚs (WMCH)" <
Post by charles andrès (WMCH)
Hello all,
From my side I cannot even connect, I receive the following message, when
trying to log with WMCH gmail account.
Internal Server Error
The server encountered an internal error and was unable to complete your
request. Either the server is overloaded or there is an error in the
application.
Thanks for your help
Charles
Wikimetrics has been having serious connectivity problems for a few days.
It turned out to be solvable by using some new hostnames (
labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports
and let me know if anything is still wrong.
Hi everyone. I will work on this as soon as I get into the office, in
about an hour from now. Yuvi suggested one thing that I wasn't aware of
that might make this a simple fix.
Hi Kevin,
Sorry to be a pest but do you have any update on sorting out the
Wikimetrics issues? It seems to have gotten worse since we last spoke to
you with around 1 in 10 reports going through.
Thanks,
Dan
All the developers are in transit to SF today. Dan said he'd be in the
office this afternoon. First dev I see I'll notify them of problems in
wikimetrics.
On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker <
Hello again gentlemen,
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
It's good to hear it's working again. Don't hesitate to reach out
to us
of
trouble again.
On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker <
It's working perfectly now--a thousand thank yous, Dan and Marcel.
On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez <
Thanks so much Dan and Marcel!
-E
On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu <
forgot Marcel - my fault. Jaime & folks, in general Marcel rules
and
he's probably going to help you out faster / better than I can.
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
Ok, Amanda and anyone else who had problems. Please try again.
I
think I've cleared up some gunk and that might have helped
things. We'll be
looking at performance more closely soon.
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and
recurrent =
0 and created < date('2014-12-01');
** This deleted records that are not visible in the system
anymore.
They are recoverable from the wikimetrics database backups but
we don't need
them in the database. These probably slowed some things down,
in total the
statement deleted 1623628 rows.
* alter table report add column old_recurrent tinyint(1); update
report
set recurrent = 0, old_recurrent = 1 where user_id = 461 and
recurrent = 1;
** This disables WikimetricsBot recurrent reports, but preserves
the
data so we can deal with them later. When labs is done
re-synchronizing, we
will be re-running these reports. They feed data to Vital
Signs, in case
someone's curious about what they are.
* Stopped and rebooted the system. The backup system seems to be
hanging or taking a really long time. I'd like to take a look
at this in
more depth, but my guess is the amount it's transferring has
gone beyond
what we expected.
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
We're sorry - the problems we were facing last week have
probably
festered. I'm going to turn off some things and reset the
system. I'll
report back.
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
Oh yes, and Jaime did have me restart my browser and clear the
cache,
but it did not help.
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Hello Kevin,
Jaime asked me to email you about some trouble I've been
having with
Wikimetrics. The whole team has been experiencing a pretty
high rate of
failures in both report creation and cohort uploads. Almost
nothing has
gotten through for me today: of the last 13 reports I've
run, 3 were
successful. Of the failures, I would say maybe only two or
three "pended"
at all before becoming failures. I've been experiencing the
same problem
with cohort uploads.
The reports have been: Newly Registered, Edits, and Rolling
Active
Editor using expanded cohorts. Please find attached an
example of one of
the reports. I tried uploading cohorts using text files of
user names and
pasting user names from Notepad into the "Paste Usernames"
field. I do
expand the cohorts every time.
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this
problem in the
future?
Many thanks for the assistance, and please do let me know if
you
need any more information from me on this.
Best,
Amanda
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Charles Andrès
2015-01-24 17:48:47 UTC
Permalink
Thanks Dan,

No worries, we have all of our cohortes saved in a google sheet :-D

I will fill a bug report.

charles
Charles, it looks like Google authentication is broken, I was able to reproduce the error. I will look at it on Monday but in the meantime you can use Mediawiki authentication. It will create a separate user for you and you would have to re-create your cohorts, but it would let you work if you have urgent reports.
I would appreciate it if you filed a bug in Phabricator about the Google auth problem.
Hello all,
From my side I cannot even connect, I receive the following message, when trying to log with WMCH gmail account.
Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Thanks for your help
Charles
Wikimetrics has been having serious connectivity problems for a few days. It turned out to be solvable by using some new hostnames (labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports and let me know if anything is still wrong.
Hi everyone. I will work on this as soon as I get into the office, in about an hour from now. Yuvi suggested one thing that I wasn't aware of that might make this a simple fix.
Hi Kevin,
Sorry to be a pest but do you have any update on sorting out the Wikimetrics issues? It seems to have gotten worse since we last spoke to you with around 1 in 10 reports going through.
Thanks,
Dan
All the developers are in transit to SF today. Dan said he'd be in the office this afternoon. First dev I see I'll notify them of problems in wikimetrics.
Hello again gentlemen,
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
It's good to hear it's working again. Don't hesitate to reach out to us
trouble again.
It's working perfectly now--a thousand thank yous, Dan and Marcel.
Thanks so much Dan and Marcel!
-E
forgot Marcel - my fault. Jaime & folks, in general Marcel rules and
he's probably going to help you out faster / better than I can.
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
Ok, Amanda and anyone else who had problems. Please try again. I
think I've cleared up some gunk and that might have helped things. We'll be
looking at performance more closely soon.
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and recurrent =
0 and created < date('2014-12-01');
** This deleted records that are not visible in the system anymore.
They are recoverable from the wikimetrics database backups but we don't need
them in the database. These probably slowed some things down, in total the
statement deleted 1623628 rows.
* alter table report add column old_recurrent tinyint(1); update report
set recurrent = 0, old_recurrent = 1 where user_id = 461 and recurrent = 1;
** This disables WikimetricsBot recurrent reports, but preserves the
data so we can deal with them later. When labs is done re-synchronizing, we
will be re-running these reports. They feed data to Vital Signs, in case
someone's curious about what they are.
* Stopped and rebooted the system. The backup system seems to be
hanging or taking a really long time. I'd like to take a look at this in
more depth, but my guess is the amount it's transferring has gone beyond
what we expected.
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
We're sorry - the problems we were facing last week have probably
festered. I'm going to turn off some things and reset the system. I'll
report back.
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
Oh yes, and Jaime did have me restart my browser and clear the cache,
but it did not help.
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Hello Kevin,
Jaime asked me to email you about some trouble I've been having with
Wikimetrics. The whole team has been experiencing a pretty high rate of
failures in both report creation and cohort uploads. Almost nothing has
gotten through for me today: of the last 13 reports I've run, 3 were
successful. Of the failures, I would say maybe only two or three "pended"
at all before becoming failures. I've been experiencing the same problem
with cohort uploads.
The reports have been: Newly Registered, Edits, and Rolling Active
Editor using expanded cohorts. Please find attached an example of one of
the reports. I tried uploading cohorts using text files of user names and
pasting user names from Notepad into the "Paste Usernames" field. I do
expand the cohorts every time.
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this problem in the
future?
Many thanks for the assistance, and please do let me know if you
need any more information from me on this.
Best,
Amanda
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics <https://lists.wikimedia.org/mailman/listinfo/wikimetrics>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Loading...