Dan Andreescu
2015-01-23 17:31:06 UTC
Wikimetrics has been having serious connectivity problems for a few days.
It turned out to be solvable by using some new hostnames (
labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports
and let me know if anything is still wrong.
It turned out to be solvable by using some new hostnames (
labsdb1002.eqiad.wmnet). I fixed it just now, please retry your reports
and let me know if anything is still wrong.
Hi everyone. I will work on this as soon as I get into the office, in
about an hour from now. Yuvi suggested one thing that I wasn't aware of
that might make this a simple fix.
about an hour from now. Yuvi suggested one thing that I wasn't aware of
that might make this a simple fix.
Hi Kevin,
Sorry to be a pest but do you have any update on sorting out the
Wikimetrics issues? It seems to have gotten worse since we last spoke to
you with around 1 in 10 reports going through.
Thanks,
Dan
Sorry to be a pest but do you have any update on sorting out the
Wikimetrics issues? It seems to have gotten worse since we last spoke to
you with around 1 in 10 reports going through.
Thanks,
Dan
All the developers are in transit to SF today. Dan said he'd be in the
office this afternoon. First dev I see I'll notify them of problems in
wikimetrics.
On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker <
office this afternoon. First dev I see I'll notify them of problems in
wikimetrics.
On Tue, Jan 20, 2015 at 11:10 AM, Amanda Bittaker <
Hello again gentlemen,
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
things. We'll be
recurrent =
anymore.
don't need
total the
report
recurrent = 1;
the
re-synchronizing, we
in case
this in
beyond
system. I'll
cache,
having with
high rate of
nothing has
3 were
three "pended"
same problem
Active
example of one of
user names and
field. I do
problem in the
you
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
I think Dan might have already pinged you, but just in case, I wanted
to let you know that we are getting these failures again. It's kind
of crunch time for getting this data, so we're just banging our heads
against the wall and retrying the reports until they work (1 out of 4
times for me.) Is there any way you all could work your magic again?
Many thanks once again,
Amanda
It's good to hear it's working again. Don't hesitate to reach out to
ustrouble again.
On Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker <
andOn Wed, Dec 10, 2014 at 3:37 PM, Amanda Bittaker <
It's working perfectly now--a thousand thank yous, Dan and Marcel.
On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez <
On Wed, Dec 10, 2014 at 3:24 PM, Edward Galvez <
Thanks so much Dan and Marcel!
-E
On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu <
-E
On Wed, Dec 10, 2014 at 3:08 PM, Dan Andreescu <
forgot Marcel - my fault. Jaime & folks, in general Marcel rules
he's probably going to help you out faster / better than I can.
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
On Wed, Dec 10, 2014 at 5:57 PM, Dan Andreescu
Ok, Amanda and anyone else who had problems. Please try again. I
think I've cleared up some gunk and that might have helped
think I've cleared up some gunk and that might have helped
looking at performance more closely soon.
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and
Steps taken, logging mostly for post-mortem purpose
* delete from report where recurrent_parent_id is null and
0 and created < date('2014-12-01');
** This deleted records that are not visible in the system
** This deleted records that are not visible in the system
They are recoverable from the wikimetrics database backups but we
them in the database. These probably slowed some things down, in
statement deleted 1623628 rows.
* alter table report add column old_recurrent tinyint(1); update
* alter table report add column old_recurrent tinyint(1); update
set recurrent = 0, old_recurrent = 1 where user_id = 461 and
** This disables WikimetricsBot recurrent reports, but preserves
data so we can deal with them later. When labs is done
will be re-running these reports. They feed data to Vital Signs,
someone's curious about what they are.
* Stopped and rebooted the system. The backup system seems to be
hanging or taking a really long time. I'd like to take a look at
* Stopped and rebooted the system. The backup system seems to be
hanging or taking a really long time. I'd like to take a look at
more depth, but my guess is the amount it's transferring has gone
what we expected.
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
On Wed, Dec 10, 2014 at 5:23 PM, Dan Andreescu
We're sorry - the problems we were facing last week have probably
festered. I'm going to turn off some things and reset the
festered. I'm going to turn off some things and reset the
report back.
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
On Wed, Dec 10, 2014 at 4:47 PM, Amanda Bittaker
Oh yes, and Jaime did have me restart my browser and clear the
but it did not help.
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Thanks again,
Amanda
On Wed, Dec 10, 2014 at 1:45 PM, Amanda Bittaker
Hello Kevin,
Jaime asked me to email you about some trouble I've been
Jaime asked me to email you about some trouble I've been
Wikimetrics. The whole team has been experiencing a pretty
failures in both report creation and cohort uploads. Almost
gotten through for me today: of the last 13 reports I've run,
successful. Of the failures, I would say maybe only two or
at all before becoming failures. I've been experiencing the
with cohort uploads.
The reports have been: Newly Registered, Edits, and Rolling
The reports have been: Newly Registered, Edits, and Rolling
Editor using expanded cohorts. Please find attached an
the reports. I tried uploading cohorts using text files of
pasting user names from Notepad into the "Paste Usernames"
expand the cohorts every time.
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this
Do you know why the failure rate is so high, especially this
morning, and is there a way to eliminate or mitigate this
future?
Many thanks for the assistance, and please do let me know if
Many thanks for the assistance, and please do let me know if
need any more information from me on this.
Best,
Amanda
Best,
Amanda
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation