Discussion:
[Wikimetrics] cohort upload and report failures
Amanda Bittaker
2015-02-12 20:16:10 UTC
Permalink
Hello all,

I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.

Is anyone else having these problems again? Wonderful Analytics people,
could you please work your magic again?

Many thanks,
Amanda
Jonathan Morgan
2015-02-12 20:20:06 UTC
Permalink
(ping Kevin and Dan A.)

Hi Amanda, I've had some problems with report failures recently when I ran
a few test cohorts. On the same cohort, when I ran multiple concurrent
reports (say, bytes added, edits, and pages created), some would fail and
others succeed. It wasn't clear what the issue was.

- J
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics people,
could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
***@wikimedia.org
Dan Andreescu
2015-02-12 20:35:41 UTC
Permalink
Recently there was a restart of the labsdb cluster. I'm sorry but I don't
have time to check on it, but I bet that's the problem. I'm off tomorrow
unfortunately but I'll try to check tomorrow night :( I hope someone else
beats me to it.
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently when I ran
a few test cohorts. On the same cohort, when I ran multiple concurrent
reports (say, bytes added, edits, and pages created), some would fail and
others succeed. It wasn't clear what the issue was.
- J
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics people,
could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Nuria Ruiz
2015-02-12 20:57:19 UTC
Permalink
Amanda:

I have restarted wikimetrics, please try again and see if you find issues.

If so a cohort + report to repro will be most useful.

Thanks,

Nuria
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but I don't
have time to check on it, but I bet that's the problem. I'm off tomorrow
unfortunately but I'll try to check tomorrow night :( I hope someone else
beats me to it.
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently when I
ran a few test cohorts. On the same cohort, when I ran multiple concurrent
reports (say, bytes added, edits, and pages created), some would fail and
others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics people,
could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Jonathan Morgan
2015-02-12 21:09:48 UTC
Permalink
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort again,
to see if the same metrics fail. Let us know what you find. ;)

Same goes for anyone else who experiences these issues: the more details we
(users) can provide the engineers, the more effective they can be at
diagnosing and addressing the problems.

Cheers,
- J

*for anyone who is not 100% familiar with that hip, new software
engineering lingo



Thanks,
Post by Nuria Ruiz
Nuria
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but I
don't have time to check on it, but I bet that's the problem. I'm off
tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently when I
ran a few test cohorts. On the same cohort, when I ran multiple concurrent
reports (say, bytes added, edits, and pages created), some would fail and
others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics
people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
***@wikimedia.org
Amanda Bittaker
2015-02-12 21:32:50 UTC
Permalink
Thanks so much for the quick response, Nuria.

I ran the exact same reports on the same cohort as one of the last batches
that were failing. Last time 2/4 of the reports failed, when I reran the
individually they succeeded. (But they don't always, I reran one report 3
times this morning before it worked.) This time, my failure rate got
worse: 4/4 failed, although they said "PENDING" for a few seconds first,
which is new.

Is that useful information? Please do let me know what else I can do to
help solve this.

Thanks again,
Amanda
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort again,
to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more details
we (users) can provide the engineers, the more effective they can be at
diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but I
don't have time to check on it, but I bet that's the problem. I'm off
tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently when I
ran a few test cohorts. On the same cohort, when I ran multiple concurrent
reports (say, bytes added, edits, and pages created), some would fail and
others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics
people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Nuria Ruiz
2015-02-12 21:36:09 UTC
Permalink
DB connections in labs look to be failing, unfortunately I think besides
asking for help on the labs list there is not much we can do there. I will
start a thread on this regard.

Thanks,

Nuria
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last batches
that were failing. Last time 2/4 of the reports failed, when I reran the
individually they succeeded. (But they don't always, I reran one report 3
times this morning before it worked.) This time, my failure rate got
worse: 4/4 failed, although they said "PENDING" for a few seconds first,
which is new.
Is that useful information? Please do let me know what else I can do to
help solve this.
Thanks again,
Amanda
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort again,
to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more details
we (users) can provide the engineers, the more effective they can be at
diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but I
don't have time to check on it, but I bet that's the problem. I'm off
tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently when I
ran a few test cohorts. On the same cohort, when I ran multiple concurrent
reports (say, bytes added, edits, and pages created), some would fail and
others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics
people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Amanda Bittaker
2015-02-12 21:42:11 UTC
Permalink
Alright, thanks so much for your help once again, Nuria.

If there's anything I can do or any information I can contribute, please
don't hesitate to ping me.

Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think besides
asking for help on the labs list there is not much we can do there. I will
start a thread on this regard.
Thanks,
Nuria
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last
batches that were failing. Last time 2/4 of the reports failed, when I
reran the individually they succeeded. (But they don't always, I reran one
report 3 times this morning before it worked.) This time, my failure rate
got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I can do to
help solve this.
Thanks again,
Amanda
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort
again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more details
we (users) can provide the engineers, the more effective they can be at
diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but I
don't have time to check on it, but I bet that's the problem. I'm off
tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently when
I ran a few test cohorts. On the same cohort, when I ran multiple
concurrent reports (say, bytes added, edits, and pages created), some would
fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics
people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Nuria Ruiz
2015-02-13 22:19:57 UTC
Permalink
Amanda,

Looks like wikimetrics was able to run automatic reports last night w/o big
issues, are your reports still failing?

Thanks,

Nuria
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute, please
don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think besides
asking for help on the labs list there is not much we can do there. I will
start a thread on this regard.
Thanks,
Nuria
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last
batches that were failing. Last time 2/4 of the reports failed, when I
reran the individually they succeeded. (But they don't always, I reran one
report 3 times this morning before it worked.) This time, my failure rate
got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I can do to
help solve this.
Thanks again,
Amanda
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort
again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more
details we (users) can provide the engineers, the more effective they can
be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but I
don't have time to check on it, but I bet that's the problem. I'm off
tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently when
I ran a few test cohorts. On the same cohort, when I ran multiple
concurrent reports (say, bytes added, edits, and pages created), some would
fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and
running reports. Strangely, it seems the more reports you try to run in
one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics
people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Nuria Ruiz
2015-02-16 18:15:33 UTC
Permalink
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last night w/o
big issues, are your reports still failing?
Thanks,
Nuria
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute, please
don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think
besides asking for help on the labs list there is not much we can do there.
I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last
batches that were failing. Last time 2/4 of the reports failed, when I
reran the individually they succeeded. (But they don't always, I reran one
report 3 times this morning before it worked.) This time, my failure rate
got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I can do
to help solve this.
Thanks again,
Amanda
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort
again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more
details we (users) can provide the engineers, the more effective they can
be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but I
don't have time to check on it, but I bet that's the problem. I'm off
tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently
when I ran a few test cohorts. On the same cohort, when I ran multiple
concurrent reports (say, bytes added, edits, and pages created), some would
fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and
running reports. Strangely, it seems the more reports you try to run in
one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics
people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Amanda Bittaker
2015-02-16 20:49:36 UTC
Permalink
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working better
now. I still get failures, especially when running three or four reports
in one batch, but the reports work if you rerun them (sometimes a couple
times.)

I'm still getting "PENDING"s that turn into "FAILURE"s sometimes, which I
just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...

In any case, Wikimetrics is mostly functioning but seems to be having
recurring troubles that sometimes blow up to freeze the whole tool. It
would be great to resolve the troubles before the next explosion--is there
anything I can do to help? Dan H and I still have plenty of reports to
run, we can keep you updated on the reports ran and failure rate while you
are fixing, if that would be useful.

Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last night w/o
big issues, are your reports still failing?
Thanks,
Nuria
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute, please
don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think
besides asking for help on the labs list there is not much we can do there.
I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last
batches that were failing. Last time 2/4 of the reports failed, when I
reran the individually they succeeded. (But they don't always, I reran one
report 3 times this morning before it worked.) This time, my failure rate
got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I can do
to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort
again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more
details we (users) can provide the engineers, the more effective they can
be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but
I don't have time to check on it, but I bet that's the problem. I'm off
tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently
when I ran a few test cohorts. On the same cohort, when I ran multiple
concurrent reports (say, bytes added, edits, and pages created), some would
fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and
running reports. Strangely, it seems the more reports you try to run in
one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics
people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Dan Andreescu
2015-02-17 15:20:14 UTC
Permalink
Sorry for the trouble, Amanda. The problem is solely with the underlying
database, which we don't maintain. It's a sanitized replica of all the
changes being made to all the wikis so it's a fairly complicated piece of
infrastructure that sometimes has problems. The folks who maintain it are
aware of the issues, but we'll continue representing them until they're
solved.
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working better
now. I still get failures, especially when running three or four reports
in one batch, but the reports work if you rerun them (sometimes a couple
times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes, which I
just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be having
recurring troubles that sometimes blow up to freeze the whole tool. It
would be great to resolve the troubles before the next explosion--is there
anything I can do to help? Dan H and I still have plenty of reports to
run, we can keep you updated on the reports ran and failure rate while you
are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last night w/o
big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute,
please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think
besides asking for help on the labs list there is not much we can do there.
I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last
batches that were failing. Last time 2/4 of the reports failed, when I
reran the individually they succeeded. (But they don't always, I reran one
report 3 times this morning before it worked.) This time, my failure rate
got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I can do
to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort
again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more
details we (users) can provide the engineers, the more effective they can
be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry but
I don't have time to check on it, but I bet that's the problem. I'm off
tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently
when I ran a few test cohorts. On the same cohort, when I ran multiple
concurrent reports (say, bytes added, edits, and pages created), some would
fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and
running reports. Strangely, it seems the more reports you try to run in
one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics
people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Amanda Bittaker
2015-02-17 15:53:29 UTC
Permalink
Good morning Dan,

Thanks very much for the explanation. Is there a Phabricator task we can
upvote (award a token?) to make this issue more visible?

As always, we really appreciate your help with this.

Best,
Amanda
Post by Dan Andreescu
Sorry for the trouble, Amanda. The problem is solely with the underlying
database, which we don't maintain. It's a sanitized replica of all the
changes being made to all the wikis so it's a fairly complicated piece of
infrastructure that sometimes has problems. The folks who maintain it are
aware of the issues, but we'll continue representing them until they're
solved.
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working better
now. I still get failures, especially when running three or four reports
in one batch, but the reports work if you rerun them (sometimes a couple
times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes, which I
just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be having
recurring troubles that sometimes blow up to freeze the whole tool. It
would be great to resolve the troubles before the next explosion--is there
anything I can do to help? Dan H and I still have plenty of reports to
run, we can keep you updated on the reports ran and failure rate while you
are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last night w/o
big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute,
please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think
besides asking for help on the labs list there is not much we can do there.
I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last
batches that were failing. Last time 2/4 of the reports failed, when I
reran the individually they succeeded. (But they don't always, I reran one
report 3 times this morning before it worked.) This time, my failure rate
got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I can
do to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort
again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more
details we (users) can provide the engineers, the more effective they can
be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry
but I don't have time to check on it, but I bet that's the problem. I'm
off tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently
when I ran a few test cohorts. On the same cohort, when I ran multiple
concurrent reports (say, bytes added, edits, and pages created), some would
fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and
running reports. Strangely, it seems the more reports you try to run in
one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful
Analytics people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Dan Andreescu
2015-02-17 15:58:06 UTC
Permalink
I can't find a specific ticket, Nuria may know of one. In general, this is
the project that LabsDB tickets are tagged with:
https://phabricator.wikimedia.org/tag/wikimedia-labs-infrastructure/
Post by Amanda Bittaker
Good morning Dan,
Thanks very much for the explanation. Is there a Phabricator task we can
upvote (award a token?) to make this issue more visible?
As always, we really appreciate your help with this.
Best,
Amanda
Post by Dan Andreescu
Sorry for the trouble, Amanda. The problem is solely with the underlying
database, which we don't maintain. It's a sanitized replica of all the
changes being made to all the wikis so it's a fairly complicated piece of
infrastructure that sometimes has problems. The folks who maintain it are
aware of the issues, but we'll continue representing them until they're
solved.
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working better
now. I still get failures, especially when running three or four reports
in one batch, but the reports work if you rerun them (sometimes a couple
times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes, which
I just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be having
recurring troubles that sometimes blow up to freeze the whole tool. It
would be great to resolve the troubles before the next explosion--is there
anything I can do to help? Dan H and I still have plenty of reports to
run, we can keep you updated on the reports ran and failure rate while you
are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last night
w/o big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute,
please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think
besides asking for help on the labs list there is not much we can do there.
I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last
batches that were failing. Last time 2/4 of the reports failed, when I
reran the individually they succeeded. (But they don't always, I reran one
report 3 times this morning before it worked.) This time, my failure rate
got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I can
do to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same cohort
again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more
details we (users) can provide the engineers, the more effective they can
be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry
but I don't have time to check on it, but I bet that's the problem. I'm
off tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures recently
when I ran a few test cohorts. On the same cohort, when I ran multiple
concurrent reports (say, bytes added, edits, and pages created), some would
fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and
running reports. Strangely, it seems the more reports you try to run in
one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful
Analytics people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Jonathan Morgan
2015-02-17 17:25:23 UTC
Permalink
Hi Amanda,

Here's a ticket you can upvote: https://phabricator.wikimedia.org/T87596

I added a link to this thread to the task. I also added an "Evil Spooky
Haunted Tree" token to the task. Because... well it just felt like the
right thing to do.

- J
Post by Dan Andreescu
I can't find a specific ticket, Nuria may know of one. In general, this
https://phabricator.wikimedia.org/tag/wikimedia-labs-infrastructure/
Post by Amanda Bittaker
Good morning Dan,
Thanks very much for the explanation. Is there a Phabricator task we can
upvote (award a token?) to make this issue more visible?
As always, we really appreciate your help with this.
Best,
Amanda
Post by Dan Andreescu
Sorry for the trouble, Amanda. The problem is solely with the
underlying database, which we don't maintain. It's a sanitized replica of
all the changes being made to all the wikis so it's a fairly complicated
piece of infrastructure that sometimes has problems. The folks who
maintain it are aware of the issues, but we'll continue representing them
until they're solved.
On Mon, Feb 16, 2015 at 3:49 PM, Amanda Bittaker <
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working
better now. I still get failures, especially when running three or four
reports in one batch, but the reports work if you rerun them (sometimes a
couple times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes, which
I just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be having
recurring troubles that sometimes blow up to freeze the whole tool. It
would be great to resolve the troubles before the next explosion--is there
anything I can do to help? Dan H and I still have plenty of reports to
run, we can keep you updated on the reports ran and failure rate while you
are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last night
w/o big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute,
please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think
besides asking for help on the labs list there is not much we can do there.
I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the last
batches that were failing. Last time 2/4 of the reports failed, when I
reran the individually they succeeded. (But they don't always, I reran one
report 3 times this morning before it worked.) This time, my failure rate
got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I can
do to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same
cohort again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more
details we (users) can provide the engineers, the more effective they can
be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry
but I don't have time to check on it, but I bet that's the problem. I'm
off tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures
recently when I ran a few test cohorts. On the same cohort, when I ran
multiple concurrent reports (say, bytes added, edits, and pages created),
some would fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and
running reports. Strangely, it seems the more reports you try to run in
one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful
Analytics people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
***@wikimedia.org
Amanda Bittaker
2015-02-17 17:35:41 UTC
Permalink
Sweet, thanks Jonathan. I added a "Heartbreak" token, because at this
point I am really far too emotionally attached to Wikimetrics.
Post by Jonathan Morgan
Hi Amanda,
Here's a ticket you can upvote: https://phabricator.wikimedia.org/T87596
I added a link to this thread to the task. I also added an "Evil Spooky
Haunted Tree" token to the task. Because... well it just felt like the
right thing to do.
- J
Post by Dan Andreescu
I can't find a specific ticket, Nuria may know of one. In general, this
https://phabricator.wikimedia.org/tag/wikimedia-labs-infrastructure/
On Tue, Feb 17, 2015 at 10:53 AM, Amanda Bittaker <
Post by Amanda Bittaker
Good morning Dan,
Thanks very much for the explanation. Is there a Phabricator task we
can upvote (award a token?) to make this issue more visible?
As always, we really appreciate your help with this.
Best,
Amanda
Post by Dan Andreescu
Sorry for the trouble, Amanda. The problem is solely with the
underlying database, which we don't maintain. It's a sanitized replica of
all the changes being made to all the wikis so it's a fairly complicated
piece of infrastructure that sometimes has problems. The folks who
maintain it are aware of the issues, but we'll continue representing them
until they're solved.
On Mon, Feb 16, 2015 at 3:49 PM, Amanda Bittaker <
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working
better now. I still get failures, especially when running three or four
reports in one batch, but the reports work if you rerun them (sometimes a
couple times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes,
which I just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be having
recurring troubles that sometimes blow up to freeze the whole tool. It
would be great to resolve the troubles before the next explosion--is there
anything I can do to help? Dan H and I still have plenty of reports to
run, we can keep you updated on the reports ran and failure rate while you
are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last night
w/o big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute,
please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think
besides asking for help on the labs list there is not much we can do there.
I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the
last batches that were failing. Last time 2/4 of the reports failed, when
I reran the individually they succeeded. (But they don't always, I reran
one report 3 times this morning before it worked.) This time, my failure
rate got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I
can do to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
On Thu, Feb 12, 2015 at 12:57 PM, Nuria Ruiz <
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same
cohort again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the more
details we (users) can provide the engineers, the more effective they can
be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new software
engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm sorry
but I don't have time to check on it, but I bet that's the problem. I'm
off tomorrow unfortunately but I'll try to check tomorrow night :( I hope
someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures
recently when I ran a few test cohorts. On the same cohort, when I ran
multiple concurrent reports (say, bytes added, edits, and pages created),
some would fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and
running reports. Strangely, it seems the more reports you try to run in
one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful
Analytics people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Amanda Bittaker
2015-02-17 17:40:53 UTC
Permalink
On that note, Jonathan, do you have SQL queries that return the same
results as the Wikimetrics reports? I tried writing my own for bytes
added, but it's pretty janky and takes forever to return anything in
Quarry. Would you share your wisdom with a poor wayward amateur?
Post by Amanda Bittaker
Sweet, thanks Jonathan. I added a "Heartbreak" token, because at this
point I am really far too emotionally attached to Wikimetrics.
Post by Jonathan Morgan
Hi Amanda,
Here's a ticket you can upvote: https://phabricator.wikimedia.org/T87596
I added a link to this thread to the task. I also added an "Evil Spooky
Haunted Tree" token to the task. Because... well it just felt like the
right thing to do.
- J
Post by Dan Andreescu
I can't find a specific ticket, Nuria may know of one. In general, this
https://phabricator.wikimedia.org/tag/wikimedia-labs-infrastructure/
On Tue, Feb 17, 2015 at 10:53 AM, Amanda Bittaker <
Post by Amanda Bittaker
Good morning Dan,
Thanks very much for the explanation. Is there a Phabricator task we
can upvote (award a token?) to make this issue more visible?
As always, we really appreciate your help with this.
Best,
Amanda
On Tue, Feb 17, 2015 at 7:20 AM, Dan Andreescu <
Post by Dan Andreescu
Sorry for the trouble, Amanda. The problem is solely with the
underlying database, which we don't maintain. It's a sanitized replica of
all the changes being made to all the wikis so it's a fairly complicated
piece of infrastructure that sometimes has problems. The folks who
maintain it are aware of the issues, but we'll continue representing them
until they're solved.
On Mon, Feb 16, 2015 at 3:49 PM, Amanda Bittaker <
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working
better now. I still get failures, especially when running three or four
reports in one batch, but the reports work if you rerun them (sometimes a
couple times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes,
which I just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be having
recurring troubles that sometimes blow up to freeze the whole tool. It
would be great to resolve the troubles before the next explosion--is there
anything I can do to help? Dan H and I still have plenty of reports to
run, we can keep you updated on the reports ran and failure rate while you
are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last night
w/o big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute,
please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I think
besides asking for help on the labs list there is not much we can do there.
I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the
last batches that were failing. Last time 2/4 of the reports failed, when
I reran the individually they succeeded. (But they don't always, I reran
one report 3 times this morning before it worked.) This time, my failure
rate got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I
can do to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
On Thu, Feb 12, 2015 at 12:57 PM, Nuria Ruiz <
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same
cohort again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the
more details we (users) can provide the engineers, the more effective they
can be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new
software engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm
sorry but I don't have time to check on it, but I bet that's the problem.
I'm off tomorrow unfortunately but I'll try to check tomorrow night :( I
hope someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures
recently when I ran a few test cohorts. On the same cohort, when I ran
multiple concurrent reports (say, bytes added, edits, and pages created),
some would fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts
and running reports. Strangely, it seems the more reports you try to run
in one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful
Analytics people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Dan Andreescu
2015-02-17 17:43:38 UTC
Permalink
Amanda, did you base your query on the pseudo-sql listed on the metric
page? In case you haven't seen it:

https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/metrics/bytes_added.py#L26

(Sorry I'm linking to github, labs seems to be suffering some serious DNS
problems right now, all my attempts to load wikimetrics are failing)
Post by Amanda Bittaker
On that note, Jonathan, do you have SQL queries that return the same
results as the Wikimetrics reports? I tried writing my own for bytes
added, but it's pretty janky and takes forever to return anything in
Quarry. Would you share your wisdom with a poor wayward amateur?
Post by Amanda Bittaker
Sweet, thanks Jonathan. I added a "Heartbreak" token, because at this
point I am really far too emotionally attached to Wikimetrics.
Post by Jonathan Morgan
Hi Amanda,
Here's a ticket you can upvote: https://phabricator.wikimedia.org/T87596
I added a link to this thread to the task. I also added an "Evil Spooky
Haunted Tree" token to the task. Because... well it just felt like the
right thing to do.
- J
Post by Dan Andreescu
I can't find a specific ticket, Nuria may know of one. In general,
https://phabricator.wikimedia.org/tag/wikimedia-labs-infrastructure/
On Tue, Feb 17, 2015 at 10:53 AM, Amanda Bittaker <
Post by Amanda Bittaker
Good morning Dan,
Thanks very much for the explanation. Is there a Phabricator task we
can upvote (award a token?) to make this issue more visible?
As always, we really appreciate your help with this.
Best,
Amanda
On Tue, Feb 17, 2015 at 7:20 AM, Dan Andreescu <
Post by Dan Andreescu
Sorry for the trouble, Amanda. The problem is solely with the
underlying database, which we don't maintain. It's a sanitized replica of
all the changes being made to all the wikis so it's a fairly complicated
piece of infrastructure that sometimes has problems. The folks who
maintain it are aware of the issues, but we'll continue representing them
until they're solved.
On Mon, Feb 16, 2015 at 3:49 PM, Amanda Bittaker <
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working
better now. I still get failures, especially when running three or four
reports in one batch, but the reports work if you rerun them (sometimes a
couple times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes,
which I just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be
having recurring troubles that sometimes blow up to freeze the whole tool.
It would be great to resolve the troubles before the next explosion--is
there anything I can do to help? Dan H and I still have plenty of reports
to run, we can keep you updated on the reports ran and failure rate while
you are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last
night w/o big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute,
please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I
think besides asking for help on the labs list there is not much we can do
there. I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the
last batches that were failing. Last time 2/4 of the reports failed, when
I reran the individually they succeeded. (But they don't always, I reran
one report 3 times this morning before it worked.) This time, my failure
rate got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I
can do to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
On Thu, Feb 12, 2015 at 12:57 PM, Nuria Ruiz <
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same
cohort again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the
more details we (users) can provide the engineers, the more effective they
can be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new
software engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm
sorry but I don't have time to check on it, but I bet that's the problem.
I'm off tomorrow unfortunately but I'll try to check tomorrow night :( I
hope someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures
recently when I ran a few test cohorts. On the same cohort, when I ran
multiple concurrent reports (say, bytes added, edits, and pages created),
some would fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts
and running reports. Strangely, it seems the more reports you try to run
in one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful
Analytics people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Dan Andreescu
2015-02-17 17:45:16 UTC
Permalink
Yep, labs is currently experiencing a disk failure which will affect our
instance. The thread subject on the labs list:

[Labs-l] Partial outage in progress
Post by Dan Andreescu
Amanda, did you base your query on the pseudo-sql listed on the metric
https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/metrics/bytes_added.py#L26
(Sorry I'm linking to github, labs seems to be suffering some serious DNS
problems right now, all my attempts to load wikimetrics are failing)
Post by Amanda Bittaker
On that note, Jonathan, do you have SQL queries that return the same
results as the Wikimetrics reports? I tried writing my own for bytes
added, but it's pretty janky and takes forever to return anything in
Quarry. Would you share your wisdom with a poor wayward amateur?
Post by Amanda Bittaker
Sweet, thanks Jonathan. I added a "Heartbreak" token, because at this
point I am really far too emotionally attached to Wikimetrics.
Post by Jonathan Morgan
Hi Amanda,
https://phabricator.wikimedia.org/T87596
I added a link to this thread to the task. I also added an "Evil Spooky
Haunted Tree" token to the task. Because... well it just felt like the
right thing to do.
- J
On Tue, Feb 17, 2015 at 7:58 AM, Dan Andreescu <
Post by Dan Andreescu
I can't find a specific ticket, Nuria may know of one. In general,
https://phabricator.wikimedia.org/tag/wikimedia-labs-infrastructure/
On Tue, Feb 17, 2015 at 10:53 AM, Amanda Bittaker <
Post by Amanda Bittaker
Good morning Dan,
Thanks very much for the explanation. Is there a Phabricator task we
can upvote (award a token?) to make this issue more visible?
As always, we really appreciate your help with this.
Best,
Amanda
On Tue, Feb 17, 2015 at 7:20 AM, Dan Andreescu <
Post by Dan Andreescu
Sorry for the trouble, Amanda. The problem is solely with the
underlying database, which we don't maintain. It's a sanitized replica of
all the changes being made to all the wikis so it's a fairly complicated
piece of infrastructure that sometimes has problems. The folks who
maintain it are aware of the issues, but we'll continue representing them
until they're solved.
On Mon, Feb 16, 2015 at 3:49 PM, Amanda Bittaker <
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working
better now. I still get failures, especially when running three or four
reports in one batch, but the reports work if you rerun them (sometimes a
couple times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes,
which I just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be
having recurring troubles that sometimes blow up to freeze the whole tool.
It would be great to resolve the troubles before the next explosion--is
there anything I can do to help? Dan H and I still have plenty of reports
to run, we can keep you updated on the reports ran and failure rate while
you are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last
night w/o big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can
contribute, please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I
think besides asking for help on the labs list there is not much we can do
there. I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the
last batches that were failing. Last time 2/4 of the reports failed, when
I reran the individually they succeeded. (But they don't always, I reran
one report 3 times this morning before it worked.) This time, my failure
rate got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I
can do to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
On Thu, Feb 12, 2015 at 12:57 PM, Nuria Ruiz <
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same
cohort again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the
more details we (users) can provide the engineers, the more effective they
can be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new
software engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm
sorry but I don't have time to check on it, but I bet that's the problem.
I'm off tomorrow unfortunately but I'll try to check tomorrow night :( I
hope someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures
recently when I ran a few test cohorts. On the same cohort, when I ran
multiple concurrent reports (say, bytes added, edits, and pages created),
some would fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts
and running reports. Strangely, it seems the more reports you try to run
in one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful
Analytics people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Jonathan Morgan
2015-02-17 17:50:37 UTC
Permalink
If Wikimetrics is down because of DB issues, then methinks Quarry will also
be down. :(
Post by Amanda Bittaker
On that note, Jonathan, do you have SQL queries that return the same
results as the Wikimetrics reports? I tried writing my own for bytes
added, but it's pretty janky and takes forever to return anything in
Quarry. Would you share your wisdom with a poor wayward amateur?
Post by Amanda Bittaker
Sweet, thanks Jonathan. I added a "Heartbreak" token, because at this
point I am really far too emotionally attached to Wikimetrics.
Post by Jonathan Morgan
Hi Amanda,
Here's a ticket you can upvote: https://phabricator.wikimedia.org/T87596
I added a link to this thread to the task. I also added an "Evil Spooky
Haunted Tree" token to the task. Because... well it just felt like the
right thing to do.
- J
Post by Dan Andreescu
I can't find a specific ticket, Nuria may know of one. In general,
https://phabricator.wikimedia.org/tag/wikimedia-labs-infrastructure/
On Tue, Feb 17, 2015 at 10:53 AM, Amanda Bittaker <
Post by Amanda Bittaker
Good morning Dan,
Thanks very much for the explanation. Is there a Phabricator task we
can upvote (award a token?) to make this issue more visible?
As always, we really appreciate your help with this.
Best,
Amanda
On Tue, Feb 17, 2015 at 7:20 AM, Dan Andreescu <
Post by Dan Andreescu
Sorry for the trouble, Amanda. The problem is solely with the
underlying database, which we don't maintain. It's a sanitized replica of
all the changes being made to all the wikis so it's a fairly complicated
piece of infrastructure that sometimes has problems. The folks who
maintain it are aware of the issues, but we'll continue representing them
until they're solved.
On Mon, Feb 16, 2015 at 3:49 PM, Amanda Bittaker <
Post by Amanda Bittaker
Oop, thanks for the ping, Nuria. Wikimetrics seems to be working
better now. I still get failures, especially when running three or four
reports in one batch, but the reports work if you rerun them (sometimes a
couple times.)
I'm still getting "PENDING"s that turn into "FAILURE"s sometimes,
which I just noticed for the first time last Thursday. Also, sometimes the
"FAILURE"s change position in the Current Report Inbox list, moving up or
down a spot. Not sure if that helps diagnose what might be happening...
In any case, Wikimetrics is mostly functioning but seems to be
having recurring troubles that sometimes blow up to freeze the whole tool.
It would be great to resolve the troubles before the next explosion--is
there anything I can do to help? Dan H and I still have plenty of reports
to run, we can keep you updated on the reports ran and failure rate while
you are fixing, if that would be useful.
Many thanks,
Amanda
Post by Nuria Ruiz
Ping ....
Post by Jonathan Morgan
Amanda,
Looks like wikimetrics was able to run automatic reports last
night w/o big issues, are your reports still failing?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:42 PM, Amanda Bittaker <
Post by Amanda Bittaker
Alright, thanks so much for your help once again, Nuria.
If there's anything I can do or any information I can contribute,
please don't hesitate to ping me.
Best,
Amanda
Post by Nuria Ruiz
DB connections in labs look to be failing, unfortunately I
think besides asking for help on the labs list there is not much we can do
there. I will start a thread on this regard.
Thanks,
Nuria
On Thu, Feb 12, 2015 at 1:32 PM, Amanda Bittaker <
Post by Amanda Bittaker
Thanks so much for the quick response, Nuria.
I ran the exact same reports on the same cohort as one of the
last batches that were failing. Last time 2/4 of the reports failed, when
I reran the individually they succeeded. (But they don't always, I reran
one report 3 times this morning before it worked.) This time, my failure
rate got worse: 4/4 failed, although they said "PENDING" for a few seconds
first, which is new.
Is that useful information? Please do let me know what else I
can do to help solve this.
Thanks again,
Amanda
On Thu, Feb 12, 2015 at 1:09 PM, Jonathan Morgan <
Post by Jonathan Morgan
Thanks Nuria!
On Thu, Feb 12, 2015 at 12:57 PM, Nuria Ruiz <
Post by Nuria Ruiz
If so a cohort + report to repro will be most useful.
Translation:* try to run the exact same reports on the same
cohort again, to see if the same metrics fail. Let us know what you find. ;)
Same goes for anyone else who experiences these issues: the
more details we (users) can provide the engineers, the more effective they
can be at diagnosing and addressing the problems.
Cheers,
- J
*for anyone who is not 100% familiar with that hip, new
software engineering lingo
Thanks,
Post by Nuria Ruiz
Nuria
On Thu, Feb 12, 2015 at 12:35 PM, Dan Andreescu <
Post by Dan Andreescu
Recently there was a restart of the labsdb cluster. I'm
sorry but I don't have time to check on it, but I bet that's the problem.
I'm off tomorrow unfortunately but I'll try to check tomorrow night :( I
hope someone else beats me to it.
On Thu, Feb 12, 2015 at 3:20 PM, Jonathan Morgan <
Post by Jonathan Morgan
(ping Kevin and Dan A.)
Hi Amanda, I've had some problems with report failures
recently when I ran a few test cohorts. On the same cohort, when I ran
multiple concurrent reports (say, bytes added, edits, and pages created),
some would fail and others succeed. It wasn't clear what the issue was.
- J
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts
and running reports. Strangely, it seems the more reports you try to run
in one batch the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful
Analytics people, could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF)
<https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Jonathan T. Morgan
Community Research Lead
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
***@wikimedia.org
Nuria Ruiz
2015-02-25 16:36:14 UTC
Permalink
Amanda,

Following up on this, are you still having troubles with wkimetrics ?

Thanks,

Nuria
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics people,
could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Edward Galvez
2015-02-25 18:12:17 UTC
Permalink
Hi Nuria,

We have been working with Dan A. to use the sql queries from Wikimetrics
instead of using the Wikimetrics interface. We have so many cohorts and
reports to run, we can't rely on Wikimetrics because of time. It's also
just faster for us to use a spreadsheet to build the queries quickly, than
to upload cohorts, validate them, and then build the reports.

Thanks,
Edward




On Feb 25, 2015, at 8:36 AM, Nuria Ruiz <***@wikimedia.org> wrote:

Amanda,

Following up on this, are you still having troubles with wkimetrics ?

Thanks,

Nuria
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics people,
could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Edward Galvez
2015-02-25 18:13:18 UTC
Permalink
And thanks for checking in again about this Nuria :)

-E
Post by Edward Galvez
Hi Nuria,
We have been working with Dan A. to use the sql queries from Wikimetrics
instead of using the Wikimetrics interface. We have so many cohorts and
reports to run, we can't rely on Wikimetrics because of time. It's also
just faster for us to use a spreadsheet to build the queries quickly, than
to upload cohorts, validate them, and then build the reports.
Thanks,
Edward
Amanda,
Following up on this, are you still having troubles with wkimetrics ?
Thanks,
Nuria
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics people,
could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
Tighe Flanagan
2015-02-25 21:44:07 UTC
Permalink
I had success running bytes, edits and pages created reports for a new
cohort today (only 49 users).

Hope the tool is working well for others, too!

Tighe

--
Tighe Flanagan
Manager, Wikipedia Education Program
Wikimedia Foundation
+1.415.839.6885 x6880
***@wikimedia.org
education.wikimedia.org
Post by Edward Galvez
And thanks for checking in again about this Nuria :)
-E
Post by Edward Galvez
Hi Nuria,
We have been working with Dan A. to use the sql queries from Wikimetrics
instead of using the Wikimetrics interface. We have so many cohorts and
reports to run, we can't rely on Wikimetrics because of time. It's also
just faster for us to use a spreadsheet to build the queries quickly, than
to upload cohorts, validate them, and then build the reports.
Thanks,
Edward
Amanda,
Following up on this, are you still having troubles with wkimetrics ?
Thanks,
Nuria
On Thu, Feb 12, 2015 at 12:16 PM, Amanda Bittaker <
Post by Amanda Bittaker
Hello all,
I am getting failures again, both when uploading cohorts and running
reports. Strangely, it seems the more reports you try to run in one batch
the less likely it is any report will succeed.
Is anyone else having these problems again? Wonderful Analytics people,
could you please work your magic again?
Many thanks,
Amanda
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
--
Edward Galvez
Program Evaluation Associate
Wikimedia Foundation
_______________________________________________
Wikimetrics mailing list
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
Loading...