Discussion:
[Wikimetrics] Using Wikimetrics to generate cohorts
Dario Taraborelli
2014-10-02 23:00:38 UTC
Permalink
Abbey asked a question during today’s research group that I wanted to relay to the wikimetrics devs.

Would it be possible to allow people to use Wikimetrics’ project-level reports to generate cohorts, in other words, obtain lists of user_ids or user_names matching specific criteria, for example:

• registered users on a given date or period
• newly active editors on a given date
• unique editors on a given date or period

UX research as well as LCA would die to have such a functionality (the fallback is to do this via Quarry or post a request to Research & Data or someone in Grantmaking).

Dario
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikimetrics/attachments/20141002/4eeed91f/attachment.html>
Nuria Ruiz
2014-10-02 23:16:38 UTC
Permalink
Dario:

There are no technical blockers to be able to generate that data. Now,
product wise it does not seem like a fit as wikimetrics' purpose is to
produce data and run metrics. All wikimetrics computations are pre-canned.

It seems to me the use cases you passed along are better fitted by a tool
being able to freely query the db like quarry.

Thanks,

Nuria





On Thu, Oct 2, 2014 at 4:00 PM, Dario Taraborelli <
Post by Dario Taraborelli
Abbey asked a question during today’s research group that I wanted to
relay to the wikimetrics devs.
Would it be possible to allow people to use Wikimetrics’ project-level
reports to *generate *cohorts, in other words, obtain lists of user_ids
• registered users on a given date or period
• newly active editors on a given date
• unique editors on a given date or period
UX research as well as LCA would die to have such a functionality (the
fallback is to do this via Quarry or post a request to Research & Data or
someone in Grantmaking).
Dario
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikimetrics/attachments/20141002/67372116/attachment.html>
Dario Taraborelli
2014-10-03 00:43:16 UTC
Permalink
Nuria,

thanks for the feedback – for context, the reason why I am asking is that:

• I was under the impression that this data was already being stored somewhere in temporary tables by Wikimetrics when generating project-level reports
• this is quite similar to one of the earliest feature requests that we had for UserMetrics (the predecessor of Wikimetrics) under the notion of “generated cohorts”:

1) take the non-aggregate output of a report (say all registered users or new active editors from foowiki in a given time period)
2) save the output as a cohort
3) re-run that cohort through a different metric

Using Quarry still relies on the end user’s ability to understand how to turn a research question into a query. Having a curated query library is a good step in that direction, but that still requires some basic knowledge of SQL.

D
There are no technical blockers to be able to generate that data. Now, product wise it does not seem like a fit as wikimetrics' purpose is to produce data and run metrics. All wikimetrics computations are pre-canned.
It seems to me the use cases you passed along are better fitted by a tool being able to freely query the db like quarry.
Thanks,
Nuria
Abbey asked a question during today’s research group that I wanted to relay to the wikimetrics devs.
• registered users on a given date or period
• newly active editors on a given date
• unique editors on a given date or period
UX research as well as LCA would die to have such a functionality (the fallback is to do this via Quarry or post a request to Research & Data or someone in Grantmaking).
Dario
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikimetrics/attachments/20141002/d43ef93e/attachment-0001.html>
Nuria Ruiz
2014-10-03 00:56:07 UTC
Permalink
Post by Dario Taraborelli
I was under the impression that this data was already being stored
somewhere in temporary tables by Wikimetrics when generating project-level
reports
The data is not being stored anywhere at this time, we just query a table
for users that match a criteria.
Post by Dario Taraborelli
this is quite similar to one of the earliest feature requests that we had
for UserMetrics (the predecessor of Wikimetrics) under the notion of
Post by Dario Taraborelli
1) take the non-aggregate output of a report (say all registered users or
new active editors from *foowiki *in a given time period)
Post by Dario Taraborelli
2) save the output as a cohort
3) re-run that cohort through a different metric
I see. At this time in wikimetrics we have no way to store & reuse
intermediate results of metrics in other metrics. Which, if you notice, is
a performance concern as we recompute stuff. We have talked about doing
this in the past (we called it chaining metrics) and we have some backlog
items to this extent. You can talk to kevin about this use case (which is
slightly different than the original one you described) and he can add it
to the backlog.




On Thu, Oct 2, 2014 at 5:43 PM, Dario Taraborelli <
Post by Dario Taraborelli
Nuria,
• I was under the impression that this data was already being stored
somewhere in temporary tables by Wikimetrics when generating project-level
reports
• this is quite similar to one of the earliest feature requests that we
had for UserMetrics (the predecessor of Wikimetrics) under the notion of
1) take the non-aggregate output of a report (say all registered users or
new active editors from *foowiki *in a given time period)
2) save the output as a cohort
3) re-run that cohort through a different metric
Using Quarry still relies on the end user’s ability to understand how to
turn a research question into a query. Having a curated query library is a
good step in that direction, but that still requires some basic knowledge
of SQL.
D
There are no technical blockers to be able to generate that data. Now,
product wise it does not seem like a fit as wikimetrics' purpose is to
produce data and run metrics. All wikimetrics computations are pre-canned.
It seems to me the use cases you passed along are better fitted by a tool
being able to freely query the db like quarry.
Thanks,
Nuria
On Thu, Oct 2, 2014 at 4:00 PM, Dario Taraborelli <
Post by Dario Taraborelli
Abbey asked a question during today’s research group that I wanted to
relay to the wikimetrics devs.
Would it be possible to allow people to use Wikimetrics’ project-level
reports to *generate *cohorts, in other words, obtain lists of user_ids
• registered users on a given date or period
• newly active editors on a given date
• unique editors on a given date or period
UX research as well as LCA would die to have such a functionality (the
fallback is to do this via Quarry or post a request to Research & Data or
someone in Grantmaking).
Dario
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikimetrics/attachments/20141002/7fb4b87c/attachment.html>
Kevin Leduc
2014-10-03 18:01:42 UTC
Permalink
I added the story to the backlog so we don't lose it:
https://bugzilla.wikimedia.org/show_bug.cgi?id=71614
Post by Dario Taraborelli
Post by Dario Taraborelli
I was under the impression that this data was already being stored
somewhere in temporary tables by Wikimetrics when generating project-level
reports
The data is not being stored anywhere at this time, we just query a table
for users that match a criteria.
Post by Dario Taraborelli
this is quite similar to one of the earliest feature requests that we had
for UserMetrics (the predecessor of Wikimetrics) under the notion of
Post by Dario Taraborelli
1) take the non-aggregate output of a report (say all registered users or
new active editors from *foowiki *in a given time period)
Post by Dario Taraborelli
2) save the output as a cohort
3) re-run that cohort through a different metric
I see. At this time in wikimetrics we have no way to store & reuse
intermediate results of metrics in other metrics. Which, if you notice, is
a performance concern as we recompute stuff. We have talked about doing
this in the past (we called it chaining metrics) and we have some backlog
items to this extent. You can talk to kevin about this use case (which is
slightly different than the original one you described) and he can add it
to the backlog.
On Thu, Oct 2, 2014 at 5:43 PM, Dario Taraborelli <
Post by Dario Taraborelli
Nuria,
• I was under the impression that this data was already being stored
somewhere in temporary tables by Wikimetrics when generating project-level
reports
• this is quite similar to one of the earliest feature requests that we
had for UserMetrics (the predecessor of Wikimetrics) under the notion of
1) take the non-aggregate output of a report (say all registered users or
new active editors from *foowiki *in a given time period)
2) save the output as a cohort
3) re-run that cohort through a different metric
Using Quarry still relies on the end user’s ability to understand how to
turn a research question into a query. Having a curated query library is a
good step in that direction, but that still requires some basic knowledge
of SQL.
D
There are no technical blockers to be able to generate that data. Now,
product wise it does not seem like a fit as wikimetrics' purpose is to
produce data and run metrics. All wikimetrics computations are pre-canned.
It seems to me the use cases you passed along are better fitted by a tool
being able to freely query the db like quarry.
Thanks,
Nuria
On Thu, Oct 2, 2014 at 4:00 PM, Dario Taraborelli <
Post by Dario Taraborelli
Abbey asked a question during today’s research group that I wanted to
relay to the wikimetrics devs.
Would it be possible to allow people to use Wikimetrics’ project-level
reports to *generate *cohorts, in other words, obtain lists of user_ids
• registered users on a given date or period
• newly active editors on a given date
• unique editors on a given date or period
UX research as well as LCA would die to have such a functionality (the
fallback is to do this via Quarry or post a request to Research & Data or
someone in Grantmaking).
Dario
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikimetrics/attachments/20141003/a522a908/attachment.html>
Jonathan Morgan
2014-10-06 18:37:41 UTC
Permalink
On Thu, Oct 2, 2014 at 5:43 PM, Dario Taraborelli <
Post by Dario Taraborelli
Using Quarry still relies on the end user’s ability to understand how to
turn a research question into a query. Having a curated query library is a
good step in that direction, but that still requires some basic knowledge
of SQL.
I don't think it does, actually. Can't we just create a canned Quarry
query that generates a list of all accounts created in the past 24 hours in
an output format that is Wikimetrics-ready, and then direct people to the
persistent URL? User pastes that verbatim into their own "New query" box,
and download the resulting data. Bada-bing!

- J
Post by Dario Taraborelli
D
There are no technical blockers to be able to generate that data. Now,
product wise it does not seem like a fit as wikimetrics' purpose is to
produce data and run metrics. All wikimetrics computations are pre-canned.
It seems to me the use cases you passed along are better fitted by a tool
being able to freely query the db like quarry.
Thanks,
Nuria
On Thu, Oct 2, 2014 at 4:00 PM, Dario Taraborelli <
Post by Dario Taraborelli
Abbey asked a question during today’s research group that I wanted to
relay to the wikimetrics devs.
Would it be possible to allow people to use Wikimetrics’ project-level
reports to *generate *cohorts, in other words, obtain lists of user_ids
• registered users on a given date or period
• newly active editors on a given date
• unique editors on a given date or period
UX research as well as LCA would die to have such a functionality (the
fallback is to do this via Quarry or post a request to Research & Data or
someone in Grantmaking).
Dario
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Research-Internal mailing list
Research-Internal at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/research-internal
--
Jonathan T. Morgan
Learning Strategist
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
jmorgan at wikimedia.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikimetrics/attachments/20141006/f7672ece/attachment.html>
Dan Andreescu
2014-10-07 18:25:31 UTC
Permalink
+1 I'm a fan of bada-bing. But not sure about how it fits in the bigger
picture

On Mon, Oct 6, 2014 at 11:37 AM, Jonathan Morgan <jmorgan at wikimedia.org>
Post by Nuria Ruiz
On Thu, Oct 2, 2014 at 5:43 PM, Dario Taraborelli <
Post by Dario Taraborelli
Using Quarry still relies on the end user’s ability to understand how to
turn a research question into a query. Having a curated query library is a
good step in that direction, but that still requires some basic knowledge
of SQL.
I don't think it does, actually. Can't we just create a canned Quarry
query that generates a list of all accounts created in the past 24 hours in
an output format that is Wikimetrics-ready, and then direct people to the
persistent URL? User pastes that verbatim into their own "New query" box,
and download the resulting data. Bada-bing!
- J
Post by Dario Taraborelli
D
There are no technical blockers to be able to generate that data. Now,
product wise it does not seem like a fit as wikimetrics' purpose is to
produce data and run metrics. All wikimetrics computations are pre-canned.
It seems to me the use cases you passed along are better fitted by a tool
being able to freely query the db like quarry.
Thanks,
Nuria
On Thu, Oct 2, 2014 at 4:00 PM, Dario Taraborelli <
Post by Dario Taraborelli
Abbey asked a question during today’s research group that I wanted to
relay to the wikimetrics devs.
Would it be possible to allow people to use Wikimetrics’ project-level
reports to *generate *cohorts, in other words, obtain lists of user_ids
• registered users on a given date or period
• newly active editors on a given date
• unique editors on a given date or period
UX research as well as LCA would die to have such a functionality (the
fallback is to do this via Quarry or post a request to Research & Data or
someone in Grantmaking).
Dario
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Research-Internal mailing list
Research-Internal at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/research-internal
--
Jonathan T. Morgan
Learning Strategist
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
jmorgan at wikimedia.org
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikimetrics/attachments/20141007/72ab16de/attachment.html>
Kevin Leduc
2014-10-07 18:38:10 UTC
Permalink
yeah, relative to the other stuff we need for wikimetrics, I think this is
lower priority. Jonathan, if you hear more and more needs for it, you can
prioritize this sooner... This would be something for Marcel to implement
once he's up to speed on coding wikimetrics.

On Tue, Oct 7, 2014 at 11:25 AM, Dan Andreescu <dandreescu at wikimedia.org>
Post by Dan Andreescu
+1 I'm a fan of bada-bing. But not sure about how it fits in the bigger
picture
On Mon, Oct 6, 2014 at 11:37 AM, Jonathan Morgan <jmorgan at wikimedia.org>
Post by Nuria Ruiz
On Thu, Oct 2, 2014 at 5:43 PM, Dario Taraborelli <
Post by Dario Taraborelli
Using Quarry still relies on the end user’s ability to understand how to
turn a research question into a query. Having a curated query library is a
good step in that direction, but that still requires some basic knowledge
of SQL.
I don't think it does, actually. Can't we just create a canned Quarry
query that generates a list of all accounts created in the past 24 hours in
an output format that is Wikimetrics-ready, and then direct people to the
persistent URL? User pastes that verbatim into their own "New query" box,
and download the resulting data. Bada-bing!
- J
Post by Dario Taraborelli
D
There are no technical blockers to be able to generate that data. Now,
product wise it does not seem like a fit as wikimetrics' purpose is to
produce data and run metrics. All wikimetrics computations are pre-canned.
It seems to me the use cases you passed along are better fitted by a
tool being able to freely query the db like quarry.
Thanks,
Nuria
On Thu, Oct 2, 2014 at 4:00 PM, Dario Taraborelli <
Post by Dario Taraborelli
Abbey asked a question during today’s research group that I wanted to
relay to the wikimetrics devs.
Would it be possible to allow people to use Wikimetrics’ project-level
reports to *generate *cohorts, in other words, obtain lists of
• registered users on a given date or period
• newly active editors on a given date
• unique editors on a given date or period
UX research as well as LCA would die to have such a functionality (the
fallback is to do this via Quarry or post a request to Research & Data or
someone in Grantmaking).
Dario
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Research-Internal mailing list
Research-Internal at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/research-internal
--
Jonathan T. Morgan
Learning Strategist
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
jmorgan at wikimedia.org
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________
Wikimetrics mailing list
Wikimetrics at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimetrics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikimetrics/attachments/20141007/69638604/attachment.html>
Loading...