Dan Andreescu
2013-11-22 22:27:39 UTC
Hi everyone,
A quick note about something that just messed me up. When uploading a
cohort to wikimetrics, you are told you can use either user_name, user_id,
or a mixture in the first column. However, this can really produce
unexpected results if you don't know how it works. I think it needs to
change, but until then, this is how it works and how it can bite you:
Let's say I have a list of users:
1,en
2,en
3,en
When it validates, it will look up user_name == 1, if it doesn't find
anything it will look up user_id == 1. Then user_name == 2, user_id == 2,
user_name == 3, user_id == 3. If what you meant with the above cohort was
the users with ids 1, 2, and 3, then you might be very confused later when
you see user id 234215 in your output results. This might happen if a
user_name is actually 2! So, for now, until I figure out how to fix this,
it will always prefer user_names before user_ids.
Please let me know if this is confusing. Also, the whole problem stems
from needing to accept both user_id and user_name in the *same* upload. If
everyone agrees, I'd much rather just allow people to toggle between one or
the other. This would speed up validation and make it much clearer what is
going on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/wikimetrics/attachments/20131122/07841e67/attachment.html>
A quick note about something that just messed me up. When uploading a
cohort to wikimetrics, you are told you can use either user_name, user_id,
or a mixture in the first column. However, this can really produce
unexpected results if you don't know how it works. I think it needs to
change, but until then, this is how it works and how it can bite you:
Let's say I have a list of users:
1,en
2,en
3,en
When it validates, it will look up user_name == 1, if it doesn't find
anything it will look up user_id == 1. Then user_name == 2, user_id == 2,
user_name == 3, user_id == 3. If what you meant with the above cohort was
the users with ids 1, 2, and 3, then you might be very confused later when
you see user id 234215 in your output results. This might happen if a
user_name is actually 2! So, for now, until I figure out how to fix this,
it will always prefer user_names before user_ids.
Please let me know if this is confusing. Also, the whole problem stems
from needing to accept both user_id and user_name in the *same* upload. If
everyone agrees, I'd much rather just allow people to toggle between one or
the other. This would speed up validation and make it much clearer what is
going on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/wikimetrics/attachments/20131122/07841e67/attachment.html>