Dan Andreescu
2013-11-02 04:00:15 UTC
Hi,
I just noticed someone ran a query from 2012 to 2013 as a timeseries by
hour. This... creates a *lot* of data. For the cohort they used, it's
about 1.8 million pieces of data. Should we cap report sizes somehow? It
doesn't pose any immediate dangers other than taking up a lot of resources
and computation time, as well as IO time spent logging the results (the log
is currently acting as rudimentary backup - perhaps this is ill conceived).
In this case it looks like maybe it was a mistake, so one idea is to warn
the user that they are about to generate a lot of data, and to ask them to
confirm.
Thoughts?
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/wikimetrics/attachments/20131102/639e7c5d/attachment.html>
I just noticed someone ran a query from 2012 to 2013 as a timeseries by
hour. This... creates a *lot* of data. For the cohort they used, it's
about 1.8 million pieces of data. Should we cap report sizes somehow? It
doesn't pose any immediate dangers other than taking up a lot of resources
and computation time, as well as IO time spent logging the results (the log
is currently acting as rudimentary backup - perhaps this is ill conceived).
In this case it looks like maybe it was a mistake, so one idea is to warn
the user that they are about to generate a lot of data, and to ask them to
confirm.
Thoughts?
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/wikimetrics/attachments/20131102/639e7c5d/attachment.html>