[00:31:17] Analytics-Cluster: Hive User can specify webrequest date range in query more easily - https://phabricator.wikimedia.org/T76531#1016592 (kevinator) [00:41:31] Analytics-EventLogging: nail down what EL process looks like and understand impact on system - https://phabricator.wikimedia.org/T75912#1016639 (kevinator) [00:41:33] Analytics-Engineering, Analytics-EventLogging: Analytics Eng has capacity & monitoring for EventLogging - https://phabricator.wikimedia.org/T76803#1016640 (kevinator) [00:42:17] Analytics-Engineering, Analytics-Cluster: Browser report for mobile folks from hadoop data in refined tables (Weekly or Daily) - https://phabricator.wikimedia.org/T88504#1016642 (Nuria) [00:43:47] Analytics-Engineering: EPIC: Prepare and host Event Logging hackathon at MWDS - https://phabricator.wikimedia.org/T86212#1016651 (Nuria) Open>Resolved a:Nuria [00:44:55] Analytics-Engineering, Analytics-EventLogging: EPIC: Prepare and host Event Logging hackathon at MWDS - https://phabricator.wikimedia.org/T86212#1016657 (kevinator) [00:50:04] Analytics-Kanban, Analytics-Cluster: Browser report for mobile folks from hadoop data in refined tables (Weekly or Daily) - https://phabricator.wikimedia.org/T88504#1016678 (kevinator) [00:51:04] Analytics-Kanban, Analytics-Cluster: Browser report for mobile folks from hadoop data in refined tables (Weekly or Daily) - https://phabricator.wikimedia.org/T88504#1016683 (kevinator) p:Triage>High [01:01:29] Analytics-Kanban, Analytics-Cluster: Mobile Apps PM has monthly report from oozie about apps uniques - https://phabricator.wikimedia.org/T88308#1016720 (kevinator) [01:02:12] Analytics-Kanban, Analytics-Cluster: Mobile Apps PM has monthly report from oozie about apps uniques - https://phabricator.wikimedia.org/T88308#1016723 (kevinator) p:Triage>High [01:03:31] Analytics-Kanban, Analytics-EventLogging: Drop clientValidated and isTruncated fields from event capsule - https://phabricator.wikimedia.org/T88595#1016732 (kevinator) p:Triage>Normal a:Nuria [01:07:08] Analytics-Kanban, Analytics-EventLogging: Remove autoincrement id from tables [5 pts] - https://phabricator.wikimedia.org/T87661#1016744 (kevinator) This data warehouse requirement was discussed here: https://wikitech.wikimedia.org/wiki/Analytics/DataWarehouse/Requirements#02.2F03.2F2015 [01:13:40] Analytics-Kanban: Icinga Monitoring should detect cluster running out of space - https://phabricator.wikimedia.org/T88640#1016773 (Nuria) NEW [01:13:51] Analytics-Kanban: Icinga Monitoring should detect cluster running out of HEAP space - https://phabricator.wikimedia.org/T88640#1016780 (Nuria) [01:15:11] Analytics-Kanban: Develop Verify Merge scripts into Data Warehouse repo {vole} - https://phabricator.wikimedia.org/T88641#1016782 (kevinator) NEW [01:17:57] Analytics-Kanban, Analytics-EventLogging: Script adding indices to Edit schema's EL table for Data Warehouse {vole} - https://phabricator.wikimedia.org/T88642#1016793 (kevinator) NEW a:Milimetric [01:19:46] Analytics-Kanban, Analytics-Cluster: Icinga Monitoring should detect cluster running out of HEAP space - https://phabricator.wikimedia.org/T88640#1016802 (kevinator) [01:21:25] Analytics-Kanban, Analytics-Cluster: Monitor (Icinga) cluster running out of HEAP space - https://phabricator.wikimedia.org/T88640#1016805 (kevinator) [01:21:51] Analytics-Kanban, Analytics-Cluster: Monitor cluster running out of HEAP space with Icinga - https://phabricator.wikimedia.org/T88640#1016806 (kevinator) [01:26:10] for those who weren't there: https://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarterly_reviews/Analytics/January_2015 [01:26:21] Analytics-Kanban: Troubleshoot data warehouse importing process {vole} - https://phabricator.wikimedia.org/T88583#1016825 (kevinator) [01:35:07] Analytics-Kanban: Develop Verify Merge scripts into Data Warehouse repo {mole} - https://phabricator.wikimedia.org/T88641#1016855 (kevinator) [01:35:16] Analytics-Kanban, Analytics-EventLogging: Script adding indices to Edit schema's EL table for Data Warehouse {mole} - https://phabricator.wikimedia.org/T88642#1016857 (kevinator) [01:35:26] Analytics-Kanban: Troubleshoot data warehouse importing process {mole} - https://phabricator.wikimedia.org/T88583#1016859 (kevinator) [01:44:49] Analytics-Kanban, Analytics-EventLogging: {Spike} Design where to specify purge schedule for schema {oryx} - https://phabricator.wikimedia.org/T88646#1016864 (kevinator) NEW [01:46:00] Analytics-EventLogging: Make DB logging configurable in the schema - https://phabricator.wikimedia.org/T87177#1016872 (kevinator) [01:46:01] Analytics-Engineering, Analytics-EventLogging: Product Instrumentation and Visualization - https://phabricator.wikimedia.org/T76795#1016871 (kevinator) [01:46:37] Analytics-Kanban, Analytics-EventLogging: Make DB logging configurable in the schema {oryx} - https://phabricator.wikimedia.org/T87177#1016873 (kevinator) p:Triage>Low [01:46:53] Analytics-Kanban, Analytics-EventLogging: Make DB logging configurable in the schema {oryx} - https://phabricator.wikimedia.org/T87177#985432 (kevinator) p:Low>Normal [01:49:57] Analytics-Kanban, Analytics-Cluster: Build component for Oozie jobs to sends e-mails - https://phabricator.wikimedia.org/T88433#1016880 (kevinator) [01:55:24] Analytics-Kanban, Analytics-Cluster: Mobile PM sees reports on browsers (Weekly or Daily) - https://phabricator.wikimedia.org/T88504#1016884 (kevinator) [02:01:41] hey halfak [02:02:23] hey. in meeting. might be intermittent. [02:03:02] k, was wondering when you'd be likely to have a chance to take a look at the bot; no rush. [02:12:36] Analytics-Kanban, Analytics-Cluster: {epic} WMF has UC report per project per month & day {bear} - https://phabricator.wikimedia.org/T88647#1016919 (kevinator) NEW [02:18:11] Analytics-Engineering, Analytics-EventLogging: Researchers access log of events failing validation - https://phabricator.wikimedia.org/T85028#1016929 (kevinator) [02:18:13] Analytics-EventLogging: find a better way to identify events that fail validation as early as possible - https://phabricator.wikimedia.org/T78355#1016930 (kevinator) [02:19:58] Analytics-Kanban, Analytics-EventLogging: Engineer finds events that fail validation as early as possible {oryx} - https://phabricator.wikimedia.org/T78355#1016935 (kevinator) [02:20:39] Analytics-Engineering, Analytics-EventLogging: Product Instrumentation and Visualization - https://phabricator.wikimedia.org/T76795#1016938 (kevinator) [02:20:41] Analytics-Kanban, Analytics-EventLogging: Engineer finds events that fail validation as early as possible {oryx} - https://phabricator.wikimedia.org/T78355#1016939 (kevinator) [02:20:55] Analytics-Kanban, Analytics-EventLogging: Engineer finds events that fail validation as early as possible {oryx} - https://phabricator.wikimedia.org/T78355#1016940 (kevinator) a:kevinator>None [02:22:11] Analytics-Kanban, Analytics-EventLogging: {epic} Product Instrumentation and Visualization {oryx} - https://phabricator.wikimedia.org/T76795#1016944 (kevinator) [02:23:38] Analytics-Kanban, Analytics-EventLogging: {epic} Product Instrumentation and Visualization {oryx} - https://phabricator.wikimedia.org/T76795#1016948 (Aklapper) [02:36:58] Analytics-Kanban, Analytics-EventLogging: Analytics Eng has capacity & monitoring for EventLogging {oryx} - https://phabricator.wikimedia.org/T76803#1016963 (kevinator) [03:41:58] MediaWiki-Special-pages, Analytics, MediaWiki-User-blocking: Make a SpecialPage to show stats on blocked IP (ranges) that attempt to edit - https://phabricator.wikimedia.org/T78840#1017013 (Glaisher) [10:14:44] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit: Active code review users on a monthly basis - https://phabricator.wikimedia.org/T86152#1017467 (Qgil) Let's show a graph with the number of active users on a monthly basis. Then let's have a table with a list of active users in the past month (usernam... [10:26:19] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit: Basic metrics about contributors exercising +2/-2 permissions in Gerrit - https://phabricator.wikimedia.org/T59038#1017504 (Qgil) Let's start with this: * A table with the full list of contributors who have exercised +2 OR -2 in Gerrit (name, primary... [10:42:42] Analytics-Tech-community-metrics: "Contributors new and gone" in korma is stalled - https://phabricator.wikimedia.org/T88278#1017529 (Qgil) a:Dicortazar [10:49:47] Wikimedia-Git-or-Gerrit, Analytics-Tech-community-metrics: Basic metrics about contributors exercising +2/-2 permissions in Gerrit - https://phabricator.wikimedia.org/T59038#1017534 (Qgil) a:Acs>Dicortazar [10:50:24] Analytics-Tech-community-metrics: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1017535 (Qgil) a:Acs>None [10:50:27] Analytics-Tech-community-metrics: Graphs for median/average should report absolute numbers - https://phabricator.wikimedia.org/T68266#1017539 (Qgil) a:Acs>None [10:50:29] Analytics-Tech-community-metrics: "Volume of open changesets" graph should show reviews pending every month - https://phabricator.wikimedia.org/T72278#1017537 (Qgil) a:Acs>None [10:50:30] Analytics-Tech-community-metrics: Tech metrics should talk about "Affiliation" instead of organizations or companies - https://phabricator.wikimedia.org/T62091#1017541 (Qgil) a:Acs>None [11:16:41] Analytics-Tech-community-metrics: Consolidating time ranges across tech community metrics - https://phabricator.wikimedia.org/T86630#1017568 (Qgil) After a chat today with @Dicortazar, and after he showed the reports they are preparing for OpenStack ([[ http://activity.openstack.org/dash/reports/2014-q4/pdf/p... [11:27:11] Analytics-Tech-community-metrics: KPI pages in korma need horizontal margins - https://phabricator.wikimedia.org/T88670#1017574 (Qgil) NEW [11:31:32] Analytics-Tech-community-metrics: KPI pages in korma need horizontal margins - https://phabricator.wikimedia.org/T88670#1017584 (Qgil) [11:34:48] Analytics: Metrics about the Wikimedia APIs usage - https://phabricator.wikimedia.org/T88267#1017588 (Qgil) [12:29:46] Project-Creators, Phabricator, Engineering-Community: Analytics-Volunteering and Wikidata's Need-Volunteer tags - https://phabricator.wikimedia.org/T88266#1017709 (Aklapper) [12:34:48] (CR) QChris: [C: -1] "Mostly nits. Looks great already!" (10 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/187651 (owner: Ananthrk) [12:57:16] Wikimedia-Git-or-Gerrit, Analytics-Tech-community-metrics: Active code review users on a monthly basis - https://phabricator.wikimedia.org/T86152#1017815 (Dicortazar) [12:58:59] Wikimedia-Git-or-Gerrit, Analytics-Tech-community-metrics: Active code review users on a monthly basis - https://phabricator.wikimedia.org/T86152#962385 (Dicortazar) We finally defined to have it in a quarter basis as far as I remember. Although it wouldn't be an issue to have it using a month period. I modi... [13:48:03] Multimedia, Analytics: Track image context and pass information onto X-Analytics - https://phabricator.wikimedia.org/T85922#1017907 (ezachte) I can't comment on #4. As for #3 seems not highest priority, but mostly useful for WMF itself, to see via which path 'intentional views' mostly happen. These images h... [14:11:53] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit: Basic metrics about contributors exercising +2/-2 permissions in Gerrit - https://phabricator.wikimedia.org/T59038#1017932 (Nemo_bis) [16:20:35] Analytics: Backfill event logging data after 02/05 outage - https://phabricator.wikimedia.org/T88692#1018153 (Krenair) [16:27:42] Analytics: Backfill event logging data after 02/05 outage - https://phabricator.wikimedia.org/T88692#1018175 (ggellerman) [16:30:34] Analytics, Analytics-Kanban: Backfill event logging data after 02/05 outage - https://phabricator.wikimedia.org/T88692#1018181 (ggellerman) [18:34:34] (Abandoned) Milimetric: Add Maven Checkstyle with default settings and disable Storm ETL module [analytics/kraken] - https://gerrit.wikimedia.org/r/78960 (owner: Diederik) [18:56:44] milimetric: Please look at https://gerrit.wikimedia.org/r/#/c/188270/, have incorporated your suggestion [18:58:07] merged [19:08:40] wikimedia/mediawiki-extensions-EventLogging#348 (wmf/1.25wmf16 - 0cb444e : Reedy): The build passed. [19:08:40] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/0cb444e83e28 [19:08:40] Build details : http://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/49641712 [19:10:39] Ellery? [20:31:57] milimetric, got a minute for a celery question? [20:32:03] yes [20:32:16] So, I want to have a work queue to process diffs in pages. [20:32:35] I'll need to make sure that if I get two revisions temporally close, that they'll be processed in order. Is that guaranteed? [20:32:45] *two revisions to the same page* [20:34:20] halfak: you have the freedom to structure your tasks however you want [20:34:29] celery gives you some abstractions [20:34:37] "chain" is one that might make sense here [20:34:46] Gotcha. [20:34:47] a chain is a set of tasks that must be executed in a specific order [20:35:09] and you can pass the result from one to the next [20:35:55] halfak: do you have a set of servers you're going to run this on? [20:36:25] because celery won't gain you much if you're on only one machine [20:36:28] milimetric, I haven't set it up yet, but I was planning to stand up a second processing server in the labs "persistence" project [20:36:41] ok, so you'd make a celery cluster in labs [20:36:48] Yeah. [20:36:49] Bad idea? [20:37:15] i'm hesitant that it's worth learning celery's idiosyncrasies [20:37:37] but maybe keep that in mind as you read through the docs. It's not "easy" [20:37:58] sorry, btw, chains: http://celery.readthedocs.org/en/latest/userguide/canvas.html#chains [20:38:07] No doubt. Few interesting things are :) Anything in particular you'd like to warn me against? [20:50:26] milimetric, it looks like you need to know the tasks in advance for a chain. Optimally, I'd like to create a queue of revisions for every page so that if a new revision for that page shows up while I am processing, I can just grab it off the queue and keep working. Does that make sense? [20:51:02] I suppose that, in practice, I'll end up batching ~50 revs at a time anyway. [20:51:48] So I could just have a task take a list of revisions that I group beforehand. [20:52:30] hm... i'm going to read up a bit [20:52:40] basically in-order stream processing implemented on top of celery [20:52:58] milimetric, indeed. [20:53:20] I'm trying to figure out how round of a hole celery is for my square peg :) [20:57:02] ok, eesh... i'm more and more hesitant, so halfak the problems: [20:57:06] serialization [20:57:49] celery needs to serialize task logic and parameters across the "wire" so you'll have to think about this if your situation doesn't happen to "just work" [20:58:19] as far as I can tell, no, you can't mess with a chain after it starts [20:58:25] Indeed. Most everything of importance that I work with is trivially serializable because I work with multiprocessing a lot. [20:58:31] but you could take the result of a chain as a dependency for another chain you start later [20:58:34] so that's fine [20:58:48] milimetric, ooh. That could work [20:59:25] halfak: basically, celery tasks are stored in a backend of your choice [20:59:46] so when you load them from that backend, you can inquire about their status or use them as async tasks that may or may not be "ready" [20:59:54] either way, they're fine as a candidate for a chain [21:00:03] however, we have run into bugs when we nested chains too far [21:00:33] Now that I've thought about batch processing a bit, I think that'll be fine. [21:00:40] Let's say that I want to work with redis within a task function. Good idea/bad idea? [21:00:47] there are "max recursion limit hit" bugs, serialization problems (for stuff that serializes fine in other contexts), and mind-bendy "am I killing my connection pool or am I closing everything properly" types of questions [21:01:04] Yeah... connection pool. [21:01:10] That's what I'm worried about now. [21:01:12] your task wouldn't need to know about redis [21:01:32] OK, so it's important that the task *doesn't* know about redis. [21:01:35] you would just write python and you'd manage your input/output schema from each python chunk [21:02:19] yeah, the result of the task could be stored into redis or wherever you want, but it wouldn't do that directly, a task is any python function with @your_celery_instance.task decorated on it [21:03:34] when your celery instance starts, it finds all the places you used it to decorate functions, and it serializes the code for those functions to the workers that need to be able to process those tasks [21:04:01] Can workers manage state? [21:04:01] then when you add a task, it picks a worker from its pool (could be distributed) and serializes the parameters you're passing to the function [21:04:13] e.g. a database connection. [21:04:16] you don't know about workers as the programmer [21:04:20] you only know about your task [21:04:36] Totally, but you know that your task is within *some* worker [21:04:39] which, yes, can use or create connections, state, etc. [21:05:04] right, and you could use some celery inspection stuff to figure out more details but that is hopefully never needed [21:05:26] * halfak is going to write some psuedo-code [21:07:12] milimetric, http://pairjam.com/#kyfg6h [21:07:16] I'll ping when I have something ready [21:07:22] k [21:11:50] not very "pseudo" ;) [21:13:13] yeah. I like my python :) [21:13:17] best psuedo-code ever [21:13:19] :) [21:15:52] milimetric, OK. I think it is ready when you have a sec. No rush. [21:29:59] (CR) Nuria: (WIP) project class/variant extraction UDF (4 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/188588 (owner: OliverKeyes) [21:34:08] halfak: sorry I gotta work on a few other things I'm way behind on [21:34:14] but we can chat more tomorrow? [21:34:20] No worries. Thanks so much for the notes :) [21:34:23] <3!!! [21:44:08] Analytics, Wikimedia-Fundraising: Public dashboards for CentralNotice and Fundraising - https://phabricator.wikimedia.org/T88744#1019258 (awight) NEW [22:05:33] (PS1) QChris: Re-run after refined datasets+projectcounts had delayed data added [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/188904 [22:06:28] (CR) QChris: [C: 2 V: 2] "Self-merging as the team felt that such data correcting commits" [analytics/aggregator/data] - https://gerrit.wikimedia.org/r/188904 (owner: QChris) [22:13:26] Project-Creators, Phabricator, Engineering-Community: Analytics-Volunteering and Wikidata's Need-Volunteer tags - https://phabricator.wikimedia.org/T88266#1019422 (Aklapper) @kevinator: Okay with Lydia's proposal above? (I'm also wondering where the creation of the Analytics-Volunteering project was discussed... [23:18:00] Analytics-Wikistats: Provide total active editors for December 2014 - https://phabricator.wikimedia.org/T88403#1019667 (ezachte) I can't comment on SQL approach. As for dump based data: I'm not totally pessimistic about how long it will take to generate the missing dumps. Most dumps for December are still mi... [23:25:45] milimetric: it's good to see another opinion that celery is not terribly straightforward [23:27:02] Analytics-Wikistats: Provide total active editors for December 2014 - https://phabricator.wikimedia.org/T88403#1019694 (ezachte) Caveat on previous comment. This is assuming the new cycle picks up oldest dumps in a particular queue first, which is how it is supposed to work. I'm not sure why nlwiki is current...