[10:55:11] hello team :] [10:58:23] o/ [11:04:46] 10Analytics: Add webrequest_stats to Druid in order to explore it with Pivot - https://phabricator.wikimedia.org/T150844#2798536 (10elukey) [11:04:53] 10Analytics: Add webrequest_stats to Druid in order to explore it with Pivot - https://phabricator.wikimedia.org/T150844#2798551 (10elukey) p:05Triage>03High [11:53:21] 10Analytics, 06Editing-Analysis: Move contents of ee-dashboards to edit-analysis.wmflabs.org - https://phabricator.wikimedia.org/T135174#2798699 (10mforns) @HJiang-WMF Awesome! Feel free to set up a meeting to discuss Dashiki. Thanks! [11:57:35] 10Analytics, 06Editing-Analysis: Move contents of ee-dashboards to edit-analysis.wmflabs.org - https://phabricator.wikimedia.org/T135174#2798704 (10mforns) @HJiang-WMF Oh! Sorry. First we Analytics should add a reportupdater trigger in puppet to run your queries. Please, add me as a reviewer to the gerrit pat... [15:38:27] 10Analytics, 06Research-and-Data: Hash IPs on webrequest table - https://phabricator.wikimedia.org/T150545#2789216 (10Milimetric) strong +1 on this, given recent reflection [16:00:54] mforns: helloooooo [16:01:00] standup? [16:01:05] nuria, trying! [16:20:09] 06Analytics-Kanban: Spamy - User-like pages distort our pageview metrics (they return 200 when they should return 404) - https://phabricator.wikimedia.org/T145922#2799461 (10Nuria) [16:33:40] 10Analytics, 06Research-and-Data: Hash IPs on webrequest table - https://phabricator.wikimedia.org/T150545#2799535 (10leila) @Milimetric I will provide a list of use-cases I will find next week. In the mean time, I want to point out that there is at least once use-case that will need raw IP and that's any wor... [16:36:25] 10Analytics, 06Research-and-Data: Hash IPs on webrequest table - https://phabricator.wikimedia.org/T150545#2799549 (10Milimetric) @leila: for those cases, it would be good to know if 30 days of data are good enough. We could use the raw data in special cases if they're the minority. But yes, if enough other... [16:40:06] 10Analytics, 06Research-and-Data: Hash IPs on webrequest table - https://phabricator.wikimedia.org/T150545#2799555 (10leila) @Milimetric sure. I'll keep an eye on the length needed as well (of course for the ISP case and the specific case of censorship research, the longer will be the better or the research wi... [16:46:03] mforns: when you got time? [16:46:17] milimetric, hey :] was going to ping you [16:46:29] how about 30 mins before the meeting? [16:46:49] I kinda have to prepare the interview now and then SoS [16:47:48] no prob, I'll look at druid in the meantime and prepare, just ping me when you're around [16:48:29] milimetric, do you plan to show druid queries to them? or just pivot? [16:52:21] not sure if the data's loaded properly. If it is, we can show Hive, compare the query complexity, explain the thought behind the schema and the new types of things available like revert, delete, etc. [16:52:47] and if we have time after that, go into Pivot a bit, with maybe Druid queries. I don't want to emphasize Druid because it's not very friendly to have to learn a new language [16:53:00] I'll see how nicely plywood works for some of our more complex queries [16:54:20] milimetric, I agree with not showing plywood, that would scare them away [16:54:36] plywood should be the same as clickhouse [16:54:47] it's pure druid json query that is scary to me [16:55:36] milimetric, and also we can describe the current/latest fields [16:56:16] that's what I was thinking with "explain the thought behind the schema and the new types of things available like revert, delete, etc." [16:56:25] milimetric, ok :] [16:56:54] lemme know if you think of anything else we could show, I'll just jot a rough outline on paper and we can keep it casual [18:24:33] (03CR) 10Ottomata: Use kafka-python instead of pykafka (032 comments) [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321550 (owner: 10Ottomata) [18:24:46] (03PS2) 10Ottomata: Use kafka-python instead of pykafka [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321550 [18:53:40] milimetric, hey done with SoS, wanna prepare the meeting? [18:54:20] mforns: yes, goin to the cave [18:58:42] (03PS1) 10Ottomata: Use argparse to make statsv configurable [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321911 (https://phabricator.wikimedia.org/T150765) [19:33:55] niceee --^ [19:40:01] (03CR) 10Elukey: "I like it but I'd add one last thing, namely a clear program structure like:" (032 comments) [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321911 (https://phabricator.wikimedia.org/T150765) (owner: 10Ottomata) [20:03:18] mforns_brb: cave? [20:08:48] ottomata: : can you make your homedir on kafka401 world readable so i can get our work from yesterday? [20:09:34] ottomata: thank youuuuuu [20:19:10] ah,i thought i did? [20:19:13] oh the home dir isn't [20:19:13] ah [20:19:43] nuria: done, sorr [20:19:44] y [20:19:54] i made ~otto/rdkafka-test readable, but not ~otto [20:19:56] done [20:20:05] didn't realize they weren't readable on labs [20:20:13] ottomata: np, will try to do code and testing today, just want to add some unit tests for char replacement [20:43:26] milimetric, back [20:43:57] cave? [20:49:49] (03PS1) 10Joal: [WIP] Example flink job validating event-logging events [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/321936 [20:53:38] (03CR) 10Ottomata: "Yeah, but to do so would require more restructuring than I am willing to do right now. The process_queue and other methods use globals de" (032 comments) [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321911 (https://phabricator.wikimedia.org/T150765) (owner: 10Ottomata) [20:53:53] (03PS2) 10Ottomata: Use argparse to make statsv configurable [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321911 (https://phabricator.wikimedia.org/T150765) [20:55:07] joal: :p [20:55:39] ottomata: ;) [20:55:48] ottomata: just for fun :D [20:58:01] :) [21:01:18] ottomata: but fighting with different things (https proxy first, no guava versions in shaded jars ... mwarf ;) [21:01:41] aye :) [21:05:19] (03CR) 10Elukey: "Yeah agreed, but we are interleaving scripting stuff with Class declarations, readability is not super good. I don't think it would be tha" [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321911 (https://phabricator.wikimedia.org/T150765) (owner: 10Ottomata) [21:34:07] hey, so you have to list all the wikis in reportupdater config explicitly? no way to just say "I want everything"? [21:40:28] ping milimetric ^ [21:48:00] MaxSem: there is, one sec lemme find example [21:48:24] MaxSem: but usually people don't really want everything - nobody ever looks at the output and it means 800 extra queries that probably don't need to happen [21:49:13] heh [21:49:22] MaxSem: so this is using a file (in the same directory) to list out the needed wikis: https://github.com/wikimedia/analytics-limn-edit-data/blob/master/edit/config.yaml#L47 [21:54:28] MaxSem: yeah, so we could add an all-wikis option, but we decided we'd like people to manage their own list so we can be sure they really need everything. But you can just copy paste files like this https://github.com/wikimedia/analytics-limn-edit-data/blob/master/edit/wikis.txt or output "show databases;" from analytics-store.eqiad.wmnet and pick what you [21:54:28] need from there. [21:55:47] milimetric, thanks - I have food for thoughts :) [21:57:49] cool [22:07:36] hmm, ERROR - Report "event" could not be written because of error: 'NoneType' object has no attribute 'strftime' [22:09:30] so it created bogus TSVs until I fixed the queries, but then couldn't digest these files trying to update [23:00:29] (03PS1) 10MaxSem: WIP: reportupdater queries for EventLogging [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/322007 (https://phabricator.wikimedia.org/T147034) [23:53:41] (03CR) 10Ori.livneh: [C: 032] Use kafka-python instead of pykafka [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321550 (owner: 10Ottomata)