[08:10:03] morning from Seville! :) [08:10:53] Hi ! [10:31:37] https://nifi.apache.org/ was called out in a talk about using apache tools in banking environment [10:31:57] (moving data from legacy stuff to the cloud in this case, before ETL) [10:33:09] also interesting that they are using Spark on Mesos, if I got it correctly to read data from Kafka [10:34:00] (03CR) 10Gilles: Use systemd's process watchdog to trigger restarts (031 comment) [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321230 (https://phabricator.wikimedia.org/T150359) (owner: 10Ori.livneh) [11:37:43] Talk on hive optilisations: Hive 2.0 + calcite is very cool ! [12:37:13] https://eng.uber.com/ureplicator/ is Uber's mirror maker like solution [12:37:27] (not sure if Andrew already looked at it) [12:38:31] and they also use a variation of https://github.com/confluentinc/kafka-rest to get events from apps before batching them to kafka brokers [12:40:06] (In reality we are just making stuff up while eating tapas :P) [12:43:54] (03CR) 10Elukey: "Would it make sense to take a more radical approach like exploring other kafka consumers? Or maybe packaging the last https://github.com/P" [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321230 (https://phabricator.wikimedia.org/T150359) (owner: 10Ori.livneh) [12:54:02] and they also use XFS [12:54:14] (they noticed improvements from ext4) [13:35:33] joal: I'm going to try and remember how to test https://gerrit.wikimedia.org/r/#/c/305989/ now :) [14:05:52] morning [14:15:02] (03CR) 10Ottomata: "I'm fine with this idea! But it might also be worth trying an updated or different kafka client. We've switched to kafka-python in event" [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321230 (https://phabricator.wikimedia.org/T150359) (owner: 10Ori.livneh) [14:20:53] ottomata: I want Flink! :P [14:24:03] 10Analytics, 10ChangeProp, 10Citoid, 10ContentTranslation-CXserver, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2748980 (10MoritzMuehlenhoff) Another nodejs-based service we're running is etherpad-lite (running etherpad.wikimedia.org), I've added @akosiaris to the tic... [14:28:07] elukey: you do?! hahaa [14:28:14] what is your desire? [14:33:34] 10Analytics, 10ChangeProp, 10Citoid, 10ContentTranslation-CXserver, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2792476 (10akosiaris) That's gonna be a problem. Etherpad is practically unmaintained these days. The leading developer has moved on to other projects (http... [14:35:02] (03PS3) 10Ori.livneh: Use systemd's process watchdog to trigger restarts [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321230 (https://phabricator.wikimedia.org/T150359) [14:39:44] (03CR) 10Ori.livneh: "There really ought to be one canonical Python client that we use and it ought to be robust, so I am in favor of switching to kafka-python " [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321230 (https://phabricator.wikimedia.org/T150359) (owner: 10Ori.livneh) [14:43:22] (03CR) 10Ottomata: "Yeah, but 'canonical' here is hard, as they change so rapidly. At this moment, I think pykafka is going to get left in the dust, but if y" [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321230 (https://phabricator.wikimedia.org/T150359) (owner: 10Ori.livneh) [14:48:54] (03CR) 10Ori.livneh: "Ottomata, sounds good -- thanks." [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321230 (https://phabricator.wikimedia.org/T150359) (owner: 10Ori.livneh) [14:49:11] (03CR) 10Ori.livneh: [C: 032 V: 032] Use systemd's process watchdog to trigger restarts [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321230 (https://phabricator.wikimedia.org/T150359) (owner: 10Ori.livneh) [15:02:28] ottomata: Joking, just seen the keynote this morning from one of the devs and it looks promising [15:02:42] especially the snapshots [15:03:23] Joseph and I were chatting about a future with EL in Flink (and maybe no camus and oozie! :) [15:05:36] ahhh awesome [15:11:05] 10Analytics, 10ChangeProp, 10Citoid, 10ContentTranslation-CXserver, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2792535 (10GWicke) @akosiaris, in the short term, I would propose to test if it works with node 6. The level of compatibility between 4 & 6 has been general... [15:19:09] 10Analytics, 10ChangeProp, 10Citoid, 10ContentTranslation-CXserver, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2792545 (10MoritzMuehlenhoff) There's at least anecdotal evidence that it works/worked with 6.2: https://github.com/ether/etherpad-lite/issues/2956 [15:53:06] (03PS10) 10Addshore: WikidataArticlePlaceholderMetrics also send search referral data [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) [15:53:16] (03PS11) 10Addshore: WikidataArticlePlaceholderMetrics also send search referral data [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) [16:00:43] trying to join team [16:01:09] a-team: standduppp [16:02:00] GRR trying [16:03:05] https://myriad.apache.org/ - niceee [16:17:17] 10Analytics: Update Refinery's restart documentation and Oozie alarms - https://phabricator.wikimedia.org/T150661#2792629 (10elukey) [16:18:35] nuria: --^ [16:18:43] document all the things! [16:18:46] :) [16:22:05] (03PS12) 10Addshore: WikidataArticlePlaceholderMetrics also send search referral data [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) [16:34:45] 10Analytics: Update Refinery's restart documentation and Oozie alarms - https://phabricator.wikimedia.org/T150661#2792680 (10Milimetric) p:05Triage>03Normal [16:35:27] 06Analytics-Kanban, 10Datasets-General-or-Unknown: pageviews files missing since yesterday 10th November - https://phabricator.wikimedia.org/T150524#2792684 (10Milimetric) [16:36:11] 06Analytics-Kanban, 10Datasets-General-or-Unknown: pageviews files missing since yesterday 10th November - https://phabricator.wikimedia.org/T150524#2788564 (10Milimetric) a:03elukey [16:36:44] 10Analytics, 10Beta-Cluster-Infrastructure: Set up a fake Pageview API endpoint for the beta cluster - https://phabricator.wikimedia.org/T150483#2792693 (10Milimetric) p:05Triage>03Normal [16:38:13] ottomata: if you don't mind I'd like to work on statsv since I am super ignorant about the produce part of kafka [16:38:30] (err sorry consume part in this case, but same thing) [16:39:43] 06Analytics-Kanban: Document "-" page title being a special value for "no title found" in pageview API. - https://phabricator.wikimedia.org/T150249#2792712 (10Milimetric) p:05Triage>03Normal a:03Nuria [16:41:01] 10Analytics, 10Pageviews-API: Provide weekly top pageviews stats - https://phabricator.wikimedia.org/T133575#2792717 (10Milimetric) p:05Triage>03Normal [16:41:59] 10Analytics, 10Analytics-Dashiki, 07Easy, 13Patch-For-Review: Dashiki breakdown layout problems. UI - https://phabricator.wikimedia.org/T133312#2792720 (10Milimetric) 05Open>03Resolved a:03Milimetric solved in the meantime [16:42:30] 10Analytics: Evaluate whether to rewrite varnishkafka in python - https://phabricator.wikimedia.org/T131938#2183609 (10Milimetric) p:05Triage>03Normal [16:45:32] 10Analytics, 07Documentation: Document a proposal for bundling other than load-refine jobs together (see refine/diagram) - https://phabricator.wikimedia.org/T130734#2144849 (10Milimetric) @JAllemandou can you explain more so we know how to prioritize? [16:46:40] 06Analytics-Kanban: Fix dropdowns in metric selector in dashiki - https://phabricator.wikimedia.org/T150664#2792748 (10Milimetric) [16:48:11] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#2792765 (10Milimetric) p:05Triage>03Normal [16:48:13] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#2144836 (10Milimetric) p:05Triage>03Normal [16:48:41] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#2144836 (10Milimetric) @JAllemandou / @Ottomata : talk about this and decide if we should do this. [16:50:00] 10Analytics: Run browser reports on hive monthly - https://phabricator.wikimedia.org/T118330#2792773 (10Milimetric) @Krinkle : can you tell us why it would be useful to have this monthly and weekly? [16:51:54] 10Analytics: Remove eventlogging code from blog. Use piwik to count pageviews - https://phabricator.wikimedia.org/T129558#2792778 (10Milimetric) 05Open>03declined metrics counted in other ways on the blog. [16:58:23] (03PS13) 10Addshore: WikidataArticlePlaceholderMetrics also send search referral data [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) [16:58:36] (03CR) 10Addshore: [C: 031] "Verified" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) (owner: 10Addshore) [16:58:37] 10Analytics: Better redirect handling for pageview API - https://phabricator.wikimedia.org/T121912#2792803 (10Milimetric) I think we can look at this more closely after we get a handle on redirects as part of the wikistats 2.0 data pipeline. Redirects are complicated on mediawiki. [16:58:47] joal: ^^ all verified tested and working! [17:01:42] (03CR) 10Addshore: "Once merged it would be great to have this running for all available data in the webrequest table!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/305989 (https://phabricator.wikimedia.org/T142955) (owner: 10Addshore) [18:48:37] mforns: gonna finish lunch and ping you? You working late today? [19:24:54] milimetric, I also was having dinner, ping me when you want, I'll work for 2 more hours [19:24:58] (03PS3) 10MaxSem: Count pages with geo tags [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/319262 (https://phabricator.wikimedia.org/T149722) [19:25:01] (03PS1) 10MaxSem: Add script for hourly cronjobs [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/321473 (https://phabricator.wikimedia.org/T149722) [19:25:46] (03CR) 10MaxSem: [C: 032 V: 032] Add script for hourly cronjobs [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/321473 (https://phabricator.wikimedia.org/T149722) (owner: 10MaxSem) [19:26:06] (03CR) 10MaxSem: [C: 032 V: 032] Count pages with geo tags [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/319262 (https://phabricator.wikimedia.org/T149722) (owner: 10MaxSem) [20:16:28] sorry mforns you still working? [20:16:34] milimetric, yes :] [20:16:40] batcave? [20:16:56] omw [20:43:33] nuria: have you been able to test the node-rdkafka-statsd stuff with node-rdkafka + kafka cluster yet? [20:43:53] ottomata: no, i did not try but about statsd call i asked Pchelolo [20:44:01] nuria: i just commented about statsd call. [20:45:19] ottomata: ok, thank you. will look in a sec, let me finish writting interview feedback [20:46:28] k cool [21:10:08] ottomata: ok, corrected nits [21:10:53] ottomata: statds d stuff still remains , just let me test how does that work without kafka or anything, the docs for client are most confusing [21:11:49] nuria: yeah, you might just need to read more about different statsd metric types [21:11:51] not the actual node client itself [21:13:20] not sure about my comment about set() for strings, [21:14:14] this reads well https://blog.pkhamre.com/understanding-statsd-and-graphite/ [21:33:12] ottomata: will do [21:35:51] ottomata : ok gauges all around, gauges are counters for which metrics like average and percentiles are of now value correct? if they were we will be using timers [21:38:35] ? [21:43:19] nuria: i don't understand the q [21:43:20] :) [21:43:26] ottomata: nah, np [21:43:48] yeah, librdkafka keeps quite a few of its own 'averages' etc. [21:43:49] for some stats [21:43:50] but not all [21:43:54] but i think that's ok [21:44:08] its easier to just use guages for numbers, instead of trying to be smart about which ones timers might be good for [21:44:17] ottomata: right, sorry. waht i was aying is that according to graphite gauges are metrics for which averages are not useful [21:44:38] ottomata: that is metrics that are not arround an interval of values [21:44:39] aye, ja [21:44:42] *around [21:51:26] ottomata: will add couple more tests to account for NAN and we should be ready to go with my next patch, thanks for your reviews [21:51:37] ok cool [21:51:46] ya, then let's test in mw vagrant or in labs or something [21:51:49] probably in labs [21:51:53] so we can actually see stats in graphite [22:19:59] ottomata: how do we make ezachte a db in hive? [22:20:49] create database ezachte [22:20:50] ; [22:20:50] :) [22:25:19] (03PS1) 10Ottomata: Use kafka-python instead of pykafka [analytics/statsv] - 10https://gerrit.wikimedia.org/r/321550 [22:27:38] laters all! [22:49:09] yo, analyticians: where should I look for our "top level numbers" (pageviews/mo, active editors, unique devices/mo, etc) [22:57:19] greg-g: we're still converging those, that's our wikistats 2.0 project, but for now: [22:57:32] pageviews: https://analytics.wikimedia.org/dashboards/vital-signs/ [22:58:08] unique devices: https://analytics.wikimedia.org/dashboards/vital-signs/#projects=eswiki,itwiki,enwiki,jawiki,dewiki,ruwiki,frwiki/metrics=UniqueDevices [22:58:33] active editors: wikistats: https://stats.wikimedia.org/EN/TablesWikipediaEN.htm (by wiki or otherwise, it's hard to navigate) [22:58:59] active editors in reportcard: http://reportcard.wmflabs.org/graphs/active_editors http://reportcard.wmflabs.org/ [22:59:18] (wikistats 2.0 will replace all those) [22:59:47] greg-g: is there anything else I can assist you with today? [23:06:52] milimetric: the pageviews one, we don't have a total number in that graph? [23:07:18] the uniqe devices link didn't work :/ [23:07:32] brought me here: https://analytics.wikimedia.org/dashboards/vital-signs/#projects=eswiki,itwiki,enwiki,jawiki,dewiki,ruwiki,frwiki/metrics= which tells me to select a metric [23:10:16] generalized question: the analytics.wikimedia.org graphs don't have totals? just broken out by project only? [23:14:38] i got the unique devices number [23:14:58] milimetric: I think that's it, other than needing to do basic addition in my head for the total number :P