[05:12:04] 10Analytics: Making geowiki data public - https://phabricator.wikimedia.org/T131280#2929889 (10Ijon) I believe we discussed this verbally a few months ago, but for the record, let me record answers to Nuria's questions here as well: 1. No measure I can think of is as robust a measure of the kind of impact we ar... [05:34:40] (03CR) 10Gergő Tisza: "@Nuria I haven't seen similar date arithmetics for most other projects, even though they also have to deal with the same problem of inform" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/331100 (https://phabricator.wikimedia.org/T137321) (owner: 10Gergő Tisza) [15:28:44] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 5 others: Asynchronous processing in production: one queue to rule them all - https://phabricator.wikimedia.org/T149408#2751310 (10MelodyKramer) Notes from this session: https://etherpad.wikimedia.org/p/devsummit17-asynchronous-processing [15:30:22] !log Restart 0024519-160420145651441-oozie-oozi-C for day 2017-01-09 to see if it fails again [15:30:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:01:57] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381#2909045 (10Nuria) The monthly code still needs to be load tested to make sure we can still deliver data within SLA, once we do that we can see whether that idea can be used elsewhere. [16:02:41] 06Analytics-Kanban, 07Easy, 03Google-Code-In-2016, 13Patch-For-Review: Add monthly request stats per article title to pageview api - https://phabricator.wikimedia.org/T139934#2930816 (10Nuria) We need to load test this code to make sure results are being deliver within SLAs before making data available for... [16:11:27] tgr|away: let's talk about this at some point this week https://gerrit.wikimedia.org/r/#/c/331100/ [17:00:36] (03PS13) 10Nuria: [WIP] POC of loading tile data into pivot [analytics/refinery] - 10https://gerrit.wikimedia.org/r/327845 (https://phabricator.wikimedia.org/T151832) [17:04:37] (03PS14) 10Nuria: [WIP] POC of loading tile data into pivot [analytics/refinery] - 10https://gerrit.wikimedia.org/r/327845 (https://phabricator.wikimedia.org/T151832) [17:06:26] (03PS15) 10Nuria: [WIP] POC of loading tile data into pivot [analytics/refinery] - 10https://gerrit.wikimedia.org/r/327845 (https://phabricator.wikimedia.org/T151832) [17:16:38] 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014#2931068 (10Milimetric) [18:01:29] 10Analytics, 10Pageviews-API: Yearly endpoint for the /pageviews/top API - https://phabricator.wikimedia.org/T154381#2931222 (10MusikAnimal) [18:18:44] (03PS16) 10Nuria: POC of loading tile data into pivot [analytics/refinery] - 10https://gerrit.wikimedia.org/r/327845 (https://phabricator.wikimedia.org/T151832) [18:23:40] 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), 03Interactive-Sprint, 13Patch-For-Review: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2931278 (10Nuria) Data is loaded (cc @JAllemandou) : https://pivot.wikimedia.org/#tiles-poc/lin... [18:24:55] 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014#2931291 (10tstarling) In 2010, I found a backup of Wikipedia from August 2001. Aside from the usual UseModWiki database, I discovered the files diff_log and rclog which contained a previously unknown copy of every revision to the... [18:35:26] Testing out various reportupdater reports and had one that yielded the following error: "could not be executed because of error: object of type 'NoneType' has no len()" have you ever seen that, milimetric & nuria? [18:36:44] bearloga: I think (given that most people are at mw summit) that if you send an e-mail with your command and repro scenario to analytics-internal you might get a better response [18:37:11] nuria: fair! forgot that was happening. will do, thanks! [18:37:59] nuria: o/ [18:38:10] elukey: welcome to america [18:38:16] elukey: man ... [18:38:18] thank you! [18:44:29] bearloga: do take a look at data on pivot, i think orginal data is missing project most of the time [18:44:53] bearloga: plus also looks a bit strange, which might indicate a data issue or a bug on feature end [18:45:20] bearloga: it's the dataset with no label in pivot [18:48:59] nuria: taking a look now :) it's very cool! the missing project is because the most prolific user of maps right now is a pokemon go website called pkget, which is responsible for 30M+ tiles a day [18:49:33] but only after nov 8, which is when they got blocked from using openstreetmap tiles [18:49:42] and switched to us instead [18:52:29] bearloga: ok, on my end i consider ticket done.Once you decide whether you want to use pivot to explore your data just remember that work needs to happen to productionize workflow [18:53:10] bearloga: it is not a matter of just merging that changeset as there is no pipeline from hive to fill in the karthoerian table neither one [18:53:11] nuria: got it! thank you so much for making this happen! this is so very cool! [18:54:13] bearloga: yw, moving ticket to done, we can catch up on sync up as to next steps (just think that if you want to proceed with pivot there is a bit of work in your end) [18:54:51] bearloga: rather I will move ticket to paused until i hear from you [21:31:13] nuria: are you in SF this week? maybe we could talk about it tomorrow? [22:40:28] tgr|away: nuria is coming to SF tomorrow night I think [22:41:40] Who is the best person in the analytics team currently to review quotes for new hadoop hardware? (I've always assumed otto but just checking ;) [22:41:43] bearloga: sounds like a problem with your config.yaml syntax [22:41:52] robh: yep, otto [22:41:55] and luca [22:42:27] 10Analytics, 10EventBus: EventBus produces non-canonical page urls - https://phabricator.wikimedia.org/T155066#2932286 (10Krinkle) [22:42:33] cool, added luca to task as well [22:45:12] theres planned spend for hadoop nodes this month, hence the question. mainly the task is asking if the increase in core count due to cpu version change will require a cooresponding increase in memory [22:45:32] and if the current memory to core ratio is ideal or if we've learned that we need to shift the specification sicne the last order