[07:28:05] (03PS3) 10Joal: Make mediawiki-history-reduced data permanent [analytics/refinery] - 10https://gerrit.wikimedia.org/r/427948 (https://phabricator.wikimedia.org/T192482) [08:10:10] 10Analytics, 10Patch-For-Review, 10User-Elukey: Update druid to latest release (0.11) - https://phabricator.wikimedia.org/T164008#4148842 (10elukey) Pivot deployed on d-1, usable via: ``` ssh -L 9090:localhost:9090 d-1.analytics.eqiad.wmflabs -N ``` Data cubes need probably to be tuned/fixed? [08:11:28] Cluster 'druid' encountered and error during SourceListRefresh: only druid versions >= 0.8.0 are supported [08:11:34] good morning :) [08:17:20] Morning elukey [08:17:37] mwarf mwarf - Pivot not happy [08:20:00] (03CR) 10Joal: [V: 031] "Tested on cluster." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/427948 (https://phabricator.wikimedia.org/T192482) (owner: 10Joal) [08:21:51] elukey: this diff of version is weird ... shouldn't 0.10.0 be higher than 0.8.0? [08:23:18] joal: I was brewing coffee sorry :) [08:23:23] o/ [08:24:10] so I didn't check pivot's code but I am pretty sure that there is a weird check like if version <= 0.9.something print error bla bla bla [08:29:23] my js fu is horrible :D [08:29:46] in cluster-manager.ts (on d-1) I can see [08:29:47] logger.error(`Cluster '${cluster.name}' encountered and error during SourceReintrospect: ${e.message}`); [08:30:13] line 231 [08:33:48] ufff it is difficult to find where the error comes from [08:35:42] joal: https://groups.google.com/forum/#!topic/imply-user-group/M5xsGIqMmBo [08:40:22] ah ./build/public/pivot-main.295b2ca1b5938e9d94fe.js line 5780 originates the error [08:40:58] 5779 if (this.version && External.versionLessThan(this.version, minVersion)) { [08:41:01] 5780 throw new Error("only " + this.engine + " versions >= " + minVersion + " are supported"); [08:41:04] 5781 } [08:43:20] ahahhah it workkksssss [08:43:25] I commented those lines [08:43:42] ah noooooooo [08:43:48] same error again [08:43:51] * elukey cries [08:44:13] but I probably didn't fix it properly [08:48:58] I'd need the help of our JS masters :D [08:53:45] fdans: hola! are you around? [08:54:15] elukey: hello!! [08:55:52] o/ [08:55:56] morning :) [08:56:07] so if you have time I'd need a JS consult/hacking from you [08:58:10] wanna cave elukey ? [08:58:14] sure [09:04:32] fdans, elukey : any chance? [09:18:10] 10Analytics, 10Analytics-Wikistats: Please install JSON.pm at stat1005 for Wikistats_1 - https://phabricator.wikimedia.org/T192760#4149025 (10ezachte) [09:42:42] joal: it seems to work now! [09:43:46] joal: basically what we need to do is comment a line in node_modules/plywood/build/plywood.js [09:44:03] if you want to check it now: ssh -L 9090:localhost:9090 d-1.analytics.eqiad.wmflabs -N [09:44:41] you guys rock :) [09:44:44] Testing now [09:49:54] 10Analytics, 10Patch-For-Review, 10User-Elukey: Update druid to latest release (0.11) - https://phabricator.wikimedia.org/T164008#4149110 (10elukey) Me and the JS master @fdans checked the Pivot's code this morning, and after a lot of tests we identified what returns the error: node_modules/plywood/build/pl... [09:51:15] elukey, fdans - Works for me ! [09:51:19] Yay [09:51:30] \o/ [09:51:48] joal: I put the explanation in the task ==^ [09:53:32] 10Analytics, 10Patch-For-Review, 10User-Elukey: Update druid to latest release (0.11) - https://phabricator.wikimedia.org/T164008#4149115 (10JAllemandou) +1 for commenting the global check :) [09:55:15] so now I am nerd sniped [09:55:29] I am running the function on a nodejs online compiler [09:55:36] and with 0.9.2 it returns "false" [09:55:43] with 0.10.0 "true" [09:59:51] elukey: hm - this is super weirdoh! [10:01:03] console.log("10" < "8"); --> true [10:01:09] * elukey cries [10:01:29] console.log(parseInt("10",10) < parseInt("8",10)); --> false [10:01:33] lolololololo [10:02:03] Of course :) [10:02:10] Strings vs integers [10:02:38] 10Analytics, 10Patch-For-Review, 10User-Elukey: Update druid to latest release (0.11) - https://phabricator.wikimedia.org/T164008#4149131 (10elukey) ``` console.log("10" < "8"); console.log(parseInt("10",10) < parseInt("8",10)); console.log("9" < "8"); console.log(parseInt("9",10) < parseInt("8",10)); true... [10:02:49] ok now I feel better :D [10:03:13] ooook so pivot works, last one is checking the monitoring metrics but it should be fine (and probably not a big blockers) [10:03:16] *blocker [10:04:32] elukey: Do I let you go for that ? You'll surely be faster than me [10:09:25] joal: so I'd need to check the prometheus agent for metrics, but it should be fine.. do you have other ideas? [10:10:20] elukey: Do you wish me to start a realtime job for you to check, or we'll assume it'll work? [10:10:35] elukey: except from that, no real other idea [10:11:23] mmm so I am using curl on d-1 and I don't see druid's metrics [10:11:25] uff [10:13:26] of course I don't see them among the jmx ones [10:13:31] hahaha [10:13:34] slow monday :) [10:32:15] joal: I am adding the druid exporter's config in hiera and restarting the labs cluster [10:32:39] elukey: ack! [10:32:43] after that if you could test queries, indexation, realtime etc.. it would be great (anytime this afternoon) [10:35:12] yess metrics are flowing [10:35:13] goood [10:35:50] so now I am going to lunch + errand, will bb in ~2h [10:35:51] elukey: I can launch some more stuff [10:35:55] sure! [10:35:57] elukey: shall I go for that now? [10:36:04] whenever you want, all ready [10:36:12] what are you expected us to test? metrics checks? [10:36:52] joal: I'll triple check that the prometheus metrics reported will be ok (like no duplicates, no strange numbers etc..) [10:37:02] if you could just trigger events that generate metrics [10:37:10] especially stuff like realtime [10:37:16] so we'll test them all [10:37:26] but afaics all seems good [10:37:35] elukey: they'll be sparse, but I'll launch 1 hadoop indexation, and 1 real time job for some time [10:37:39] ok for you? [10:37:44] I'll note the times :) [10:38:18] sure :) [10:38:34] ok ! Later ;) [10:39:54] o/ [12:52:07] joal: metrics look fine afaics, we should be ok! [12:52:29] if we manage to remove the pivot faulty check today we can think about migrating tomorrow [12:56:21] (03PS1) 10Elukey: Comment Druid version check not compatible with 0.10.0+ [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) [12:57:43] (03CR) 10Elukey: "JS Masters: not sure if this is the right way, let me know otherwise how to best fix it :)" [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) (owner: 10Elukey) [13:02:51] elukey: I wanted to wait for you to start realtime [13:02:54] We do now [13:03:13] 10Analytics, 10Patch-For-Review, 10User-Elukey: Update druid to 0.10 - https://phabricator.wikimedia.org/T164008#4149631 (10elukey) a:05Ottomata>03elukey [13:03:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update druid to 0.10 - https://phabricator.wikimedia.org/T164008#3218113 (10elukey) [13:04:59] joal: ah thanks! You can start anytime [13:05:25] elukey: job started, metrics should start flowing for dataset webrequest_live anytime [13:09:01] (03CR) 10Mforns: [WIP] Add jobs for druid indexing of virtualpageviews (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/427696 (https://phabricator.wikimedia.org/T192305) (owner: 10Mforns) [13:12:47] elukey: the job that was working yesterday seems to fail today :( [13:12:57] elukey: the realtime job, meaning [13:17:15] joal: argh [13:17:19] what does it say? [13:17:28] elukey: anything changed on teh druid side? [13:17:40] elukey: as if the job didn't manage to connect to druid [13:17:47] elukey: will try to shake it [13:18:57] all the daemons got restarted to pick up monitoring settings this morning, nothing more [13:19:23] let me know what endpoint:port is not working [13:21:26] (going to grab a quick coffee) [13:25:58] I can see banner logs though [13:27:38] druid_realtime_ingest_events_processed_count{datasource="webrequest_live"} 17.0 [13:27:39] elukey: Sorry - It was as usual a PBKAC [13:27:44] yessir [13:27:44] \o/ [13:27:50] :) [13:27:54] joal: we are readyz [13:27:55] elukey: ready to try [13:27:59] indeed [13:28:13] elukey: how do you wish to proceed? [13:28:36] joal: so first it would be great to review/merge/etc.. the "fix" for pivot (https://gerrit.wikimedia.org/r/428331) [13:29:18] then http://druid.io/docs/0.10.0/operations/rolling-updates.html [13:29:36] but we should do one host every hour [13:29:47] since we need to disable the middlemanagers gracefully [13:30:44] (03CR) 10Joal: [C: 031] "LGTM !" [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) (owner: 10Elukey) [13:31:07] works for me elukey [13:32:23] joal: I'd also need to work on https://gerrit.wikimedia.org/r/#/c/355471/, since at the time we had only one cluster. I re-adapted it for our current use case buuuut we'd need to upgrade both clusters at the same time [13:32:29] that is probably not the best [13:32:42] I think analytics goes first, then we leave a couple of days to find horrible things [13:32:49] and then we migrate public [13:32:53] what do you think? [13:33:54] elukey: doing the 2 clusters separately seems good to me :) [13:44:52] 10Analytics, 10User-Elukey: Reimage stat1004 with Debian Stretch - https://phabricator.wikimedia.org/T192640#4149757 (10Ottomata) Into it, but I think I'd like to wait until we get all the workers upgraded to stretch before we do stat1004. [13:47:12] 10Analytics, 10Analytics-Wikistats: Please install JSON.pm at stat1005 for Wikistats_1 - https://phabricator.wikimedia.org/T192760#4149760 (10Ottomata) Ah great! Was going to ask for a phab task. https://gerrit.wikimedia.org/r/#/c/428337/ [13:47:31] 10Analytics-Kanban, 10Analytics-Wikistats: Please install JSON.pm at stat1005 for Wikistats_1 - https://phabricator.wikimedia.org/T192760#4149761 (10Ottomata) [13:48:24] (03CR) 10Fdans: [V: 032 C: 031] Comment Druid version check not compatible with 0.10.0+ [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) (owner: 10Elukey) [13:48:43] 10Analytics, 10User-Elukey: Reimage stat1004 with Debian Stretch - https://phabricator.wikimedia.org/T192640#4149763 (10elukey) [13:50:41] milimetric: this looks promising https://twitter.com/sarah_edo/status/988414671232339970 [13:50:56] what a stupidly huge preview [13:54:46] !log reimage analytics1067 to debian stretch [13:54:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:57:15] joal: on an1067 there are two spark containers [13:57:27] are you going to scream and hit me if I shutdown the node? :D [13:57:28] elukey: kiling thelm [13:57:40] Should be gone elukey [13:57:44] thanks a lot! [13:57:46] :) [13:57:53] fdans: that's sweet, yeah, make a phab task like "Improve responsive experience" and link it [13:58:17] joal: during the next couple of months, let me know if me or Andrew should stop reimaging to avoid hitting important stuff [13:58:46] elukey: how fast will we reimage? [13:59:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage the Debian Jessie Analytics worker nodes to Stretch. - https://phabricator.wikimedia.org/T192557#4149804 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1067.eqiad.wmnet'] ```... [13:59:40] joal: it takes ~1h for me to complete one host (not actively working on it all the time, most of it is waiting) [14:00:06] Andrew might take half of it probably, so say between 30/60 mins :D [14:00:25] and this is of course keeping data [14:00:30] on the datanode partitions [14:00:33] o/ [14:00:36] ottomata: o/ [14:01:00] elukey: we can probably do 2 at at time, ya? [14:02:24] ottomata: sure sure :) [14:03:41] elukey: my question was somehow broader: you're coouple of months -- Are we planning for the thing to last months? [14:08:16] joal: so me and Andrew decided to reimage some hosts this quarter when we have some "spare time" [14:08:27] it is not part of the goals so very low priority [14:08:36] and since most of it is just background waitin [14:08:39] makes sense elukey (sorry to have not noticed) [14:08:41] so yeah, could be a month or two [14:08:45] no prob :) [14:09:01] elukey: i'm going to switch the main -> analytics mirrormaker to --new.consumer. [14:09:10] ottomata: go go go go [14:09:22] i'm going to leave one instance of old consumer running until one instance of new consumer is up. that will cause a few duplicatesa while they are both running [14:09:29] but that will be better than a few missing messages [14:10:20] !log switching main -> analytics MirrorMaker to --new.consumer (temporarily stopping puppet on kafka101[234]) https://phabricator.wikimedia.org/T192387 [14:10:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:35:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Reimage the Debian Jessie Analytics worker nodes to Stretch. - https://phabricator.wikimedia.org/T192557#4149936 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1067.eqiad.wmnet'] ``` and were **ALL** successful. [14:51:35] 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Patch-For-Review, 10User-GoranSMilovanovic: SparkR on Spark 2.3.0 - Testing on Large Data Sets - https://phabricator.wikimedia.org/T192348#4150007 (10JAllemandou) Hi @GoranSMilovanovic , The problem I see in your code is that you instanciate the dataframe... [14:52:22] (03PS3) 10Mforns: Add jobs for druid indexing of virtualpageviews [analytics/refinery] - 10https://gerrit.wikimedia.org/r/427696 (https://phabricator.wikimedia.org/T192305) [14:52:57] elukey: cool, we got it good https://grafana-admin.wikimedia.org/dashboard/db/kafka-mirrormaker-new-consumer?refresh=5m&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-lag_datasource=eqiad%20prometheus%2Fops&var-mirror_name=main-eqiad_to_eqiad [14:53:04] elukey: I also had a reminder for us to discuss T190213 for druid [14:53:07] FYI the destination cluster name here is 'eqiad' for those historial 'reasons' :) [14:53:12] instead of analytics [14:54:41] 10Analytics-Kanban: Vet new geo wiki data - https://phabricator.wikimedia.org/T191343#4102098 (10Milimetric) First notes about vetting are up here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Geowiki [14:55:05] ottomata: \o/ [14:55:16] 10Analytics-Kanban, 10Analytics-Wikistats: Please install JSON.pm at stat1005 for Wikistats_1 - https://phabricator.wikimedia.org/T192760#4150019 (10Ottomata) [14:55:39] elukey: i'm going to rename that dashboarda nd get rid of the old one [14:56:41] I was about to say that, thanks [14:57:01] let's also make sure afterwards that the dashboard_liks template for the alarms is good [14:58:38] (03PS4) 10Mforns: Add jobs for druid indexing of virtualpageviews [analytics/refinery] - 10https://gerrit.wikimedia.org/r/427696 (https://phabricator.wikimedia.org/T192305) [15:10:40] (03PS5) 10Mforns: Add jobs for druid indexing of virtualpageviews [analytics/refinery] - 10https://gerrit.wikimedia.org/r/427696 (https://phabricator.wikimedia.org/T192305) [15:10:55] (03CR) 10Ottomata: [C: 031] Add all changes to repo since Feb 2016 [analytics/ua-parser/uap-core] - 10https://gerrit.wikimedia.org/r/427415 (https://phabricator.wikimedia.org/T192465) (owner: 10Fdans) [15:15:03] (03CR) 10Milimetric: Comment Druid version check not compatible with 0.10.0+ (031 comment) [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) (owner: 10Elukey) [15:22:42] (03PS2) 10Elukey: Fix Druid version check not compatible with 0.10.0+ [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) [15:29:05] (03CR) 10Elukey: "Still testing it in labs, I hate JS :D" [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) (owner: 10Elukey) [15:34:16] (03PS3) 10Elukey: Fix Druid version check not compatible with 0.10.0+ [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) [15:39:12] (03PS4) 10Elukey: Fix Druid version check not compatible with 0.10.0+ [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) [15:42:37] (03PS5) 10Elukey: Fix Druid version check not compatible with 0.10.0+ [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) [15:42:59] milimetric: --^ :D [15:45:50] ping ottomata milimetric fdans joal [15:45:53] grossskinnn [15:47:07] OO [15:53:11] 10Analytics: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642#4150270 (10fdans) p:05Normal>03High [15:53:32] 10Analytics: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641#4150273 (10fdans) p:05Triage>03Normal [15:53:52] 10Analytics, 10Analytics-Kanban: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642#4145595 (10fdans) [15:53:59] 10Analytics, 10User-Elukey: Reimage stat1004 with Debian Stretch - https://phabricator.wikimedia.org/T192640#4150276 (10fdans) p:05Triage>03Normal [15:54:09] 10Analytics, 10User-Elukey: Reimage stat1004 with Debian Stretch - https://phabricator.wikimedia.org/T192640#4145568 (10fdans) p:05Normal>03High [15:54:15] 10Analytics: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641#4145584 (10fdans) p:05Normal>03High [15:54:21] 10Analytics, 10User-Elukey: Upgrade Druid nodes (1001->1006) to Debian Stretch - https://phabricator.wikimedia.org/T192636#4150283 (10fdans) p:05Triage>03Normal [15:54:26] 10Analytics, 10User-Elukey: Upgrade Druid nodes (1001->1006) to Debian Stretch - https://phabricator.wikimedia.org/T192636#4145484 (10fdans) p:05Normal>03High [15:54:42] 10Analytics, 10User-Elukey: Upgrade Archiva (meitnerium) to Debian Stretch - https://phabricator.wikimedia.org/T192639#4150289 (10fdans) p:05Triage>03Normal [15:54:55] 10Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog, and 2 others: Enable Reading List Syncing usage stats - https://phabricator.wikimedia.org/T191859#4150290 (10fdans) p:05Normal>03High [15:55:13] 10Analytics, 10Analytics-Kanban: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642#4150294 (10fdans) p:05High>03Normal [15:55:46] 10Analytics, 10User-Elukey: Reimage stat1004 with Debian Stretch - https://phabricator.wikimedia.org/T192640#4145568 (10fdans) p:05High>03Normal [15:55:48] 10Analytics: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641#4150299 (10fdans) p:05High>03Normal [15:55:52] 10Analytics, 10User-Elukey: Upgrade Druid nodes (1001->1006) to Debian Stretch - https://phabricator.wikimedia.org/T192636#4145484 (10fdans) p:05High>03Normal [15:56:55] 10Analytics, 10Analytics-Wikistats: Adding ranks to the map tooltip - https://phabricator.wikimedia.org/T191141#4150313 (10Nuria) 05Open>03declined [16:04:20] 10Analytics: Add a --dry-run option to the sqoop script - https://phabricator.wikimedia.org/T188556#4150347 (10fdans) [16:07:50] 10Analytics: Archive old geowiki data (editors per country) and make it easily available at WMF - https://phabricator.wikimedia.org/T190856#4085654 (10fdans) For this to happen we need to : - Sqoop the old geowiki tables into hadoop - Creating job to ingest data into druid - Remove old data. [16:10:53] 10Analytics: Archive old geowiki data (editors per country) and make it easily available at WMF - https://phabricator.wikimedia.org/T190856#4150421 (10fdans) [16:11:16] 10Analytics, 10Analytics-Kanban: Archive old geowiki data (editors per country) and make it easily available at WMF - https://phabricator.wikimedia.org/T190856#4085654 (10fdans) [16:12:37] 10Analytics, 10Analytics-Kanban: Archive old geowiki data (editors per country) and make it easily available at WMF - https://phabricator.wikimedia.org/T190856#4085654 (10fdans) [16:12:39] 10Analytics: Turn off old geowiki jobs - https://phabricator.wikimedia.org/T190059#4150437 (10fdans) [16:12:41] 10Analytics-Kanban: Private geo wiki data in new analytics stack - https://phabricator.wikimedia.org/T176996#4150435 (10fdans) [16:14:22] 10Analytics: Turn off old geowiki jobs - https://phabricator.wikimedia.org/T190059#4061513 (10fdans) [16:17:15] 10Analytics, 10Easy: Productionize job for Global Innovation Index from Hadoop Geowiki data - https://phabricator.wikimedia.org/T190535#4150456 (10fdans) [16:19:14] 10Analytics, 10Analytics-Data-Quality, 10Datasets-Webstatscollector, 10Language-Team: Add alarms for high volume of views to pages with replacement characters - https://phabricator.wikimedia.org/T117945#4150471 (10mforns) p:05High>03Normal [16:19:26] 10Analytics: Alarm on data quality issues - https://phabricator.wikimedia.org/T159840#4150472 (10mforns) p:05High>03Normal [16:19:49] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039#4150473 (10mforns) p:05Normal>03Triage [16:19:52] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039#3315367 (10mforns) p:05Triage>03Normal [16:20:06] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#4150475 (10mforns) p:05Normal>03Triage [16:20:09] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#3594088 (10mforns) p:05Triage>03Normal [16:20:52] 10Analytics, 10Analytics-Kanban: Add a --dry-run option to the sqoop script - https://phabricator.wikimedia.org/T188556#4150480 (10Nuria) [16:21:39] 10Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog, and 2 others: Enable Reading List Syncing usage stats - https://phabricator.wikimedia.org/T191859#4150481 (10mpopov) >>! In T191859#4146991, @Nuria wrote: > Note retention of this data is subjected... [16:22:16] 10Analytics: Clickstream dataset for Persian Wikipedia only includes external values - https://phabricator.wikimedia.org/T191964#4150484 (10Nuria) a:03Nuria [16:41:42] 10Analytics, 10Pageviews-API: Filter top pages by namespace/category - https://phabricator.wikimedia.org/T182975#3840107 (10Milimetric) p:05Triage>03Low [16:41:50] 10Analytics-Tech-community-metrics, 10Code-Health, 10Release-Engineering-Team (Kanban): Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites. - https://phabricator.wikimedia.org/T190891#4150561 (10greg) p:05Triage>03Normal [16:42:27] 10Analytics, 10Analytics-Wikistats: Allow namespace selection on Top Viewed Articles - https://phabricator.wikimedia.org/T182964#4150565 (10Milimetric) [16:42:34] 10Analytics, 10Pageviews-API: Filter top pages by namespace/category - https://phabricator.wikimedia.org/T182975#3840107 (10Milimetric) [16:42:47] 10Analytics, 10Analytics-Wikistats: Allow namespace selection on Top Viewed Articles - https://phabricator.wikimedia.org/T182964#3839881 (10Milimetric) p:05Triage>03Low [16:43:21] 10Analytics, 10Analytics-Wikistats: Use line charts when breaking down a column chart in Wikistats2 - https://phabricator.wikimedia.org/T189200#4150571 (10Milimetric) p:05Triage>03Normal [16:43:29] 10Analytics, 10Analytics-Wikistats: Use line charts when breaking down a column chart in Wikistats2 - https://phabricator.wikimedia.org/T189200#4034779 (10Milimetric) p:05Normal>03High [16:45:40] ottomata: https://gerrit.wikimedia.org/r/428372 seems nice, afaics we have NUMA hw on the Hadoop worker nodes [16:47:01] 10Analytics, 10Analytics-Dashiki: publish mediawiki deployments as a metric tsv - https://phabricator.wikimedia.org/T189156#4033271 (10Milimetric) p:05Triage>03Normal [16:47:05] 10Analytics, 10Analytics-Dashiki: Optionally do not sort columns in table-timeseries alphabetically - https://phabricator.wikimedia.org/T189125#4032172 (10Milimetric) p:05Triage>03Normal [16:47:07] 10Analytics, 10Analytics-Dashiki: Add a legend for annotations - https://phabricator.wikimedia.org/T189164#4033525 (10Milimetric) p:05Triage>03Normal [16:47:10] 10Analytics, 10Analytics-Dashiki: Make it possible to suppress the box in the bottom left of dygraphs-timeseries graphs - https://phabricator.wikimedia.org/T189069#4030449 (10Milimetric) p:05Triage>03Normal [16:47:12] 10Analytics, 10Analytics-Dashiki, 10Easy: Add annotationsMetric option to tabs layout - https://phabricator.wikimedia.org/T189159#4033434 (10Milimetric) p:05Triage>03Normal [16:47:36] 10Analytics: wikistats , move to webpack 4 - https://phabricator.wikimedia.org/T188759#4150607 (10Milimetric) p:05Triage>03Normal [16:53:14] 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking): Make it easier to enable EventLogging's debug mode - https://phabricator.wikimedia.org/T188640#4150625 (10Milimetric) p:05Triage>03Low [16:54:05] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: allow to view stats for all language versions (a.k.a. Project families) - https://phabricator.wikimedia.org/T188550#4150630 (10Milimetric) p:05Triage>03Normal [16:54:47] 10Analytics: Jupyter Notebooks TLC 2018-2019 - https://phabricator.wikimedia.org/T188275#4150647 (10Milimetric) p:05Triage>03Normal [16:57:52] 10Analytics: RStudio web version on SWAP - https://phabricator.wikimedia.org/T180270#4150674 (10Milimetric) p:05Triage>03Normal [16:58:06] 10Analytics: RStudio web version on SWAP - https://phabricator.wikimedia.org/T180270#3751938 (10Milimetric) p:05Normal>03Low [16:58:36] 10Analytics, 10Analytics-Wikistats: Active Editors metric for "all-projects" - https://phabricator.wikimedia.org/T188265#4150682 (10Milimetric) [16:58:43] 10Analytics, 10Analytics-Wikistats: Active Editors metric for "all-projects" - https://phabricator.wikimedia.org/T188265#4001846 (10Milimetric) p:05Triage>03Normal [16:58:52] ottomata: if interested: https://siliconangle.com/blog/2018/04/16/apache-flink-helps-netflix-process-three-trillion-events-every-day-flinkforward/ [16:59:28] 10Analytics: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041#4150686 (10Milimetric) p:05Triage>03Low [16:59:40] 10Analytics: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041#3994247 (10Milimetric) [16:59:44] 10Analytics, 10Research: Refactor pagecounts-ez generation - https://phabricator.wikimedia.org/T192474#4150689 (10Milimetric) [17:00:07] 10Analytics, 10Analytics-Wikistats: Beta: Provide easier way of accessing metrics such as active editors as defined in Wikistats 1 - https://phabricator.wikimedia.org/T187806#4150692 (10Milimetric) p:05Triage>03Normal [17:01:47] joal: https://gerrit.wikimedia.org/r/428372 for you too as a thought for the future [17:04:18] all right people, logging off! [17:04:22] talk with you tomorrow [17:05:05] if you guys could review the Pivot change https://gerrit.wikimedia.org/r/428331 before tomorrow that'd be great, so me and Joseph will be able to move forward with Druid 0.10 tomorrow [17:05:35] (I tested it in labs and works fine) [17:08:53] Bye elukey ! Thanks for the link :) [17:44:13] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039#4150838 (10Ottomata) [17:50:08] 10Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog, and 2 others: Enable Reading List Syncing usage stats - https://phabricator.wikimedia.org/T191859#4150847 (10mpopov) [17:50:41] 10Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog, and 2 others: Enable Reading List Syncing usage stats - https://phabricator.wikimedia.org/T191859#4118961 (10mpopov) [17:51:49] 10Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog, and 2 others: Enable Reading List Syncing usage stats - https://phabricator.wikimedia.org/T191859#4118961 (10mpopov) **Update**: @chelsyx @Nuria and I met to discuss this and have come up Proposal 4... [17:54:36] 10Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog, and 2 others: Enable Reading List Syncing usage stats - https://phabricator.wikimedia.org/T191859#4150880 (10mpopov) [18:40:01] fdans: idea, not sure if this is better or worse [18:40:02] yt? [18:44:31] bearloga: (New³ alternative) jajajaja [18:48:36] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Use profile and prometheus for role::kafka::main::broker - https://phabricator.wikimedia.org/T192831#4151005 (10Ottomata) p:05Triage>03Normal [18:50:39] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Upgrade to Stretch and Java 8 for Kafka main cluster - https://phabricator.wikimedia.org/T192832#4151019 (10Ottomata) p:05Triage>03Normal [19:06:55] (03CR) 10Milimetric: Fix Druid version check not compatible with 0.10.0+ (032 comments) [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) (owner: 10Elukey) [19:15:57] (03CR) 10Milimetric: [C: 032] "This all looks great. Let's merge and deploy." (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/427042 (https://phabricator.wikimedia.org/T182718) (owner: 10Nuria) [19:16:58] nuria_: one thing I kind of lapsed on is that we should be adding more tests to Wikistats. I'm making a phab task now and adding it to the beta column with high priority to take a look at test coverage and come up with a consistent policy for how we should go forward and what kinds of tests we should add. [19:18:17] milimetric: o/ [19:18:44] hi elukey [19:18:52] I agree about the !== but the try/catch seems a little bit too much for this fix, would it be ok just to s/!=/!== ? [19:18:52] woah, you're on late [19:19:03] nah I am watching tv :) [19:19:05] elukey: np, just mentioning for completeness [19:19:08] 10Analytics, 10Analytics-Wikistats: Audit Wikistats unit testing - https://phabricator.wikimedia.org/T192836#4151128 (10Milimetric) [19:19:10] 10Analytics, 10Analytics-Wikistats: Audit Wikistats unit testing - https://phabricator.wikimedia.org/T192836#4151139 (10Milimetric) p:05Triage>03High [19:19:16] the only way to do JS without losing your mind :) [19:19:19] (03Merged) 10jenkins-bot: SEO oriented changes for Wikistats2 Pages [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/427042 (https://phabricator.wikimedia.org/T182718) (owner: 10Nuria) [19:20:03] (03PS6) 10Elukey: Fix Druid version check not compatible with 0.10.0+ [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) [19:20:04] milimetric: ahahha [19:21:44] patch updated! [19:22:50] (03CR) 10Milimetric: [V: 032 C: 032] Fix Druid version check not compatible with 0.10.0+ [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428331 (https://phabricator.wikimedia.org/T164008) (owner: 10Elukey) [19:23:48] \o/ [19:23:50] thanks a lot! [19:24:27] milimetric: ok if I deploy pivot? [19:24:43] yeah, definitely elukey, thank you [19:27:05] error: unable to unlink old 'node_modules/plywood/build/plywood.js': Permission denied [19:27:08] lovely :D [19:30:59] ok seems fixed [19:31:22] elukey@tin:/srv/deployment/analytics/pivot/deploy$ cat scap/targets [19:31:23] stat1001.eqiad.wmnet [19:32:14] and thorium [19:32:45] (03PS1) 10Elukey: Remove stat1001 from scap targets [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428413 [19:33:06] (03CR) 10Elukey: [V: 032 C: 032] Remove stat1001 from scap targets [analytics/pivot/deploy] - 10https://gerrit.wikimedia.org/r/428413 (owner: 10Elukey) [19:33:39] 10Analytics, 10Analytics-Wikistats: Wikistats: Data Regression - https://phabricator.wikimedia.org/T192840#4151211 (10Milimetric) [19:33:52] 10Analytics, 10Analytics-Wikistats: Wikistats: Data Regression - https://phabricator.wikimedia.org/T192840#4151222 (10Milimetric) p:05Triage>03Unbreak! a:03Milimetric [19:34:19] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats: Data Regression - https://phabricator.wikimedia.org/T192840#4151211 (10Milimetric) [19:34:48] !log deploy https://gerrit.wikimedia.org/r/428331 for Pivot [19:34:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:35:44] pivot seems to work :) [19:36:51] ooook we are ready for druid 0.10 then :) [19:36:56] thanks a lot milimetric! [19:36:57] awesome, thanks elukey [19:37:17] * elukey afk again :) [19:37:30] nuria_: you see the data regression task? It looks like we have billions of edits a month since January :) Something went wrong somewhere... [19:38:36] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats: Data Regression - https://phabricator.wikimedia.org/T192840#4151245 (10Milimetric) [19:39:32] (03CR) 10Milimetric: [C: 032] "Looks really nice and polished, well done." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/423904 (https://phabricator.wikimedia.org/T188277) (owner: 10Fdans) [19:41:15] 10Analytics, 10Analytics-Wikistats: Wikistats Bug: all but 2018 data missing? - https://phabricator.wikimedia.org/T192841#4151252 (10jmatazzoni) [19:56:55] milimetric: waht? cc joal [19:57:17] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Should it be possible for a schema to override DNT in exceptional circumstances? - https://phabricator.wikimedia.org/T187277#4151340 (10Tbayer) [19:57:48] nuria_: I'm thinking maybe we hide the last couple of months that are bad from the UI [19:57:53] because in any case the fix will be slow [19:58:56] milimetric, joal: this is new as of today, we were looking at this dat just last week [19:59:45] 10Analytics, 10Analytics-Wikistats: Wikistats Bug: all but 2018 data missing? - https://phabricator.wikimedia.org/T192841#4151252 (10Nuria) working on this. [20:01:12] one sec, was distracted, ok, on this now [20:01:21] nuria_: wanna take a look together in the cave [20:01:22] ? [20:01:57] yes milimetric let's do that cc joal please come to cave if you may [20:07:58] ottomata: yt? [20:08:16] yaa [20:08:17] heya [20:08:32] ottomata: do we have permits to access public druid cluster [20:08:49] access in what way? [20:08:51] probably? [20:09:08] ottomata: all of a sudden all data looks bad [20:09:17] what do you mean access? [20:09:18] ssh? [20:09:27] yes, ok, i see i can edit [20:09:33] ottomata: i can ssh [20:09:35] k [20:09:35] ya [20:09:46] ottomata: ok I guess we can also look at admin interface [20:13:32] ottomata: can you come to cave? [20:18:24] 10Analytics, 10Analytics-EventLogging, 10Performance-Team (Radar): Spin out a tiny EventLogging RL module for lightweight logging - https://phabricator.wikimedia.org/T187207#4151444 (10Imarlier) [20:22:57] nuria_: ok [20:25:03] 10Analytics: Under construction page in wikistats to take site down - https://phabricator.wikimedia.org/T192847#4151477 (10Nuria) [20:31:47] (03PS1) 10Milimetric: Disable metrics temporarily [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428506 (https://phabricator.wikimedia.org/T192840) [20:34:51] 10Analytics, 10Analytics-Wikistats: Wikistats Bug: all but 2018 data missing? - https://phabricator.wikimedia.org/T192841#4151514 (10Nuria) Disabling editing metrics. [20:35:56] (03PS2) 10Milimetric: Disable metrics temporarily [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428506 (https://phabricator.wikimedia.org/T192840) [20:36:12] (03CR) 10Milimetric: [V: 032 C: 032] Disable metrics temporarily [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428506 (https://phabricator.wikimedia.org/T192840) (owner: 10Milimetric) [20:38:47] (03PS1) 10Milimetric: Prepare for release 2.2.3 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428509 [20:39:08] (03CR) 10Milimetric: [V: 032 C: 032] "turning off metrics" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428509 (owner: 10Milimetric) [20:39:59] (03PS1) 10Milimetric: Release 2.2.3 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/428510 [20:40:50] (03CR) 10Milimetric: [V: 032 C: 032] "release hotfix version of wikistats with metrics turned off to temporarily take offline the metrics with bad data (basically all editing m" [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/428510 (owner: 10Milimetric) [20:41:26] !log deployed a version of wikistats with all but reading metrics disabled to stop showing bad data [20:41:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:48:31] (03PS1) 10Milimetric: Disable editing metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428516 [20:48:39] nuria_: ^ [20:52:27] (03PS1) 10Milimetric: Prepare for release 2.2.4 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428518 [20:52:41] (03CR) 10Milimetric: [V: 032 C: 032] Prepare for release 2.2.4 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428518 (owner: 10Milimetric) [20:52:53] (03CR) 10Milimetric: [V: 032 C: 032] Disable editing metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428516 (owner: 10Milimetric) [20:56:11] PROBLEM - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is CRITICAL: 1.333e+05 gt 1e+05 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad%2520prometheus%252Fops&var-lag_datasource=eqiad%2520prometheus%252Fops&var-mirror_name=main-eqiad_to_eqiad [20:57:51] (03PS1) 10Milimetric: Release 2.2.4 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/428520 [20:58:06] (03CR) 10Milimetric: [V: 032 C: 032] Release 2.2.4 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/428520 (owner: 10Milimetric) [20:59:58] hm! [21:08:03] (03PS1) 10Milimetric: [WIP] Enabling metrics again [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/428523 [21:21:20] RECOVERY - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is OK: (C)1e+05 gt (W)1e+04 gt 2137 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad%2520prometheus%252Fops&var-lag_datasource=eqiad%2520prometheus%252Fops&var-mirror_name=main-eqiad_to_eqiad [21:35:02] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats: Data Regression - https://phabricator.wikimedia.org/T192840#4151699 (10Nuria) [21:35:04] 10Analytics, 10Analytics-Wikistats: Wikistats Bug: all but 2018 data missing? - https://phabricator.wikimedia.org/T192841#4151698 (10Nuria) [21:35:21] 10Analytics, 10Analytics-Wikistats: Wikistats Bug: all but 2018 data missing? - https://phabricator.wikimedia.org/T192841#4151252 (10Nuria) a:03JAllemandou [21:35:27] 10Analytics, 10Analytics-Wikistats: Wikistats Bug: all but 2018 data missing? - https://phabricator.wikimedia.org/T192841#4151252 (10Nuria) Rerunning indexing for 2018-02 snapshot [21:46:50] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review, 10Services (watching): Upgrade to Stretch and Java 8 for Kafka main cluster - https://phabricator.wikimedia.org/T192832#4151709 (10mobrovac) Neat! [21:59:31] PROBLEM - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is CRITICAL: 1.329e+05 gt 1e+05 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad%2520prometheus%252Fops&var-lag_datasource=eqiad%2520prometheus%252Fops&var-mirror_name=main-eqiad_to_eqiad [22:18:31] RECOVERY - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is OK: (C)1e+05 gt (W)1e+04 gt 798 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad%2520prometheus%252Fops&var-lag_datasource=eqiad%2520prometheus%252Fops&var-mirror_name=main-eqiad_to_eqiad [22:29:40] PROBLEM - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is CRITICAL: 1.142e+05 gt 1e+05 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad%2520prometheus%252Fops&var-lag_datasource=eqiad%2520prometheus%252Fops&var-mirror_name=main-eqiad_to_eqiad [22:48:41] RECOVERY - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is OK: (C)1e+05 gt (W)1e+04 gt 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad%2520prometheus%252Fops&var-lag_datasource=eqiad%2520prometheus%252Fops&var-mirror_name=main-eqiad_to_eqiad [23:19:51] PROBLEM - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is CRITICAL: 1.147e+05 gt 1e+05 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad%2520prometheus%252Fops&var-lag_datasource=eqiad%2520prometheus%252Fops&var-mirror_name=main-eqiad_to_eqiad [23:28:15] milimetric: i think here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Geowiki#Comparing_Data_with_Previous_System [23:28:35] yea? [23:28:37] milimetric: it will helpful to quantify percentage wise how different is old and newer data [23:28:56] milimetric: so asaf has a quick go to number, maybe we need to do it per wiki [23:29:25] milimetric: the detail explanation can dwell into details as to where differences come from . [23:29:30] makes sense nuria_, but the percent is not at all consistent, I'll give an average and a range, and try to explain a bit [23:29:53] nuria_: feel free to leave comments on the talk page, I should go do bedtime now [23:30:05] (for baby, I'm not a chicken and so I don't go to bed at 7 :)) [23:30:15] milimetric: see here: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2/Data_Quality [23:30:47] milimetric: the quantification helps to gauge differences. [23:31:11] milimetric: those are going to become even more apparent when we compute the index data that leila and yourself extracted from here [23:31:44] milimetric: computation of reduced data still ongoing [23:31:48] will check tonite [23:34:51] RECOVERY - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is OK: (C)1e+05 gt (W)1e+04 gt 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad%2520prometheus%252Fops&var-lag_datasource=eqiad%2520prometheus%252Fops&var-mirror_name=main-eqiad_to_eqiad