[06:09:10] 10Analytics, 10Operations, 10User-Elukey: Investigate if a Prometheus exporter for the AMD GPU(s) can be easily created - https://phabricator.wikimedia.org/T220784 (10elukey) p:05Triage→03Normal [07:09:02] 10Analytics, 10EventBus, 10Services (watching): EventGate service runner worker occasionally killed, usually during higher load - https://phabricator.wikimedia.org/T220661 (10akosiaris) p:05Triage→03Normal [07:09:15] 10Analytics, 10EventBus, 10Operations, 10monitoring, and 3 others: Upgrade statsd_exporter to 0.9 - https://phabricator.wikimedia.org/T220709 (10akosiaris) p:05Triage→03Normal [07:31:18] Morning elukey - Shall we spark together as bit today? [07:31:29] elukey: looks like we have subnets to match :) [07:41:55] joal: morning! that thing can probably wait don't worry :) [07:42:08] but if you want to test spark2 on the test cluster I'd be super happy :D [07:42:19] 5.16.1 seems running fine [07:42:24] just finished the last checks [07:42:43] (also found something horrible about oozie-setup, basically it does su -s /bin/bash inside, sigh) [07:44:15] :( [07:44:45] elukey: Will test in a bit, in scala for now and I don't want to let it go :) [07:44:46] I am going to override the file via puppet [07:44:56] oh yes even next week! [07:44:57] no rush [08:55:34] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Change permissions for daily traffic anomaly reports on stat1007 - https://phabricator.wikimedia.org/T219546 (10elukey) @ssingh late reply as well, apologies :) I did the following: * copied over anaconda3, project_monitoring and heka from /home/jdcc to yours... [09:02:14] 10Analytics, 10Patch-For-Review, 10User-Elukey: Upgrade analytics cluster to Cloudera CDH 5.16.1 - https://phabricator.wikimedia.org/T218343 (10elukey) I have tested https://etherpad.wikimedia.org/p/analytics-cdh5.16.1 upgrading the Hadoop test cluster, all good! https://gerrit.wikimedia.org/r/503266 needs... [09:02:33] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade analytics cluster to Cloudera CDH 5.16.1 - https://phabricator.wikimedia.org/T218343 (10elukey) [09:23:09] (03PS6) 10Fdans: Replace time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) [09:25:29] (03CR) 10jerkins-bot: [V: 04-1] Replace time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) (owner: 10Fdans) [09:46:23] 10Analytics, 10Operations, 10User-Elukey: Investigate if a Prometheus exporter for the AMD GPU(s) can be easily created - https://phabricator.wikimedia.org/T220784 (10fgiunchedi) +1, something that parses the json and write metrics in text format for node-exporter to pick up sounds good to me [09:48:52] heya elukey - pausing on scala - Cna you remind me how I should access the test cluster? [09:51:17] joal: sure! so masters are analtics1028-29 [09:51:20] coord: 1030 [09:51:25] hue: 1039 [09:51:36] and then the rest up to 1041 are workers [09:52:26] elukey: I assume I should use any worker as edge node (IIRC coord has special firewall rules not making spark jobs easy [09:55:31] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade matomo1001 to latest upstream - https://phabricator.wikimedia.org/T218037 (10elukey) Packages uploaded by upstream. Tested the update in labs, in requires some db alters: ` elukey@matomo:~$ sudo -u www-data /usr/bin/php /usr/share... [09:56:03] joal: I think so yes [09:56:11] Sounds good elukey :) [10:07:34] elukey: error with spark2-shell: File does not exist hdfs://analytics-test-hadoop/user/spark/share/lib/spark2-assembly.zip [10:07:47] elukey: I know Andrew create this file manuall [10:07:59] elukey: maybe it could be good to have its creation puppetized? [10:08:17] joal: it is, but on stat1004, that it is not in the testing cluster -.- [10:08:30] Ahhhhh [10:09:40] elukey: shall I create it manually then? [10:10:35] ah no wait I am misremembering, the thing that I was talking about is a oozie lib [10:10:42] not that one [10:10:48] I can't find it in puppet indeed [10:10:57] yeah joal please do it manually [10:11:01] we'll have to puppetize it [10:11:24] wasn't aware of it :( [10:12:28] !log matomo upgraded to 3.9.1 to fix some security vulns [10:12:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:13:01] elukey: I actually think it;s just a matter of copying it to HDFS - /usr/lib/spark2/spark2-assembly.zip [10:14:42] elukey: I did `sudo -u hdfs hdfs dfs -mkdir -p /user/spark/share/lib` and `sudo -u hdfs hdfs dfs -put /usr/lib/spark2/spark2-assembly.zip /user/spark/share/lib` and no error anymore [10:16:09] ack thanks! [10:16:21] I have just upgraded matomo to its latest version [10:16:23] seems all good [10:17:24] elukey: spark2-shell works like a charm [10:17:36] elukey: will quick-try pyspark [10:18:39] lovely [10:20:11] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade matomo1001 to latest upstream - https://phabricator.wikimedia.org/T218037 (10elukey) [10:20:21] pyspark2 works great as well [10:21:09] \o/ [10:21:45] all right joal, if you have patience/time next week I think that we could upgrade prod [10:22:03] just need to schedule downtime [10:22:14] elukey: I'll give all I can :) [10:23:37] any preference for the day? [10:24:37] I'd say wednesday afternoon please - I have an electricity downtime planned on tuesday, and need to care kids ion wednesday morning - Wednesday afternoon looks promising :) [10:24:45] elukey: ok for you? [10:28:26] sure! [10:41:21] * elukey lunch! [11:24:56] 10Analytics: Decide: start_timestamp for mediawiki history - https://phabricator.wikimedia.org/T220507 (10JAllemandou) Thanks for your comment @nettrom_WMF - I should have explained the plan more thoroughtly. In the next changes for mediawiki-history, we will add fields for pages and users, ending up in having... [13:17:40] joal: Wed 17th 15:00 CET for 5.16.1's upgrade. How does it sound? [13:18:02] we should be done in ~1h [13:34:13] hey teamm [14:04:23] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) Opened https://github.com/RadeonOpenCompute/ROCm/issues/761 to... [14:47:21] 10Analytics, 10EventBus, 10Services (watching): EventGate service runner worker occasionally killed, usually during higher load - https://phabricator.wikimedia.org/T220661 (10Pchelolo) @akosiaris no, spanning up a new worker takes no time, the problem here is actually hilling old worker. The heartbeat limit... [15:01:03] elukey: timing is good for me :) [15:15:41] (03CR) 10Milimetric: "I love this direction. Like any big change, I think there are a bunch of details to work out." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/499968 (https://phabricator.wikimedia.org/T219112) (owner: 10Fdans) [15:17:27] thank youu milimetric, this is useful [15:18:12] fdans: sorry so late, got a little wrapped up with turnilo yesterday [15:18:37] no it's just in time as I finish losing my mind with router tests [15:18:39] I figured it was a bit premature for code review though, we should probably solidify the design more first [15:18:58] oh cool, yeah, part of my thoughts are on routing [15:41:52] 10Analytics: Decide: start_timestamp for mediawiki history - https://phabricator.wikimedia.org/T220507 (10Nuria) Let's please make sure this clarification/explanation appears on the docs. [15:47:07] 10Analytics, 10Operations, 10hardware-requests, 10netops, and 2 others: Upgrade kafka-jumbo100[1-6] to 10G NICs (if possible) - https://phabricator.wikimedia.org/T220700 (10ayounsi) From: https://netbox.wikimedia.org/dcim/devices/?q=kafka-jumbo&status=1 kafka-jumbo1002 kafka-jumbo1004 kafka-jumbo1005 Are... [15:47:11] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10mforns) Hey! :] I've been looking into this for a bit. Is there any documentation I can read on the flow of the surveys? Does the user click on a link... [15:51:47] mforns: let's talk about this in standup https://phabricator.wikimedia.org/T220627 [15:57:25] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10mforns) I found https://www.mediawiki.org/wiki/Extension:QuickSurveys, and it explains the code for the survey is loaded dynamically, so JS disabled is... [16:07:37] (03CR) 10Milimetric: [V: 03+2] Refactor python util.py into smaller files [analytics/refinery] - 10https://gerrit.wikimedia.org/r/502469 (https://phabricator.wikimedia.org/T220111) (owner: 10Joal) [16:07:48] Thanks milimetric :) --^ [16:17:33] (03PS1) 10Bmansurov: Oozie: wait for new Wikidata dumps before generating article recommendations [analytics/refinery] - 10https://gerrit.wikimedia.org/r/503393 (https://phabricator.wikimedia.org/T210844) [16:28:57] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10phuedx) @ovasileva: This might benefit from some investigation on our side too. [16:29:51] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Isaac) > Is there any documentation I can read on the flow of the surveys? Does the user click on a link on-wiki, that opens a Google/Qualtrics form? U... [16:43:03] 10Analytics, 10WMF-NDA-Requests: Check PPI leftovers - awight - https://phabricator.wikimedia.org/T220377 (10awight) Thanks for asking! Everything in those home directories can be safely deleted. [16:56:15] * elukey off! [17:03:49] milimetric: about lineage-first-event-timestamps, are you ok with me moving forward with the solution we advertised on the task? [17:05:53] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10phuedx) >>! In T220627#5107552, @Isaac wrote: >> And when do events for QuickSurveyInitiation and QuickSurveysResponses trigger? > * We are supposed to... [17:08:41] joal: yes [17:08:45] definitely [17:09:01] great milimetric :) [17:38:26] 10Analytics, 10EventBus, 10Services (watching): EventGate service runner worker occasionally killed, usually during higher load - https://phabricator.wikimedia.org/T220661 (10mobrovac) >>! In T220661#5107329, @Pchelolo wrote: > Having said that the shutdown approach is incorrect for 1 worker situation I stil... [17:44:54] 10Analytics: Decide: start_timestamp for mediawiki history - https://phabricator.wikimedia.org/T220507 (10nettrom_WMF) @JAllemandou : Thanks for clarifying that, very much appreciated! The description for how this will be handled for pages looks good to me. When it comes to users, I would expect `userCreationT... [18:00:18] hey a-team! The HelpPanel schema whitelist patch was merged yesterday. I don't see any events in the event_sanitized table, though. When should I be expecting to see them start showing up? [18:01:22] Nettrom, the EL sanitization whitelist is merged, but refinery (the repo it belongs) is not deployed yet. The deployment train is on Wednesdays [18:01:50] Nettrom, is that a problem for you? [18:02:08] are you going to loose data because of that? [18:02:33] mforns: nah, I'll take a snapshot to make sure we have everything [18:03:06] Nettrom, I'm working right now in adding a second pass to the sanitization process, that will resanitize after 45 days of collection [18:03:33] this will help in cases like this, where the data starts flowing in before the schema whitelist is deployed [18:03:52] mforns: that is awesome! it'll definitely be helpful for us in this case [18:04:11] I hope I can finish this in the next week or 2 [18:08:43] mforns: that sounds great! thanks for following up on this! [18:45:19] 10Analytics, 10Analytics-Cluster: Requesting account expiration extension - https://phabricator.wikimedia.org/T183291 (10Jdcc-berkman) Just to close this out, our developer put together something that could work as a good starting place for integrating this work into the production stack: https://github.com/be... [18:46:18] 10Analytics, 10Analytics-Cluster: Requesting account expiration extension - https://phabricator.wikimedia.org/T183291 (10Nuria) Nice, thank you. [21:23:35] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Isaac) > i.e. it's unlikely but not impossible that QuickSurveys could be loaded and executed before EventLogging. Hmm...that would explain a lack of i... [23:06:36] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Generalized rate limiting, deduplication, and job scheduling module - https://phabricator.wikimedia.org/T173447 (10mobrovac) p:05High→03Low [23:08:20] 10Analytics, 10EventBus, 10Multi-Content-Revisions, 10Core Platform Team Backlog (Next), 10Services (next): Redesign revision-related event schemas for MCR - https://phabricator.wikimedia.org/T186371 (10mobrovac) [23:08:29] 10Analytics, 10EventBus, 10Multi-Content-Revisions, 10Core Platform Team Backlog (Next), 10Services (next): Redesign revision-related event schemas for MCR - https://phabricator.wikimedia.org/T186371 (10mobrovac) p:05Triage→03Normal