[02:47:52] PROBLEM - Check the last execution of monitor_refine_mediawiki_job_events_failure_flags on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_mediawiki_job_events_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:42:30] (03PS31) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [05:54:52] (03PS32) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [06:18:46] 10Analytics, 10Pageviews-API, 10User-Elukey: Improve user management for AQS Cassandra - https://phabricator.wikimedia.org/T142073 (10elukey) a:05elukey→03None [06:19:14] 10Analytics, 10User-Elukey: Alarms on pageview API latency increase - https://phabricator.wikimedia.org/T164243 (10elukey) 05Open→03Declined [06:21:55] RECOVERY - Check the last execution of monitor_refine_mediawiki_job_events_failure_flags on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_mediawiki_job_events_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:26:55] 10Analytics, 10Analytics-Kanban: reportupdater should run with python3 - https://phabricator.wikimedia.org/T253418 (10elukey) 05Open→03Invalid Marcel and I checked the error that Nuria encountered (IIUC the usage of `iteritems()` that is not supported anymore in py3) but there is no occurrence in the curre... [06:34:56] 10Analytics: Fix TLS certificate location and expire for Hadoop/Presto/etc.. and add alarms on TLS cert expiry - https://phabricator.wikimedia.org/T253957 (10elukey) [06:38:10] (03PS33) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [08:00:06] (03PS34) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [08:05:58] 10Analytics, 10Operations, 10Traffic: Spammy events coming our way for sites such us https://ru.wikipedia.kim - https://phabricator.wikimedia.org/T190843 (10King77001) thats serious [[ https://uniprojectmaterials.com | project topics ]] [08:30:19] (03PS35) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [08:39:03] 10Analytics: Decomission notebook hosts - https://phabricator.wikimedia.org/T249752 (10elukey) Thanks a lot everybody for your work! Really appreciated :) Reminder for everybody that the deprecation will happen next week. I will remove access to the nodes and send an email, leaving the hosts as they are for ano... [08:47:11] (03PS36) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [08:49:43] joal: o/ [08:56:53] (03PS37) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [09:05:57] 10Analytics, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10elukey) Highlights for release 0.14: https://github.com/apache/druid/releases/tag/druid-0.14.0-incubating * New console (merges coordinator and historical ones) https://dr... [09:07:46] 10Analytics, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10elukey) Highlights for 0.15: https://github.com/apache/druid/releases/tag/druid-0.15.0-incubating * new Druid data load UI, seems interesting for spec ingestion. [09:11:50] (03PS38) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [09:20:02] (03PS39) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [09:21:58] 10Analytics, 10ChangeProp, 10Event-Platform, 10MediaWiki-JobQueue, and 5 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088 (10Naike) [09:31:11] 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10Core Platform Team Workboards (Clinic Duty Team), 10Performance-Team (Radar): Remove usage of MEDIAWIKI_JOB_RUNNER from WikimediaEvents extension - https://phabricator.wikimedia.org/T247130 (10Aklapper) @Naike: Could you explain what/who exactly this t... [09:31:46] joal: o/ [09:32:19] in labs I had a 3 nodes zk cluster on stretch, so I tried to upgrade one of the vms to buster [09:32:25] and it worked nicely [09:32:47] now what I am wondering is if we should just try a reimage of say druid1001 [09:33:28] there are some difficulties atm to preserve the /srv partition, but I am thinking that we don't really *need* to preserve the druid cache with 5 nodes [09:33:46] we could do it very gently, one node at the time [09:34:29] (03PS40) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [09:52:29] (03PS41) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [10:00:58] 10Analytics, 10Analytics-Kanban: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10elukey) [10:01:00] 10Analytics, 10Analytics-Kanban: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10elukey) p:05Triage→03Medium a:03elukey [10:05:01] !log move el2druid config from druid1001 to an-druid1001 [10:05:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:24:58] (03PS42) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [10:31:48] (03PS43) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:01:44] (03PS44) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:02:51] * elukey lunch! [11:06:19] Hi folks [11:06:53] djellel: hello :) I guess you wish to talk about producing revisions diffs [11:08:07] joal: i am working on something using diff-match-patch [11:09:38] the last wikitext history snapshot is from 2020-01, when is the next snapshot? [11:10:45] djellel: latest wikitext history snapshot is actually more recent but hive has not been updated [11:10:54] djellel: I'll do it later today [11:13:00] that'd be great. If you have any thoughts about diffs, especially on what to store, lets talk at some point. [11:14:05] djellel: precisions about history-dumps - Regular jobs converting XML to avro run every month, but mostly fail, due to the hadoop version of bzip2 codec not supporteing special cases (see T243241) [11:14:06] T243241: Some xml-dumps files don't follow BZ2 'correct' definition - https://phabricator.wikimedia.org/T243241 [11:14:36] djellel: I manually trick files so thaqt the computation can happen, but this involves manual steps, so it's not done on a regular bases [11:14:59] djellel: also - the mediawiki_wikitext_history table is avro - not XML [11:15:18] djellel: I think that's all I can think of :) [11:17:15] djellel: about dumps-processing I mean [11:17:25] (03PS45) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:18:24] djellel: about diffs... There is an interesting aspect, which is when consecutive revisions are not linked together with parent_id - That usually mean something bizzare (deletion in the middle, insertion in the midlle etc) - What strategy do we want to take when that happens? [11:20:25] joal: I don't know about corner cases.... I am going with groupby page_id, order by revision_timestamp, then compute on 1 lag window [11:22:20] djellel: that works :) I suggest you do your group-computation using mediawiki-history (smaller) - You could also take advantage of getting historical page-title, revision number etc in addition to rev_id pairs [11:22:52] And when you have your pairs, you join with wikitext [11:23:47] Another solution is to use dump-order and to have an XML reader providing revision-pairs [11:24:21] this later would lest IO intensive and would probably be correct in term of ordering in 99/9% of cases [11:25:16] djellel: you have wikitext_history up-to 2020-04 [11:27:39] strange, it doesn't show in Hue [11:28:14] joal: what's "dump-order and to have an XML" ? [11:28:31] XML-reader [11:29:53] djellel: To provide text as avro, we parse the XML using a dedicated hadoop-mediawiki-dumps-XML-reader (see https://github.com/wikimedia/analytics-wikihadoop/tree/master/src/main/scala/org/wikimedia/wikihadoop) [11:29:54] (03PS46) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:30:35] joal: I was doing the diff computation, using the XML data and a MapReduce code, yesterday. My jobs failed to finish. [11:30:56] djellel: In order not to have to shuffle text and reorder it, we could use the same approach, using a dedicated hadoop-xml-reader that would give you 2 consecutive revisions [11:31:43] joal: Ok, I will look into that. [11:34:09] djellel: I don't know how your map-reduce job is done, so I can't really help on understanding why it failed [11:50:02] (03PS47) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:53:22] (03PS48) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:54:55] (03PS49) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:58:45] joal: around by ay chance? [11:58:50] yessir elukey [11:59:00] bonjour :) [11:59:04] did you see my msg about druid? [11:59:06] elukey: I was waiting for you to get back to not disturb ) [11:59:09] I have [11:59:10] (the reimage of 1001) [11:59:17] but would like to be sure of the understanding [11:59:22] da cave? [11:59:36] sure [12:02:38] (03PS50) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [12:03:17] Achievement Unlocked: push 50 patch sets to a code review [12:09:48] * joal bow to fdans - gerrit master by number [12:10:27] joal: i wish I was oozie master :( [12:10:49] fdans: oozie masters us all - we all are puppets in its hands [12:14:04] joal: the pagecounts_ez dataset finally works :_) [12:14:11] https://hue.wikimedia.org/oozie/list_oozie_coordinator/0028841-200507064132789-oozie-oozi-C/ [12:14:23] frozen elukey :() [12:19:46] 10Analytics, 10Analytics-Kanban: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` druid1001.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202005291219_elukey_... [12:19:58] !log reimage druid1001 to Debian Buster [12:20:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:41:41] 10Analytics, 10Analytics-Kanban: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10elukey) ` 12:39:38 | druid1001.eqiad.wmnet | WARNING: unable to verify that BIOS boot parameters are back to normal, got: Boot parameter version: 1 Boot parameter 5 is valid/unlocked Boot parameter da... [12:43:49] djellel: heya - question for you - From a reader providing you revision pairs, how would you like the thing to work? First is (empty, 1st-rev), or first is (1st-rev, 2nd-rev), similarly, last is (last-rev, empty), or last is (prev-last-rev, last-rev) ? [12:44:25] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10mpopov) >> a perplexing permissions issue > Hm! Weird, the docs directory didn't have group executable perms, which means you can't read contents of the directory. > ` > $ sudo chmod g+x /srv/published/dashboards/Wikipedia_... [12:47:16] djellel: I suggest using: (empty, 1st-rev) ... (prev-last-rev, last,rev) [12:48:06] joal: as long as the order is consistent, no difference from my perspective. [12:48:49] djellel: it matters for pages having signle revs, as if we use (1st, 2nd) and (prev-last, last), pages with a single rev are ignored [12:48:58] I'll go with the suggested :) [12:49:43] I prefer (empty, first) [12:50:18] makes more sense as well to me djellel - with that, first diff is actually full-creation [12:50:23] that preserves the semantic [13:03:58] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10MW-1.35-notes (1.35.0-wmf.35; 2020-06-02): All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10mpopov) >>! In T251935#6172791, @Ottomata wrote: > Hm, we should settle on a config... [13:05:30] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Druid to Debian Buster - https://phabricator.wikimedia.org/T253980 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['druid1001.eqiad.wmnet'] ` and were **ALL** successful. [13:05:53] (03PS1) 10Joal: Add XMLToJSONRevisionPair input format [analytics/wikihadoop] - 10https://gerrit.wikimedia.org/r/599837 [13:08:19] (03PS2) 10Joal: Add XMLToJSONRevisionPair input format [analytics/wikihadoop] - 10https://gerrit.wikimedia.org/r/599837 [13:16:36] 10Quarry, 10DBA, 10Data-Services: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Marostegui) So this query takes around 25 minutes to execute on an idle host, so on a normal loaded hosts it is perfectly possible that it would take more, so the reason it is... [13:16:43] 10Quarry, 10Data-Services: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Marostegui) [13:19:45] !log re-run druid webrequest hourly 29/05T11 (failed due to a host reimage in progress) [13:19:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:26:46] joal: new node up! [13:33:17] I see it in the coord's ui, it is already getting segments [13:34:25] * elukey afk for a coffee [13:34:27] \o/ elukey :) [13:34:33] enjoy your coffee :) [13:34:39] djellel: I have a gift for you :) [13:34:57] djellel: https://gerrit.wikimedia.org/r/#/c/analytics/wikihadoop/+/599837/ [13:40:22] joal: beautiful, thank you :D [13:40:39] djellel: I'm writting a manual to use it in spark easily [13:41:52] joal: lol, your read my mind [13:48:15] djellel: https://gist.github.com/jobar/d8b9a0cb80d27e2f6fdd092b140a3e33 [13:49:15] Ah! Forgot the imports [13:54:21] Ok here we are, gist updated with tested code djellel [13:54:30] have fun :) [14:12:58] joal: awesome, I'll try and let you know :) [14:28:47] hey teammmm :] [14:31:24] o/ [14:51:55] 10Analytics, 10ChangeProp, 10Event-Platform, 10MediaWiki-JobQueue, and 5 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088 (10Pchelolo) Ok, this has been in production for 2 years now, I think it's time to resolve the ticket. All the subtasks are not re... [15:00:23] 10Analytics, 10Operations, 10Traffic, 10Readers-Web-Backlog (Tracking): Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10Jdlrobson) [15:02:26] i forgot it's silent day! [15:02:37] I was in standup waiting like a total dingdong [15:11:12] :] [15:13:12] fdans-ding-dong - nothing about silence here :) [15:13:51] fdans: I plan to review your patch on Tuesday (Monday off in France, last day of incredible days-off May month) [15:38:24] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599159 (owner: 10Elukey) [15:38:51] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599160 (owner: 10Elukey) [15:39:35] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599161 (owner: 10Elukey) [15:40:12] (03CR) 10Mforns: [V: 03+2 C: 03+2] reportupdater.py: add verbose log exceptions [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599159 (owner: 10Elukey) [15:40:18] (03Merged) 10jenkins-bot: reportupdater.py: fix flake8 error E741 ambiguous variable name 'l' [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599161 (owner: 10Elukey) [15:40:20] (03CR) 10Mforns: [V: 03+2 C: 03+2] tox: run tests also with python 3.7 [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599160 (owner: 10Elukey) [15:40:22] (03CR) 10Mforns: [V: 03+2 C: 03+2] reportupdater.py: fix flake8 error E741 ambiguous variable name 'l' [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/599161 (owner: 10Elukey) [16:10:54] dcausse: Heya - Would you have a minute for me? [16:23:22] 10Analytics: Web publication doesn't work - https://phabricator.wikimedia.org/T253661 (10jwang) 05Open→03Resolved a:03jwang Close the ticket as issues were solved. Thank you! @Ottomata [16:31:56] joal: the '2020-04' snapshot is only available for enwiki? [16:32:06] it shouldn't djellel [16:33:49] djellel: however there are a lot of missing projects [16:34:25] I am referring to mediawiki_wikitext_history table [16:34:40] * elukey off! [16:34:44] o/ [16:34:52] have a good weekend elukey [16:35:04] Have a good weekend [16:36:31] Some missing for 2020-03 (less) djellel , and it seems 2020-02 is complete [16:36:39] I'll stat jobs to fix that [16:36:57] cool! [16:37:12] for 2020-04 if possible [16:37:28] since it's cummulative [16:49:22] djellel: it should be ok for Monday (plenty stuff to convert, will take some time) [16:49:48] no rush :) [16:50:33] the script you created above returns all the parent/child revisions for a page, correct ? [16:51:11] djellel: by script you mean? [16:51:16] the CR with the reader? [16:51:25] or the example? [16:51:37] the example [16:52:05] I am not proficient in scala. but ideally, i need to get the user_id and the timestamp of the revision [16:52:10] the example applies the reader to a dump file (simplewiki), and show 10 first/second examples [16:53:39] I have another script that'll help you with that - I have tested diff_match_patch with simplewiki (so I have another example) - It fails with workers of 16Gb and a timeout of 10s - I think you'll need to devise heuristics to filter out cases [16:55:21] re diff_match_patch, did you package it locally, or used mvn? [16:55:49] I copied the src folder onto my project [16:56:50] ok, cool! [16:58:05] the wiki diff code would be best, but it's in PHP I think :/ [16:58:33] I need to modify diff_match_patch to return word-level differences [16:59:37] I guess wiki-diff code is in PHP - But I don't know [16:59:48] it is [17:20:41] djellel: here you are - https://gist.github.com/jobar/b265683f352d4e7425be165b65524d85 [17:20:48] djellel: this has failed for me so far [17:21:15] djellel: I have it currently running (lowering timeout to 1s), I'll let you know [17:48:48] puf after changes suggested by ottomata, airflow-pyarrow still not working [17:48:52] futex issues [17:49:38] joal: thumbs-up [18:11:01] (03CR) 10Nuria: "nice, thanks for doing this" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/599302 (https://phabricator.wikimedia.org/T253847) (owner: 10Joal) [18:18:43] 10Analytics, 10Operations, 10Traffic: Spammy events coming our way for sites such us https://ru.wikipedia.kim - https://phabricator.wikimedia.org/T190843 (10Nuria) 05Open→03Declined [18:20:56] 10Analytics, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10Nuria) given note about automatic compaction let's skip trying to do that: ` Note: This is the initial implementation and has limitations on interoperability with realtime... [19:22:16] 10Analytics: Resting a Kerberos access for sguebo - https://phabricator.wikimedia.org/T254035 (10sguebo_WMF) [19:22:33] 10Analytics: Reseting a Kerberos access for sguebo - https://phabricator.wikimedia.org/T254035 (10sguebo_WMF) [19:22:49] 10Analytics: Resetting a Kerberos access for sguebo - https://phabricator.wikimedia.org/T254035 (10sguebo_WMF) [19:23:22] 10Analytics: Resetting Kerberos access for sguebo - https://phabricator.wikimedia.org/T254035 (10sguebo_WMF) [23:18:57] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10MW-1.35-notes (1.35.0-wmf.35; 2020-06-02): All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) Hm. I don't love the name `config`, which would mean: Stream Config conf...