[00:57:55] 10Analytics, 10Fundraising-Backlog, 10fundraising-tech-ops: Bring Banner History data into Fundraising infrastructure - https://phabricator.wikimedia.org/T253050 (10DStrine) [02:35:21] 10Analytics, 10Analytics-Cluster: rev_user and rev_user_text == NULL in wmf_raw.mediawiki_revision & - https://phabricator.wikimedia.org/T254835 (10diego) [03:04:46] (03CR) 10Nuria: [C: 03+1] "Leaving joseph to +2" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/595189 (https://phabricator.wikimedia.org/T251542) (owner: 10Mforns) [03:17:39] 10Analytics, 10Analytics-Cluster: rev_user and rev_user_text == NULL in wmf_raw.mediawiki_revision - https://phabricator.wikimedia.org/T254835 (10diego) [06:04:50] goood morning [06:08:39] 10Analytics, 10Scoring-platform-team: [Discuss] ORES model development and deployment processes - https://phabricator.wikimedia.org/T216246 (10Aklapper) 05Stalled→03Open The previous comments don't explain who or what (task?) exactly this task is stalled on (["If a report is waiting for further input (e.g.... [07:32:37] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade AMD ROCm to latest upstream - https://phabricator.wikimedia.org/T247082 (10elukey) stat1005 upgraded! Docs updated as well https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/AMD_GPU [07:32:45] !log upgrade ROCm to 3.3 on stat1005 [07:32:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:41:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade AMD ROCm to latest upstream - https://phabricator.wikimedia.org/T247082 (10elukey) [07:48:42] 10Quarry: Update document title on title change - https://phabricator.wikimedia.org/T254847 (10BrandonXLF) [08:22:56] 10Analytics, 10Analytics-Cluster: rev_user and rev_user_text == NULL in wmf_raw.mediawiki_revision - https://phabricator.wikimedia.org/T254835 (10ashley) Can't speak for the WMF's analytics setup etc. but in general [[https://www.mediawiki.org/wiki/Actor_migration|mw:Actor migration]] is a thing, so yes, attem... [08:34:57] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10elukey) [08:41:16] 10Analytics, 10Performance-Team: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Gilles) @Nuria when did those last happen? I can't find recent examples in logstash. The validation errors I see in there come from bots or people using ancient browsers (Firefox from 2006 or Chrome... [08:45:58] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10elukey) [10:22:22] 10Analytics: reset of burrow metrics for consumer group - https://phabricator.wikimedia.org/T254498 (10hnowlan) Great, thanks! [11:25:53] 10Analytics, 10Analytics-Cluster: rev_user and rev_user_text == NULL in wmf_raw.mediawiki_revision - https://phabricator.wikimedia.org/T254835 (10JAllemandou) I confirm @ashley's point. You should use `actor_id` to join to the `actor` table, or use `wmf.mediawiki_history` where that join is already done. [12:34:12] 10Analytics, 10DBA: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10Marostegui) [12:34:25] 10Analytics, 10DBA: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10Marostegui) p:05Triage→03Medium [12:45:36] 10Analytics, 10DBA: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10elukey) No preference! I think that given the limited downtime it can be done anytime, no problems from my side. Do you require help in doing anything? [12:46:57] 10Analytics, 10DBA: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10Marostegui) >>! In T254870#6205693, @elukey wrote: > No preference! I think that given the limited downtime it can be done anytime, no problems from my side. Do you require help in... [12:48:09] (03PS2) 10Joal: Add a corrected bzip2 codec for spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603590 (https://phabricator.wikimedia.org/T243241) [12:48:34] (03CR) 10Joal: "Tested on cluster" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603590 (https://phabricator.wikimedia.org/T243241) (owner: 10Joal) [12:58:46] (03CR) 10Joal: "2 comments/idea" (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/595189 (https://phabricator.wikimedia.org/T251542) (owner: 10Mforns) [12:59:00] thx joal :] [12:59:19] np mforns - Please don't hesitate to discuss my ideas :) [13:04:17] joal: no no, they are cool tips, will do! [13:04:51] ack mforns - I think me not dong a lot of code those days makes me feel not confident in my comments :) [13:06:51] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10Ottomata) Sounds good! [13:14:30] * elukey errand for a bit! [13:20:02] (03CR) 10Ottomata: [C: 03+1] "Just so I understand, we intend to use commons-compress 1.2.0 when it is released, which would allow us to then remove this code?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603590 (https://phabricator.wikimedia.org/T243241) (owner: 10Joal) [13:24:02] (03CR) 10Joal: "> we intend to use commons-compress 1.2.0 when it is released, which would allow us to then remove this code?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603590 (https://phabricator.wikimedia.org/T243241) (owner: 10Joal) [13:50:49] addshore: Heya - did minor changes on your doc about our talk - I hope it was helpful :) [13:57:49] gone for kids - back for standup [14:06:09] 10Analytics, 10Analytics-Kanban, 10EventStreams, 10Operations, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10BBlack) Looking at that other ticket T250912 - would an in-band service ping or NOP event of some kind ad... [14:20:55] 10Analytics, 10Analytics-Kanban, 10Operations, 10vm-requests: Create archiva1002 as replacement of archiva1001 - https://phabricator.wikimedia.org/T254890 (10elukey) [14:31:27] 10Analytics, 10Performance-Team: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Nuria) These errors are are from last week, it might be a total one-off and you can ignore, ticket was just an FYI. [14:56:44] joal: just one comment on your comments: your second comment, isn't it the same concept we discussed in your last comment in the previous pass? [14:57:21] uou, lots of comments :P [15:20:55] 10Analytics, 10Performance-Team: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Gilles) I can't find that particular error in logstash at that date, though: https://logstash.wikimedia.org/goto/9dd45df06d68b8f928681dfdc89946e5 Did you pull it from elsewhere? [15:23:32] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10elukey) Enabled the repository-purge consumer in the "Repository Scanning" tab, from the documentation it should help in purging data. [15:26:54] 10Analytics, 10Performance-Team: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Nuria) Logstash would only have errors where records do not validate, this one does validate, issues are around the values of the given keys (integers out of range) [15:32:26] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Epic, 10Product-Infrastructure-Team-Backlog (Kanban): Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10mpopov) [15:32:39] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Epic, 10Product-Infrastructure-Team-Backlog (Kanban): Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10mpopov) [15:35:02] 10Analytics, 10Performance-Team: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Gilles) So, it's valid as per the schema, but fails to be inserted into Hive because the values are out of range? [15:37:19] https://grafana.wikimedia.org/d/000000377/host-overview?panelId=12&fullscreen&orgId=1&refresh=5m&var-server=archiva1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc [15:37:23] \o/ [15:38:06] (03PS5) 10Mforns: Make anomaly detection correctly handle holes in time-series [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/595189 (https://phabricator.wikimedia.org/T251542) [15:38:28] (03CR) 10Mforns: [V: 03+2] Make anomaly detection correctly handle holes in time-series (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/595189 (https://phabricator.wikimedia.org/T251542) (owner: 10Mforns) [15:38:38] the archiva git fat script complained a bit [15:38:54] (03CR) 10Mforns: [V: 03+2] "Tested again with real data, seems good to go!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/595189 (https://phabricator.wikimedia.org/T251542) (owner: 10Mforns) [15:42:03] 10Analytics, 10Performance-Team: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Nuria) >Where can I find the list of these? There is no list I can point you too, these events are dropped when processed but exist on the raw data. To be clear there are very few. [15:48:26] joal awesome! [16:01:38] nuria: standup? [16:02:32] (03CR) 10Joal: [C: 03+2] "Looks good to me :) Thanks for the changes mforns" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/595189 (https://phabricator.wikimedia.org/T251542) (owner: 10Mforns) [16:02:52] thx joal :] [16:02:55] :) [16:04:29] (03CR) 10Mforns: [V: 03+2] Make anomaly detection correctly handle holes in time-series [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/595189 (https://phabricator.wikimedia.org/T251542) (owner: 10Mforns) [16:20:43] 10Analytics, 10Performance-Team: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Gilles) How can I inspect the raw data? Looking at the raw files in hdfs as mentioned in the docs (eg. /mnt/hdfs/wmf/data/raw/eventlogging/eventlogging_NavigationTiming/...), they're not human-read... [16:38:47] (03PS1) 10Milimetric: Show languages in dropdown [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/604065 (https://phabricator.wikimedia.org/T246971) [16:42:20] (03CR) 10Milimetric: "@fdans, you referenced selectableLanguages here: https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/589606/4/src/components/SiteLan" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/604065 (https://phabricator.wikimedia.org/T246971) (owner: 10Milimetric) [17:03:43] 10Analytics, 10Product-Analytics (Kanban), 10Readers-Web-Backlog (Needs Product Owner Decisions), 10covid-19: Weekly updates on editors - https://phabricator.wikimedia.org/T248427 (10kzimmerman) 05Open→03Resolved [17:03:45] 10Analytics, 10Product-Analytics (Kanban), 10Readers-Web-Backlog (Needs Product Owner Decisions), 10covid-19: Weekly updates on editors & readers - https://phabricator.wikimedia.org/T247873 (10kzimmerman) [17:21:06] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10elukey) Better but I still see a lot of old files: ` elukey@archiva1001:/var/lib/archiva/repositories$ sudo du -hs * 68K internal 16G mirrored 665M python 34G releases 9.1G snapshots ` [17:21:56] 10Analytics, 10Product-Analytics (Kanban), 10Readers-Web-Backlog (Needs Product Owner Decisions), 10covid-19: Weekly updates on editors - https://phabricator.wikimedia.org/T248427 (10MMiller_WMF) @jwang @kzimmerman -- thank you for working on this. Is this weekly edits data available in Turnilo/Superset?... [17:43:23] * elukey off! [18:10:03] 10Analytics, 10Product-Analytics (Kanban), 10Readers-Web-Backlog (Needs Product Owner Decisions), 10covid-19: Weekly updates on editors - https://phabricator.wikimedia.org/T248427 (10jwang) @MMiller_WMF, it's just through the dashboard I posted. The database, which Turnilo/Superset can access to, is scoop... [18:18:21] 10Analytics, 10Product-Analytics (Kanban), 10Readers-Web-Backlog (Needs Product Owner Decisions), 10covid-19: Weekly updates on editors - https://phabricator.wikimedia.org/T248427 (10Nuria) @MMiller_WMF the tables @jwang is using are available in superset so you can run selects like these through superse... [18:35:57] ottomata: ping me whenever you want to discuss your task! I might leave for 20 minutes during the next hour, but will be back [18:54:02] 10Analytics, 10Analytics-Kanban, 10EventStreams, 10Operations, and 2 others: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) Hm, I'm pretty sure the connection is terminated even when there are events being sent. ` ti... [19:07:56] milimetric: i think deleting schemas is a bad idea :) [19:07:56] https://meta.wikimedia.org/wiki/Schema:KaiOSAppConsent [19:08:02] mforns: oh! sorry ok! [19:08:12] umm, i'm here, want to talk at :30 after? [19:10:30] ottomata: yes :] [19:24:26] (03PS3) 10Ottomata: [WIP] Add EventStreamConfig and EventStreamHelper classes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [19:27:37] mforns: fyi, i put up my current patch here [19:27:38] https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/603582 [19:27:44] we will brain bounce that stuff [19:27:51] you might have better ideas of how to structure, dunno [19:28:16] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add EventStreamConfig and EventStreamHelper classes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [19:36:38] ottomata: just sent patch to blacklist schema [20:23:48] ottomata: https://app.greenhouse.io/people/143520869?application_id=160182136 [20:23:53] ottomata: sorry! [20:24:07] ottomata: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604130 [20:25:36] 10Analytics, 10Event-Platform, 10MediaWiki-Maintenance-system, 10WMF-JobQueue, and 3 others: showJobs.php maintenance script useless and misleading in production - https://phabricator.wikimedia.org/T221224 (10eprodromou) a:05Clarakosi→03None We want to let someone else on Clinic Duty take this. [20:28:46] ah nuria that is the wrong place, sorry just noticed [20:28:50] that is the mw job exclusion [20:29:25] fixing [20:56:15] 10Analytics, 10Readers-Web-Backlog (Needs Product Owner Decisions): % of "none" referers seems too high - https://phabricator.wikimedia.org/T195880 (10Isaac) Another data point that is interesting in this discussion: Youtube provides Wikipedia articles as fact-checks / context for a variety of conspiracy theor... [21:02:54] (03PS4) 10Ottomata: [WIP] Add EventStreamConfig and EventStream classes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [21:03:23] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Epic, and 2 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10Krinkle) @jlinehan If you haven't already, you may want to look into ReadingDepth and its instrumentation. It seems closely related to this.... [21:03:54] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add EventStreamConfig and EventStream classes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [21:05:00] mforns: thanks for the EventStream class idea, I think that is much cleaner. I might still need a helper or factory class to pull some things together when working with multiple EventStreams, but the code's home makes much more sense now [21:06:03] (03PS5) 10Ottomata: [WIP] Add EventStreamConfig and EventStream classes [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [21:59:54] 10Analytics, 10Analytics-Cluster: rev_user and rev_user_text == NULL in wmf_raw.mediawiki_revision - https://phabricator.wikimedia.org/T254835 (10diego) Thanks @ashley and @JAllemandou . I've solved the problem using wmf.mediawiki_history, so no emergencies from my side. However, is that the expected behavior... [22:20:33] (03CR) 10Nuria: "Are jobs that use unzipping going to use this class explicitily? If so maybe we can add docs of how is it used?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603590 (https://phabricator.wikimedia.org/T243241) (owner: 10Joal)