[00:51:39] 10Analytics: Superset Presto LIMIT <10000 error - https://phabricator.wikimedia.org/T282632 (10SNowick_WMF) [00:52:29] 10Analytics: Superset Presto LIMIT >10000 error - https://phabricator.wikimedia.org/T282632 (10SNowick_WMF) [05:46:54] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Marostegui) >>! In T120242#7069904, @Ottomata wrote: > Interesting thanks! So brainstorming how... [08:42:54] 10Analytics, 10Research: Add global locks to mediawiki_history - https://phabricator.wikimedia.org/T282657 (10Pablo) [08:57:15] 10Analytics-Clusters, 10Product-Analytics: Can't re-run failed Oozie workflows in Hue/Hue-Next (as non-admin) - https://phabricator.wikimedia.org/T275212 (10nshahquinn-wmf) 05Open→03Declined Hmm, no, that didn't work either. Given what @elukey said earlier, I think it makes sense not to invest more time in... [09:06:19] 10Analytics-Clusters, 10Analytics-Kanban: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10elukey) >>! In T278423#7079951, @razzi wrote: > Ok, here's my new plan, including draining the cluster and using safemode to take a stable fsimage. If this looks good to you @elu... [09:37:50] joal2: how did the job last night go? [09:38:23] the repair is still going - it might take an awful long time, to the point that I'm beginning to wonder whether removing that node from the list we write to might be a good idea short-term [10:03:15] hm [10:03:38] Hi hnowlan - jobs failedlast night - I have not checked the errors though, will do [10:05:25] Amir1: You're live for reliability data, with data starting the first of May :) [10:05:42] joal2: AMAZING. [10:05:50] Did the job ran and finished? [10:07:42] 10Analytics, 10Research: Add global locks to mediawiki_history - https://phabricator.wikimedia.org/T282657 (10JAllemandou) Hi @Pablo - Do you know in which DB the data is stored? if it is in the `centralauth` one we don' have it. This task should then become adding data from centralauth to the lake and the `m... [10:09:56] Amir1: I have not checked if data is available, but jobs ran successfully from an oozie perspective :) [10:10:08] Coool [10:10:12] Thanks. I check [10:10:33] hnowlan: Just looked at the logs: same first host for failure [10:14:16] argh [10:14:42] hnowlan: are we running a full or incremental repair on the instance? [10:18:14] elukey: it's a full repair - I believe a full repair is required before moving to sequential repairs [10:18:29] er incremental repairs [10:36:40] hnowlan: ahh okok, but only for that instance right? [10:37:43] (bbiab) [10:39:17] joal2: I checked and it works. The problem is that it works too well. It has data for previous days, is that intentional? https://grafana.wikimedia.org/d/79S1Hq9Mz/wikidata-reliability-metrics?viewPanel=15&orgId=1&refresh=1d&from=now-7d&to=now [10:40:24] Amir1: It is expected :) We started the job on May 1st (https://grafana.wikimedia.org/d/79S1Hq9Mz/wikidata-reliability-metrics?viewPanel=15&orgId=1&refresh=1d&from=now-20d&to=now) [10:40:45] Amazing [10:40:49] Cool [10:40:50] Thanks [10:41:02] I just don't want it to do it several times [10:41:06] Let me rewrite that correctly Amir1 - We started the job yesterday, with a start-date of May 1st, leading to past days to be backfilled :) [10:41:20] cool [10:41:55] And from now on job will be run day by day [10:42:53] Awesome [10:46:12] elukey: yeah I think so. That said, I'm not sure if we even *can* do incremental repairs with this version of cassandra, the help for nodetool doesn't mention it [10:50:34] https://cassandra.apache.org/doc/3.11.3/operating/repair.html# ;_; [10:59:40] I won't be able to make standup today, my availability is a bit limited due to irl stuff [11:05:53] 10Analytics, 10Research: Add global locks to mediawiki_history - https://phabricator.wikimedia.org/T282657 (10Pablo) Thanks @JAllemandou! Data might be stored in the `centralauth` DB, so I am happy to update the title of this task accordingly :) In addition, I should mention that for many of these users with... [11:06:23] 10Analytics, 10Research: Adding data from centralauth to the lake and the mediawiki_history dataset - https://phabricator.wikimedia.org/T282657 (10Pablo) [11:20:52] Moving spot in the next hours - back tonight [12:33:22] 10Analytics, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10MW-1.36-notes (1.36.0-wmf.30; 2021-02-09): eventgate_validation_error for NewcomerTask, HomepageTask, and HomepageVisit schemas - https://phabricator.wikimedia.org/T273700 (10kostajh) 05Resolved→03Open `'.event.user_editcount' should b... [12:36:30] 10Analytics, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10MW-1.36-notes (1.36.0-wmf.30; 2021-02-09): eventgate_validation_error for NewcomerTask, HomepageTask, and HomepageVisit schemas - https://phabricator.wikimedia.org/T273700 (10kostajh) p:05Triage→03Medium [13:01:40] hi all, we have a few services using the new pki infastructre specificaly debmonitor client auth traffic, debmonitor.wikimedia.org (between ats and origin) and dbtree.wikimedia.org (between ats and origin). I think there are some anal;ytics services that make use pf the sslcert module which may benefit from moving over to the new pki service and wondered if you had any services that may make [13:01:46] good test candidates [13:01:47] https://wikitech.wikimedia.org/wiki/PKI [13:09:19] jbond42: We are basically using TLS puppet host certs on hadoop and presto, so we could replace everything with the new PKI in theory (in practice we'd need to study a way to do it gracefully, but we could think about a temporary hadoop cluster jobs stop for it) [13:09:50] in ML we'll have a similar problem, Kubeflow wants a CA to get certificates for microservices on kubernetes, so I'll ping you when the time comes :) [13:10:16] maybe for the analytics use case the best could be to open a task to investigate what needs to be done [13:12:41] elukey: sounds good to me, if you open a task ill happely comment with some ideas. not too famliure with how things are implmented but its possible there may be seemless migrations options [13:14:11] razzi, ottomata - For Hadoop we can open a task to investigate what it is needed to do, I can assist if you open one adding what I have done with puppet TLS certs :) [13:19:44] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) Right, I understand that the extra maintenance this would cause could be too onerous t... [13:43:43] huh, w will also need to think about Kafka too [13:44:43] also possibly cassandra, we never added tls encryption between nodes [13:53:32] 10Analytics: Include Global blocks in mediawiki_history - https://phabricator.wikimedia.org/T282684 (10Milimetric) [13:56:17] !log removing refine_mediawiki_job Refine jobs - T281605 [13:56:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:56:21] T281605: Stop Refining mediawiki_job events in Hive - https://phabricator.wikimedia.org/T281605 [13:57:34] 10Analytics, 10Research: Adding data from centralauth to the lake and the mediawiki_history dataset - https://phabricator.wikimedia.org/T282657 (10Milimetric) [13:57:36] 10Analytics: Include Global blocks in mediawiki_history - https://phabricator.wikimedia.org/T282684 (10Milimetric) [14:00:51] mforns: mornin! [14:01:15] lemme konw if you want to do any delete and/or virutalpageview stuff before PA sync in 1.5 hrs [14:05:09] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [14:13:07] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [14:24:14] (03PS8) 10Jason Linehan: [WIP] Metrics Platform schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) [14:24:52] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Metrics Platform schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [14:25:31] 10Analytics, 10Event-Platform, 10Fundraising-Backlog, 10MW-1.37-notes (1.37.0-wmf.4; 2021-05-04): CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10Ottomata) @AndyRussG is CentralNotice special for deployment? We merged the chang... [14:28:12] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) Hi @eyener just verifying: Can we expect this code to work with EventGate as EventLogging does now by the end of Q1 and schedule the migration for just after th... [14:30:15] 10Analytics, 10Analytics-Kanban: Stop Refining mediawiki_job events in Hive - https://phabricator.wikimedia.org/T281605 (10Ottomata) Should we delete all mediawiki_job tables and data now? I think so. [14:34:35] (03PS9) 10Jason Linehan: [WIP] Metrics Platform schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) [14:35:07] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Metrics Platform schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [14:47:26] ottomata: uops, missed your ping, sup? [14:47:35] oh, I read it [14:47:57] ottomata: yes, let's drop data :] [14:48:06] ok! [14:48:22] k [14:48:47] bc? [14:48:51] ya [14:49:13] (03PS10) 10Jason Linehan: [WIP] Metrics Platform schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) [14:49:46] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Metrics Platform schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [14:55:26] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Remove unused imports [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/681683 (owner: 10Awight) [14:56:41] joal / milimetric there are a few hive UDF cleanup changes from awight in refinery-source [15:06:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10phuedx) [15:17:38] !log dropped event.mediawiki_job_* tables and data directories with mforns - T273789 [15:17:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:17:42] T273789: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 [15:27:12] 10Analytics, 10Event-Platform, 10Fundraising-Backlog, 10MW-1.37-notes (1.37.0-wmf.4; 2021-05-04): CentralNoticeBannerHistory and CentralNoticeImpression Event Platform Migration - https://phabricator.wikimedia.org/T271168 (10AndyRussG) >>! In T271168#7082147, @Ottomata wrote: > @AndyRussG is CentralNotice... [15:34:06] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10AndyRussG) Hi! So,, LandingPageImpression should be migrated, please, and... [15:34:34] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 4 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10AndyRussG) [15:57:26] 10Analytics-Clusters, 10Analytics-Kanban: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10JAllemandou) The plan looks great @razzi , and the comments as well! My nits on some small things. >> - Stop oozie coordinators >> - Navigate to https://hue.wikimedia.org/hue/... [16:52:29] 10Analytics, 10LDAP-Access-Requests, 10SRE, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo (deadline EOD Monday 17) - https://phabricator.wikimedia.org/T282589 (10elukey) [17:00:36] 10Analytics: Missing data in virtualpageview_hourly table since April 15, 2021 - https://phabricator.wikimedia.org/T282710 (10cchen) [17:20:57] joal: milimetric want to talk gobblin before buod meeting in 40 mins? [17:22:08] maybe at :50 (10 minutes before meeting)? [17:22:15] Can do ottomata - We can also do it during the meeting (architectural change) [17:22:19] as you wish [17:22:59] if it only take 10 miins ok! i may miss the camus/gobblin meeting [17:24:03] mforns: yt? shall I do vpv group0+1? [17:46:18] ottomata: back, we have 15 mins until BUOD, is that enough? [17:47:09] mforns: let's chill :) [17:47:15] could go wrong and then we'd be frantic [17:47:23] ok [17:48:17] ottomata: I'm here if you wish to talk gobblin for 10 mins [17:48:47] k! [17:48:52] bc [18:04:56] 10Analytics: Missing data in virtualpageview_hourly table since April 15, 2021 - https://phabricator.wikimedia.org/T282710 (10cchen) [18:15:54] 10Analytics, 10Analytics-Kanban: Consolidate labs / production sqoop lists to a single list - https://phabricator.wikimedia.org/T280549 (10razzi) 05Open→03Resolved [18:54:16] 10Analytics, 10Analytics-EventLogging, 10Vector (Vector (Tracking)): EventLogging revision popup gets hidden behind content in Vector - https://phabricator.wikimedia.org/T282550 (10Jdlrobson) Let's fix this in the eventlogging extension [18:56:07] (03CR) 10Milimetric: "for convenience, testing with custom log4j:" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/686629 (owner: 10Milimetric) [19:55:50] Gone for this week team - see you on Monday [19:59:51] 10Analytics, 10LDAP-Access-Requests, 10SRE, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo (deadline EOD Monday 17) - https://phabricator.wikimedia.org/T282589 (10Aklapper) >>! In T282589#7082462, @Elitre wrote: > Are you requesting this to happen as the person in... [20:15:13] ottomata: lmk if you want to pair on deletion or migration, I'll be working for another hour. [20:17:47] mforns: lets do migration, i can deploy that for ya [20:17:58] ottomata: ok [20:18:31] ottomata: BTW, dropping of mediawiki_job_* partitions ended successfully [20:18:36] great! [20:18:44] mforns: actually, a deploy window is happening right now [20:18:46] not sure if we can deplo [20:18:47] checking [20:20:37] mforns: let's do delete then [20:20:39] bc? [20:20:47] ok omw [21:03:46] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata)