[04:42:59] 10Analytics, 10Event-Platform, 10Research: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10bmansurov) @Ottomata thanks for reviewing the patch. I've merged it and deployed it. [06:02:25] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) @razzi is this host ready for getting data on it? [07:01:36] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10elukey) >>! In T269211#6890919, @Marostegui wrote: > @razzi is this host ready for getting data on it? @Marostegui we only quickly checked that the /srv part... [07:06:49] good morning [07:07:59] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) It should be fine to add instances while I do the first copies. All that will happen is that puppet will attempt to create /srv/sqldata.sX and if... [07:23:12] !log drain + reimage analytics107[4,5] to Buster [07:23:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:32:52] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1074.eqiad.wmnet', 'analytics1075.eqiad.wmnet'] ` The log can be found in... [07:36:17] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) For the record I just ran: ` root@clouddb1021:/srv# pvs PV VG Fmt Attr PSize PFree /dev/sda3 tank lvm2 a-- 13.92t <4.83t root@cl... [07:37:00] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) Transfer from clouddb1013 (s1 and s3) to clouddb1021 is now on-going [07:38:53] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) Removed labsdb1012 from tendril and zarcillo [07:44:30] PROBLEM - HDFS missing blocks on an-master1001 is CRITICAL: 6 ge 5 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_missing_blocks https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=40&fullscreen [07:45:08] this is a little weird, I do 2 hosts at the time [07:45:25] (it is getting difficult to find nodes in the same rack) [07:46:01] (for the analytics* naming I mean) [07:46:11] anyway, should resolve in max 10 mins [07:49:41] Hi [07:55:18] bonjour [07:56:46] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) Remaining nodes to reimage and their racking: ` an-worker1080.eqiad.wmnet: /eqiad/A/4 an-worker1081.eqiad.wmnet: /eqiad/A/7 an-worker1082.eqiad.wmnet: /eqiad/A/7 an-worker1103.e... [07:57:07] elukey: less and less hosts to reimage! [07:57:38] joal: 33 to finish! [07:57:47] \o/ [07:57:49] currently 44 on buster [08:04:36] RECOVERY - HDFS missing blocks on an-master1001 is OK: (C)5 ge (W)2 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_missing_blocks https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=40&fullscreen [08:05:55] joal: to start the week in the optimal way, https://issues.apache.org/jira/browse/YARN-2497 [08:06:15] IIUC labels are only supported for Capacity :( [08:06:36] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1074.eqiad.wmnet', 'analytics1075.eqiad.wmnet'] ` and were **ALL** successful. [08:12:48] Arf elukey :( [08:16:12] 10Analytics: Check home/HDFS leftovers of dedcode - https://phabricator.wikimedia.org/T276748 (10MoritzMuehlenhoff) [08:16:34] joal: this is very frustrating sigh [08:17:07] I here you elukey :) [08:17:13] s/here/hear [08:17:19] pff monday [08:17:32] do you wish we start a move toward capacity scheduler? [08:18:46] joal: no idea, it would be nice if we want to have a cluster with some gpus, but not idea about the effort to migrate to capacity [08:20:02] 10Analytics-Clusters: Configure Yarn to be able to locate nodes with a GPU - https://phabricator.wikimedia.org/T264401 (10elukey) Sadly it seems that due to https://issues.apache.org/jira/browse/YARN-6636 (and other related issues), the Fair scheduler (that we use) doesn't support/respect node labels. If we want... [08:20:21] elukey: given we don't have many queues, I assume it wouldn't be a huge effort up-front, but the long-tail of tweaks might b endless [08:20:28] !log drain + reimage an-worker108[1,2] to Buster [08:20:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:21:14] joal: on the plus side, we could add hard limits to what a single job can allocate [08:21:36] correct sir - we could do proper resource-limitatio [08:24:51] 10Analytics-Radar: Presto error in Superest - only when grouping - https://phabricator.wikimedia.org/T270503 (10JAllemandou) AFAIK there is no defined ordering for execution of where clauses. Theoretically the engine (in our case, presto) can reorder the clauses as it see fits, usually for performance optimizati... [08:25:52] (03CR) 10Joal: [C: 03+2] "LGTM - If test works we can merge :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/668236 (https://phabricator.wikimedia.org/T207171) (owner: 10Lex Nasser) [08:27:53] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1081.eqiad.wmnet', 'an-worker1082.eqiad.wmnet'] ` The log can be found in... [09:12:11] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1081.eqiad.wmnet', 'an-worker1082.eqiad.wmnet'] ` and were **ALL** successful. [09:19:11] !log drain + reimage an-worker108[3,4] to Buster [09:19:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:32:52] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1083.eqiad.wmnet', 'an-worker1084.eqiad.wmnet'] ` The log can be found in... [09:46:52] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) @elukey @razzi the data for s1 and s3 has been transferred and I have moved their data directories to their final location. I am not going to copy... [10:34:01] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1083.eqiad.wmnet', 'an-worker1084.eqiad.wmnet'] ` and were **ALL** successful. [10:41:24] !log drain + reimage an-worker1104/1089 to Debian Buster [10:41:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:42:49] 10Analytics, 10Product-Infrastructure-Team-Backlog, 10Wikimedia Taiwan, 10Chinese-Sites, 10Pageviews-Anomaly: Top read is showing one page that had fake traffic in zhwiki - https://phabricator.wikimedia.org/T274605 (10Htchien) Hi @JAllemandou - CHT wants to discuss this issue privately, could I give your... [11:03:25] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1088.eqiad.wmnet', 'an-worker1104.eqiad.wmnet'] ` The log can be found in... [11:45:48] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1088.eqiad.wmnet'] ` Of which those **FAILED**: ` ['an-worker1104.eqiad.wmnet'] ` [11:52:38] 27 worker nodes remaining :) [12:02:50] TIL https://docs.feast.dev/ [12:03:32] Nice elukey --^ [12:03:56] it seems included in Kubeflow but in alpha [12:04:02] ack [12:13:43] all right going afk for lunch :) [13:05:35] (03PS2) 10Phuedx: Add new properties to UniversalLanguageSelector schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) [13:36:45] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure: [Metrics Platform] Define stream configuration syntax relevant to v1 release - https://phabricator.wikimedia.org/T273235 (10jlinehan) p:05Triage→03Medium [13:51:02] !log drain + reimage an-worker110[4,5] to Buster [13:51:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:59:33] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1105.eqiad.wmnet', 'an-worker1106.eqiad.wmnet'] ` The log can be found in... [14:28:17] 10Analytics, 10Event-Platform, 10Research: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Ottomata) Great! Is it working?! [14:30:14] 10Analytics, 10FR-Tech-Analytics, 10Fundraising-Backlog: Whitelist Portal and WikipediaApp event data for (sanitized) long-term storage - https://phabricator.wikimedia.org/T273246 (10Pcoombe) Thanks @EYener. That's correct that the 2020 portal banners ran from Nov 30 to Jan 1 inclusive. I had honestly given... [14:34:16] joal: do you happen to have a few minutes for me? [14:34:35] I sure do milimetric [14:34:39] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1105.eqiad.wmnet', 'an-worker1106.eqiad.wmnet'] ` and were **ALL** successful. [14:34:40] batcave! [14:34:50] <3 [14:52:01] !log altered topics (eqiad|codfw).mediawiki.client.session_tick to have 2 partitions - T276502 [14:52:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:52:07] T276502: [SessionLength] Change sampling rate to 5% - https://phabricator.wikimedia.org/T276502 [14:52:17] mforns: we good to go for sampling ratee increase [14:52:49] ottomata: \o/ thanks a lot! [14:54:04] !log drain + reimage an-worker110[7,8] to Buster [14:54:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:00:05] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1107.eqiad.wmnet', 'an-worker1108.eqiad.wmnet'] ` The log can be found in... [15:05:54] 10Analytics-Clusters, 10observability, 10User-fgiunchedi: Setup Analytics team in VO/splunk oncall - https://phabricator.wikimedia.org/T273064 (10fgiunchedi) hi @Ottomata @razzi @elukey, did you had a chance to look into setting things up in VO? Please let us know if you need assistance. [15:10:36] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:10:44] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:10:49] 10Analytics, 10Machine-Learning-Team: Configure the Hadoop cluster to use the GPUs available on some workers - https://phabricator.wikimedia.org/T276791 (10elukey) [15:10:56] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: ReferencePreviewsBaseline Event Platform Migration - https://phabricator.wikimedia.org/T275007 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:11:02] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10Event-Platform, and 4 others: CodeMirrorUsage Event Platform Migration - https://phabricator.wikimedia.org/T275005 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:11:04] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:11:16] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:11:25] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: ReferencePreviewsCite Event Platform Migration - https://phabricator.wikimedia.org/T275008 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:11:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:11:32] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02): ReferencePreviewsPopups Event Platform Migration - https://phabricator.wikimedia.org/T275009 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:11:41] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:11:46] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:11:51] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02): TemplateDataEditor Event Platform Migration - https://phabricator.wikimedia.org/T275012 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:11:54] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TwoColConflictConflict Event Platform Migration - https://phabricator.wikimedia.org/T275013 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:11:56] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:11:58] 10Analytics-Clusters: Configure Yarn to be able to locate nodes with a GPU - https://phabricator.wikimedia.org/T264401 (10elukey) [15:12:00] 10Analytics, 10Machine-Learning-Team: Configure the Hadoop cluster to use the GPUs available on some workers - https://phabricator.wikimedia.org/T276791 (10elukey) [15:12:04] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TemplateDataApi Event Platform Migration - https://phabricator.wikimedia.org/T275011 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:12:07] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: TwoColConflictExit Event Platform Migration - https://phabricator.wikimedia.org/T275014 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:12:09] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:12:14] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics, and 2 others: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333 (10Ottomata) 05Open→03Resolved [15:12:16] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:12:24] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics, and 2 others: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333 (10Ottomata) [15:12:33] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: VisualEditorTemplateDialogUse Event Platform Migration - https://phabricator.wikimedia.org/T275015 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:12:38] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10Event-Platform, and 3 others: CodeMirrorUsage Event Platform Migration - https://phabricator.wikimedia.org/T275005 (10Ottomata) [15:12:44] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: ReferencePreviewsBaseline Event Platform Migration - https://phabricator.wikimedia.org/T275007 (10Ottomata) [15:12:49] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: ReferencePreviewsCite Event Platform Migration - https://phabricator.wikimedia.org/T275008 (10Ottomata) [15:12:52] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02): ReferencePreviewsPopups Event Platform Migration - https://phabricator.wikimedia.org/T275009 (10Ottomata) [15:12:55] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TemplateDataApi Event Platform Migration - https://phabricator.wikimedia.org/T275011 (10Ottomata) [15:13:00] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02): TemplateDataEditor Event Platform Migration - https://phabricator.wikimedia.org/T275012 (10Ottomata) [15:13:06] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TwoColConflictConflict Event Platform Migration - https://phabricator.wikimedia.org/T275013 (10Ottomata) [15:13:13] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: TwoColConflictExit Event Platform Migration - https://phabricator.wikimedia.org/T275014 (10Ottomata) [15:13:31] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: VisualEditorTemplateDialogUse Event Platform Migration - https://phabricator.wikimedia.org/T275015 (10Ottomata) [15:16:04] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:16:46] I was so looking forward for yarn labels [15:17:15] so I opened a task to encourage people to think about the Yarn capacity scheduler :P [15:18:34] the crazy alternative could be to remove the gpu nodes from the main cluster, and form a separate gpu-only cluster [15:18:44] and use something like Alluxio to cache hdfs [15:18:52] (like we'd do with presto) [15:19:07] too crazy? :P [15:32:26] 10Analytics, 10Machine-Learning-Team: Configure the Hadoop cluster to use the GPUs available on some workers - https://phabricator.wikimedia.org/T276791 (10elukey) The alternative - more crazy - idea would be to remove the GPU nodes from the hadoop cluster, and create another dedicated one without HDFS. We'd... [15:32:58] you know elukey maybe not so crazy...especially if they could use the same hdfs? [15:32:59] dunno [15:33:44] ottomata: hello :) I think that alluxio can be instructed to use say RAM + HDD (like a tiered cache) and write back to HDFS (the source) when needed [15:34:09] OH and USE oomoething like aluxxio [15:34:12] interesti g [15:34:13] yeahhH! [15:34:16] exactly yes [15:34:23] if that works that is a very cool idea [15:34:37] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1107.eqiad.wmnet', 'an-worker1108.eqiad.wmnet'] ` and were **ALL** successful. [15:34:59] I am not sure if the stat boxes would be ok in having multiple hadoop cluster configs, probably puppet would need to be adjusted :D [15:36:04] elukey: i think they would, there is a 'conf dir' setting in puppet iirc, which uses update-alternatives [15:36:13] so, we could make an /etc/hadoop/conf-gpu [15:36:16] and just set [15:36:25] HADOOP_HOME or HADOOP_CONF_DIR (or whateer it is?) when we want to use that [15:36:40] /etc/hadoop/conf is a symlink [15:36:43] yes could be an option [15:36:45] and just the default [15:42:35] I am wondering if this could be ok or not in light of the new k8s/ML cluster, but the training one is far from now [15:44:00] I can ask later on to the ML team during the weekly meeting [15:47:22] aye [15:59:23] (03CR) 10Ottomata: [C: 03+2] Migrate legacy EL schemas EditAttemptStep and VisualEditorFeatureUse [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668773 (https://phabricator.wikimedia.org/T267343) (owner: 10Ottomata) [16:08:49] 10Analytics, 10Event-Platform, 10Inuka-Team: InukaPageView Event Platform Migration - https://phabricator.wikimedia.org/T267344 (10Ottomata) @nshahquinn-wmf @SBisson Can we decline this task and not migrate InukaPageView, because {T265921} ? [16:17:37] !log drain + reimage an-worker1109/1110 to Buster [16:17:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:27:29] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1109.eqiad.wmnet', 'an-worker1110.eqiad.wmnet'] ` The log can be found in... [16:28:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [16:29:11] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Structured-Data-Backlog, 10Patch-For-Review: SuggestedTagsAction Event Platform Migration - https://phabricator.wikimedia.org/T267351 (10mforns) [16:37:06] 10Analytics, 10CirrusSearch, 10SRE, 10Wikidata, and 3 others: Upgrade prometheus-jmx-exporter - https://phabricator.wikimedia.org/T276595 (10MPhamWMF) [16:53:33] 10Analytics, 10CirrusSearch, 10SRE, 10Wikidata, and 3 others: Upgrade prometheus-jmx-exporter - https://phabricator.wikimedia.org/T276595 (10colewhite) [16:53:37] 10Analytics-Radar, 10Cassandra, 10observability, 10Puppet, and 2 others: Upgrade prometheus-jmx-exporter on all services using it - https://phabricator.wikimedia.org/T192948 (10colewhite) [17:01:33] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1109.eqiad.wmnet', 'an-worker1110.eqiad.wmnet'] ` and were **ALL** successful. [17:08:15] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10polishdeveloper) a:05polishdeveloper→03Edtadros [17:09:24] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10polishdeveloper) [17:10:53] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10polishdeveloper) [17:12:48] !log drain + reimage an-worker11[13,14] to Buster [17:12:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:18:57] 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10MW-1.36-notes (1.36.0-wmf.22; 2020-12-15), and 2 others: [L] Instrument MediaSearch results page - https://phabricator.wikimedia.org/T258183 (10CBogen) @egardner can this ticket be closed? [17:21:01] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1113.eqiad.wmnet', 'an-worker1114.eqiad.wmnet'] ` The log can be found in... [17:21:41] elukey: quick q: have you deleted data from Thorium on sta1006? [17:23:43] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Clean up issues with jobs after Hadoop Upgrade - https://phabricator.wikimedia.org/T274322 (10Milimetric) [17:25:21] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure: Roll-up raw sessionTick data into distribution - https://phabricator.wikimedia.org/T271455 (10Milimetric) [17:29:47] 10Analytics, 10Analytics-Kanban: Test hudi and Iceberg as an incremental update system using 2 mediawiki-history snapshots - https://phabricator.wikimedia.org/T262256 (10Milimetric) Initial test is done, calling this done, we can either upgrade to Spark 3 using Iceberg's snapshot feature, even if data mutation... [17:30:50] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Productionize HDFS fsimage data analysis job - https://phabricator.wikimedia.org/T261283 (10Milimetric) [17:37:36] !log rebalance kafka partitions for webrequest_upload partition 10 [17:52:08] joal: nope still haven't done t [17:52:09] *it [17:52:12] will do it tomorrow [17:52:23] ack elukey np, we were cleaning the kanban :) [17:55:34] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1113.eqiad.wmnet', 'an-worker1114.eqiad.wmnet'] ` and were **ALL** successful. [17:58:21] 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10MW-1.36-notes (1.36.0-wmf.22; 2020-12-15), and 2 others: [L] Instrument MediaSearch results page - https://phabricator.wikimedia.org/T258183 (10egardner) >>! In T258183#6893310, @CBogen wrote: > @egardner can this ticket be closed? The Aud... [18:11:04] !log drain + reimage an-worker11[15,16] to Buster [18:11:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:18:00] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1115.eqiad.wmnet', 'an-worker1116.eqiad.wmnet'] ` The log can be found in... [18:18:12] 10Analytics, 10Event-Platform, 10Inuka-Team: InukaPageView Event Platform Migration - https://phabricator.wikimedia.org/T267344 (10nshahquinn-wmf) >>! In T267344#6892799, @Ottomata wrote: > @nshahquinn-wmf @SBisson Can we decline this task and not migrate InukaPageView, because {T265921} ? No, we still need... [18:35:17] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10nshahquinn-wmf) >>! In T273219#6834149, @SBisson wrote: > @Ottomata Sure, will let you know when all 3 are settled. @Ottomata I'... [18:35:30] 19 hadoop workers left :D [18:38:33] elukey: you have a steady fast pace :) [18:41:35] joal: hopefully I'll be done in a couple of days, happy about the pace, but masters + coords will be more challenging :D [18:41:45] ack! :) [18:42:05] razzi: when you have a moment start to think about reimagining the master to buster, and the coordinators, we'll have fun in planning those :D [18:42:28] elukey: sounds good [18:42:29] (what pitfalls, things to take care, procedure, etc.. so we can discuss it) [18:43:05] razzi: also for clouddb1021, do you want to do anything today? Otherwise I can follow up tomorrow with Manuel about what procedure to use to bootstrap say the s1 and s3 replica [18:43:26] (and then let you do the procedure when you join) [18:43:53] I was just about to ask about that; will it be more complex than what you have at https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/DBs#First_run_of_MariaDB? [18:44:49] razzi: in theory no, in practice I need to ask to Manuel or Stevie to double check.. But if you want to can follow up and ask if they are still online [18:45:43] ok, I can ask around #wikimedia-databases, it'll be a good learning experience [18:45:45] razzi: one thing that I am wondering is if we should only list, for the moment, s1 and s3 in hiera for the new role, so when you'll apply it we'll get only two new instances (as opposed to 10) [18:46:03] and we'll add the other other ones as Manuel copies over [18:46:21] elukey: that would make sense, to roll things out more gradually [18:49:42] !log rebalance kafka partitions for webrequest_upload partition 11 [18:49:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:50:30] razzi: ack I'll be back in 10/15 mins :) [18:53:07] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1115.eqiad.wmnet', 'an-worker1116.eqiad.wmnet'] ` and were **ALL** successful. [18:54:07] 10Analytics, 10Event-Platform, 10Inuka-Team: InukaPageView Event Platform Migration - https://phabricator.wikimedia.org/T267344 (10Ottomata) Ah ok, got it! [18:57:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10Ottomata) Thanks, if I don't get to this this week, will do the next. [18:58:16] hey all [18:58:23] I have annual planning questions [18:58:31] should I bring them up here? [19:00:07] Hi tltaylor - if the questions are 'formal', or require more team-thoughts, I think an email to analytics-internal@wikimedia.org is better [19:00:31] tltaylor: if questions are quick turn around then here is fine :) [19:04:58] hmm... let me go read a bit before I formulate them [19:10:55] tltaylor: i think the correct email is Internal team communication for the Analytics team [19:13:17] ok. I trust you all read your email on a regular basis then [19:13:52] razzi: +1ed the code change, pcc looks good! [19:14:22] cool elukey, merging! [19:14:42] razzi: maybe as precautionary step add one day of downtime to the node so it will not alert until tomorrow morning [19:14:51] good idea [19:15:09] ack then, looking forward to see the instances up tomorrow! [19:15:28] going afk, have a good rest of the day folks :) [19:18:08] byeeee [19:24:06] * razzi afk for lunch [19:29:14] PROBLEM - Check the last execution of eventlogging_to_druid_prefupdate_hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_prefupdate_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:46:21] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: MobileWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T267347 (10mforns) [19:46:30] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: DesktopWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T271164 (10mforns) [19:58:26] ottomata: I missed the backport window today for session length sampling rate increase. Would it be too much to ask, if you could deploy that? 😬 [19:59:28] hmm, i have a meeting now mforns , would it be ok to do tomorrow morning? so we don't deploy that at end of (my) day? [19:59:46] ottomata: of course! [20:00:40] no worries, this way we have more time to follow up [20:02:08] RECOVERY - Check the last execution of eventlogging_to_druid_prefupdate_hourly on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_prefupdate_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:12:04] 10Analytics, 10CirrusSearch, 10SRE, 10Wikidata, and 4 others: Upgrade prometheus-jmx-exporter - https://phabricator.wikimedia.org/T276595 (10crusnov) p:05Triage→03Medium [20:47:25] (03CR) 10Sharvaniharan: "This change is ready for review." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [20:55:04] 10Analytics-Radar, 10SRE, 10ops-eqiad: Try to move some new analytics worker nodes to different racks - https://phabricator.wikimedia.org/T276239 (10wiki_willy) [21:03:31] 10Analytics-Radar, 10SRE, 10ops-eqiad: Try to move some new analytics worker nodes to different racks - https://phabricator.wikimedia.org/T276239 (10wiki_willy) Some of the mw servers in rack A7 should be decom'd, after T273915 is installed for the refresh. Since the power in A7 is maxing out, I think we sh... [21:23:01] mforns: shall I merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/669962? [21:23:14] those schemas have already been fully migrated for a while? [21:23:34] ottomata: yes, that is their next step [21:23:40] k merging [21:23:46] they have been fully migrated for a couple weeks [21:23:48] will run puppet etc. too [21:23:50] perfect [21:23:52] ok, thanks! [21:24:04] I'll proceed to pushing changes for the extension [21:24:23] great yeah! i mean, if we can let's merge that today so it gets in the train this week [21:29:58] 10Analytics, 10Event-Platform, 10Product-Data-Infrastructure, 10Patch-For-Review: PrefUpdate Event Platform Migration - https://phabricator.wikimedia.org/T267348 (10Ottomata) [21:31:00] 10Analytics, 10Editing-team, 10Event-Platform, 10Patch-For-Review: EditAttemptStep Event Platform Migration - https://phabricator.wikimedia.org/T267343 (10Ottomata) [21:31:06] 10Analytics, 10Editing-team, 10Event-Platform, 10Patch-For-Review: VisualEditorFeatureUse Event Platform Migration - https://phabricator.wikimedia.org/T267353 (10Ottomata) [21:33:19] ottomata: Done deploying? If so, I've got a config change that needs to go out [21:36:38] mholloway: yes done [21:36:45] awesome, thanks! [21:37:00] mforns: gimme extension.json patch! i will merge it :) [21:37:19] ottomata: ok, on it [21:47:06] ottomata: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/669989 [21:47:43] gr8 will merge after ci [21:47:44] ty [21:49:58] (03CR) 10Lex Nasser: "> Patch Set 3: Code-Review+2" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/668236 (https://phabricator.wikimedia.org/T207171) (owner: 10Lex Nasser) [23:22:08] !log rebalance kafka partitions for webrequest_upload partition 12 [23:22:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log