[02:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [06:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [08:12:46] (03PS1) 10Aqu: Improve Spark Logger Quietness [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100394 (https://phabricator.wikimedia.org/T381074) [09:05:44] (03CR) 10Aqu: [C:03+1] build: introduce .sdkmanrc to document JDK version to use for this project [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1062712 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:05:59] (03CR) 10Gehel: [C:03+2] build: introduce .sdkmanrc to document JDK version to use for this project [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1062712 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:09:29] (03CR) 10Aqu: [C:03+1] "Cool. We may have to think about mac users not using open jdk." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1098908 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:10:18] (03CR) 10Aqu: [C:03+2] build: add sdkman configuration [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1098908 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:22:43] (03Merged) 10jenkins-bot: build: add sdkman configuration [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1098908 (https://phabricator.wikimedia.org/T346611) (owner: 10Gehel) [09:30:15] (03PS10) 10Aqu: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [09:44:43] (03PS11) 10Aqu: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [10:00:54] 10Data-Engineering (Q2 2024 October 1st - December 31th), 07Epic, 13Patch-For-Review: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition - https://phabricator.wikimedia.org/T354694#10378733 (10gmodena) [10:20:47] 06Data-Engineering, 10CirrusSearch, 10Structured Data Engineering, 06Structured-Data-Backlog, and 2 others: Migrate image recommendation to use page_weighted_tags_changed stream - https://phabricator.wikimedia.org/T372912#10378796 (10Gehel) [10:22:31] (03PS12) 10Aqu: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [10:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [11:07:13] 06Data-Engineering, 06Data-Platform-SRE: Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479 (10BTullis) 03NEW [11:07:50] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10378973 (10BTullis) [11:20:06] 06Data-Engineering, 10CirrusSearch, 10Structured Data Engineering, 06Structured-Data-Backlog, and 2 others: Migrate image recommendation to use page_weighted_tags_changed stream - https://phabricator.wikimedia.org/T372912#10379023 (10BTullis) >>! In T372912#10316874, @pfischer wrote: > @BTullis, I would ap... [11:20:28] 06Data-Engineering, 06Data-Platform, 06DBA, 07Schema-change-in-production: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742#10379024 (10Ladsgroup) [11:35:54] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10379044 (10gmodena) @tchin w e'll need to update this patch to target {T381322} [11:36:15] 06Data-Engineering, 06Data-Platform, 06DBA, 07Schema-change-in-production: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742#10379054 (10Ladsgroup) [11:36:20] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10379055 (10gmodena) @tchin actually - do we already have the s3 buckets? [12:02:22] (03Abandoned) 10Milimetric: GDI Equity Landscape Tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/941911 (owner: 10Nmaphophe) [12:23:30] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10379301 (10BTullis) p:05Triage→03High [12:47:44] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10379451 (10brouberol) These are the error messages I could see in the console: {F57778202} {F57778203} {F57778205}... [12:52:07] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10379470 (10brouberol) https://github.com/apache/airflow/blob/a238d06b8a1e631dfee58e53e9349f7a0c0fa880/airflow/api_conn... [13:00:54] 06Data-Engineering, 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10379505 (10brouberol) ` airflow_analytics=# select map_index from task_instance where dag_id='refine_to_hive_hourly' a... [13:12:40] FYI, a fleet-wide software update failed on an-test-worker1001 due to no disk space left [13:12:57] there's 57G of *_resources files in /tmp [13:19:12] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10379578 (10Marostegui) ` cumin2024@db1167.eqiad.wmnet[wikidatawiki]> stop slave; ALTER TABLE /*_*/revision Query OK, 0... [13:19:35] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10379594 (10Marostegui) [14:08:13] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10379785 (10tchin) @gmodena We don't, although I guess now is a good time to do it. Wh... [14:13:36] (03CR) 10Milimetric: "we just kind of left this up in the air, sorry about that. What is the status with this project? What should we do? I can see arguments" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/958518 (https://phabricator.wikimedia.org/T345446) (owner: 10Fabian Kaelin) [14:14:44] (03CR) 10Milimetric: [C:03+2] Update links to docs and repo [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/1049265 (https://phabricator.wikimedia.org/T357327) (owner: 10Triciaburmeister) [14:16:01] (03Merged) 10jenkins-bot: Update links to docs and repo [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/1049265 (https://phabricator.wikimedia.org/T357327) (owner: 10Triciaburmeister) [14:28:17] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [15:03:48] (03PS13) 10Gehel: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 [15:09:27] (03CR) 10Gehel: Extraction of RefineHelper (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [15:09:46] (03PS14) 10Gehel: Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 [15:14:58] (03CR) 10CI reject: [V:04-1] Extraction of RefineHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080706 (owner: 10Gehel) [15:16:32] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380070 (10brouberol) I can create the buckets. I'll need to generate S3 users as wel... [15:26:35] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380125 (10brouberol) Scratch that, found it. I need to put any private values under... [15:32:50] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380157 (10brouberol) ` brouberol@cephosd1001:~$ sudo radosgw-admin user create --uid... [15:38:40] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380183 (10brouberol) I've done the same thing for `mw-content-history-reconcile-enri... [15:40:12] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-dump-rev-content-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10380196 (10brouberol) [16:47:55] 06Data-Engineering, 06Data-Platform-SRE: Do performance testing of a big Hadoop Table hosted by Ceph - https://phabricator.wikimedia.org/T381416#10380406 (10JAllemandou) Let's try that. The network load is indeed a known impact of decoupling compute and storage in big-data world. [17:20:38] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100067 (https://phabricator.wikimedia.org/T377257) (owner: 10Joal) [17:26:48] (03CR) 10Ottomata: [C:03+1] Improve Spark Logger Quietness [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100394 (https://phabricator.wikimedia.org/T381074) (owner: 10Aqu) [17:35:16] (03PS1) 10Joal: Update load cassandra top-pageview monthly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100505 [17:44:37] (03CR) 10Btullis: [C:03+1] Update load cassandra top-pageview monthly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100505 (owner: 10Joal) [17:46:54] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100505 (owner: 10Joal) [17:50:23] !log Deploying refinery with scap [17:50:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:56:19] !log Deploying refinery onto HDFS [17:56:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:59:13] !log Rerun cassandra_load_pageview_top_articles_monthly after refinery patch deployed [17:59:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:28:18] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [18:32:05] 06Data-Engineering, 06Data Products, 06DBA, 10GlobalBlocking, 07Schema-change-in-production: Update type of gbw_expiry on the global_block_whitelist table - https://phabricator.wikimedia.org/T381521 (10Dreamy_Jazz) 03NEW [18:35:03] 06Data-Engineering, 06Data Products, 06DBA, 10GlobalBlocking, 07Schema-change-in-production: Update type of gbw_expiry on the global_block_whitelist table - https://phabricator.wikimedia.org/T381521#10380922 (10Dreamy_Jazz) [19:06:21] !log Alter wmf_raw.mediawiki_user adding user_is_temp field for temp_account project [19:06:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:07:08] !log Alter webrequest_actor 3 tables adding actor_signature_per_project_family field for automated-traffic detection heuristic changeb [19:07:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:07:40] !log Recompute 1 day of webrequest_actor_metrics for rollup to work as expected [19:07:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:18:45] (03PS1) 10Joal: Fix typo in webrequest_actor previous patch [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100521 [19:19:05] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100521 (owner: 10Joal) [19:39:19] !log deploy Airflow [19:39:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:41:49] 06Data-Engineering, 06Movement-Insights, 10Wmfdata-Python: Support creating a Spark session with a GitLab-built Conda environment - https://phabricator.wikimedia.org/T378253#10381138 (10nshahquinn-wmf) a:03nshahquinn-wmf @fkaelin has prepared an [MR implementing this](https://gitlab.wikimedia.org/repos/dat... [20:18:33] 06Data-Engineering, 10MediaWiki-extensions-WikimediaMaintenance, 07Wikimedia-production-error: PHP Deprecated: Deprecated cross-wiki access to MediaWiki\Revision\RevisionRecord. Expected: the local wiki, Actual: 'guwwiki'. Pass expected $wikiId. [Called from... - https://phabricator.wikimedia.org/T304528#10381263 [20:20:07] 06Data-Engineering, 06Data Products, 06DBA, 10GlobalBlocking, 07Schema-change-in-production: Update type of gbw_expiry on the global_block_whitelist table - https://phabricator.wikimedia.org/T381521#10381267 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup Ran with replication on the master of eac... [20:55:55] (03PS1) 10Joal: Fix pageview_actor bug from previous patch [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100543 [20:56:13] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for hotfix deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1100543 (owner: 10Joal) [20:58:11] 06Data-Engineering: Airflow skips canary-event tasks - https://phabricator.wikimedia.org/T380836#10381381 (10Ottomata) > The biggest issue I see here is that there is no other way for us to be sure this has not happened that checking each dag... Shouldn't we get emails if the dag run or task fails? [21:08:59] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10MediaWiki-Core-Hooks, 06MW-Interfaces-Team, 05FY2024-25 KR 5.2 Simplify feature development, and 2 others: Design and document new Domain Events feature in MediaWiki core - https://phabricator.wikimedia.org/T379959#10381434 (10Ahoelzl) [21:09:42] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10MediaWiki-Core-Hooks, 06MW-Interfaces-Team, 05FY2024-25 KR 5.2 Simplify feature development, and 2 others: Design and document new Domain Events feature in MediaWiki core - https://phabricator.wikimedia.org/T379959#10381432 (10Ahoelzl) [21:09:53] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06MW-Interfaces-Team: Make DomainEvents serializable - https://phabricator.wikimedia.org/T379936#10381453 (10Ahoelzl) [21:10:23] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06MW-Interfaces-Team: Explore a mechanism for publishing domain events to an event bus - https://phabricator.wikimedia.org/T379935#10381454 (10Ahoelzl) [21:11:03] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10CirrusSearch, 03Discovery-Search (Current work), 13Patch-For-Review: Move oozie/util/swift/upload/ out of the refinery oozie folder - https://phabricator.wikimedia.org/T380343#10381455 (10Ahoelzl) [21:11:06] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10CirrusSearch, 03Discovery-Search (Current work), 13Patch-For-Review: Move oozie/util/swift/upload/ out of the refinery oozie folder - https://phabricator.wikimedia.org/T380343#10381468 (10Ottomata) a:03dcausse [21:13:48] 06Data-Engineering, 10Data-Engineering-Wikistats, 06Data Products: slider bar for period in wikistats is becoming unusable - https://phabricator.wikimedia.org/T380565#10381471 (10Ottomata) [21:14:07] 10Data-Engineering (Q2 2024 October 1st - December 31th): Airflow skips canary-event tasks - https://phabricator.wikimedia.org/T380836#10381495 (10Ahoelzl) [21:14:20] 10Data-Engineering (Q2 2024 October 1st - December 31th): Reduce `refine_to_hive_hourly` airflow task number - https://phabricator.wikimedia.org/T380856#10381496 (10Ahoelzl) [21:14:22] 06Data-Engineering, 10Data-Engineering-Wikistats, 06Data Products: slider bar for period in wikistats is becoming unusable - https://phabricator.wikimedia.org/T380565#10381478 (10Ottomata) [21:18:22] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Research: Incremental HTML dataset to support "Who are moderators" SDS 1.2.3 - https://phabricator.wikimedia.org/T380874#10381510 (10Ahoelzl) [21:22:11] 06Data-Engineering: Warning of mismatch in declarations of Webrequest schema - https://phabricator.wikimedia.org/T380916#10381518 (10Ahoelzl) cc @Antoine_Quhen [21:22:16] 10Data-Engineering (Q2 2024 October 1st - December 31th): Warning of mismatch in declarations of Webrequest schema - https://phabricator.wikimedia.org/T380916#10381519 (10Ahoelzl) [21:25:14] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Data-Platform-SRE (2024.11.30 - 2024.12.20): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10381539 (10Ahoelzl) [22:05:52] 06Data-Engineering, 10MediaWiki-General, 10Event-Platform, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817#10381650 (10Ottomata) I just manually copied the [[ http... [22:28:18] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent