[00:01:54] 10Data-Engineering-Roadmap, 07Epic: [Spike] Evaluate AI tooling for data engineering - https://phabricator.wikimedia.org/T391422 (10Ahoelzl) 03NEW [00:03:12] 10Data-Engineering (Q4 2025 April 1st - June 30th): Analyze impact for webrequest and unique devices pipelines to derive access_method without m-dot domain - https://phabricator.wikimedia.org/T389696#10724327 (10Ahoelzl) [00:50:37] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10724345 (10xcollazo) >>! In T390727#10723216, @brouberol wrote: > Indeed! I had a reminde... [00:50:53] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10724346 (10xcollazo) [12:24:04] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 10Discovery-Search (2025.03.22 - 2025.04.11), 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all depen... - https://phabricator.wikimedia.org/T367405#10725748 [13:05:40] (03CR) 10Xcollazo: [C:03+2] "Quite a subtle bug, nice find!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1135077 (https://phabricator.wikimedia.org/T370470) (owner: 10Mforns) [13:16:32] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Figure root cause of silent failures when computing metrics for mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T387033#10726011 (10xcollazo) > I wonder if [[ https://gitlab.wikimedia.org/repos/data-engineering/du... [15:19:25] 06Data-Engineering, 06Data-Engineering-Radar, 10Beta-Cluster-Infrastructure: beta/deployment-prep Kafka clusters do not support TLS client connections - https://phabricator.wikimedia.org/T346402#10726551 (10Ottomata) [15:23:41] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 10Discovery-Search (2025.03.22 - 2025.04.11), 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all depen... - https://phabricator.wikimedia.org/T367405#10726582 [15:34:32] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Modify table maintenance mechanism to support Iceberg's rewrite_position_delete_files() - https://phabricator.wikimedia.org/T391280#10726632 (10Ottomata) @xcollazo hm! Cool! Q: Is pyspark needed to run Spark SQL? You want to use... [16:01:48] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Modify table maintenance mechanism to support Iceberg's rewrite_position_delete_files() - https://phabricator.wikimedia.org/T391280#10726729 (10xcollazo) > but do you need / want to launch with custom pyspark stuff? >Could you just la... [16:04:15] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 10Discovery-Search (2025.03.22 - 2025.04.11), 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all depen... - https://phabricator.wikimedia.org/T367405#10726731 [16:37:21] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10726833 (10FCeratto-WMF) [16:37:45] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10726834 (10FCeratto-WMF) Section s8 finished excluding DC masters [17:39:28] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Modify table maintenance mechanism to support Iceberg's rewrite_position_delete_files() - https://phabricator.wikimedia.org/T391280#10727200 (10Ottomata) Ah! TIL about that feature! Okay! Then the next question is: Maybe you just n... [17:56:23] 06Data-Engineering, 07Documentation: https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log should be on Wikitech - https://phabricator.wikimedia.org/T387878#10727272 (10kzimmerman) These server admin logs are coming from Data Platform Engineering updates. Removing Product Analytics and adding Data Engineer... [17:57:42] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 10Discovery-Search (2025.03.22 - 2025.04.11), 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all depen... - https://phabricator.wikimedia.org/T367405#10727274 [19:02:12] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Modify table maintenance mechanism to support Iceberg's rewrite_position_delete_files() - https://phabricator.wikimedia.org/T391280#10727405 (10xcollazo) > Could you achieve this by adding `for_virtualenv` support to our SparkSqlOpera... [20:18:23] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Modify table maintenance mechanism to support Iceberg's rewrite_position_delete_files() - https://phabricator.wikimedia.org/T391280#10727636 (10xcollazo) The current Spark 3.1.2 compiled `WMFSparkSQLCLIDriver` is... [20:50:42] 10Data-Engineering (Q4 2025 April 1st - June 30th): Assess impact of schema changes of categorylinks, metadata, imagelinks on data pipelines - https://phabricator.wikimedia.org/T391527 (10Ahoelzl) 03NEW [20:50:52] 10Data-Engineering (Q4 2025 April 1st - June 30th), 07Essential-Work: Assess impact of schema changes of categorylinks, metadata, imagelinks on data pipelines - https://phabricator.wikimedia.org/T391527#10727734 (10Ahoelzl) [20:51:27] 10Data-Engineering (Q4 2025 April 1st - June 30th): Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10727737 (10Ahoelzl) a:05Ahoelzl→03mforns [21:48:03] FIRING: GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest ingested an unexpected number of records for a Kafka topic partition. ... [21:48:08] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest&var-kafka_topic=webrequest_text&viewPanel=4 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [22:28:03] FIRING: [2x] GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest ingested an unexpected number of records for a Kafka topic partition. ... [22:28:03] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest&var-kafka_topic=webrequest_text&viewPanel=4 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [23:18:03] FIRING: [2x] GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest ingested an unexpected number of records for a Kafka topic partition. ... [23:18:03] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest&var-kafka_topic=webrequest_text&viewPanel=4 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [23:33:03] FIRING: [3x] GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest ingested an unexpected number of records for a Kafka topic partition. ... [23:33:03] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest&var-kafka_topic=webrequest_text&viewPanel=4 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected [23:50:55] 06Data-Engineering, 06Data-Engineering-Radar, 10Beta-Cluster-Infrastructure: beta/deployment-prep Kafka clusters do not support TLS client connections - https://phabricator.wikimedia.org/T346402#10728268 (10bd808) I think the TLS listener problems were fixed by @elukey in {T383096}. `lang=shell-session bd80... [23:57:15] 06Data-Engineering, 06Data-Engineering-Radar, 10Beta-Cluster-Infrastructure: beta/deployment-prep Kafka clusters do not support TLS client connections - https://phabricator.wikimedia.org/T346402#10728275 (10bd808) 05Open→03Resolved a:03elukey I'm giving @elukey credit for this no longer being an ac... [23:58:03] FIRING: [3x] GobblinKafkaRecordsExtractedNotEqualRecordsExpected: Gobblin job webrequest ingested an unexpected number of records for a Kafka topic partition. ... [23:58:03] - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Gobblin - https://grafana.wikimedia.org/d/pAQaJwEnk/gobblin?orgId=1&var-gobblin_job_name=webrequest&var-kafka_topic=webrequest_text&viewPanel=4 - https://alerts.wikimedia.org/?q=alertname%3DGobblinKafkaRecordsExtractedNotEqualRecordsExpected