[06:56:17] 06Data-Engineering, 06Java-Scala-Standardization, 10Discovery-Search (2025.05.02 - 2025.05.23): Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406#10798916 (10pfischer) a:05amastilovic→03pfischer [08:14:35] (03CR) 10Gehel: Create a new module to isolate lightweight jobs (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120953 (owner: 10Aqu) [08:17:04] (03PS9) 10Gehel: Create a new module to isolate lightweight jobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120953 (owner: 10Aqu) [08:19:21] (03PS10) 10Gehel: Create a new module to isolate lightweight jobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120953 (owner: 10Aqu) [08:32:42] (03CR) 10Gehel: [C:03+1] "LGTM" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120953 (owner: 10Aqu) [08:35:20] (03PS11) 10Aqu: Create a new module to isolate lightweight jobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120953 [08:37:21] (03CR) 10Gehel: [C:03+1] "LGTM" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120953 (owner: 10Aqu) [08:37:33] (03CR) 10Aqu: [C:03+2] Create a new module to isolate lightweight jobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120953 (owner: 10Aqu) [08:49:24] (03Merged) 10jenkins-bot: Create a new module to isolate lightweight jobs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120953 (owner: 10Aqu) [09:02:39] 06Data-Engineering, 06Product-Analytics, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Allow curl commands from Airflow BashOperator - https://phabricator.wikimedia.org/T392288#10799360 (10brouberol) I haven't added anything to wikitech, but we have real life examples to get inspiration from: - https:/... [09:22:41] (03PS1) 10Aqu: Add refinery-job-lite to list of deployed artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1143022 [09:24:28] (03PS2) 10Aqu: Add refinery-job-lite jars for non-Spark jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1143022 [09:51:11] Starting build #33 for job analytics-refinery-maven-release [09:52:00] Project analytics-refinery-maven-release build #33: 04FAILURE in 49 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release/33/ [10:01:49] (03PS1) 10Aqu: Fix refinery-job-lite pom.xml [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1143027 [13:48:55] 06Data-Engineering, 06Data-Platform-SRE, 10Data-Services: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10800408 (10BTullis) I think that this is likely to be uncontentious and relatively easy to achieve. The three tables mentioned appear to be related in s... [13:53:59] 06Data-Engineering, 06Data-Platform-SRE, 10Data-Services: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10800442 (10BTullis) From an eyeball of the schemas, I think that there is unlikely to be any need for redaction of data. We can also see another table... [14:04:40] hey hey, how do I move T392468 forward? it's apparently been moved to 'Needs Clarification' without any further explanation on what needs to happen [14:04:40] T392468: Add new WMCS IP ranges to analytics - https://phabricator.wikimedia.org/T392468 [14:08:15] 06Data-Engineering, 06Data-Engineering-Radar, 06SRE, 10SRE-Access-Requests, and 2 others: Requesting access to for  - https://phabricator.wikimedia.org/T393066#10800522 (10BTullis) 05In progress→03Resolved This should be working now @SCampos-WMF - Please feel free to let me... [14:24:43] 06Data-Engineering, 10Technical-blog-posts: Write a blog post about the recent Airflow migration to Kubernetes - https://phabricator.wikimedia.org/T393603 (10BTullis) 03NEW [14:29:42] 06Data-Engineering, 10Technical-blog-posts: Write a blog post about the recent Airflow migration to Kubernetes - https://phabricator.wikimedia.org/T393603#10800640 (10BTullis) p:05Triage→03Medium [14:43:32] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Experimentation Lab (Experiment Platform Sprint 6), 13Patch-For-Review: FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10800698 (10BBlack) >>! In T391... [16:23:17] 10Data-Engineering (Q4 2025 April 1st - June 30th): Avoid Gobblin SimpleSkein DAG runs queuing up by disabling "catchup" - https://phabricator.wikimedia.org/T393618 (10amastilovic) 03NEW [16:24:44] 10Data-Engineering (Q4 2025 April 1st - June 30th): Avoid Gobblin SimpleSkein DAG runs queuing up by disabling "catchup" - https://phabricator.wikimedia.org/T393618#10801276 (10amastilovic) [17:46:09] 06Data-Engineering, 10Data Pipelines: Provide an easy way for MediaWiki to fetch aggregate statistics from the data lake - https://phabricator.wikimedia.org/T341649#10801700 (10KStoller-WMF) [17:52:38] 10Data-Engineering (Q4 2025 April 1st - June 30th): Avoid Gobblin SimpleSkein DAG runs queuing up by disabling "catchup" - https://phabricator.wikimedia.org/T393618#10801715 (10xcollazo) > Fix this issue so that Airflow knows to skip scheduled DAG runs if the previous DAG run is still running. [[ https://gitlab... [17:57:41] 10Data-Engineering (Q4 2025 April 1st - June 30th): Avoid Gobblin SimpleSkein DAG runs queuing up by disabling "catchup" - https://phabricator.wikimedia.org/T393618#10801759 (10amastilovic) > Seems like to mimic the old systemd behavior of "If there is an instance of myself already running, do nothing", we will... [18:04:59] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10801787 (10xcollazo) Total deletes now in a much better position compared to T393012#10797777: ` spark.sql(""" SELECT coun... [18:09:36] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10801817 (10xcollazo) [[ https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&from=1746554732678&to=1746641132678 | Some... [18:15:41] 06Data-Engineering, 10MediaWiki-DomainEvents, 10Event-Platform: PageDeleted event should contain outgoing redirect target information - https://phabricator.wikimedia.org/T393633 (10gmodena) 03NEW [18:33:31] 10Data-Engineering (Q4 2025 April 1st - June 30th): Avoid Gobblin SimpleSkein DAG runs queuing up by disabling "catchup" - https://phabricator.wikimedia.org/T393618#10801933 (10amastilovic) All I could find so far is a trick of using an `ExternalTaskSensor` to point to a task in a previous `DagRun`, with `mode="... [18:34:37] 06Data-Engineering, 10MediaWiki-DomainEvents, 10Event-Platform: PageDeleted event should contain outgoing redirect target information - https://phabricator.wikimedia.org/T393633#10801937 (10daniel) > perform a database lookup. That wouldn't work reliably, the redirect entry should already gone from the data... [18:34:54] 06Data-Engineering, 10MediaWiki-DomainEvents, 06MW-Interfaces-Team, 10Event-Platform: PageDeleted event should contain outgoing redirect target information - https://phabricator.wikimedia.org/T393633#10801940 (10daniel) [18:49:24] 10Data-Engineering (Q4 2025 April 1st - June 30th): Avoid Gobblin SimpleSkein DAG runs queuing up by disabling "catchup" - https://phabricator.wikimedia.org/T393618#10801980 (10xcollazo) > Airflow will not start running the next DagRun in the first place if the previous DagRun is still running It will if you set... [18:51:41] 10Data-Engineering (Q4 2025 April 1st - June 30th): Avoid Gobblin SimpleSkein DAG runs queuing up by disabling "catchup" - https://phabricator.wikimedia.org/T393618#10801991 (10amastilovic) > It will if you set max_active_runs > 1. Correct, but we have it set to `max_active_runs = 1` - I don't see why we would... [18:56:47] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 10Dumps-Generation, 13Patch-For-Review: Decomission dumps job `download_enterprise_htmldumps` - https://phabricator.wikimedia.org/T390556#10802001 (10amastilovic) > Please also delete empty dirs which are currently presented there... [19:05:09] 10Data-Engineering (Q4 2025 April 1st - June 30th): Avoid Gobblin SimpleSkein DAG runs queuing up by disabling "catchup" - https://phabricator.wikimedia.org/T393618#10802029 (10xcollazo) How about: Set `max_active_runs = 2`. Then, similar to @brouberol's [[ https://gitlab.wikimedia.org/repos/data-engineering/a... [20:01:42] 06Data-Engineering, 10MediaWiki-DomainEvents, 06MW-Interfaces-Team, 10Event-Platform: PageDeleted event should contain outgoing redirect target information - https://phabricator.wikimedia.org/T393633#10802261 (10gmodena) >>! In T393633#10801937, @daniel wrote: >> perform a database lookup. > > That wouldn... [20:03:39] 06Data-Engineering, 10MediaWiki-DomainEvents, 06MW-Interfaces-Team, 10Event-Platform: PageDeleted event should contain outgoing redirect target information - https://phabricator.wikimedia.org/T393633#10802271 (10gmodena) [20:29:10] 10Data-Engineering (Q4 2025 April 1st - June 30th): Gobblin test jobs for event and eventlogging_legacy are misconfigured - https://phabricator.wikimedia.org/T393645 (10amastilovic) 03NEW [20:37:02] Starting build #34 for job analytics-refinery-maven-release [20:41:01] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10802411 (10Ahoelzl) @xcollazo regarding the suggest increase of `dfs.namenode.handler.count`, what's the name nodes load in... [20:41:39] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: [OpsWeek] Monthly reconcile pipeline not triggering - https://phabricator.wikimedia.org/T393108#10802412 (10xcollazo) [20:51:57] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10802461 (10xcollazo) >>! In T393012#10802411, @Ahoelzl wrote: > @xcollazo regarding the suggested increase of `dfs.namenode... [20:56:44] Project analytics-refinery-maven-release build #34: 04STILL FAILING in 19 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release/34/ [21:34:00] (03PS1) 10Aleksandar Mastilovic: Remove all configuration files for Gobblin [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1143193 [21:35:04] (03PS2) 10Aleksandar Mastilovic: Remove all configuration files for Gobblin [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1143193 (https://phabricator.wikimedia.org/T390249) [23:32:37] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 3 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10802850 (10srishakatux) a:03srishakatux [23:33:24] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 3 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10802855 (10srishakatux) p:05Medium→03High [23:41:52] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 3 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10802865 (10srishakatux) Updated task description based on suggestions from @Amire80