[00:33:37] (03update) 10jforrester: build: Update node engine to 20.20.2+ [repos/ci-tools/grunt-stylelint] - 10https://gitlab.wikimedia.org/repos/ci-tools/grunt-stylelint/-/merge_requests/8 (owner: 10volker-e) [00:33:47] (03merge) 10jforrester: build: Update node engine to 20.20.2+ [repos/ci-tools/grunt-stylelint] - 10https://gitlab.wikimedia.org/repos/ci-tools/grunt-stylelint/-/merge_requests/8 (owner: 10volker-e) [01:26:37] (03PS1) 10Pwangai: phpunit: log project breakdown and split-group timings [integration/quibble] - 10https://gerrit.wikimedia.org/r/1272025 (https://phabricator.wikimedia.org/T423059) [02:43:54] (03CR) 10Pwangai: "recheck" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1268686 (https://phabricator.wikimedia.org/T422108) (owner: 10Pwangai) [05:45:23] 10Phabricator, 07Documentation: Batch edit silencing instructions seem to be missing some information - https://phabricator.wikimedia.org/T423526#11828131 (10Aklapper) Docs are currently at https://wikitech.wikimedia.org/wiki/Phabricator#Run_a_bulk_job_silently_(suppressing_notification_spam) like all the othe... [05:46:02] (03PS1) 10Phedenskog: submodule-update: Fetch nested submodules in parallel [integration/quibble] - 10https://gerrit.wikimedia.org/r/1272373 [07:03:50] 10Phabricator, 07Documentation: Batch edit silencing instructions seem to be missing some information - https://phabricator.wikimedia.org/T423526#11828222 (10Aklapper) Ah, no, that output is all expected behavior, per https://we.phorge.it/source/arcanist/browse/master/src/platform/PlatformSymbols.php . I'm pre... [07:43:38] 10Continuous-Integration-Infrastructure, 07Jenkins, 10Castor: Waiting for the completion of castor-save-workspace-cache sometimes takes almost 4 minutes for core tests - https://phabricator.wikimedia.org/T418974#11828327 (10Peter) Here's another example. The jobs takes 1 minute and the waiting time adds 37 s... [08:38:23] 10Continuous-Integration-Infrastructure, 07Jenkins, 10Castor, 07Spike, 06Test Platform (Plovdiv 25): Investigate if there's a way to make castor wait time smaller - https://phabricator.wikimedia.org/T423557 (10Peter) 03NEW [08:39:32] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work: Leftover schemas in shared DB for envs - https://phabricator.wikimedia.org/T422938#11828499 (10jnuche) [09:13:48] 10Beta-Cluster-Infrastructure, 06Data-Engineering, 10WMF-JobQueue, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform: Kafka-topics broken in beta: "zookeeper is not a recognized option" - https://phabricator.wikimedia.org/T422842#11828609 (10brouberol) a:03brouberol [09:13:51] 10Beta-Cluster-Infrastructure, 06Data-Engineering, 10WMF-JobQueue, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform: Kafka-topics broken in beta: "zookeeper is not a recognized option" - https://phabricator.wikimedia.org/T422842#11828612 (10brouberol) 05Open→03In progress [09:13:54] 10Beta-Cluster-Infrastructure, 06Data-Engineering, 10WMF-JobQueue, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform: Kafka-topics broken in beta: "zookeeper is not a recognized option" - https://phabricator.wikimedia.org/T422842#11828611 (10brouberol) [10:07:30] 10Beta-Cluster-Infrastructure, 06Data-Engineering, 10WMF-JobQueue, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), and 2 others: Kafka-topics broken in beta: "zookeeper is not a recognized option" - https://phabricator.wikimedia.org/T422842#11828772 (10brouberol) Fixed on kafka-test, which was also migrated... [10:08:16] PROBLEM - gerrit process on gerrit2002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit/review_site/bin/gerrit.war daemon -d /var/lib/gerrit/review_site https://wikitech.wikimedia.org/wiki/Gerrit [10:12:16] RECOVERY - gerrit process on gerrit2002 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit/review_site/bin/gerrit.war daemon -d /var/lib/gerrit/review_site https://wikitech.wikimedia.org/wiki/Gerrit [10:17:16] PROBLEM - gerrit process on gerrit2002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit/review_site/bin/gerrit.war daemon -d /var/lib/gerrit/review_site https://wikitech.wikimedia.org/wiki/Gerrit [10:20:16] RECOVERY - gerrit process on gerrit2002 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit/review_site/bin/gerrit.war daemon -d /var/lib/gerrit/review_site https://wikitech.wikimedia.org/wiki/Gerrit [10:29:45] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review, 07Sustainability (Incident Followup): Move Gerrit data out of root partition - https://phabricator.wikimedia.org/T333143#11828854 (10Jelto) I've tested the migration snippet on the spare instance `gerrit2002`. Beside fixi... [11:07:50] !log sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud [11:07:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:52:10] (03CR) 10Phedenskog: [C:03+1] "What about also use one of the test instances running through these changes for all the jobs that will be affected and document that in th" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1268686 (https://phabricator.wikimedia.org/T422108) (owner: 10Pwangai) [11:59:18] (03CR) 10Phedenskog: [C:03+1] "If we combine the instances to one 8CPU we can also get some more accurate numbers of what some of the wins will look like." [integration/quibble] - 10https://gerrit.wikimedia.org/r/1268686 (https://phabricator.wikimedia.org/T422108) (owner: 10Pwangai) [12:36:16] PROBLEM - gerrit process on gerrit2002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /var/lib/gerrit/review_site/bin/gerrit.war daemon -d /var/lib/gerrit/review_site https://wikitech.wikimedia.org/wiki/Gerrit [13:08:51] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 06Traffic, and 2 others: gerrit: Adapt timeouts to avoid 502 errors in CI jobs - https://phabricator.wikimedia.org/T421827#11829467 (10ABran-WMF) I have tried to increase `per_connection_buffer_limit_bytes` first to 4MB, then to 16MB to see if... [13:17:03] (03PS1) 10Xqt: jjb/zuul: remove fasttest-py39 due to pytest vulnerability [integration/config] - 10https://gerrit.wikimedia.org/r/1272711 (https://phabricator.wikimedia.org/T423568) [13:35:30] (03CR) 10Reedy: [C:03+2] jjb/zuul: remove fasttest-py39 due to pytest vulnerability [integration/config] - 10https://gerrit.wikimedia.org/r/1272711 (https://phabricator.wikimedia.org/T423568) (owner: 10Xqt) [13:37:18] (03Merged) 10jenkins-bot: jjb/zuul: remove fasttest-py39 due to pytest vulnerability [integration/config] - 10https://gerrit.wikimedia.org/r/1272711 (https://phabricator.wikimedia.org/T423568) (owner: 10Xqt) [13:38:06] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272711 T423568 [13:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:43:33] 06Project-Admins, 06WMDE-TechWish: #WMDE-TechWish-Maintenance and #WMDE-Blueprint-tickets archived under #German-Community-Wishlist - https://phabricator.wikimedia.org/T324929#11829646 (10A_smart_kitten) For the record, #wmde-techwish-maintenance got re-activated [[https://phabricator.wikimedia.org/project/man... [13:45:50] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review, 07Sustainability (Incident Followup): Move Gerrit data out of root partition - https://phabricator.wikimedia.org/T333143#11829653 (10Jelto) I did another migration attempt and puppet keeps re-creating the `/var/lib/gerrit... [13:49:11] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work: patchdemo staging: Creation of new demos with catalyst backend fails on connecting to db - https://phabricator.wikimedia.org/T421181#11829661 (10jnuche) [13:49:31] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work: Upgrade K3s cluster to most recent stable version - https://phabricator.wikimedia.org/T400077#11829663 (10jnuche) [13:49:43] 06Release-Engineering-Team (Doing 😎), 10Catalyst (Luka Ijo Pimeja Jan), 07Essential-Work: Leftover schemas in shared DB for envs - https://phabricator.wikimedia.org/T422938#11829665 (10jnuche) [14:15:44] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review, 07Sustainability (Incident Followup): Move Gerrit data out of root partition - https://phabricator.wikimedia.org/T333143#11829791 (10ABran-WMF) [14:19:47] 10Gerrit, 06collaboration-services, 07Wikimedia-Incident: Update and improve operation runbooks and documentation for Gerrit - https://phabricator.wikimedia.org/T423601 (10Jelto) 03NEW [14:20:26] 10Gerrit, 06collaboration-services, 13Patch-For-Review, 07Wikimedia-Incident: 2026-04-12 Gerrit Outage (was: DiskSpace) - https://phabricator.wikimedia.org/T423027#11829840 (10Jelto) [14:20:57] 10Gerrit, 06collaboration-services, 07Wikimedia-Incident: Update and improve operation runbooks and documentation for Gerrit - https://phabricator.wikimedia.org/T423601#11829847 (10Jelto) [14:31:52] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: Sunsetting mirrors.wikimedia.org - https://phabricator.wikimedia.org/T416707#11829949 (10thcipriani) Checking my understanding of "sunsetting" here: - We're no longer hosting a mirror? vs. - the `mirrors.wikimedia.org` url will cease t... [14:35:23] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: Add PyPy > 3.9 to Wikimedia CI - https://phabricator.wikimedia.org/T423607 (10Xqt) 03NEW [14:46:07] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review, 07Sustainability (Incident Followup): Move Gerrit data out of root partition - https://phabricator.wikimedia.org/T333143#11830101 (10ops-monitoring-bot) Host gerrit2002.wikimedia.org rebooted by jelto@cumin1003 with reaso... [14:57:23] PROBLEM - gerrit process on gerrit2002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-17-openjdk-amd64/bin/java .*-jar /srv/gerrit/site_path/review_site/bin/gerrit.war daemon -d /srv/gerrit/site_path/review_site https://wikitech.wikimedia.org/wiki/Gerrit [15:02:08] (03PS1) 10Xqt: jjb/zuul: remove fasttest-pypy due to pytest vulnerability [integration/config] - 10https://gerrit.wikimedia.org/r/1272752 (https://phabricator.wikimedia.org/T423568) [15:03:59] 10Gerrit, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review, 07Sustainability (Incident Followup): Move Gerrit data out of root partition - https://phabricator.wikimedia.org/T333143#11830214 (10Jelto) a:03Jelto I successfully migrated the spare host `gitlab2002` to `/srv/gerrit`.... [15:07:15] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: Sunsetting mirrors.wikimedia.org - https://phabricator.wikimedia.org/T416707#11830231 (10MoritzMuehlenhoff) >>! In T416707#11829949, @thcipriani wrote: > Checking my understanding of "sunsetting" here: > > - We're no longer hosting a m... [15:08:44] who's coming to train log triage today? [15:09:24] 10Fresh, 10Gerrit, 06collaboration-services: Wikimedia gerrit load management 429s break fresh-install - https://phabricator.wikimedia.org/T421726#11830243 (10sbassett) 05Duplicate→03Open I think this is still broken. I'd note that on the other, related bug (T421680#11828411), the added a new UA to requ... [15:15:29] giving up on the meeting since no one else came, I guess the triage was done earlier or out of band? [15:18:07] (03CR) 10Pwangai: "I will combine the instances today, then we can see how to get this working with all test scenarios." [integration/quibble] - 10https://gerrit.wikimedia.org/r/1268686 (https://phabricator.wikimedia.org/T422108) (owner: 10Pwangai) [15:18:46] (03CR) 10Pwangai: "recheck" [integration/quibble] - 10https://gerrit.wikimedia.org/r/1268686 (https://phabricator.wikimedia.org/T422108) (owner: 10Pwangai) [15:21:41] 10Beta-Cluster-Infrastructure, 06Data-Engineering, 10WMF-JobQueue, 10Event-Platform: Jobs are not being processed in beta, April 2026 edition - https://phabricator.wikimedia.org/T423615 (10Daimona) 03NEW [15:22:47] 10Beta-Cluster-Infrastructure, 06Data-Engineering, 10WMF-JobQueue, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform: Kafka-topics broken in beta: "zookeeper is not a recognized option" - https://phabricator.wikimedia.org/T422842#11830331 (10Daimona) >>! In T422842#11828772, @brouberol wrot... [15:25:11] 10Beta-Cluster-Infrastructure, 06Data-Engineering, 10WMF-JobQueue, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 10Event-Platform: Kafka-topics broken in beta: "zookeeper is not a recognized option" - https://phabricator.wikimedia.org/T422842#11830343 (10brouberol) 05In progress→03Resolved [16:10:17] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: New base images without mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423622 (10thcipriani) 03NEW [16:21:13] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: New base images without mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423622#11830659 (10Jdforrester-WMF) To confirm, specifically is this task talking about SRE's base Debian distro images, whose sourceslist is configured... [16:23:34] 06Release-Engineering-Team, 10dev-images, 10Catalyst (PatchDemo): Upgrade Patchdemo's PHP image to 8.3, ensure the node matches what's used in CI - https://phabricator.wikimedia.org/T423359#11830680 (10thcipriani) [16:24:22] 06Release-Engineering-Team, 10dev-images, 10Catalyst (Luka Ijo Pimeja Jan): Upgrade Patchdemo's PHP image to 8.3, ensure the node matches what's used in CI - https://phabricator.wikimedia.org/T423359#11830703 (10thcipriani) [16:36:59] 10Gerrit, 06collaboration-services, 07Documentation, 07Sustainability (Incident Followup): Update and improve operation runbooks and documentation for Gerrit - https://phabricator.wikimedia.org/T423601#11830739 (10A_smart_kitten) [16:37:00] 10Gerrit, 06collaboration-services, 13Patch-For-Review, 07Wikimedia-Incident: 2026-04-12 Gerrit Outage (was: DiskSpace) - https://phabricator.wikimedia.org/T423027#11830740 (10A_smart_kitten) [16:37:03] 10Gerrit, 06collaboration-services, 07Documentation, 07Sustainability (Incident Followup): Update and improve operation runbooks and documentation for Gerrit - https://phabricator.wikimedia.org/T423601#11830742 (10A_smart_kitten) (swapping incident tag for incident-followup; also boldly removing as a [[htt... [16:52:07] 10Phabricator, 07Documentation: Batch edit silencing instructions seem to be missing some information - https://phabricator.wikimedia.org/T423526#11830774 (10bd808) >>! In T423526#11828222, @Aklapper wrote: > I'm pretty sure upstream does not expose server details like paths by default, e.g. in stacktraces of... [17:03:12] 10Continuous-Integration-Infrastructure, 07Jenkins, 10Castor, 07Spike, 06Test Platform (Plovdiv 25): Investigate if there's a way to make castor wait time smaller - https://phabricator.wikimedia.org/T423557#11830813 (10bd808) Speculative idea: have jobs wanting castor caching/storage fire a dependent job... [17:21:26] (03PS1) 10SBassett: Use allow-listed User Agent for fresh gerrit downloads [fresh] - 10https://gerrit.wikimedia.org/r/1272800 (https://phabricator.wikimedia.org/T421726) [17:23:42] (03CR) 10Jforrester: "Do we want a bespoke one just for fresh?" [fresh] - 10https://gerrit.wikimedia.org/r/1272800 (https://phabricator.wikimedia.org/T421726) (owner: 10SBassett) [17:25:56] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: New base images without mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423622#11830878 (10thcipriani) [17:26:37] (03CR) 10SBassett: "I feel like this one is generally usable for stuff like this? But maybe one for each service makes more sense. Let me add an SRE or two " [fresh] - 10https://gerrit.wikimedia.org/r/1272800 (https://phabricator.wikimedia.org/T421726) (owner: 10SBassett) [17:28:34] (03CR) 10SBassett: "Hey arnaudb - would you prefer we create a bespoke UA for fresh? Or would you prefer we used the generalized UA mentioned in T421680#1182" [fresh] - 10https://gerrit.wikimedia.org/r/1272800 (https://phabricator.wikimedia.org/T421726) (owner: 10SBassett) [17:30:09] 10Phabricator, 07Documentation: Batch edit silencing instructions seem to be missing some information - https://phabricator.wikimedia.org/T423526#11830894 (10Aklapper) Looks like I misremembered what `adjustFilePath()` does (not) do from https://we.phorge.it/source/arcanist/browse/master/src/error/PhutilErrorH... [17:30:18] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: New base images without mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423622#11830895 (10thcipriani) >>! In T423622#11830659, @Jdforrester-WMF wrote: > To confirm, specifically is this task talking about SRE's base Debian d... [17:37:15] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: New base images without mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423622#11830927 (10MoritzMuehlenhoff) I'll take care of this tomorrow. [18:22:00] 10Beta-Cluster-Infrastructure, 06Data-Engineering, 06MW-Interfaces-Team, 10WMF-JobQueue, 10Event-Platform: Jobs are not being processed in beta, April 2026 edition - https://phabricator.wikimedia.org/T423615#11831127 (10bd808) The cert I see from outside the cluster for https://en.wikipedia.beta.wmcloud.... [19:21:27] (03update) 10dancy: sync-world: Offer to rollback k8s deployments [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/1129 (https://phabricator.wikimedia.org/T225207 https://phabricator.wikimedia.org/T375497 https://phabricator.wikimedia.org/T394858 https://phabricator.wikimedia.org/T396106) [19:24:56] 06Release-Engineering-Team, 10Scap, 06serviceops-deprecated, 06SRE-OnFire, 07Sustainability (Incident Followup): Should scap be able to update helmfile-defaults when -Dbuild_mw_container_image:False ? - https://phabricator.wikimedia.org/T390531#11831389 (10dancy) 05Open→03Resolved a:03dancy >>!... [19:31:54] (03update) 10dancy: sync-world: Offer to rollback k8s deployments [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/1129 (https://phabricator.wikimedia.org/T225207 https://phabricator.wikimedia.org/T375497 https://phabricator.wikimedia.org/T394858 https://phabricator.wikimedia.org/T396106) [19:37:51] 10Continuous-Integration-Infrastructure (Zuul upgrade), 06Release-Engineering-Team (Priority Backlog 📥), 07Essential-Work: Add a zuul tenant config on the zuul scheduler host (zuul1001) - https://phabricator.wikimedia.org/T406384#11831437 (10Dzahn) ` [zuul1001:~] $ grep -i proxy /lib/systemd/system/zuul-* /... [20:19:11] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: Sunsetting mirrors.wikimedia.org - https://phabricator.wikimedia.org/T416707#11831547 (10bd808) >>! In T416707#11830231, @MoritzMuehlenhoff wrote: > These are both true. We will no longer operate a mirror (which is running under mirrors... [20:49:55] !log creating integration/zuul-jobs repo to serve as a mirror of opendev.org/zuul/zuul-jobs (T406384) [20:49:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:49:58] T406384: Add a zuul tenant config on the zuul scheduler host (zuul1001) - https://phabricator.wikimedia.org/T406384 [21:24:55] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: Sunsetting mirrors.wikimedia.org - https://phabricator.wikimedia.org/T416707#11831821 (10A_smart_kitten) >>! In T416707#11831547, @bd808 wrote: > And going backward? Is there a way we can help [[https://www.w3.org/Provider/Style/URI|"co... [21:29:58] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: setup 2 contint machines for jenkins - https://phabricator.wikimedia.org/T418521#11831827 (10Dzahn) @jnuche I checked and the /var/lib/jenkins/plugins had 501 files on contint1002 and only 94... [21:47:40] 06Release-Engineering-Team (Radar), 06Infrastructure-Foundations, 06SRE: Sunsetting mirrors.wikimedia.org - https://phabricator.wikimedia.org/T416707#11831873 (10bd808) >>! In T416707#11831821, @A_smart_kitten wrote: > But maybe e.g. #collaboration-services could host a microsite at the `mirrors.wikimedia.or...