[00:09:58] <wikibugs>	 (03open) 10bd808: bastion: Apply profile::kubernetes::client [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/26
[00:12:21] <wikibugs>	 (03merge) 10bd808: bastion: Apply profile::kubernetes::client [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/26
[00:26:35] <wikibugs>	 (03open) 10bd808: bastion: Match expected YAML formatting [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/27 (https://phabricator.wikimedia.org/T398643)
[00:27:24] <wikibugs>	 (03merge) 10bd808: bastion: Match expected YAML formatting [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/27 (https://phabricator.wikimedia.org/T398643)
[00:42:05] <wikibugs>	 (03open) 10bd808: Revert "ci: Switch back to Digital Ocean runners by default" [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/28
[01:51:04] <Krinkle>	 [wikitech-l] Beta Cluster now lives on beta.wmcloud.org.  https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/YDABPV75LADRQCXMJAFWUP256N4EQ25B/
[02:06:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 10MW-1.45-notes (1.45.0-wmf.9; 2025-07-08), 13Patch-For-Review: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318#10993916 (10Krinkle)
[02:26:50] <wikibugs>	 10GitLab (Infrastructure), 06Release-Engineering-Team, 06collaboration-services: Upgrade GitLab to major version 18 - https://phabricator.wikimedia.org/T394382#10993948 (10Dzahn) >>! In T394382#10988305, @Jelto wrote: > I'll resolve the task, please re-open if there are any issues.  We got this one: T399...
[02:49:04] <wikibugs>	 10GitLab (Auth & Access), 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: Create bot to sync LDAP groups with related GitLab groups - https://phabricator.wikimedia.org/T319211#10993969 (10Dzahn) This bot currently has a problem talking to gitlab (T394382#10993948 T399250).  It...
[02:52:02] <wikibugs>	 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10993972 (10Dzahn)
[05:15:55] <wikibugs>	 (03PS2) 10Hashar: Zuul, jjb: [mediawiki/services/poolcounter] rm tox-poolcounter job [integration/config] - 10https://gerrit.wikimedia.org/r/1167649 (https://phabricator.wikimedia.org/T399075)
[05:16:31] <wikibugs>	 (03PS2) 10Hashar: dockerfiles: remove tox-poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/1167650 (https://phabricator.wikimedia.org/T399075)
[05:20:25] <wikibugs>	 (03CR) 10Hashar: [C:03+2] Zuul, jjb: [mediawiki/services/poolcounter] rm tox-poolcounter job [integration/config] - 10https://gerrit.wikimedia.org/r/1167649 (https://phabricator.wikimedia.org/T399075) (owner: 10Hashar)
[05:21:49] <wikibugs>	 (03Merged) 10jenkins-bot: Zuul, jjb: [mediawiki/services/poolcounter] rm tox-poolcounter job [integration/config] - 10https://gerrit.wikimedia.org/r/1167649 (https://phabricator.wikimedia.org/T399075) (owner: 10Hashar)
[05:49:08] <wikibugs>	 10Gerrit: Unable to push commit whose parent is not the latest commit for the parent patch - https://phabricator.wikimedia.org/T399241#10994084 (10hashar) The message comes from `java/com/google/gerrit/server/git/receive/ReceiveCommits.java`, in `stable-3.10`: ` lang=java         } else if (revisions.containsKey...
[05:52:15] <wikibugs>	 (03CR) 10Hashar: [C:03+2] dockerfiles: remove tox-poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/1167650 (https://phabricator.wikimedia.org/T399075) (owner: 10Hashar)
[05:53:35] <wikibugs>	 (03Merged) 10jenkins-bot: dockerfiles: remove tox-poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/1167650 (https://phabricator.wikimedia.org/T399075) (owner: 10Hashar)
[05:55:43] <wikibugs>	 10Continuous-Integration-Config, 06MediaWiki-Platform-Team, 10PoolCounter, 13Patch-For-Review, 10Release Pipeline (Blubber): Migrate poolcounter to Blubber - https://phabricator.wikimedia.org/T399075#10994089 (10hashar) 05Open→03Resolved a:03hashar
[05:56:08] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10994093 (10hashar)
[06:28:11] <wikibugs>	 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10994122 (10Jelto)
[06:42:20] <wikibugs>	 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10994134 (10Jelto) >>! In T399250#10993963, @Dzahn wrote: > https://gitlab.wikimedia.org/admin/users/ldap-sync-bot/impersonation_tokens says `This user has...
[06:55:46] <wikibugs>	 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10994142 (10Jelto) 05Open→03Resolved p:05Triage→03Medium a:03Jelto I rotated the token and triggered the `sync-gitlab-group-with-ldap.service`...
[07:18:42] <wikibugs>	 10Diffusion, 10Phabricator, 06collaboration-services: Drop our mirroring of code to Diffusion and empty the repos - https://phabricator.wikimedia.org/T359549#10994160 (10A_smart_kitten) >>! In T359549#10993837, @Dzahn wrote: > Are there any vetoes / hard concerns about making it visible only to logged in use...
[07:19:28] <wmcs-alerts>	 FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[07:19:35] <wmcs-alerts>	 FIRING: [7x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[07:19:39] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261 (10wmcs-alerts) 03NEW
[07:24:32] <wmcs-alerts>	 RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[07:24:33] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994185 (10wmcs-alerts)
[07:24:35] <wmcs-alerts>	 RESOLVED: [7x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[08:14:28] <wmcs-alerts>	 RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[08:18:05] <wmcs-alerts>	 FIRING: [2x] InstanceDown: Project deployment-prep instance deployment-mediawiki13 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[08:18:28] <wmcs-alerts>	 FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[08:21:28] <wmcs-alerts>	 RESOLVED: [18x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[08:23:32] <wmcs-alerts>	 RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[08:23:35] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994329 (10wmcs-alerts)
[08:55:39] <wikibugs>	 10Continuous-Integration-Infrastructure (Zuul upgrade), 10Release-Engineering-Team (Priority Backlog 📥): Plan for porting PipelineLib to Zuul Ansible - https://phabricator.wikimedia.org/T390119#10994511 (10hashar)
[08:55:52] <wikibugs>	 10Continuous-Integration-Infrastructure (Zuul upgrade), 10Release-Engineering-Team (Priority Backlog 📥): Plan for porting PipelineLib to Zuul Ansible - https://phabricator.wikimedia.org/T390119#10994514 (10hashar)
[09:03:01] <wikibugs>	 (03PS1) 10Hashar: Archive wikibase/release-prototype [integration/config] - 10https://gerrit.wikimedia.org/r/1168126 (https://phabricator.wikimedia.org/T399279)
[09:03:26] <wikibugs>	 (03CR) 10Hashar: [C:03+2] Archive wikibase/release-prototype [integration/config] - 10https://gerrit.wikimedia.org/r/1168126 (https://phabricator.wikimedia.org/T399279) (owner: 10Hashar)
[09:04:28] <wmcs-alerts>	 FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[09:04:34] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994544 (10wmcs-alerts)
[09:04:53] <wikibugs>	 (03Merged) 10jenkins-bot: Archive wikibase/release-prototype [integration/config] - 10https://gerrit.wikimedia.org/r/1168126 (https://phabricator.wikimedia.org/T399279) (owner: 10Hashar)
[09:09:38] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994566 (10fnegri)
[09:10:25] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994570 (10fnegri) Possibly related to {T399281}
[09:12:45] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994574 (10fnegri) 05Open→03Resolved a:03fnegri This is now resolved, but there were two separate spikes of unreachable instances:  {F63886612}
[09:18:52] <wikibugs>	 (03PS1) 10Hashar: dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765)
[09:24:09] <wikibugs>	 10Diffusion, 10Phabricator, 06collaboration-services: Drop our mirroring of code to Diffusion and empty the repos - https://phabricator.wikimedia.org/T359549#10994611 (10Aklapper) >>! In T359549#10146373, @valerio.bozzolan wrote: > I put my details directly in the original description.  I'd appreciate commen...
[09:30:25] <wikibugs>	 10Diffusion, 10Phabricator, 06collaboration-services: Drop our mirroring of code to Diffusion and empty the repos - https://phabricator.wikimedia.org/T359549#10994639 (10Aklapper) >>! In T359549#10895207, @Jelto wrote: > A reason against diffusion is the poor performance. Everything above a few dozen RPS ove...
[09:34:25] <wikibugs>	 10Continuous-Integration-Infrastructure (Zuul upgrade), 10Quibble: Remove reliance on EXECUTOR_NUMBER environment variable in CI - https://phabricator.wikimedia.org/T399283 (10hashar) 03NEW
[09:36:52] <wikibugs>	 10Continuous-Integration-Infrastructure (Zuul upgrade), 06Fundraising-Backlog, 10Quibble: Remove reliance on EXECUTOR_NUMBER environment variable in CI - https://phabricator.wikimedia.org/T399283#10994671 (10hashar) wikimedia/fundraising/tools has: ` lang=python,name=silverpop_export/tests/test_update.py def...
[09:39:28] <wmcs-alerts>	 RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[09:39:36] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994680 (10wmcs-alerts)
[10:17:34] <wikibugs>	 (03PS2) 10Hashar: dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765)
[10:24:17] <wikibugs>	 (03CR) 10CI reject: [V:04-1] dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[10:24:28] <wmcs-alerts>	 FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[10:24:39] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994790 (10wmcs-alerts)
[10:29:28] <wmcs-alerts>	 RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[10:29:35] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994806 (10wmcs-alerts)
[11:13:28] <wmcs-alerts>	 FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[11:13:40] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994986 (10wmcs-alerts)
[11:30:05] <wmf-insecte>	 maintenance-disconnect-full-disks build 718244 integration-agent-docker-1041 (/: 26%, /srv: 95%, /var/lib/docker: 46%): OFFLINE due to disk space
[11:40:32] <wmcs-alerts>	 FIRING: [7x] InstanceDown: Project deployment-prep instance deployment-cirrussearch12 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:41:28] <wmcs-alerts>	 FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[11:45:09] <wmf-insecte>	 maintenance-disconnect-full-disks build 718247 integration-agent-docker-1041 (/: 26%, /srv: 35%, /var/lib/docker: 46%): RECOVERY disk space OK
[11:45:28] <wmcs-alerts>	 RESOLVED: [14x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:46:28] <wmcs-alerts>	 RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[11:46:35] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297#10995075 (10wmcs-alerts)
[11:48:57] <wikibugs>	 (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[11:53:47] <wikibugs>	 (03CR) 10Hashar: [C:03+2] dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[11:55:31] <wikibugs>	 (03Merged) 10jenkins-bot: dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[12:11:34] <wikibugs>	 (03CR) 10Hashar: [C:03+2] "I have built the image right now as part of building another image." [integration/config] - 10https://gerrit.wikimedia.org/r/1167537 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[12:18:32] <wmcs-alerts>	 FIRING: [6x] InstanceDown: Project deployment-prep instance deployment-cache-text08 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[12:19:28] <wmcs-alerts>	 FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[12:23:28] <wmcs-alerts>	 RESOLVED: [12x] InstanceDown: Project deployment-prep instance deployment-cache-text08 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[12:24:28] <wmcs-alerts>	 RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[12:24:37] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297#10995157 (10wmcs-alerts)
[12:42:24] <wikibugs>	 (03PS2) 10Hashar: jjb: change tox-mysqld jobs to Bookworm and tox v3 [integration/config] - 10https://gerrit.wikimedia.org/r/1168168 (https://phabricator.wikimedia.org/T335765)
[12:42:24] <wikibugs>	 (03PS1) 10Hashar: dockerfiles: add pkg-config to tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168170
[12:44:23] <wikibugs>	 (03CR) 10Hashar: [C:03+2] dockerfiles: add pkg-config to tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168170 (owner: 10Hashar)
[12:45:45] <wikibugs>	 (03Merged) 10jenkins-bot: dockerfiles: add pkg-config to tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168170 (owner: 10Hashar)
[12:47:58] <Lucas_WMDE>	 re <jakob_WMDE> hello, is T399297 the task to keep in eye on regarding the beta sites being down?
[12:47:59] <stashbot>	 T399297: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297
[12:48:08] <Lucas_WMDE>	 (moving to this channel which IIUC is more appropriate for beta stuff than -operations)
[12:48:18] <Lucas_WMDE>	 there’s been some activity in T399281 but it’s not clear to me if that would affect the beta cluster as well or only toolforge
[12:48:18] <stashbot>	 T399281: 2025-07-11 Toolforge tools not responding - https://phabricator.wikimedia.org/T399281
[12:48:55] <Lucas_WMDE>	 it feels like we could use a general “beta down” style task to attach tasks like T399261, T399297, maybe T399216 to
[12:48:55] <stashbot>	 T399261: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261
[12:48:56] <stashbot>	 T399216: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216
[12:54:19] <jakob_WMDE>	 Lucas_WMDE: thanks! I agree, a general task would make the situation a bit clearer
[12:57:29] <wikibugs>	 10Continuous-Integration-Infrastructure (Zuul upgrade): Provision user in Cloud VPS hosted Kubernetes cluster for use by nodepool - https://phabricator.wikimedia.org/T398367#10995240 (10jnuche) a:03jnuche
[12:57:32] <wikibugs>	 10Continuous-Integration-Infrastructure (Zuul upgrade): Setup RBAC for Cloud VPS hosted kubernetes cluster - https://phabricator.wikimedia.org/T398365#10995241 (10jnuche) a:03jnuche
[13:05:30] <wikibugs>	 (03CR) 10Hashar: [C:03+2] jjb: change tox-mysqld jobs to Bookworm and tox v3 [integration/config] - 10https://gerrit.wikimedia.org/r/1168168 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[13:07:06] <wikibugs>	 (03Merged) 10jenkins-bot: jjb: change tox-mysqld jobs to Bookworm and tox v3 [integration/config] - 10https://gerrit.wikimedia.org/r/1168168 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[13:07:46] <wikibugs>	 (03PS1) 10Hashar: dockerfiles: remove tox-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168175 (https://phabricator.wikimedia.org/T335765)
[13:09:44] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303 (10Lucas_Werkmeister_WMDE) 03NEW
[13:10:04] <Lucas_WMDE>	 created ^
[13:10:05] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995331 (10Lucas_Werkmeister_WMDE)
[13:10:08] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10995333 (10Lucas_Werkmeister_WMDE)
[13:10:09] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10995334 (10Lucas_Werkmeister_WMDE)
[13:12:57] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995344 (10Lucas_Werkmeister_WMDE) The [beta-code-update-eqiad](https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/), [beta-scap-sync-world](https://integration.wikimedia.org/ci/view/Be...
[13:17:44] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10995358 (10hashar)
[13:20:20] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10995377 (10hashar) We can't remove tox-java8 which used by Cergen. That is still used...
[13:20:54] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10995382 (10hashar)
[13:21:48] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10995385 (10hashar)
[13:22:14] <wikibugs>	 (03CR) 10Hashar: [C:03+2] dockerfiles: remove tox-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168175 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[13:23:47] <wikibugs>	 (03Merged) 10jenkins-bot: dockerfiles: remove tox-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168175 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar)
[13:24:10] <James_F>	 So just zuul-cloner and then waiting for Cergen? Nice!
[13:24:36] <hashar>	 yeah
[13:24:48] <hashar>	 zuul-cloner is no more needed with the new Zuul 
[13:24:56] <James_F>	 But still needed until we migrate.
[13:25:03] <James_F>	 When is that happening? Years? 
[13:25:06] <hashar>	 cause that is Zuul pushing the prepared repositories to the worker nodes
[13:25:15] <hashar>	 so essentially
[13:25:22] <hashar>	 Buster is gone now
[13:25:32] <James_F>	 Now, do bullseye.
[13:25:53] <hashar>	 I so want us to migrate to RedHat or Oracle Linux with 5 / 10 years services
[13:26:02] <hashar>	 so the next upgrade would be 2035 :]
[13:26:21] <hashar>	 I am going to look at aggregating images to get a few less
[13:26:23] <James_F>	 And being able to test PHP 8.4 would come in ~2035.
[13:26:28] <hashar>	 na
[13:26:31] <hashar>	 well
[13:26:48] <hashar>	 we could well compile php8.4 against a different OS version
[13:27:05] <hashar>	 well that is pretty much what we do for prod I think
[13:27:16] <hashar>	 though how it is done without CI is an entire mystery to me
[13:27:32] <James_F>	 Yes, but then we'd be avoiding the whole point of paying RedHat/etc. to do the OS/package mangement for us.
[13:28:03] <hashar>	 and that makes me wonder why Oracle does not sale an OS
[13:28:18] <hashar>	 (you'd have a licence that makes you pay per keystroke)
[13:28:21] <James_F>	 i have my Solaris CDs (not DVDs!) somewhere.
[13:28:26] <hashar>	 or the amount of time you stare at the screen
[13:29:25] <hashar>	 I think next week I will work on making MediaWiki PHPUnit test to not run every single tests
[13:30:01] <hashar>	 oh and jsduck will be gone :)
[13:30:53] <hashar>	 or maybe I look at T397429
[13:30:53] <stashbot>	 T397429: Reduce the number of CI images - https://phabricator.wikimedia.org/T397429
[13:30:58] <hashar>	 to aggregate similar images
[13:31:12] <James_F>	 For jsduck we're waiting for REL1_39 EOL, I think?
[13:33:22] <hashar>	 depends on whether we move the Quibble bullseye image to bookworm :)
[13:33:40] <James_F>	 That's blocked on SRE packaging PHP 8.1 for bookworm.
[13:33:58] <hashar>	 then I have seen a patch to jump us to php 8.3
[13:34:02] <James_F>	 T397075 -> T362705
[13:34:03] <stashbot>	 T397075: Package Wikimedia's PHP 8.1 component for bookworm - https://phabricator.wikimedia.org/T397075
[13:34:03] <stashbot>	 T362705: Migrate Quibble images from bullseye to bookworm - https://phabricator.wikimedia.org/T362705
[13:34:29] <hashar>	 that is what I was wondering during the Product & Tech presentation about MediaWiki + php upgrade
[13:34:45] <hashar>	 it seems like MediaWiki team is owning the process and I guess they have a roadmap somewhere
[13:34:51] <hashar>	 and if not, they should
[13:34:52] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995434 (10Lucas_Werkmeister_WMDE)
[13:34:57] <hashar>	 and we can then be kept informed about the process
[13:34:59] <hashar>	 but
[13:35:11] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995437 (10Lucas_Werkmeister_WMDE)
[13:35:12] <hashar>	 then I can just rely on James :]
[13:35:40] * James_F grins.
[13:36:18] <hashar>	 in the ideal world the road map would be presented every month with the progress
[13:36:28] <hashar>	 but well hmm
[13:36:32] <hashar>	 different company :]
[13:40:44] <wikibugs>	 10GitLab (Infrastructure), 06collaboration-services: upgrade gitlab hosts to bookworm - https://phabricator.wikimedia.org/T399306 (10Jelto) 03NEW
[14:07:28] <wmcs-alerts>	 FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates
[14:11:28] <wmcs-alerts>	 FIRING: [4x] InstanceDown: Project deployment-prep instance deployment-kafka-logging01 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:13:11] <wikibugs>	 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399307 (10wmcs-alerts) 03NEW
[14:13:28] <wmcs-alerts>	 FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[14:16:28] <wmcs-alerts>	 FIRING: [10x] InstanceDown: Project deployment-prep instance deployment-cirrussearch13 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:18:28] <wmcs-alerts>	 RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[14:21:28] <wmcs-alerts>	 FIRING: [13x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:23:28] <wmcs-alerts>	 FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[14:23:28] <wmcs-alerts>	 FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance deployment-schema-3 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[14:26:28] <wmcs-alerts>	 FIRING: [19x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:32:43] <wmcs-alerts>	 FIRING: [23x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:33:02] <wmf-insecte>	 Project beta-code-update-eqiad build #556221: 04FAILURE in 30 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/556221/
[14:33:02] <wmcs-alerts>	 FIRING: [24x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:37:58] <wmcs-alerts>	 FIRING: [24x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:38:02] <wmcs-alerts>	 FIRING: [25x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:43:02] <wmcs-alerts>	 FIRING: [26x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:47:50] <hashar>	 poor beta
[14:48:02] <wmcs-alerts>	 FIRING: [26x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:48:28] <wmcs-alerts>	 FIRING: [26x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:53:02] <wmcs-alerts>	 FIRING: [27x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[14:53:28] <wmcs-alerts>	 FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance deployment-etcd02 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[14:58:02] <wmcs-alerts>	 FIRING: [29x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:00:28] <wmcs-alerts>	 FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[15:03:01] <wmf-insecte>	 Project beta-code-update-eqiad build #556222: 04STILL FAILING in 30 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/556222/
[15:03:02] <wmcs-alerts>	 FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:04:50] <bd808>	 !log Hard reboot of deployment-acme-chief05 (T399281)
[15:04:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[15:04:53] <stashbot>	 T399281: 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281
[15:05:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #86100: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/86100/
[15:08:02] <wmcs-alerts>	 FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:08:28] <wmcs-alerts>	 FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[15:09:28] <wmcs-alerts>	 FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:13:02] <wmcs-alerts>	 FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:17:10] <wmf-insecte>	 Yippee, build fixed!
[15:17:10] <wmf-insecte>	 Project beta-code-update-eqiad build #556223: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/556223/
[15:18:02] <wmcs-alerts>	 FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:18:28] <wmcs-alerts>	 RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown
[15:18:28] <wmcs-alerts>	 FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[15:18:35] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297#10995799 (10wmcs-alerts)
[15:21:28] <wmcs-alerts>	 FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[15:23:11] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995817 (10doctaxon) As noted: https://phabricator.wikimedia.org/T289318#10995040
[15:24:37] <wikibugs>	 06Project-Admins, 06Security-Team, 07SecTeam-Processed, 07Security: Modify security-related Phabricator projects related to incidents and audits - https://phabricator.wikimedia.org/T398840#10995822 (10A_smart_kitten) >>! In T398840#10995587, @mmartorana wrote: > Maybe do we want to also have a general proj...
[15:25:35] <wikibugs>	 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10995823 (10Dzahn) Great! Glad to hear it was a pretty easy fix then.
[15:26:14] <wikibugs>	 10Diffusion, 10Phabricator, 06collaboration-services: Drop our mirroring of code to Diffusion and empty the repos - https://phabricator.wikimedia.org/T359549#10995824 (10Dzahn) 05Open→03Stalled Alright, sounds good to me. Let's call it stalled then for the moment.
[15:28:09] <wikibugs>	 10Gerrit, 06collaboration-services: Gerrit: Hiera lookup for monitoring - https://phabricator.wikimedia.org/T399282#10995826 (10Dzahn)
[15:28:28] <wmcs-alerts>	 RESOLVED: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[15:29:35] <bd808>	 Things in the Beta Cluster are coming back to life. The IRC bot seems to be behind the web ui in reporting recoveries.
[15:34:58] <bd808>	 !log Hard reboot of deployment-webperf21 (T399281)
[15:35:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[15:35:00] <stashbot>	 T399281: 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281
[15:38:02] <wmcs-alerts>	 RESOLVED: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[15:59:45] <bd808>	 I think beta is technically back up and just overwhelmed by crawlers again.
[16:06:50] <bd808>	 !log Added 7 Class A networks to block list
[16:06:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[16:18:28] <wmcs-alerts>	 FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[16:40:08] <wikibugs>	 (03update) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886
[16:40:10] <wikibugs>	 (03open) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886
[16:40:13] <wikibugs>	 (03update) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886
[16:40:15] <wikibugs>	 (03update) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886
[16:40:57] <wikibugs>	 (03open) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887
[16:41:10] <wikibugs>	 (03update) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887
[16:41:11] <wikibugs>	 (03update) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887
[16:41:13] <wmf-insecte>	 Project beta-scap-sync-world build #214735: 04FAILURE in 16 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214735/
[16:43:08] <wikibugs>	 (03merge) 10dduvall: spiderpig: Avoid race condition when refreshing train status [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/885 (https://phabricator.wikimedia.org/T399120)
[16:43:41] <wikibugs>	 (03update) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887
[16:48:11] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329 (10bd808) 03NEW
[16:48:33] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996173 (10bd808) 05Open→03In progress p:05Triage→03High
[16:50:40] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996177 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/f45bc3ef01fcc1afc18343694a188b1656814d6b%5E%21/#F0 ` diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml index...
[16:52:55] <bd808>	 !log Reboot deployment-mediawiki14 to clear all open connections (T399329)
[16:52:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[16:52:59] <stashbot>	 T399329: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329
[16:54:30] <bd808>	 https://meta.wikimedia.beta.wmcloud.org/wiki/Special:MyLanguage/Main_Page is alive! Let's see how long it lasts...
[16:56:30] <wikibugs>	 (03merge) 10dduvall: image: Refactor `kokkuri image build` image ref exports [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/139 (https://phabricator.wikimedia.org/T399120)
[17:01:15] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Testing Support, 07Epic: Run Selenium tests targeting Beta cluster - https://phabricator.wikimedia.org/T373680#10996227 (10zeljkofilipin)
[17:02:39] <wmf-insecte>	 Yippee, build fixed!
[17:02:40] <wmf-insecte>	 Project beta-scap-sync-world build #214736: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214736/
[17:10:23] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996263 (10bd808) 05In progress→03Resolved Things look stable for now. I'm back to having blocked way, way too much of the internet to get here. :/
[17:11:38] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996272 (10bd808)
[17:22:30] <wikibugs>	 (03open) 10dduvall: version: 2.8.0 [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/140
[17:25:56] <wikibugs>	 (03merge) 10dduvall: version: 2.8.0 [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/140
[17:44:40] <wmf-insecte>	 Project beta-scap-sync-world build #214739: 04FAILURE in 8 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214739/
[17:52:58] <wikibugs>	 10Gerrit, 06collaboration-services: Rename of a repository is not replicated - https://phabricator.wikimedia.org/T398401#10996367 (10Dzahn) >>! In T398401#10966583, @hashar wrote: > Easiest: we can allow port 29418 when the source is the active host (gerrit1003).  All gerrit servers now have the new firewall r...
[18:02:59] <wmf-insecte>	 Yippee, build fixed!
[18:02:59] <wmf-insecte>	 Project beta-scap-sync-world build #214740: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214740/
[18:21:33] <wikibugs>	 (03approved) 10dancy: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886 (owner: 10dduvall)
[18:21:58] <wikibugs>	 (03PS1) 10Subramanya Sastry: Add prefixes for more wikis [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168229
[18:21:58] <wikibugs>	 (03PS1) 10Subramanya Sastry: Hide footers since we now have different footer text for Parsoid [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168230
[18:22:48] <wikibugs>	 (03CR) 10Subramanya Sastry: "Maybe worth adding all missing prefixes in one shot by looking at the spreadsheet and updating this.  But, next time I look at this code." [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168229 (owner: 10Subramanya Sastry)
[18:23:15] <wikibugs>	 (03approved) 10dancy: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887 (owner: 10dduvall)
[18:44:44] <wikibugs>	 (03merge) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887
[18:45:00] <wikibugs>	 (03update) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886
[18:47:29] <wikibugs>	 (03merge) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886
[19:02:30] <wikibugs>	 (03update) 10dancy: Add debug_logstash config flag [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/884
[19:03:40] <wikibugs>	 (03merge) 10dancy: Add debug_logstash config flag [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/884
[19:04:22] <wikibugs>	 (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873)
[19:04:23] <wikibugs>	 (03open) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873)
[19:07:20] <wikibugs>	 (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873)
[19:08:10] <wikibugs>	 (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873)
[19:08:15] <wikibugs>	 (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873)
[19:09:12] <wikibugs>	 (03approved) 10dduvall: build-images.py: Don't sleep after full build in train-dev [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/189 (https://phabricator.wikimedia.org/T390251) (owner: 10dancy)
[19:12:14] <wikibugs>	 (03open) 10dancy: spiderpig.py: Remove leftover debugging [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/889
[19:12:17] <wikibugs>	 (03update) 10dancy: spiderpig.py: Remove leftover debugging [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/889
[19:12:38] <wikibugs>	 (03approved) 10dduvall: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873) (owner: 10dancy)
[19:12:53] <wikibugs>	 (03merge) 10dancy: build-images.py: Don't sleep after full build in train-dev [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/189 (https://phabricator.wikimedia.org/T390251)
[19:13:28] <wikibugs>	 (03merge) 10dancy: spiderpig.py: Remove leftover debugging [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/889
[19:13:55] <wikibugs>	 (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873)
[19:59:14] <wikibugs>	 10Gerrit: Unable to push commit whose parent is not the latest commit for the parent patch - https://phabricator.wikimedia.org/T399241#10996513 (10Daimona) Would that mean this was caused by a gerrit upgrade? The last one was in T390666 a few months ago AFAICT, but I've only had this issue for the last few days....
[20:41:45] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996571 (10bd808) 05Resolved→03In progress Well that period of 1-2 load average didn't last long. :/ {F63964490, size=full}
[20:54:07] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996606 (10bd808) `lang=shell-session root@deployment-mediawiki14:~# ./big-ban-hammer.sh     - 5.0.0.0/8         # 1142 hits     - 13.0.0.0/8        # 1724 hits     - 17.0.0.0/8        # 2032 hits     - 27.0.0...
[20:55:04] <bd808>	 !log blocked even more wide IP ranges in an attempt to get the load on deployment-mediawiki14 consistently below 3. (T399329)
[20:55:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[20:55:07] <stashbot>	 T399329: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329
[20:55:59] <bd808>	 "Error: 403, Requests from your IP have been blocked" -- heh. I managed to block myself. :)
[20:57:58] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996624 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/ff29b596475e04e2647d6c9ed4468c704f30ff0d%5E%21/#F0 ` diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml index...
[20:59:32] <bd808>	 !log Reboted deployment-mediawiki14 to clear active load (T399329)
[20:59:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[21:10:57] <wikibugs>	 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996643 (10bd808)
[21:10:58] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10996646 (10bd808) →14Duplicate dup:03T399329
[21:12:03] <wikibugs>	 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297#10996651 (10bd808) →14Duplicate dup:03T399281
[21:13:43] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996653 (10bd808) `COUNTEREXAMPLE Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Erro...
[21:17:04] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996654 (10bd808) Caused by https://gerrit.wikimedia.org/r/c/operations/puppet/+/1164432. https://gerrit.wikimedia.org/r/c/labs/private/+/1155221...
[21:21:55] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996660 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/fac41347f20004b55c8e8f05d435fc0ec8284612%5E%21/#F0 ` dif...
[21:22:30] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996661 (10bd808) `counterexample Jul 11 21:21:28 deployment-urldownloader04 systemd[1]: Starting nginx.service - A high performance web server a...
[21:27:37] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996664 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/bb3cfaba98865308262c9755f7d80da72f793072%5E%21/#F0 ` dif...
[21:27:47] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996665 (10bd808)
[21:27:48] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10996666 (10bd808)
[21:28:28] <wmcs-alerts>	 RESOLVED: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[21:29:34] <wikibugs>	 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996668 (10bd808) 05Open→03Resolved a:03bd808
[21:32:31] <wikibugs>	 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399307#10996673 (10bd808) 05Open→03Resolved a:03bd808 `lang=shell-session bd808@mbp03:~$ ssh deployment-puppetserver-1.deployment-prep.eqiad1.wi...
[21:37:28] <wmcs-alerts>	 RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates
[21:37:36] <wikibugs>	 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399342 (10wmcs-alerts) 03NEW
[21:41:49] <bd808>	 !log Unblock 37.114.160.0/19 (T399236)
[21:41:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[21:46:57] <wikibugs>	 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399307#10996710 (10bd808)
[21:46:58] <wikibugs>	 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399342#10996712 (10bd808) →14Duplicate dup:03T399307
[21:58:51] <wikibugs>	 10Beta-Cluster-Infrastructure, 10MW-1.45-notes (1.45.0-wmf.9; 2025-07-08), 13Patch-For-Review: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318#10996739 (10bd808) >>! In T289318#10995040, @doctaxon wrote: > But https://en.wikipedia.beta.wmcloud.org/wiki/Special:Blankp...
[22:00:55] <wikibugs>	 10GitLab (CI & Job Runners), 13Patch-For-Review: [kokuri] Use a unique per CI run tag by default - https://phabricator.wikimedia.org/T399120#10996742 (10dduvall) @bd808 Kokkuri 2.8.0 will include the digest in the image ref. See if that solves your issue.
[22:07:57] <wikibugs>	 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Doing 😎), 13Patch-For-Review: [kokuri] Use a unique per CI run tag by default - https://phabricator.wikimedia.org/T399120#10996746 (10bd808) p:05Triage→03Medium a:03dduvall I updated https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisionin...
[22:09:43] <wikibugs>	 (03CR) 10Arlolra: [C:03+2] Hide footers since we now have different footer text for Parsoid [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168230 (owner: 10Subramanya Sastry)
[22:10:55] <wikibugs>	 (03CR) 10Arlolra: [C:03+2] Add prefixes for more wikis [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168229 (owner: 10Subramanya Sastry)
[22:11:28] <wikibugs>	 (03Merged) 10jenkins-bot: Add prefixes for more wikis [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168229 (owner: 10Subramanya Sastry)
[22:11:29] <wikibugs>	 (03Merged) 10jenkins-bot: Hide footers since we now have different footer text for Parsoid [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168230 (owner: 10Subramanya Sastry)
[22:17:55] <wikibugs>	 (03open) 10dduvall: spiderpig: Annotate optional fields in TrainStatus [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/890
[22:17:56] <wikibugs>	 (03update) 10dduvall: spiderpig: Annotate optional fields in TrainStatus [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/890
[22:17:59] <wikibugs>	 (03update) 10dduvall: spiderpig: Annotate optional fields in TrainStatus [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/890
[22:29:57] <wikibugs>	 (03open) 10dduvall: spiderpig: Display errors and warnings in their own row [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/891
[22:30:14] <wikibugs>	 (03update) 10dduvall: spiderpig: Display errors and warnings in their own row [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/891
[22:30:16] <wikibugs>	 (03update) 10dduvall: spiderpig: Display errors and warnings in their own row [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/891
[22:39:23] <wikibugs>	 (03close) 10bd808: Revert "ci: Switch back to Digital Ocean runners by default" [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/28
[23:07:13] <wmf-insecte>	 Project beta-scap-sync-world build #214770: 04FAILURE in 2 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214770/
[23:12:04] <wmf-insecte>	 Yippee, build fixed!
[23:12:04] <wmf-insecte>	 Project beta-scap-sync-world build #214771: 09FIXED in 2 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214771/