[00:09:58] (03open) 10bd808: bastion: Apply profile::kubernetes::client [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/26 [00:12:21] (03merge) 10bd808: bastion: Apply profile::kubernetes::client [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/26 [00:26:35] (03open) 10bd808: bastion: Match expected YAML formatting [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/27 (https://phabricator.wikimedia.org/T398643) [00:27:24] (03merge) 10bd808: bastion: Match expected YAML formatting [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/27 (https://phabricator.wikimedia.org/T398643) [00:42:05] (03open) 10bd808: Revert "ci: Switch back to Digital Ocean runners by default" [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/28 [01:51:04] [wikitech-l] Beta Cluster now lives on beta.wmcloud.org. https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/YDABPV75LADRQCXMJAFWUP256N4EQ25B/ [02:06:47] 10Beta-Cluster-Infrastructure, 10MW-1.45-notes (1.45.0-wmf.9; 2025-07-08), 13Patch-For-Review: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318#10993916 (10Krinkle) [02:26:50] 10GitLab (Infrastructure), 06Release-Engineering-Team, 06collaboration-services: Upgrade GitLab to major version 18 - https://phabricator.wikimedia.org/T394382#10993948 (10Dzahn) >>! In T394382#10988305, @Jelto wrote: > I'll resolve the task, please re-open if there are any issues. We got this one: T399... [02:49:04] 10GitLab (Auth & Access), 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: Create bot to sync LDAP groups with related GitLab groups - https://phabricator.wikimedia.org/T319211#10993969 (10Dzahn) This bot currently has a problem talking to gitlab (T394382#10993948 T399250). It... [02:52:02] 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10993972 (10Dzahn) [05:15:55] (03PS2) 10Hashar: Zuul, jjb: [mediawiki/services/poolcounter] rm tox-poolcounter job [integration/config] - 10https://gerrit.wikimedia.org/r/1167649 (https://phabricator.wikimedia.org/T399075) [05:16:31] (03PS2) 10Hashar: dockerfiles: remove tox-poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/1167650 (https://phabricator.wikimedia.org/T399075) [05:20:25] (03CR) 10Hashar: [C:03+2] Zuul, jjb: [mediawiki/services/poolcounter] rm tox-poolcounter job [integration/config] - 10https://gerrit.wikimedia.org/r/1167649 (https://phabricator.wikimedia.org/T399075) (owner: 10Hashar) [05:21:49] (03Merged) 10jenkins-bot: Zuul, jjb: [mediawiki/services/poolcounter] rm tox-poolcounter job [integration/config] - 10https://gerrit.wikimedia.org/r/1167649 (https://phabricator.wikimedia.org/T399075) (owner: 10Hashar) [05:49:08] 10Gerrit: Unable to push commit whose parent is not the latest commit for the parent patch - https://phabricator.wikimedia.org/T399241#10994084 (10hashar) The message comes from `java/com/google/gerrit/server/git/receive/ReceiveCommits.java`, in `stable-3.10`: ` lang=java } else if (revisions.containsKey... [05:52:15] (03CR) 10Hashar: [C:03+2] dockerfiles: remove tox-poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/1167650 (https://phabricator.wikimedia.org/T399075) (owner: 10Hashar) [05:53:35] (03Merged) 10jenkins-bot: dockerfiles: remove tox-poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/1167650 (https://phabricator.wikimedia.org/T399075) (owner: 10Hashar) [05:55:43] 10Continuous-Integration-Config, 06MediaWiki-Platform-Team, 10PoolCounter, 13Patch-For-Review, 10Release Pipeline (Blubber): Migrate poolcounter to Blubber - https://phabricator.wikimedia.org/T399075#10994089 (10hashar) 05Openβ†’03Resolved a:03hashar [05:56:08] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10994093 (10hashar) [06:28:11] 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10994122 (10Jelto) [06:42:20] 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10994134 (10Jelto) >>! In T399250#10993963, @Dzahn wrote: > https://gitlab.wikimedia.org/admin/users/ldap-sync-bot/impersonation_tokens says `This user has... [06:55:46] 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10994142 (10Jelto) 05Openβ†’03Resolved p:05Triageβ†’03Medium a:03Jelto I rotated the token and triggered the `sync-gitlab-group-with-ldap.service`... [07:18:42] 10Diffusion, 10Phabricator, 06collaboration-services: Drop our mirroring of code to Diffusion and empty the repos - https://phabricator.wikimedia.org/T359549#10994160 (10A_smart_kitten) >>! In T359549#10993837, @Dzahn wrote: > Are there any vetoes / hard concerns about making it visible only to logged in use... [07:19:28] FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [07:19:35] FIRING: [7x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [07:19:39] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261 (10wmcs-alerts) 03NEW [07:24:32] RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [07:24:33] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994185 (10wmcs-alerts) [07:24:35] RESOLVED: [7x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:14:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:18:05] FIRING: [2x] InstanceDown: Project deployment-prep instance deployment-mediawiki13 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:18:28] FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [08:21:28] RESOLVED: [18x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:23:32] RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [08:23:35] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994329 (10wmcs-alerts) [08:55:39] 10Continuous-Integration-Infrastructure (Zuul upgrade), 10Release-Engineering-Team (Priority Backlog πŸ“₯): Plan for porting PipelineLib to Zuul Ansible - https://phabricator.wikimedia.org/T390119#10994511 (10hashar) [08:55:52] 10Continuous-Integration-Infrastructure (Zuul upgrade), 10Release-Engineering-Team (Priority Backlog πŸ“₯): Plan for porting PipelineLib to Zuul Ansible - https://phabricator.wikimedia.org/T390119#10994514 (10hashar) [09:03:01] (03PS1) 10Hashar: Archive wikibase/release-prototype [integration/config] - 10https://gerrit.wikimedia.org/r/1168126 (https://phabricator.wikimedia.org/T399279) [09:03:26] (03CR) 10Hashar: [C:03+2] Archive wikibase/release-prototype [integration/config] - 10https://gerrit.wikimedia.org/r/1168126 (https://phabricator.wikimedia.org/T399279) (owner: 10Hashar) [09:04:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:04:34] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994544 (10wmcs-alerts) [09:04:53] (03Merged) 10jenkins-bot: Archive wikibase/release-prototype [integration/config] - 10https://gerrit.wikimedia.org/r/1168126 (https://phabricator.wikimedia.org/T399279) (owner: 10Hashar) [09:09:38] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994566 (10fnegri) [09:10:25] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994570 (10fnegri) Possibly related to {T399281} [09:12:45] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10994574 (10fnegri) 05Openβ†’03Resolved a:03fnegri This is now resolved, but there were two separate spikes of unreachable instances: {F63886612} [09:18:52] (03PS1) 10Hashar: dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) [09:24:09] 10Diffusion, 10Phabricator, 06collaboration-services: Drop our mirroring of code to Diffusion and empty the repos - https://phabricator.wikimedia.org/T359549#10994611 (10Aklapper) >>! In T359549#10146373, @valerio.bozzolan wrote: > I put my details directly in the original description. I'd appreciate commen... [09:30:25] 10Diffusion, 10Phabricator, 06collaboration-services: Drop our mirroring of code to Diffusion and empty the repos - https://phabricator.wikimedia.org/T359549#10994639 (10Aklapper) >>! In T359549#10895207, @Jelto wrote: > A reason against diffusion is the poor performance. Everything above a few dozen RPS ove... [09:34:25] 10Continuous-Integration-Infrastructure (Zuul upgrade), 10Quibble: Remove reliance on EXECUTOR_NUMBER environment variable in CI - https://phabricator.wikimedia.org/T399283 (10hashar) 03NEW [09:36:52] 10Continuous-Integration-Infrastructure (Zuul upgrade), 06Fundraising-Backlog, 10Quibble: Remove reliance on EXECUTOR_NUMBER environment variable in CI - https://phabricator.wikimedia.org/T399283#10994671 (10hashar) wikimedia/fundraising/tools has: ` lang=python,name=silverpop_export/tests/test_update.py def... [09:39:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:39:36] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994680 (10wmcs-alerts) [10:17:34] (03PS2) 10Hashar: dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) [10:24:17] (03CR) 10CI reject: [V:04-1] dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [10:24:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:24:39] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994790 (10wmcs-alerts) [10:29:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:29:35] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994806 (10wmcs-alerts) [11:13:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:13:40] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10994986 (10wmcs-alerts) [11:30:05] maintenance-disconnect-full-disks build 718244 integration-agent-docker-1041 (/: 26%, /srv: 95%, /var/lib/docker: 46%): OFFLINE due to disk space [11:40:32] FIRING: [7x] InstanceDown: Project deployment-prep instance deployment-cirrussearch12 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:41:28] FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [11:45:09] maintenance-disconnect-full-disks build 718247 integration-agent-docker-1041 (/: 26%, /srv: 35%, /var/lib/docker: 46%): RECOVERY disk space OK [11:45:28] RESOLVED: [14x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:46:28] RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [11:46:35] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297#10995075 (10wmcs-alerts) [11:48:57] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [11:53:47] (03CR) 10Hashar: [C:03+2] dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [11:55:31] (03Merged) 10jenkins-bot: dockerfiles: create tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168128 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [12:11:34] (03CR) 10Hashar: [C:03+2] "I have built the image right now as part of building another image." [integration/config] - 10https://gerrit.wikimedia.org/r/1167537 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [12:18:32] FIRING: [6x] InstanceDown: Project deployment-prep instance deployment-cache-text08 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:19:28] FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [12:23:28] RESOLVED: [12x] InstanceDown: Project deployment-prep instance deployment-cache-text08 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:24:28] RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [12:24:37] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297#10995157 (10wmcs-alerts) [12:42:24] (03PS2) 10Hashar: jjb: change tox-mysqld jobs to Bookworm and tox v3 [integration/config] - 10https://gerrit.wikimedia.org/r/1168168 (https://phabricator.wikimedia.org/T335765) [12:42:24] (03PS1) 10Hashar: dockerfiles: add pkg-config to tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168170 [12:44:23] (03CR) 10Hashar: [C:03+2] dockerfiles: add pkg-config to tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168170 (owner: 10Hashar) [12:45:45] (03Merged) 10jenkins-bot: dockerfiles: add pkg-config to tox-v3-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168170 (owner: 10Hashar) [12:47:58] re hello, is T399297 the task to keep in eye on regarding the beta sites being down? [12:47:59] T399297: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297 [12:48:08] (moving to this channel which IIUC is more appropriate for beta stuff than -operations) [12:48:18] there’s been some activity in T399281 but it’s not clear to me if that would affect the beta cluster as well or only toolforge [12:48:18] T399281: 2025-07-11 Toolforge tools not responding - https://phabricator.wikimedia.org/T399281 [12:48:55] it feels like we could use a general β€œbeta down” style task to attach tasks like T399261, T399297, maybe T399216 to [12:48:55] T399261: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261 [12:48:56] T399216: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216 [12:54:19] Lucas_WMDE: thanks! I agree, a general task would make the situation a bit clearer [12:57:29] 10Continuous-Integration-Infrastructure (Zuul upgrade): Provision user in Cloud VPS hosted Kubernetes cluster for use by nodepool - https://phabricator.wikimedia.org/T398367#10995240 (10jnuche) a:03jnuche [12:57:32] 10Continuous-Integration-Infrastructure (Zuul upgrade): Setup RBAC for Cloud VPS hosted kubernetes cluster - https://phabricator.wikimedia.org/T398365#10995241 (10jnuche) a:03jnuche [13:05:30] (03CR) 10Hashar: [C:03+2] jjb: change tox-mysqld jobs to Bookworm and tox v3 [integration/config] - 10https://gerrit.wikimedia.org/r/1168168 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [13:07:06] (03Merged) 10jenkins-bot: jjb: change tox-mysqld jobs to Bookworm and tox v3 [integration/config] - 10https://gerrit.wikimedia.org/r/1168168 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [13:07:46] (03PS1) 10Hashar: dockerfiles: remove tox-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168175 (https://phabricator.wikimedia.org/T335765) [13:09:44] 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303 (10Lucas_Werkmeister_WMDE) 03NEW [13:10:04] created ^ [13:10:05] 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995331 (10Lucas_Werkmeister_WMDE) [13:10:08] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399261#10995333 (10Lucas_Werkmeister_WMDE) [13:10:09] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10995334 (10Lucas_Werkmeister_WMDE) [13:12:57] 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995344 (10Lucas_Werkmeister_WMDE) The [beta-code-update-eqiad](https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/), [beta-scap-sync-world](https://integration.wikimedia.org/ci/view/Be... [13:17:44] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10995358 (10hashar) [13:20:20] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10995377 (10hashar) We can't remove tox-java8 which used by Cergen. That is still used... [13:20:54] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10995382 (10hashar) [13:21:48] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Migrate all CI jobs from buster to bullseye or later and drop buster testing support - https://phabricator.wikimedia.org/T335765#10995385 (10hashar) [13:22:14] (03CR) 10Hashar: [C:03+2] dockerfiles: remove tox-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168175 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [13:23:47] (03Merged) 10jenkins-bot: dockerfiles: remove tox-mysqld [integration/config] - 10https://gerrit.wikimedia.org/r/1168175 (https://phabricator.wikimedia.org/T335765) (owner: 10Hashar) [13:24:10] So just zuul-cloner and then waiting for Cergen? Nice! [13:24:36] yeah [13:24:48] zuul-cloner is no more needed with the new Zuul [13:24:56] But still needed until we migrate. [13:25:03] When is that happening? Years? [13:25:06] cause that is Zuul pushing the prepared repositories to the worker nodes [13:25:15] so essentially [13:25:22] Buster is gone now [13:25:32] Now, do bullseye. [13:25:53] I so want us to migrate to RedHat or Oracle Linux with 5 / 10 years services [13:26:02] so the next upgrade would be 2035 :] [13:26:21] I am going to look at aggregating images to get a few less [13:26:23] And being able to test PHP 8.4 would come in ~2035. [13:26:28] na [13:26:31] well [13:26:48] we could well compile php8.4 against a different OS version [13:27:05] well that is pretty much what we do for prod I think [13:27:16] though how it is done without CI is an entire mystery to me [13:27:32] Yes, but then we'd be avoiding the whole point of paying RedHat/etc. to do the OS/package mangement for us. [13:28:03] and that makes me wonder why Oracle does not sale an OS [13:28:18] (you'd have a licence that makes you pay per keystroke) [13:28:21] i have my Solaris CDs (not DVDs!) somewhere. [13:28:26] or the amount of time you stare at the screen [13:29:25] I think next week I will work on making MediaWiki PHPUnit test to not run every single tests [13:30:01] oh and jsduck will be gone :) [13:30:53] or maybe I look at T397429 [13:30:53] T397429: Reduce the number of CI images - https://phabricator.wikimedia.org/T397429 [13:30:58] to aggregate similar images [13:31:12] For jsduck we're waiting for REL1_39 EOL, I think? [13:33:22] depends on whether we move the Quibble bullseye image to bookworm :) [13:33:40] That's blocked on SRE packaging PHP 8.1 for bookworm. [13:33:58] then I have seen a patch to jump us to php 8.3 [13:34:02] T397075 -> T362705 [13:34:03] T397075: Package Wikimedia's PHP 8.1 component for bookworm - https://phabricator.wikimedia.org/T397075 [13:34:03] T362705: Migrate Quibble images from bullseye to bookworm - https://phabricator.wikimedia.org/T362705 [13:34:29] that is what I was wondering during the Product & Tech presentation about MediaWiki + php upgrade [13:34:45] it seems like MediaWiki team is owning the process and I guess they have a roadmap somewhere [13:34:51] and if not, they should [13:34:52] 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995434 (10Lucas_Werkmeister_WMDE) [13:34:57] and we can then be kept informed about the process [13:34:59] but [13:35:11] 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995437 (10Lucas_Werkmeister_WMDE) [13:35:12] then I can just rely on James :] [13:35:40] * James_F grins. [13:36:18] in the ideal world the road map would be presented every month with the progress [13:36:28] but well hmm [13:36:32] different company :] [13:40:44] 10GitLab (Infrastructure), 06collaboration-services: upgrade gitlab hosts to bookworm - https://phabricator.wikimedia.org/T399306 (10Jelto) 03NEW [14:07:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:11:28] FIRING: [4x] InstanceDown: Project deployment-prep instance deployment-kafka-logging01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:13:11] 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399307 (10wmcs-alerts) 03NEW [14:13:28] FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [14:16:28] FIRING: [10x] InstanceDown: Project deployment-prep instance deployment-cirrussearch13 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:18:28] RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [14:21:28] FIRING: [13x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:23:28] FIRING: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [14:23:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance deployment-schema-3 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:26:28] FIRING: [19x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:32:43] FIRING: [23x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:33:02] Project beta-code-update-eqiad build #556221: 04FAILURE in 30 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/556221/ [14:33:02] FIRING: [24x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:37:58] FIRING: [24x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:38:02] FIRING: [25x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:43:02] FIRING: [26x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:47:50] poor beta [14:48:02] FIRING: [26x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:48:28] FIRING: [26x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:53:02] FIRING: [27x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:53:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance deployment-etcd02 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:58:02] FIRING: [29x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:00:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:03:01] Project beta-code-update-eqiad build #556222: 04STILL FAILING in 30 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/556222/ [15:03:02] FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:04:50] !log Hard reboot of deployment-acme-chief05 (T399281) [15:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:04:53] T399281: 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281 [15:05:01] Project beta-update-databases-eqiad build #86100: 15ABORTED in 45 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/86100/ [15:08:02] FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:08:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:09:28] FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:13:02] FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:17:10] Yippee, build fixed! [15:17:10] Project beta-code-update-eqiad build #556223: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/556223/ [15:18:02] FIRING: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:18:28] RESOLVED: WidespreadInstanceDown: Widespread instances down in project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadInstanceDown [15:18:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:18:35] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297#10995799 (10wmcs-alerts) [15:21:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:23:11] 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10995817 (10doctaxon) As noted: https://phabricator.wikimedia.org/T289318#10995040 [15:24:37] 06Project-Admins, 06Security-Team, 07SecTeam-Processed, 07Security: Modify security-related Phabricator projects related to incidents and audits - https://phabricator.wikimedia.org/T398840#10995822 (10A_smart_kitten) >>! In T398840#10995587, @mmartorana wrote: > Maybe do we want to also have a general proj... [15:25:35] 10GitLab, 06collaboration-services: SystemdUnitFailed - sync-gitlab-group-with-ldap.service on gitlab2002:9100 - https://phabricator.wikimedia.org/T399250#10995823 (10Dzahn) Great! Glad to hear it was a pretty easy fix then. [15:26:14] 10Diffusion, 10Phabricator, 06collaboration-services: Drop our mirroring of code to Diffusion and empty the repos - https://phabricator.wikimedia.org/T359549#10995824 (10Dzahn) 05Openβ†’03Stalled Alright, sounds good to me. Let's call it stalled then for the moment. [15:28:09] 10Gerrit, 06collaboration-services: Gerrit: Hiera lookup for monitoring - https://phabricator.wikimedia.org/T399282#10995826 (10Dzahn) [15:28:28] RESOLVED: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [15:29:35] Things in the Beta Cluster are coming back to life. The IRC bot seems to be behind the web ui in reporting recoveries. [15:34:58] !log Hard reboot of deployment-webperf21 (T399281) [15:35:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:35:00] T399281: 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures - https://phabricator.wikimedia.org/T399281 [15:38:02] RESOLVED: [30x] InstanceDown: Project deployment-prep instance deployment-acme-chief05 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:59:45] I think beta is technically back up and just overwhelmed by crawlers again. [16:06:50] !log Added 7 Class A networks to block list [16:06:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:18:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [16:40:08] (03update) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886 [16:40:10] (03open) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886 [16:40:13] (03update) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886 [16:40:15] (03update) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886 [16:40:57] (03open) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887 [16:41:10] (03update) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887 [16:41:11] (03update) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887 [16:41:13] Project beta-scap-sync-world build #214735: 04FAILURE in 16 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214735/ [16:43:08] (03merge) 10dduvall: spiderpig: Avoid race condition when refreshing train status [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/885 (https://phabricator.wikimedia.org/T399120) [16:43:41] (03update) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887 [16:48:11] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329 (10bd808) 03NEW [16:48:33] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996173 (10bd808) 05Openβ†’03In progress p:05Triageβ†’03High [16:50:40] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996177 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/f45bc3ef01fcc1afc18343694a188b1656814d6b%5E%21/#F0 ` diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml index... [16:52:55] !log Reboot deployment-mediawiki14 to clear all open connections (T399329) [16:52:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:52:59] T399329: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329 [16:54:30] https://meta.wikimedia.beta.wmcloud.org/wiki/Special:MyLanguage/Main_Page is alive! Let's see how long it lasts... [16:56:30] (03merge) 10dduvall: image: Refactor `kokkuri image build` image ref exports [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/139 (https://phabricator.wikimedia.org/T399120) [17:01:15] 10Beta-Cluster-Infrastructure, 10Testing Support, 07Epic: Run Selenium tests targeting Beta cluster - https://phabricator.wikimedia.org/T373680#10996227 (10zeljkofilipin) [17:02:39] Yippee, build fixed! [17:02:40] Project beta-scap-sync-world build #214736: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214736/ [17:10:23] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996263 (10bd808) 05In progressβ†’03Resolved Things look stable for now. I'm back to having blocked way, way too much of the internet to get here. :/ [17:11:38] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996272 (10bd808) [17:22:30] (03open) 10dduvall: version: 2.8.0 [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/140 [17:25:56] (03merge) 10dduvall: version: 2.8.0 [repos/releng/kokkuri] - 10https://gitlab.wikimedia.org/repos/releng/kokkuri/-/merge_requests/140 [17:44:40] Project beta-scap-sync-world build #214739: 04FAILURE in 8 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214739/ [17:52:58] 10Gerrit, 06collaboration-services: Rename of a repository is not replicated - https://phabricator.wikimedia.org/T398401#10996367 (10Dzahn) >>! In T398401#10966583, @hashar wrote: > Easiest: we can allow port 29418 when the source is the active host (gerrit1003). All gerrit servers now have the new firewall r... [18:02:59] Yippee, build fixed! [18:02:59] Project beta-scap-sync-world build #214740: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214740/ [18:21:33] (03approved) 10dancy: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886 (owner: 10dduvall) [18:21:58] (03PS1) 10Subramanya Sastry: Add prefixes for more wikis [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168229 [18:21:58] (03PS1) 10Subramanya Sastry: Hide footers since we now have different footer text for Parsoid [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168230 [18:22:48] (03CR) 10Subramanya Sastry: "Maybe worth adding all missing prefixes in one shot by looking at the spreadsheet and updating this. But, next time I look at this code." [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168229 (owner: 10Subramanya Sastry) [18:23:15] (03approved) 10dancy: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887 (owner: 10dduvall) [18:44:44] (03merge) 10dduvall: spiderpig: Fix obscured "..." menu items [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/887 [18:45:00] (03update) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886 [18:47:29] (03merge) 10dduvall: spiderpig: Improve train deployment UI for small screens [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/886 [19:02:30] (03update) 10dancy: Add debug_logstash config flag [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/884 [19:03:40] (03merge) 10dancy: Add debug_logstash config flag [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/884 [19:04:22] (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873) [19:04:23] (03open) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873) [19:07:20] (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873) [19:08:10] (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873) [19:08:15] (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873) [19:09:12] (03approved) 10dduvall: build-images.py: Don't sleep after full build in train-dev [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/189 (https://phabricator.wikimedia.org/T390251) (owner: 10dancy) [19:12:14] (03open) 10dancy: spiderpig.py: Remove leftover debugging [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/889 [19:12:17] (03update) 10dancy: spiderpig.py: Remove leftover debugging [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/889 [19:12:38] (03approved) 10dduvall: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873) (owner: 10dancy) [19:12:53] (03merge) 10dancy: build-images.py: Don't sleep after full build in train-dev [repos/releng/release] - 10https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/189 (https://phabricator.wikimedia.org/T390251) [19:13:28] (03merge) 10dancy: spiderpig.py: Remove leftover debugging [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/889 [19:13:55] (03update) 10dancy: utils.py: Make select_latest_patches treat /srv/patches/next as latest [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/888 (https://phabricator.wikimedia.org/T295925 https://phabricator.wikimedia.org/T398873) [19:59:14] 10Gerrit: Unable to push commit whose parent is not the latest commit for the parent patch - https://phabricator.wikimedia.org/T399241#10996513 (10Daimona) Would that mean this was caused by a gerrit upgrade? The last one was in T390666 a few months ago AFAICT, but I've only had this issue for the last few days.... [20:41:45] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996571 (10bd808) 05Resolvedβ†’03In progress Well that period of 1-2 load average didn't last long. :/ {F63964490, size=full} [20:54:07] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996606 (10bd808) `lang=shell-session root@deployment-mediawiki14:~# ./big-ban-hammer.sh - 5.0.0.0/8 # 1142 hits - 13.0.0.0/8 # 1724 hits - 17.0.0.0/8 # 2032 hits - 27.0.0... [20:55:04] !log blocked even more wide IP ranges in an attempt to get the load on deployment-mediawiki14 consistently below 3. (T399329) [20:55:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:55:07] T399329: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329 [20:55:59] "Error: 403, Requests from your IP have been blocked" -- heh. I managed to block myself. :) [20:57:58] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996624 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/ff29b596475e04e2647d6c9ed4468c704f30ff0d%5E%21/#F0 ` diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml index... [20:59:32] !log Reboted deployment-mediawiki14 to clear active load (T399329) [20:59:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:10:57] 10Beta-Cluster-Infrastructure: 2025-07-11 traffic overload - https://phabricator.wikimedia.org/T399329#10996643 (10bd808) [21:10:58] 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10996646 (10bd808) β†’14Duplicate dup:03T399329 [21:12:03] 10Beta-Cluster-Infrastructure: Widespread instances down in project deployment-prep - https://phabricator.wikimedia.org/T399297#10996651 (10bd808) β†’14Duplicate dup:03T399281 [21:13:43] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996653 (10bd808) `COUNTEREXAMPLE Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Erro... [21:17:04] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996654 (10bd808) Caused by https://gerrit.wikimedia.org/r/c/operations/puppet/+/1164432. https://gerrit.wikimedia.org/r/c/labs/private/+/1155221... [21:21:55] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996660 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/fac41347f20004b55c8e8f05d435fc0ec8284612%5E%21/#F0 ` dif... [21:22:30] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996661 (10bd808) `counterexample Jul 11 21:21:28 deployment-urldownloader04 systemd[1]: Starting nginx.service - A high performance web server a... [21:27:37] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996664 (10bd808) https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/bb3cfaba98865308262c9755f7d80da72f793072%5E%21/#F0 ` dif... [21:27:47] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996665 (10bd808) [21:27:48] 10Beta-Cluster-Infrastructure: Beta cluster down 2025-07-11 - https://phabricator.wikimedia.org/T399303#10996666 (10bd808) [21:28:28] RESOLVED: [8x] PuppetAgentNoResources: No Puppet resources found on instance deployment-docker-mathoid02 on project deployment-prep - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:29:34] 10Beta-Cluster-Infrastructure: No Puppet resources found on instance deployment-urldownloader04 on project deployment-prep - https://phabricator.wikimedia.org/T399216#10996668 (10bd808) 05Openβ†’03Resolved a:03bd808 [21:32:31] 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399307#10996673 (10bd808) 05Openβ†’03Resolved a:03bd808 `lang=shell-session bd808@mbp03:~$ ssh deployment-puppetserver-1.deployment-prep.eqiad1.wi... [21:37:28] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [21:37:36] 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399342 (10wmcs-alerts) 03NEW [21:41:49] !log Unblock 37.114.160.0/19 (T399236) [21:41:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:46:57] 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399307#10996710 (10bd808) [21:46:58] 10Beta-Cluster-Infrastructure: Found non-revoked Puppet certificates for 1 deleted instances on deployment-puppetserver-1 - https://phabricator.wikimedia.org/T399342#10996712 (10bd808) β†’14Duplicate dup:03T399307 [21:58:51] 10Beta-Cluster-Infrastructure, 10MW-1.45-notes (1.45.0-wmf.9; 2025-07-08), 13Patch-For-Review: Move *.beta.wmflabs.org to *.beta.wmcloud.org - https://phabricator.wikimedia.org/T289318#10996739 (10bd808) >>! In T289318#10995040, @doctaxon wrote: > But https://en.wikipedia.beta.wmcloud.org/wiki/Special:Blankp... [22:00:55] 10GitLab (CI & Job Runners), 13Patch-For-Review: [kokuri] Use a unique per CI run tag by default - https://phabricator.wikimedia.org/T399120#10996742 (10dduvall) @bd808 Kokkuri 2.8.0 will include the digest in the image ref. See if that solves your issue. [22:07:57] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Doing 😎), 13Patch-For-Review: [kokuri] Use a unique per CI run tag by default - https://phabricator.wikimedia.org/T399120#10996746 (10bd808) p:05Triageβ†’03Medium a:03dduvall I updated https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisionin... [22:09:43] (03CR) 10Arlolra: [C:03+2] Hide footers since we now have different footer text for Parsoid [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168230 (owner: 10Subramanya Sastry) [22:10:55] (03CR) 10Arlolra: [C:03+2] Add prefixes for more wikis [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168229 (owner: 10Subramanya Sastry) [22:11:28] (03Merged) 10jenkins-bot: Add prefixes for more wikis [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168229 (owner: 10Subramanya Sastry) [22:11:29] (03Merged) 10jenkins-bot: Hide footers since we now have different footer text for Parsoid [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/1168230 (owner: 10Subramanya Sastry) [22:17:55] (03open) 10dduvall: spiderpig: Annotate optional fields in TrainStatus [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/890 [22:17:56] (03update) 10dduvall: spiderpig: Annotate optional fields in TrainStatus [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/890 [22:17:59] (03update) 10dduvall: spiderpig: Annotate optional fields in TrainStatus [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/890 [22:29:57] (03open) 10dduvall: spiderpig: Display errors and warnings in their own row [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/891 [22:30:14] (03update) 10dduvall: spiderpig: Display errors and warnings in their own row [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/891 [22:30:16] (03update) 10dduvall: spiderpig: Display errors and warnings in their own row [repos/releng/scap] - 10https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/891 [22:39:23] (03close) 10bd808: Revert "ci: Switch back to Digital Ocean runners by default" [repos/releng/zuul/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/releng/zuul/tofu-provisioning/-/merge_requests/28 [23:07:13] Project beta-scap-sync-world build #214770: 04FAILURE in 2 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214770/ [23:12:04] Yippee, build fixed! [23:12:04] Project beta-scap-sync-world build #214771: 09FIXED in 2 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/214771/