[00:09:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:14:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:12:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:17:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:19:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:24:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [05:39:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:09:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:57:29] 06cloud-services-team, 10Toolforge: Investigate toolforge failure to schedule pods due to insufficient cpu - https://phabricator.wikimedia.org/T427204#11953753 (10Sebastian_Berlin-WMSE) I had the same thing happening again now when I tried starting another job. This one started after about half an hour, I thin... [07:02:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:37:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:49:11] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: bird bfd session with 172.20.1.1 down - Bad packet from 172.20.1.1 - unknown session id - https://phabricator.wikimedia.org/T427202#11953849 (10cmooney) Yeah not really sure what happened there @fgiunchedi, a sync issue with the se... [07:53:41] 06cloud-services-team, 10Toolforge: Investigate toolforge failure to schedule pods due to insufficient cpu - https://phabricator.wikimedia.org/T427204#11953858 (10fgiunchedi) Thank you for the detailed report @Sebastian_Berlin-WMSE ! I checked `commonsdb-registry` usage patterns here https://grafana.wmcloud.o... [08:10:31] 06cloud-services-team, 10Toolforge: Collect toolforge k8s scheduler metrics on port 10259 - https://phabricator.wikimedia.org/T427249 (10fgiunchedi) 03NEW [08:17:55] 06cloud-services-team, 10Toolforge: [infra,k8s] Scrape Kubernetes controller-manager and scheduler metrics into Prometheus - https://phabricator.wikimedia.org/T308381#11953924 (10taavi) [08:18:07] 06cloud-services-team, 10Toolforge: Collect toolforge k8s scheduler metrics on port 10259 - https://phabricator.wikimedia.org/T427249#11953926 (10taavi) →14Duplicate dup:03T308381 [08:18:09] 06cloud-services-team, 10Toolforge: [infra,k8s] Scrape Kubernetes controller-manager and scheduler metrics into Prometheus - https://phabricator.wikimedia.org/T308381#11953928 (10taavi) [08:32:24] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: bird bfd session with 172.20.1.1 down - Bad packet from 172.20.1.1 - unknown session id - https://phabricator.wikimedia.org/T427202#11953989 (10fgiunchedi) Thank you for the detailed explanation @cmooney, definitely TIL things abou... [08:39:10] (03approved) 10taavi: bandaid high RLIMIT_NOFILE in gitlab runners [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/85 (https://phabricator.wikimedia.org/T426827) (owner: 10filippo) [08:40:30] (03PS1) 10MusikAnimal: Authorship: add support for 28 more lanugages [labs/xtools] - 10https://gerrit.wikimedia.org/r/1293652 (https://phabricator.wikimedia.org/T372340) [08:40:30] (03merge) 10filippo: bandaid high RLIMIT_NOFILE in gitlab runners [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/85 (https://phabricator.wikimedia.org/T426827) [08:40:50] (03update) 10filippo: Use deb.d.o mirror [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/86 (https://phabricator.wikimedia.org/T423596) [08:41:01] (03merge) 10filippo: Use deb.d.o mirror [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/86 (https://phabricator.wikimedia.org/T423596) [08:41:22] (03CR) 10MusikAnimal: [C:03+2] Authorship: add support for 28 more lanugages [labs/xtools] - 10https://gerrit.wikimedia.org/r/1293652 (https://phabricator.wikimedia.org/T372340) (owner: 10MusikAnimal) [08:42:15] (03Merged) 10jenkins-bot: Authorship: add support for 28 more lanugages [labs/xtools] - 10https://gerrit.wikimedia.org/r/1293652 (https://phabricator.wikimedia.org/T372340) (owner: 10MusikAnimal) [08:44:06] (03PS1) 10MusikAnimal: 3.24.8 version bump [labs/xtools] - 10https://gerrit.wikimedia.org/r/1293654 [08:45:27] (03CR) 10MusikAnimal: [C:03+2] 3.24.8 version bump [labs/xtools] - 10https://gerrit.wikimedia.org/r/1293654 (owner: 10MusikAnimal) [08:46:52] (03Merged) 10jenkins-bot: 3.24.8 version bump [labs/xtools] - 10https://gerrit.wikimedia.org/r/1293654 (owner: 10MusikAnimal) [08:59:33] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops: bird bfd session with 172.20.1.1 down - Bad packet from 172.20.1.1 - unknown session id - https://phabricator.wikimedia.org/T427202#11954086 (10cmooney) >>! In T427202#11953989, @fgiunchedi wrote: > Thank you for the detailed expla... [09:15:01] (03open) 10filippo: debian-builder: update images [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/87 (https://phabricator.wikimedia.org/T423596 https://phabricator.wikimedia.org/T426827) [09:15:43] (03approved) 10fnegri: debian-builder: update images [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/87 (https://phabricator.wikimedia.org/T423596 https://phabricator.wikimedia.org/T426827) (owner: 10filippo) [09:16:29] (03merge) 10filippo: debian-builder: update images [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/87 (https://phabricator.wikimedia.org/T423596 https://phabricator.wikimedia.org/T426827) [09:18:45] 10Tool-global-search: "500: Internal Server Error" when global-searching with a regex that includes a less-than symbol (<) - https://phabricator.wikimedia.org/T426204#11954205 (10Tacsipacsi) Same for the regex `\b(tright|tleft)\b` (https://global-search.toolforge.org/?q=%5Cb%28tright%7Ctleft%29%5Cb®ex=1&names... [09:22:38] (03open) 10filippo: Default to 64mb memory requests [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/110 (https://phabricator.wikimedia.org/T420565) [09:26:08] 06cloud-services-team, 10Toolforge, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Adjust WMCS Gitlab CI/CD repo to stop using mirrors.wikimedia.org - https://phabricator.wikimedia.org/T423596#11954234 (10fgiunchedi) [09:26:15] (03open) 10vriaa: feat: support auto sizing for banner elements [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/70 (https://phabricator.wikimedia.org/T423936) [09:26:45] 10Tool-centralnotice-banner-editor: Extend Banner Editor to support more complex banner templates - https://phabricator.wikimedia.org/T420918#11954240 (10Oyelola_Victoria) 05Open→03Resolved [09:26:57] 10Tool-centralnotice-banner-editor: Add image lookup within the editor - https://phabricator.wikimedia.org/T421072#11954243 (10Oyelola_Victoria) 05Open→03Resolved [09:27:08] 10Tool-global-search: Missing URL escaping in Phabricator bug report link - https://phabricator.wikimedia.org/T427257 (10Tacsipacsi) 03NEW [09:27:11] 10Tool-centralnotice-banner-editor: Wikimedia Hackathon 2026: Wikimedia Commons image lookup in the CentralNotice Banner Editor - https://phabricator.wikimedia.org/T424078#11954256 (10Oyelola_Victoria) 05Open→03Resolved [09:27:24] 10Tool-centralnotice-banner-editor: Add support for dark mode in edited banners - https://phabricator.wikimedia.org/T420941#11954258 (10Oyelola_Victoria) 05Open→03Resolved [09:27:32] 10Tool-centralnotice-banner-editor: Add support for codex design color tokens - https://phabricator.wikimedia.org/T426315#11954259 (10Oyelola_Victoria) 05Open→03Resolved [09:27:41] 10Tool-centralnotice-banner-editor: Add dark mode support to the editor interface - https://phabricator.wikimedia.org/T421746#11954260 (10Oyelola_Victoria) 05Open→03Resolved [09:27:52] 10Tool-centralnotice-banner-editor: Add option to set border, margin, and padding for all sides at once - https://phabricator.wikimedia.org/T421063#11954261 (10Oyelola_Victoria) 05Open→03Resolved [09:28:17] 10Tool-global-search: Missing URL escaping in Phabricator bug report link - https://phabricator.wikimedia.org/T427257#11954264 (10Tacsipacsi) Side note: you may want to link to the form I used (https://phabricator.wikimedia.org/maniphest/task/edit/form/43/) rather than the generic https://phabricator.wikimedia.o... [09:31:04] (03open) 10vriaa: fix: add subtle background to image buttons for white transparent-image visibility [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/71 [09:31:35] (03open) 10vriaa: refactor: remove obsolete default-value props from BasicDropdown callers [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/72 [09:31:50] (03open) 10vriaa: fix: clean up id assignments across the editor [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/73 [09:32:15] (03open) 10vriaa: chore: remove unused list property and list icons [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/74 [09:57:18] 06cloud-services-team, 10Cloud-VPS: sso failure in codfw1dev (labtesthorizon.wikimedia.org) - https://phabricator.wikimedia.org/T409328#11954367 (10LSobanski) Untagging IF / CAS as I don't believe there's nothing for us to do here. [09:57:37] (03update) 10fnegri: Add --diff-mode and remove --dry-run [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/10 (https://phabricator.wikimedia.org/T351637) [09:57:37] (03update) 10fnegri: Add summary with counts [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/11 (https://phabricator.wikimedia.org/T351637) [09:57:50] (03update) 10fnegri: Catch SQL errors [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/12 (https://phabricator.wikimedia.org/T351637) [09:57:59] (03update) 10fnegri: Replace only views that need updating [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/9 (https://phabricator.wikimedia.org/T351637) [10:09:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:09:44] (03open) 10filippo: utils: add build_images.py [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/88 [10:11:07] FIRING: ToolsDBAlmostFull: ToolsDB is almost full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsDBAlmostFull [10:12:39] (03update) 10filippo: utils: add build_images.py [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/88 [10:13:22] (03open) 10vriaa: fix: use codex link style in translation help text links [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/75 [10:13:36] (03open) 10vriaa: refactor: hide share design button [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/76 [10:19:29] 06cloud-services-team, 10Toolforge: Investigate toolforge failure to schedule pods due to insufficient cpu - https://phabricator.wikimedia.org/T427204#11954411 (10Sebastian_Berlin-WMSE) That's quite possible. I haven't really checked CPU usage. I was just hoping more CPUs means more faster 🙂 I'd guess that the... [10:24:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:28:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [10:33:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [10:34:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:44:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:47:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:49:33] 10Tool-paulina, 13Patch-For-Review: Link to create new item if no results - https://phabricator.wikimedia.org/T370223#11954535 (10Ademola) Added links to Paulina's native add-author and add-work forms in three contexts: - No results page — when a search returns no results, users now see "Not in Wikidata yet? A... [10:57:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:13:52] 10Tool-schedule-deployment, 06Release-Engineering-Team, 06ServiceOps new, 13Patch-For-Review: Extend functionality to support MediaWiki infrastructure Windows and related repos - https://phabricator.wikimedia.org/T385007#11954672 (10jijiki) [12:16:50] 10Tool-centralnotice-banner-editor: Add undo functionality to the editor - https://phabricator.wikimedia.org/T421061#11955015 (10Oyelola_Victoria) 05Open→03Resolved [12:18:35] (03open) 10vriaa: feat: show type select dropdown on focus [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/77 [12:37:54] 06cloud-services-team, 10Toolforge: Investigate toolforge failure to schedule pods due to insufficient cpu - https://phabricator.wikimedia.org/T427204#11955061 (10fgiunchedi) >>! In T427204#11954411, @Sebastian_Berlin-WMSE wrote: > That's quite possible. I haven't really checked CPU usage. I was just hoping mo... [12:46:39] (03CR) 10Filippo Giunchedi: [C:03+1] wmcs_libs: Add class to interact with BIRD [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289911 (owner: 10Majavah) [12:47:07] (03CR) 10Filippo Giunchedi: [C:03+1] wmcs_libs: inventory: Add data for cloudlb hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289912 (owner: 10Majavah) [12:47:48] (03CR) 10Filippo Giunchedi: [C:03+1] wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 (owner: 10Majavah) [12:48:05] 06cloud-services-team, 10Toolforge: Investigate toolforge failure to schedule pods due to insufficient cpu - https://phabricator.wikimedia.org/T427204#11955072 (10Sebastian_Berlin-WMSE) Looks like you linked to the "secret" panel again 🙂 I took a shot at removing the "-rw" at the start and that worked. You're... [12:49:01] (03CR) 10Filippo Giunchedi: [C:03+1] openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [12:49:11] (03update) 10fnegri: Catch SQL errors [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/12 (https://phabricator.wikimedia.org/T351637) [12:49:52] (03CR) 10Filippo Giunchedi: [C:03+1] wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [12:52:55] (03update) 10fnegri: Catch SQL errors [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/12 (https://phabricator.wikimedia.org/T351637) [13:18:31] (03CR) 10Majavah: [C:03+2] wmcs_libs: Add class to interact with BIRD [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289911 (owner: 10Majavah) [13:18:36] (03CR) 10Majavah: [C:03+2] wmcs_libs: inventory: Add data for cloudlb hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289912 (owner: 10Majavah) [13:18:44] (03CR) 10Majavah: [C:03+2] wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 (owner: 10Majavah) [13:18:49] (03CR) 10Majavah: [C:03+2] openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [13:18:54] (03CR) 10Majavah: [C:03+2] wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [13:21:45] (03Merged) 10jenkins-bot: wmcs_libs: Add class to interact with BIRD [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289911 (owner: 10Majavah) [13:22:20] (03Merged) 10jenkins-bot: wmcs_libs: inventory: Add data for cloudlb hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289912 (owner: 10Majavah) [13:22:21] (03Merged) 10jenkins-bot: wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 (owner: 10Majavah) [13:22:39] (03Merged) 10jenkins-bot: openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [13:22:43] (03Merged) 10jenkins-bot: wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [13:24:02] 06cloud-services-team, 10Toolforge: Investigate toolforge failure to schedule pods due to insufficient cpu - https://phabricator.wikimedia.org/T427204#11955220 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi >>! In T427204#11955072, @Sebastian_Berlin-WMSE wrote: > Looks like you linked to the "secret... [14:02:06] 10wikitech.wikimedia.org: Wikitech static is down with a 502 HTTP error (Bad Gateway) - https://phabricator.wikimedia.org/T427081#11955347 (10Andrew) The site is back up; the VM that hosts it had run out of disk space. I am pretty sure that the pipeline issue is the same as T427250 and unrelated to the site hav... [14:05:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:16:07] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)): Exclude link objects from wikimedia-paths-parameter-example-exists - https://phabricator.wikimedia.org/T425920#11955412 (10MGoncalves-WMF) 05Open→03In progress [14:18:50] 10wikitech.wikimedia.org: Wikitech static is down with a 502 HTTP error (Bad Gateway) - https://phabricator.wikimedia.org/T427081#11955438 (10A_smart_kitten) >>! In T427081#11955347, @Andrew wrote: > The site is back up; the VM that hosts it had run out of disk space. I still personally experience the 301 to `1... [14:20:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:40:35] (03PS1) 10Majavah: openstack: Add cookbook to reboot a clouservices node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1293740 (https://phabricator.wikimedia.org/T348841) [14:43:39] 06cloud-services-team, 10Cloud-VPS: sso failure in codfw1dev (labtesthorizon.wikimedia.org) - https://phabricator.wikimedia.org/T409328#11955584 (10Andrew) 05Open→03Resolved a:03Andrew [14:49:28] 10wikitech.wikimedia.org: Wikitech static is down with a 502 HTTP error (Bad Gateway) - https://phabricator.wikimedia.org/T427081#11955615 (10Koavf) >>! In T427081#11955438, @A_smart_kitten wrote: >>>! In T427081#11955347, @Andrew wrote: >> The site is back up; the VM that hosts it had run out of disk space. >... [15:11:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:16:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:33:22] 10wikitech.wikimedia.org: Wikitech static is down with a 502 HTTP error (Bad Gateway) - https://phabricator.wikimedia.org/T427081#11955867 (10Andrew) > I also still get a redirect to localhost. You're right, sorry. There are actually three things going on: 1) The whole VM was down for a while. That produced t... [15:35:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [15:40:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [15:43:17] (03open) 10kharlan: frontend: Add dark mode support [toolforge-repos/microtask-generator] - 10https://gitlab.wikimedia.org/toolforge-repos/microtask-generator/-/merge_requests/5 [15:44:18] (03update) 10kharlan: frontend: Add dark mode support [toolforge-repos/microtask-generator] - 10https://gitlab.wikimedia.org/toolforge-repos/microtask-generator/-/merge_requests/5 [15:48:42] 06cloud-services-team, 10Cloud-VPS, 10Ceph, 06Infrastructure-Foundations, 10SRE-tools: Enhacements to wmcs.ceph.roll_reboot_osds - https://phabricator.wikimedia.org/T427295 (10Andrew) 03NEW [16:00:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.984% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [16:05:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.984% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [16:27:34] (03open) 10komla: rename bastion script [toolforge-repos/komla-apps] - 10https://gitlab.wikimedia.org/toolforge-repos/komla-apps/-/merge_requests/3 [16:27:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:27:38] (03approved) 10komla: rename bastion script [toolforge-repos/komla-apps] - 10https://gitlab.wikimedia.org/toolforge-repos/komla-apps/-/merge_requests/3 [16:27:59] (03merge) 10komla: rename bastion script [toolforge-repos/komla-apps] - 10https://gitlab.wikimedia.org/toolforge-repos/komla-apps/-/merge_requests/3 [16:30:50] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{P:openstack::codfw1dev::nova::compute::service} AND NOT P{F:kernelversion = 6.12.88}' [16:30:50] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=97) on hosts matched by 'P{P:openstack::codfw1dev::nova::compute::service} AND NOT P{F:kernelversion = 6.12.88}' [16:32:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:40:37] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate Puppet CA: metricsinfra-puppetmaster-1.metricsinfra.eqiad1.wikimedia.cloud is about to expire in 18d 23h 58m 10s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [16:42:15] 06cloud-services-team, 10Data-Services, 06Privacy Engineering, 07SecTeam-Processed: filerevision view should not filter out deleted file revisions - https://phabricator.wikimedia.org/T426804#11956304 (10sbassett) >>! In T426804#11941691, @Ladsgroup wrote: > I'd say if #privacy_engineering team is happy, I... [16:51:32] 06cloud-services-team, 10Data-Services, 06Privacy Engineering, 07SecTeam-Processed: filerevision view should not filter out deleted file revisions - https://phabricator.wikimedia.org/T426804#11956359 (10Catrope) What does `fr_deleted=1` mean? It's not super clear to me from reading the code, but it seems t... [17:06:46] 06cloud-services-team, 10Data-Services, 06Privacy Engineering, 07SecTeam-Processed: filerevision view should not filter out deleted file revisions - https://phabricator.wikimedia.org/T426804#11956420 (10Ladsgroup) >>! In T426804#11956359, @Catrope wrote: > What does `fr_deleted=1` mean? It's not super clea... [17:29:03] (03CR) 10Lokal Profil: [C:03+2] simplify post-merge job [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1276870 (https://phabricator.wikimedia.org/T316627) (owner: 10Lokal Profil) [17:34:15] (03Merged) 10jenkins-bot: simplify post-merge job [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1276870 (https://phabricator.wikimedia.org/T316627) (owner: 10Lokal Profil) [17:59:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:04:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:16:09] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_mons [18:16:40] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.roll_reboot_mons (exit_code=97) [18:21:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_mons [18:21:56] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.roll_reboot_mons (exit_code=97) [18:22:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_mons [18:33:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.roll_reboot_mons (exit_code=0) [18:43:46] (03PS1) 10Andrew Bogott: roll_reboot_osds: set maintenance per-osd rather than for the whole run [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1293792 (https://phabricator.wikimedia.org/T427295) [18:46:05] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_mons [18:47:34] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.roll_reboot_mons (exit_code=97) [18:50:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_osds [19:07:55] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: [SPIKE] Identify OAD properties not supported by MediaWiki REST Framework - https://phabricator.wikimedia.org/T425942#11956890 (10AGhirelli-WMF) ##Spike report Ran the Wikimedia Sp... [19:11:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.roll_reboot_osds (exit_code=0) [19:20:53] (03CR) 10Andrew Bogott: "This seems to work fine in codfw1dev" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1293792 (https://phabricator.wikimedia.org/T427295) (owner: 10Andrew Bogott) [19:33:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:35:27] 06cloud-services-team, 10Cloud-VPS, 10Ceph, 06Infrastructure-Foundations, and 2 others: Enhacements to wmcs.ceph.roll_reboot_osds - https://phabricator.wikimedia.org/T427295#11956936 (10Andrew) Part 1 would involve a fair bit of refactoring since we currently use 'ceph node' calls to enumerate osd nodes ra... [19:48:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:27:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:32:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:05:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:10:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:35:53] 06cloud-services-team, 10Cloud-VPS, 10Ceph, 06Infrastructure-Foundations, and 2 others: Enhancements to wmcs.ceph.roll_reboot_osds - https://phabricator.wikimedia.org/T427295#11957363 (10Aklapper) [21:39:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [21:44:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [22:52:58] 10Cloud-Services: Streamline cloud services reboots to minimize admin impact - https://phabricator.wikimedia.org/T427338 (10BLiviero-WMF) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with... [23:46:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:51:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown