[00:00:38] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [00:04:50] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1061 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [00:09:50] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1061 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [00:10:08] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [00:20:18] (03approved) 10raymond-ndibe: jobs-api: use webservice image variants in one-off job tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1115 (https://phabricator.wikimedia.org/T415322) [00:20:24] (03update) 10raymond-ndibe: jobs-api: use webservice image variants in one-off job tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1115 (https://phabricator.wikimedia.org/T415322) [00:21:33] andrew@cloudcumin1001 safe_reboot (PID 92056) is awaiting input [00:21:54] (03merge) 10raymond-ndibe: jobs-api: use webservice image variants in one-off job tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1115 (https://phabricator.wikimedia.org/T415322) [00:21:56] (03update) 10raymond-ndibe: jobs-api: test continuous job publish [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1116 (https://phabricator.wikimedia.org/T388092) [00:22:28] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.494-20260519232221-965db235 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1261 (https://phabricator.wikimedia.org/T415322) (owner: 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce) [00:23:22] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [00:34:13] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [00:34:25] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [00:41:22] (03open) 10komla: add wiki link to emailer [repos/cloud/wmcs/cloud-survey-ops] - 10https://gitlab.wikimedia.org/repos/cloud/wmcs/cloud-survey-ops/-/merge_requests/5 [00:41:23] (03approved) 10komla: add wiki link to emailer [repos/cloud/wmcs/cloud-survey-ops] - 10https://gitlab.wikimedia.org/repos/cloud/wmcs/cloud-survey-ops/-/merge_requests/5 [00:41:26] (03merge) 10komla: add wiki link to emailer [repos/cloud/wmcs/cloud-survey-ops] - 10https://gitlab.wikimedia.org/repos/cloud/wmcs/cloud-survey-ops/-/merge_requests/5 [00:43:46] 10Toolforge, 06tools-platform-team: jobs-api: avoid using "runtime"/"storage" in jobs load response message - https://phabricator.wikimedia.org/T423891#11938827 (10Raymond_Ndibe) 05In progress→03Resolved [00:46:33] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli [00:47:13] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [00:47:28] (03update) 10raymond-ndibe: jobs-api: bump to 0.0.494-20260519232221-965db235 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1261 (https://phabricator.wikimedia.org/T415322) (owner: 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce) [00:47:29] (03approved) 10raymond-ndibe: jobs-api: bump to 0.0.494-20260519232221-965db235 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1261 (https://phabricator.wikimedia.org/T415322) (owner: 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce) [00:47:39] (03merge) 10raymond-ndibe: jobs-api: bump to 0.0.494-20260519232221-965db235 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1261 (https://phabricator.wikimedia.org/T415322) (owner: 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce) [00:47:43] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli [00:56:40] 10Toolforge, 06tools-platform-team: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#11938849 (10Raymond_Ndibe) ` [Cloud-announce] Jobs created via k8s CLIs/APIs will stop showing in toolforge jobs after June 20 The Toolforge Jobs Fram... [00:57:18] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli [00:58:50] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli [00:59:06] (03approved) 10raymond-ndibe: d/changelog: bump to 0.103.22 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/106 (https://phabricator.wikimedia.org/T423005) [00:59:07] (03update) 10raymond-ndibe: d/changelog: bump to 0.103.22 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/106 (https://phabricator.wikimedia.org/T423005) [01:02:19] (03merge) 10raymond-ndibe: d/changelog: bump to 0.103.22 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/106 (https://phabricator.wikimedia.org/T423005) [01:24:16] andrew@cloudcumin1001 safe_reboot (PID 92056) is awaiting input [01:26:18] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [01:33:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [01:37:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:38:19] 10Toolforge, 06tools-platform-team: webservice start should give a better error message when a conflicting job exists - https://phabricator.wikimedia.org/T423005#11938902 (10Raymond_Ndibe) 05Open→03Resolved [01:38:58] 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [jobs-api] Use the same images as webservice - https://phabricator.wikimedia.org/T415322#11938904 (10Raymond_Ndibe) 05In progress→03Resolved [01:47:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:48:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1063 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [01:53:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1063 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [02:08:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1064 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [02:13:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1064 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [03:01:04] 06cloud-services-team, 10Data-Services: filerevision view should not filter out deleted file revisions - https://phabricator.wikimedia.org/T426804 (10JJMC89) 03NEW [03:11:51] 06cloud-services-team, 10Data-Services: filerevision view should not filter out deleted file revisions - https://phabricator.wikimedia.org/T426804#11938967 (10JJMC89) [03:11:53] 06cloud-services-team, 10Data-Services, 13Patch-For-Review: Drop old image tables from wikireplicas - https://phabricator.wikimedia.org/T425191#11938968 (10JJMC89) [03:15:48] (03PS1) 10ArielGlenn: Include the mobile app repos by default when MW and services is searched [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1289462 (https://phabricator.wikimedia.org/T426627) [03:46:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:56:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:56:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1071 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [04:01:49] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1070 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [07:05:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:06:10] (03open) 10filippo: k8s: adjust default memory requests to 64Mb [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/107 (https://phabricator.wikimedia.org/T420565) [07:15:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:41:47] (03merge) 10filippo: k8s: adjust default memory requests to 64Mb [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/107 (https://phabricator.wikimedia.org/T420565) [07:45:22] (03open) 10filippo: d/changelog: bump to 0.103.23 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/108 (https://phabricator.wikimedia.org/T420565) [07:53:01] 06cloud-services-team (Hardware), 10Cloud-VPS: wmcs codfw hardware changes proposal - https://phabricator.wikimedia.org/T377568#11939368 (10fgiunchedi) >>! In T377568#11931844, @Andrew wrote: > After some discussion today, I propose that we just switch off and decom cloudnet200[78]-dev. SGTM [08:11:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:11:54] (03open) 10samwilson: Update GitHub URLs to GitLab ones [toolforge-repos/ocr] - 10https://gitlab.wikimedia.org/toolforge-repos/ocr/-/merge_requests/8 (https://phabricator.wikimedia.org/T420317) [08:16:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:18:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:23:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:24:07] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve and standardize linter messages - https://phabricator.wikimedia.org/T426821 (10KBach) 03NEW [08:24:49] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve and standardize linter messages - https://phabricator.wikimedia.org/T426821#11939433 (10KBach) 05Open→03In progress p:05Triage→03Medium [08:26:10] !log filippo@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli [08:26:11] !log filippo@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component webservice-cli [08:26:22] !log filippo@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli [08:27:38] !log filippo@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component webservice-cli [08:36:33] 10Tool-wikimedia-attribution: Explore analytics options for the Wikimedia Attribution Framework site - https://phabricator.wikimedia.org/T426738#11939472 (10Sarai-WMF) a:03Sarai-WMF [08:36:54] 10Tool-wikimedia-attribution: Explore analytics options for the Wikimedia Attribution Framework site - https://phabricator.wikimedia.org/T426738#11939473 (10Sarai-WMF) [08:41:59] 10Tool-wikimedia-attribution: Explore analytics options for the Wikimedia Attribution Framework site - https://phabricator.wikimedia.org/T426738#11939494 (10Sarai-WMF) Status update: We validated, with help from the Data platform team, that the Attribution Framework site can be added to Wikimedia's Matomo instan... [08:47:12] (03close) 10filippo: d/changelog: bump to 0.103.23 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/108 (https://phabricator.wikimedia.org/T420565) [08:51:34] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Fix linter rule duplicates - https://phabricator.wikimedia.org/T426825 (10KBach) 03NEW [08:52:04] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Fix linter rule duplicates - https://phabricator.wikimedia.org/T426825#11939536 (10KBach) p:05Triage→03Low [08:56:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:58:53] (03open) 10filippo: d/changelog: bump to 0.103.23 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/109 (https://phabricator.wikimedia.org/T420565) [09:08:54] 06cloud-services-team, 06collaboration-services, 10GitLab (CI & Job Runners): webservice-cli package deb gitlab CI job went from 9 minutes to 27 minutes - https://phabricator.wikimedia.org/T426827 (10fgiunchedi) 03NEW [09:15:08] 10Toolforge, 06tools-platform-team: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#11939627 (10fnegri) @Raymond_Ndibe the draft looks good, I'm ok with sending it today. A few minor fixes: * with toolforge -> with Toolforge * toolforg... [09:16:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:40:22] (03merge) 10samwilson: Update GitHub URLs to GitLab ones [toolforge-repos/ocr] - 10https://gitlab.wikimedia.org/toolforge-repos/ocr/-/merge_requests/8 (https://phabricator.wikimedia.org/T420317) [09:55:31] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11939752 (10fnegri) I'm reimaging clouddb1015 to trixie today, sorry for the delay. [09:55:52] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11939754 (10fnegri) 05Open→03In progress [10:03:07] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11939797 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1003 for host clouddb... [10:04:11] 06cloud-services-team, 10Data-Services, 06tools-platform-team: [wikireplicas] Upgrade clouddbs to 10.11.16 - https://phabricator.wikimedia.org/T422527#11939798 (10fnegri) [10:27:30] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11939876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1003 for host clouddb1015... [10:27:34] FIRING: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:27:45] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [10:28:41] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11939877 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1003 for host clouddb... [10:32:34] RESOLVED: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:32:45] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [10:37:32] FIRING: TargetDown: Job flask is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:40:45] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [10:42:32] FIRING: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:45:26] (03PS1) 10Majavah: wmcs_libs: Add class to interact with BIRD [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289911 [10:45:26] (03PS1) 10Majavah: wmcs_libs: inventory: Add data for cloudlb hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289912 [10:45:26] (03PS1) 10Majavah: wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 [10:45:27] (03PS1) 10Majavah: openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) [10:46:08] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudlb.safe_reboot on hosts matched by 'A:cloudlb AND A:CODFW' [10:46:08] !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudlb.safe_reboot (exit_code=99) on hosts matched by 'A:cloudlb AND A:CODFW' [10:46:46] (03PS2) 10Majavah: wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 [10:46:46] (03PS2) 10Majavah: openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) [10:46:54] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudlb.safe_reboot on hosts matched by 'A:cloudlb AND A:codfw' [10:47:32] RESOLVED: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:49:43] (03CR) 10CI reject: [V:04-1] wmcs_libs: inventory: Add data for cloudlb hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289912 (owner: 10Majavah) [10:50:12] (03CR) 10CI reject: [V:04-1] wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 (owner: 10Majavah) [10:50:20] (03CR) 10CI reject: [V:04-1] wmcs_libs: Add class to interact with BIRD [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289911 (owner: 10Majavah) [10:50:36] (03CR) 10CI reject: [V:04-1] wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 (owner: 10Majavah) [10:50:45] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [10:50:49] (03CR) 10CI reject: [V:04-1] openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [10:53:40] !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudlb.safe_reboot (exit_code=99) on hosts matched by 'A:cloudlb AND A:codfw' [10:56:00] (03PS2) 10Majavah: wmcs_libs: Add class to interact with BIRD [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289911 [10:56:00] (03PS2) 10Majavah: wmcs_libs: inventory: Add data for cloudlb hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289912 [10:56:01] (03PS3) 10Majavah: wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 [10:56:01] (03PS3) 10Majavah: openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) [10:56:13] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudlb.safe_reboot on hosts matched by 'A:cloudlb AND A:codfw' [10:57:05] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Fix property example validation - https://phabricator.wikimedia.org/T426836 (10KBach) 03NEW [10:57:09] (03update) 10fnegri: loki.py: fix logs ordering [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/18 (https://phabricator.wikimedia.org/T401552) (owner: 10raymond-ndibe) [10:57:10] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Fix property example validation - https://phabricator.wikimedia.org/T426836#11939961 (10KBach) p:05Triage→03Medium [10:59:25] (03CR) 10CI reject: [V:04-1] wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 (owner: 10Majavah) [10:59:30] (03CR) 10CI reject: [V:04-1] wmcs_libs: inventory: Add data for cloudlb hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289912 (owner: 10Majavah) [10:59:47] (03CR) 10CI reject: [V:04-1] wmcs_libs: Add class to interact with BIRD [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289911 (owner: 10Majavah) [11:00:11] (03CR) 10CI reject: [V:04-1] openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [11:02:01] 06cloud-services-team, 10Cloud-VPS: anycast-healthchecker fails to start on boot - https://phabricator.wikimedia.org/T426837 (10taavi) 03NEW [11:02:30] (03approved) 10fnegri: loki.py: fix logs ordering [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/18 (https://phabricator.wikimedia.org/T401552) (owner: 10raymond-ndibe) [11:02:31] (03update) 10fnegri: loki.py: fix logs ordering [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/18 (https://phabricator.wikimedia.org/T401552) (owner: 10raymond-ndibe) [11:02:55] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Fix property example validation - https://phabricator.wikimedia.org/T426836#11939995 (10KBach) [11:05:32] FIRING: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [11:05:43] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [11:09:56] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940016 (10fnegri) Reimage completed, Mariadb is running, but puppet is failing with: ` E: Unable to locate package... [11:10:18] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Fix property example validation - https://phabricator.wikimedia.org/T426836#11940017 (10KBach) [11:10:32] RESOLVED: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [11:10:43] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [11:12:37] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940026 (10fnegri) ` fnegri@apt1002:~$ sudo -i reprepro copy trixie-wikimedia bookworm-wikimedia wmf-pt-kill ` [11:15:40] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudlb.safe_reboot (exit_code=0) on hosts matched by 'A:cloudlb AND A:codfw' [11:15:50] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940095 (10fnegri) There's a broken dependency: ` The following packages have unmet dependencies: wmf-pt-kill : De... [11:16:25] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudlb.safe_reboot on hosts matched by 'A:cloudlb AND A:eqiad' [11:16:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:18:17] FIRING: JobUnavailable: Reduced availability for job cloudlb-haproxy in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:21:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:21:45] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940114 (10jcrespo) Percona Toolkit, which this package is a fork of, doesn't depend on that for trixie: https://pac... [11:21:57] (03PS3) 10Majavah: wmcs_libs: Add class to interact with BIRD [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289911 [11:21:57] (03PS3) 10Majavah: wmcs_libs: inventory: Add data for cloudlb hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289912 [11:21:57] (03PS4) 10Majavah: wmcs_libs: Add batch class for interacting with cloudlb nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289913 [11:21:57] (03PS4) 10Majavah: openstack: Add cookbook to reboot a cloudlb node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289914 (https://phabricator.wikimedia.org/T348841) [11:27:32] FIRING: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [11:27:43] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [11:28:07] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudlb.safe_reboot (exit_code=0) on hosts matched by 'A:cloudlb AND A:eqiad' [11:28:17] RESOLVED: JobUnavailable: Reduced availability for job cloudlb-haproxy in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:29:05] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940128 (10fnegri) I found the source at https://gerrit.wikimedia.org/r/q/project:operations/debs/wmf-pt-kill but I... [11:32:32] RESOLVED: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [11:32:43] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [11:36:33] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940136 (10Marostegui) I will take a look [11:39:36] (03update) 10filippo: d/changelog: bump to 0.103.23 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/109 (https://phabricator.wikimedia.org/T420565) [11:40:11] (03update) 10filippo: d/changelog: bump to 0.103.24 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/109 (https://phabricator.wikimedia.org/T420565) [11:48:22] (03PS1) 10Majavah: wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 [11:49:33] (03PS2) 10Majavah: wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 [11:49:48] (03PS3) 10Majavah: wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 (https://phabricator.wikimedia.org/T348841) [11:49:53] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudlb.safe_reboot on hosts matched by 'D{cloudlb2002-dev.codfw.wmnet}' [11:52:31] 10Cloud-VPS (Quota-requests), 10WMIT-Infrastructure: Quota increase request for project osmit - https://phabricator.wikimedia.org/T426790#11940251 (10Danysan1) [11:53:31] (03CR) 10CI reject: [V:04-1] wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [11:54:32] FIRING: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [11:54:43] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [11:55:15] 06cloud-services-team, 10Toolforge, 06tools-platform-team, 13Patch-For-Review: Audit tools memory requests vs actual usage - https://phabricator.wikimedia.org/T420565#11940263 (10fgiunchedi) Status update: I tried deploying the memory request 64mb change though toolsbeta said no due to limitrange ` toolsb... [11:56:12] (03PS4) 10Majavah: wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 (https://phabricator.wikimedia.org/T348841) [11:56:26] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudlb.safe_reboot (exit_code=0) on hosts matched by 'D{cloudlb2002-dev.codfw.wmnet}' [11:57:20] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Alert, 07Cloud-Services-Worktype-Unplanned, 13Patch-For-Review: [wmcs-cookbooks] add a cookbook to reboot a cloudservices/cloudlb host - https://phabricator.wikimedia.org/T348841#11940266 (10taavi) a:03taavi [11:59:32] FIRING: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [12:00:11] (03CR) 10CI reject: [V:04-1] wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 (https://phabricator.wikimedia.org/T348841) (owner: 10Majavah) [12:02:48] (03PS5) 10Majavah: wmcs_libs: bird: Downtime BGP alerts when needed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289934 (https://phabricator.wikimedia.org/T348841) [12:04:32] RESOLVED: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [12:04:43] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [12:13:23] (03update) 10filippo: Lower LimitRange for cpu/memory requests [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/88 (https://phabricator.wikimedia.org/T420565) [12:13:57] (03open) 10filippo: Lower LimitRange for cpu/memory requests [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/88 (https://phabricator.wikimedia.org/T420565) [12:34:22] (03merge) 10filippo: d/changelog: bump to 0.103.24 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/109 (https://phabricator.wikimedia.org/T420565) [12:36:10] (03update) 10filippo: Lower LimitRange for cpu/memory requests [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/88 (https://phabricator.wikimedia.org/T420565) [12:36:25] (03update) 10filippo: Lower LimitRange for cpu/memory requests [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/88 (https://phabricator.wikimedia.org/T420565) [12:37:57] FIRING: ProbeDown: Service dumps-https:443 has failed probes (http_dumps-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#dumps-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:40:05] PROBLEM - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/nfs/dumps - 324 bytes in 60.007 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [12:41:14] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [12:41:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [12:42:31] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [12:42:57] RESOLVED: ProbeDown: Service dumps-https:443 has failed probes (http_dumps-https_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#dumps-https:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:43:09] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [12:43:45] RECOVERY - toolschecker: Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 32.730 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [12:48:38] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve and standardize linter messages - https://phabricator.wikimedia.org/T426821#11940524 (10KBach) [12:49:26] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: anycast-healthchecker fails to start on boot - https://phabricator.wikimedia.org/T426837#11940528 (10fgiunchedi) Prior art at {T314457} FWIW over time my opinion has changed: services that need to be up at reboot and serve traffic from addresses advert... [12:53:32] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940562 (10fnegri) `wikireplicas-utils` was also missing in trixie, in this case a simple copy worked: `lang=shell-... [12:55:59] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940569 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1003 for host clouddb1015... [12:56:00] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1003 for host clouddb1015... [13:05:35] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:06:52] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940620 (10fnegri) The cookbook did actually PASS, but an exception was raised while writing the PASS comment (that... [13:17:30] FIRING: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [13:17:41] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [13:19:38] 10Cloud-VPS (Quota-requests), 10WMIT-Infrastructure: Quota increase request for project osmit - https://phabricator.wikimedia.org/T426790#11940705 (10taavi) I'm not finding the initial project creation task, so asking here instead. How are these services relevant to the Wikimedia projects? [13:22:32] RESOLVED: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [13:22:43] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [13:23:51] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [13:24:50] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1076 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [13:25:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:29:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1076 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [13:40:34] FIRING: TargetDown: Job flask is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [13:41:43] 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940898 (10fnegri) 05In progress→03Resolved clouddb1015 is running on Trixie and repooled. [13:42:00] 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb host with Debian Trixie - https://phabricator.wikimedia.org/T415165#11940904 (10fnegri) [13:42:22] 06cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#11940907 (10Marostegui) [13:43:18] (03update) 10fnegri: Lower LimitRange for cpu/memory requests [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/88 (https://phabricator.wikimedia.org/T420565) (owner: 10filippo) [13:43:19] (03approved) 10fnegri: Lower LimitRange for cpu/memory requests [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/88 (https://phabricator.wikimedia.org/T420565) (owner: 10filippo) [13:45:34] RESOLVED: TargetDown: Job flask is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [13:47:32] FIRING: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [13:47:43] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [13:52:32] RESOLVED: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [13:52:43] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [13:59:00] 10Cloud-VPS (Quota-requests), 10WMIT-Infrastructure: Quota increase request for project osmit - https://phabricator.wikimedia.org/T426790#11941013 (10Aklapper) > I'm not finding the initial project creation task I'd assume that was https://wikitech.wikimedia.org/wiki/Obsolete:New_Project_Request/osmit_or_osmi... [14:36:19] 06cloud-services-team, 10PAWS, 06tools-platform-team: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T426546#11941165 (10aputhin) [14:37:32] 06cloud-services-team, 10Toolforge, 06tools-platform-team: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T426547#11941176 (10aputhin) [14:38:41] 06cloud-services-team, 10Toolforge: [toolsbeta] probe flapping on ipv6 only - https://phabricator.wikimedia.org/T426584#11941181 (10aputhin) p:05Triage→03Low [14:44:00] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Tools may not allow non-interactive commands via 'become' due to dotfile configuration - https://phabricator.wikimedia.org/T426378#11941225 (10aputhin) p:05Triage→03Low We can take a look at a better fix for this edge case, but it doesn't seem li... [14:47:48] 06cloud-services-team, 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [builds-api] expose supported versions - https://phabricator.wikimedia.org/T422046#11941249 (10aputhin) p:05Triage→03Low [15:20:35] (03open) 10raymond-ndibe: status: add StatusShort enum and basic job status detection [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/299 (https://phabricator.wikimedia.org/T401172) [15:21:16] (03update) 10raymond-ndibe: status: add StatusShort enum and basic job status detection [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/299 (https://phabricator.wikimedia.org/T401172) [15:24:15] (03update) 10raymond-ndibe: status: add StatusShort enum and basic job status detection [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/299 (https://phabricator.wikimedia.org/T401172) [15:26:09] (03close) 10raymond-ndibe: [status] make job status an enum, with clearly defined states [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/208 (https://phabricator.wikimedia.org/T401172) [15:30:06] (03open) 10countcount: Remove top-level wiki URL mappings from responses [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/1 [15:54:43] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: anycast-healthchecker fails to start on boot - https://phabricator.wikimedia.org/T426837#11941565 (10taavi) a:03taavi [15:59:07] 10Toolforge, 06tools-platform-team: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#11941587 (10Raymond_Ndibe) [16:11:50] 10VPS-project-Phabricator, 06collaboration-services, 06Infrastructure-Foundations, 10Mail: @wikimedia.org email addresses don't seem to be receiving emails sent by the test Phabricator instance - https://phabricator.wikimedia.org/T422559#11941650 (10A_smart_kitten) >>! In T422559#11938383, @jhathaway wrote... [16:21:21] 06cloud-services-team, 10Data-Services: filerevision view should not filter out deleted file revisions - https://phabricator.wikimedia.org/T426804#11941691 (10Ladsgroup) I'd say if #privacy_engineering team is happy, I have no objections. [16:21:43] 06cloud-services-team, 10Data-Services, 06Privacy Engineering: filerevision view should not filter out deleted file revisions - https://phabricator.wikimedia.org/T426804#11941692 (10Ladsgroup) [16:40:35] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate Puppet CA: metricsinfra-puppetmaster-1.metricsinfra.eqiad1.wikimedia.cloud is about to expire in 24d 23h 58m 10s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [16:44:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:46:37] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: [SPIKE] Identify OAD properties not supported by MediaWiki REST Framework - https://phabricator.wikimedia.org/T425942#11941804 (10AGhirelli-WMF) a:03AGhirelli-WMF [16:49:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:13:02] (03update) 10danyya: Draft: Resolve T422343 "Integrate Denelezh and merge stats tables" [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/56 [19:05:17] 10VPS-project-Phabricator, 06collaboration-services, 06Infrastructure-Foundations, 10Mail: @wikimedia.org email addresses don't seem to be receiving emails sent by the test Phabricator instance - https://phabricator.wikimedia.org/T422559#11942353 (10Dzahn) Thanks for the root cause @jhathaway! Well, I gu... [19:34:26] (03open) 10countcount: Add GitLab CI validation [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/2 [19:36:27] (03update) 10countcount: Add GitLab CI validation [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/2 [19:37:38] (03update) 10countcount: Add GitLab CI validation [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/2 [19:39:31] (03update) 10countcount: Add GitLab CI validation [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/2 [19:42:00] (03update) 10countcount: Add GitLab CI validation [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/2 [19:43:05] (03update) 10countcount: Add GitLab CI validation [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/2 [19:43:12] (03merge) 10countcount: Add GitLab CI validation [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/2 [19:43:37] (03update) 10countcount: Remove top-level wiki URL mappings from responses [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/1 [19:53:13] (03merge) 10countcount: Remove top-level wiki URL mappings from responses [toolforge-repos/multiuserinfo] - 10https://gitlab.wikimedia.org/toolforge-repos/multiuserinfo/-/merge_requests/1 [20:29:36] 10tool-wdlocator: No data from wikidata is loaded - https://phabricator.wikimedia.org/T426903#11942605 (10Strubbl) [20:57:25] 10tool-wdlocator: No data from wikidata is loaded - https://phabricator.wikimedia.org/T426903#11942696 (10Strubbl) 05Open→03Resolved a:03Strubbl It works again. [21:45:20] 10VPS-project-Phabricator, 06collaboration-services, 06Infrastructure-Foundations, 10Mail: @wikimedia.org email addresses don't seem to be receiving emails sent by the test Phabricator instance - https://phabricator.wikimedia.org/T422559#11942880 (10A_smart_kitten) >>! In T422559#11942353, @Dzahn wrote: >... [22:48:52] (03PS2) 10Krinkle: write_config: Include mobile apps in preset for MediaWiki & services [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1289462 (https://phabricator.wikimedia.org/T426627) (owner: 10ArielGlenn) [22:49:00] (03CR) 10Krinkle: [C:03+2] write_config: Include mobile apps in preset for MediaWiki & services [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1289462 (https://phabricator.wikimedia.org/T426627) (owner: 10ArielGlenn) [22:50:05] (03Merged) 10jenkins-bot: write_config: Include mobile apps in preset for MediaWiki & services [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1289462 (https://phabricator.wikimedia.org/T426627) (owner: 10ArielGlenn) [23:18:41] 10VPS-project-Codesearch: Allow searching for plain text in CodeSearch - https://phabricator.wikimedia.org/T381325#11943202 (10SomeRandomDeveloper) https://gerrit.wikimedia.org/r/c/labs/codesearch/+/1267343 will allow implementing this option. [23:28:56] (03update) 10raymond-ndibe: tests: fix wrong test data [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/290 (https://phabricator.wikimedia.org/T423880) [23:33:55] (03approved) 10raymond-ndibe: loki.py: fix logs ordering [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/18 (https://phabricator.wikimedia.org/T401552) [23:34:01] (03merge) 10raymond-ndibe: loki.py: fix logs ordering [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/18 (https://phabricator.wikimedia.org/T401552) [23:35:27] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logs-api [23:35:31] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logs-api [23:41:38] (03open) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: logs-api: bump to 0.0.24-20260520233418-6d096df8 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1262 (https://phabricator.wikimedia.org/T401552) [23:42:05] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logs-api [23:43:28] (03update) 10raymond-ndibe: core.py: skip one-off jobs when updating storage [repos/cloud/toolforge/jobs-api] (fix_data_handling_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/288 (https://phabricator.wikimedia.org/T423544) [23:43:41] (03update) 10raymond-ndibe: core.py: skip one-off jobs when updating storage [repos/cloud/toolforge/jobs-api] (fix_data_handling_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/288 (https://phabricator.wikimedia.org/T423544) [23:47:24] (03update) 10raymond-ndibe: fix jobs load bug [repos/cloud/toolforge/jobs-api] (fix_oneoff_storage_error) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/287 (https://phabricator.wikimedia.org/T423544 https://phabricator.wikimedia.org/T423891) [23:50:14] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api [23:54:00] !log raymond-ndibe@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component logs-api [23:55:02] (03update) 10raymond-ndibe: tests: fix wrong test data [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/290 (https://phabricator.wikimedia.org/T423880) [23:56:02] (03update) 10raymond-ndibe: core.py: skip one-off jobs when updating storage [repos/cloud/toolforge/jobs-api] (fix_data_handling_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/288 (https://phabricator.wikimedia.org/T423544) [23:57:06] (03update) 10raymond-ndibe: fix jobs load bug [repos/cloud/toolforge/jobs-api] (fix_oneoff_storage_error) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/287 (https://phabricator.wikimedia.org/T423544)