[00:11:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_osds (T426563) [00:11:41] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.roll_reboot_osds (exit_code=99) (T426563) [00:11:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.roll_reboot_osds (T426563) [00:14:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:17:53] !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/26068076424 (https://github.com/cluebotng/component-configs/commits/f7db7f6fff0d4d6dd451b5f92e75ba755a74129c) [00:17:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [00:18:30] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/26068076396 (https://github.com/cluebotng/component-configs/commits/f7db7f6fff0d4d6dd451b5f92e75ba755a74129c) [00:18:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [00:19:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:34:27] 06cloud-services-team, 10Toolforge: Account disabled - https://phabricator.wikimedia.org/T426544#11934066 (10Hawkeye7) The tool is milhistbot-stubs running against the English language Wikipedia. The Bot runs under the Bot account AussieBot, and the user and password have been verified as valid. It always log... [01:01:45] (03PS2) 10Andrew Bogott: inventory: replace cloudcephmon2004-dev with cloudcephmon2007-dev [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1288982 [01:05:43] (03CR) 10CI reject: [V:04-1] inventory: replace cloudcephmon2004-dev with cloudcephmon2007-dev [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1288982 (owner: 10Andrew Bogott) [01:16:18] 06cloud-services-team, 10Toolforge: Account disabled - https://phabricator.wikimedia.org/T426544#11934106 (10Reedy) >>! In T426544#11934066, @Hawkeye7 wrote: > The tool is milhistbot-stubs running against the English language Wikipedia. The Bot runs under the Bot account AussieBot, and the user and password ha... [01:22:37] (03CR) 10Ladsgroup: [C:03+2] "Tested it locally, it looks good." [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1230251 (owner: 10Neriah) [01:24:07] (03Merged) 10jenkins-bot: frontend: Add dark mode support via prefers-color-scheme [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1230251 (owner: 10Neriah) [01:33:51] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/26070598819 (https://github.com/cluebotng/component-configs/commits/c5378a5646858d02f07ababdb2db2955000ab187) [01:33:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [01:47:37] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/26071084518 (https://github.com/cluebotng/component-configs/commits/6831fe17857c84db1afb1023198b7b11d444f69f) [01:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [02:08:47] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/26071765151 (https://github.com/cluebotng/component-configs/commits/5184d69b34b8d2442df1fbbd652535f5890b4804) [02:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [02:28:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.reboot_node [02:28:30] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.reboot_node (exit_code=99) [02:28:57] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.roll_reboot_osds (exit_code=0) (T426563) [02:37:26] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/26072649347 (https://github.com/cluebotng/component-configs/commits/94b7da488f5c23df52fa2e750438a3a80f589f3c) [02:37:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [02:40:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [03:26:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:31:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:57:58] 06cloud-services-team: Automate cluster reboots with cookbooks - https://phabricator.wikimedia.org/T426727 (10Volans) 03NEW [09:44:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [09:45:29] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [09:53:56] PROBLEM - Host cloudvirt1040 is DOWN: PING CRITICAL - Packet loss = 100% [09:55:15] !log volans@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1040.eqiad.wmnet}' [09:55:26] RECOVERY - Host cloudvirt1040 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [09:56:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1040 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [10:01:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1040 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [10:06:52] !log volans@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1041.eqiad.wmnet}' [10:20:00] PROBLEM - Host cloudvirt1041 is DOWN: PING CRITICAL - Packet loss = 100% [10:21:26] RECOVERY - Host cloudvirt1041 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [10:21:27] !log volans@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1041.eqiad.wmnet}' [10:22:40] !log volans@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1042.eqiad.wmnet}' [10:41:54] PROBLEM - Host cloudvirt1042 is DOWN: PING CRITICAL - Packet loss = 100% [10:43:04] !log volans@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1042.eqiad.wmnet}' [10:43:26] RECOVERY - Host cloudvirt1042 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [10:50:44] 10Tool-wikimedia-attribution: Explore analytics options for the Wikimedia Attribution Framework site - https://phabricator.wikimedia.org/T426738 (10Sarai-WMF) 03NEW [10:54:34] 10Tool-wikimedia-attribution: Explore analytics options for the Wikimedia Attribution Framework site - https://phabricator.wikimedia.org/T426738#11935580 (10Sarai-WMF) [10:57:17] !log volans@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt1043.eqiad.wmnet}' [11:00:41] FIRING: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:10:38] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 07OKR-Work: [SPIKE] Identify OAD properties not supported by MediaWiki REST Framework - https://phabricator.wikimedia.org/T425942#11935667 (10KBach) p:05Medium→03High [11:11:01] (03merge) 10taavi: istio-gateway: Make gateway node binding a hard rule [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1259 (https://phabricator.wikimedia.org/T426321) [11:11:08] 10Cloud-VPS, 06tools-infrastructure-team: Web proxy DNS A record failure when creating development-metrics.wmcloud.org - https://phabricator.wikimedia.org/T426675#11935668 (10taavi) Most likely you tried to resolve that name just before the DNS entries were provisioned, and were hit by NXDOMAIN caching? [11:11:36] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway [11:11:48] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway [11:11:50] PROBLEM - Host cloudvirt1043 is DOWN: PING CRITICAL - Packet loss = 100% [11:11:51] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: [SPIKE] Identify OAD properties not supported by MediaWiki REST Framework - https://phabricator.wikimedia.org/T425942#11935671 (10KBach) [11:13:14] !log volans@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt1043.eqiad.wmnet}' [11:13:26] RECOVERY - Host cloudvirt1043 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [11:14:20] (03open) 10taavi: istio-gateway: Fix affinity syntax [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1260 [11:14:23] (03update) 10taavi: istio-gateway: Fix affinity syntax [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1260 [11:14:36] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway [11:14:48] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway [11:15:44] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway [11:15:44] (03update) 10taavi: istio-gateway: Fix affinity syntax [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1260 [11:15:55] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway [11:16:24] (03merge) 10taavi: istio-gateway: Fix affinity syntax [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1260 [11:16:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [11:17:06] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway [11:17:19] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway [11:17:52] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [istio-gateway] Deploying the component can cause an outage - https://phabricator.wikimedia.org/T426321#11935711 (10taavi) 05Open→03Resolved [11:21:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [11:25:17] FIRING: JobUnavailable: Reduced availability for job blackbox_https in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:25:41] RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:25:56] FIRING: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:26:11] RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:30:11] FIRING: [2x] PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:30:17] RESOLVED: JobUnavailable: Reduced availability for job blackbox_https in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [11:41:15] 06cloud-services-team, 10Toolforge: [toolsbeta] probe flapping on ipv6 only - https://phabricator.wikimedia.org/T426584#11935794 (10taavi) HAProxy has different sockets for listening on v4 and v6, so at first I thought this was going to be something weird with the queue on that socket. However, it doesn't even... [11:48:52] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudweb.safe_reboot on hosts matched by 'P{O:wmcs::openstack::codfw1dev::cloudweb}' [11:51:48] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudweb.safe_reboot (exit_code=0) on hosts matched by 'P{O:wmcs::openstack::codfw1dev::cloudweb}' [11:52:44] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudweb.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::cloudweb}' [11:55:11] RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:55:56] FIRING: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:56:11] RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [11:57:12] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudweb.safe_reboot (exit_code=0) on hosts matched by 'P{O:wmcs::openstack::eqiad1::cloudweb}' [11:58:11] FIRING: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [12:00:56] FIRING: [2x] PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [12:02:31] (03PS1) 10Majavah: inventory: Refresh codfw1dev cloudgw hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289323 [12:03:42] (03CR) 10Majavah: [C:03+2] inventory: Refresh codfw1dev cloudgw hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289323 (owner: 10Majavah) [12:08:32] (03Merged) 10jenkins-bot: inventory: Refresh codfw1dev cloudgw hosts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289323 (owner: 10Majavah) [12:23:11] RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [12:29:41] FIRING: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [12:30:56] FIRING: [2x] PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [12:42:25] (03PS1) 10Majavah: wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 [12:42:25] (03PS1) 10Majavah: openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 [12:45:31] (03CR) 10CI reject: [V:04-1] wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 (owner: 10Majavah) [12:45:44] (03CR) 10CI reject: [V:04-1] openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 (owner: 10Majavah) [12:46:29] (03PS2) 10Majavah: wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 [12:46:29] (03PS2) 10Majavah: openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 [12:50:11] (03update) 10fnegri: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [12:50:22] (03CR) 10CI reject: [V:04-1] openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 (owner: 10Majavah) [12:50:27] (03CR) 10CI reject: [V:04-1] wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 (owner: 10Majavah) [12:52:03] (03PS3) 10Majavah: wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 [12:52:03] (03PS3) 10Majavah: openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 [12:54:41] RESOLVED: PrometheusRestarted: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=codfw%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [12:56:39] (03CR) 10CI reject: [V:04-1] openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 (owner: 10Majavah) [12:57:51] (03PS4) 10Majavah: wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 [12:57:51] (03PS4) 10Majavah: openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 [13:11:19] (03CR) 10Filippo Giunchedi: [C:03+1] wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 (owner: 10Majavah) [13:12:26] (03CR) 10Filippo Giunchedi: [C:03+1] openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 (owner: 10Majavah) [13:13:02] (03CR) 10Majavah: [C:03+2] wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 (owner: 10Majavah) [13:13:46] (03CR) 10Majavah: [C:03+2] openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 (owner: 10Majavah) [13:16:28] (03Merged) 10jenkins-bot: wmcs_libs: Move version validation to common [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289334 (owner: 10Majavah) [13:17:15] (03Merged) 10jenkins-bot: openstack: cloudvirt: Add option to exclude upgraded Kernel versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289335 (owner: 10Majavah) [13:19:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:24:17] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [13:24:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:24:46] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [13:42:07] 10Tool-wikimedia-attribution: Define attribution recommendations for Kahoot! - https://phabricator.wikimedia.org/T423571#11936350 (10Sarai-WMF) a:03Pginer-WMF [13:44:18] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: Fix issue line highlighting in the linter - https://phabricator.wikimedia.org/T425935#11936356 (10HCoplin-WMF) [13:50:12] 06cloud-services-team, 10Openstack-Magnum: Investigate new Magnum drivers - https://phabricator.wikimedia.org/T393782#11936365 (10Andrew) >>! In T393782#11933819, @bd808 wrote: > > That should make testing and switching mush easier. Yay upstream! :) Yeah! I just flip-flopped between the two drivers in codfw1... [14:10:26] 10Tool-wikimedia-attribution: Document referrer recommendations for Participation CTA - https://phabricator.wikimedia.org/T423567#11936439 (10Sarai-WMF) [14:10:33] 10Tool-wikimedia-attribution: Document referrer recommendations for Participation CTA - https://phabricator.wikimedia.org/T423567#11936441 (10Sarai-WMF) 05Open→03Resolved [14:28:00] 10Tool-wikimedia-attribution: [WE5.3.1c] Publish attribution guidelines for the Social media reuse scenario - https://phabricator.wikimedia.org/T416993#11936538 (10Sarai-WMF) a:05Sarai-WMF→03None [14:28:09] 10Tool-wikimedia-attribution: [WE5.3.1c] Publish attribution guidelines for Games and Rich media experiences - https://phabricator.wikimedia.org/T416992#11936539 (10Sarai-WMF) a:05Sarai-WMF→03None [14:31:16] 10Tool-wikimedia-attribution, 10MediaWiki-REST-API, 05MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), 06MW-Interfaces-Team (MWI-Sprint-33 (2026-05-05 to 2026-05-19)): Attribution API: Include wprov parameters in response URLs - https://phabricator.wikimedia.org/T425576#11936551 (10pmiazga) [14:32:35] 10Tool-wikimedia-attribution, 10MediaWiki-REST-API, 05MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), 06MW-Interfaces-Team (MWI-Sprint-33 (2026-05-05 to 2026-05-19)): Attribution API: Include wprov parameters in response URLs - https://phabricator.wikimedia.org/T425576#11936557 (10pmiazga) 05In progress→0... [14:36:16] 10Tool-humaniki-2: Update contrib guide to forbid AI-generated code - https://phabricator.wikimedia.org/T426758 (10Danya) 03NEW [14:36:19] 10Tool-humaniki-2: Update contrib guide to forbid AI-generated code - https://phabricator.wikimedia.org/T426758#11936612 (10Danya) p:05Triage→03Low [14:38:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:40:03] 10VPS-project-Wikistats, 07Code-Health-Help-Wanted, 07Performance Issue: wikistats needs improved data and presentation for fandom - https://phabricator.wikimedia.org/T215534#11936627 (10Xqt) 05Open→03Resolved Looks like this is solved, isn't it? [14:46:36] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: [SPIKE] Add security scheme information to MediaWiki REST API description - https://phabricator.wikimedia.org/T423552#11936666 (10KineticPelagic) [14:47:37] 10VPS-project-Wikistats: Rename List of Wikia wikis to List of Fandom wikis - https://phabricator.wikimedia.org/T426760 (10Xqt) 03NEW [14:49:36] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 13Patch-For-Review: Fix or disable oas3-valid-media-example date validation - https://phabricator.wikimedia.org/T425952#11936718 (10HCoplin-WMF) [14:50:44] 10Tool-wikimedia-attribution, 10MediaWiki-REST-API, 05MW-1.47-notes (1.47.0-wmf.3; 2026-05-19), 06MW-Interfaces-Team (MWI-Sprint-33 (2026-05-05 to 2026-05-19)): Attribution API: Include wprov parameters in response URLs - https://phabricator.wikimedia.org/T425576#11936729 (10HCoplin-WMF) Marking as res... [14:50:51] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-33 (2026-05-05 to 2026-05-19)): Add tests to ensure consistency between OAD example and OpenAPI linter - https://phabricator.wikimedia.org/T419576#11936732 (10HCoplin-WMF) 05Open→03Resolved Marking as resolved as part of sprint close out. [14:50:59] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-33 (2026-05-05 to 2026-05-19)), 07OKR-Work: Connect the new ruleset repository to the linter tool - https://phabricator.wikimedia.org/T425626#11936738 (10HCoplin-WMF) 05Open→03Resolved Marking as resolved as part of sprint close out. [14:51:07] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-33 (2026-05-05 to 2026-05-19)), 07OKR-Work: Move linter rules to a separate repository - https://phabricator.wikimedia.org/T425625#11936742 (10HCoplin-WMF) 05Open→03Resolved Marking as resolved as part of sprint close out. [14:51:18] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-33 (2026-05-05 to 2026-05-19)), 07OKR-Work: Add the remaining linting rules - https://phabricator.wikimedia.org/T422600#11936754 (10HCoplin-WMF) Marking as resolved as part of sprint close out. [14:54:45] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: [SPIKE] Identify OAD properties not supported by MediaWiki REST Framework - https://phabricator.wikimedia.org/T425942#11936784 (10HCoplin-WMF) a:03KineticPelagic [14:55:31] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: Fix issue line highlighting in the linter - https://phabricator.wikimedia.org/T425935#11936786 (10HCoplin-WMF) a:03KineticPelagic [14:56:37] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)): Exclude link objects from wikimedia-paths-parameter-example-exists - https://phabricator.wikimedia.org/T425920#11936790 (10HCoplin-WMF) a:03MGoncalves-WMF [14:57:18] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: [SPIKE] Identify OAD properties not supported by MediaWiki REST Framework - https://phabricator.wikimedia.org/T425942#11936798 (10HCoplin-WMF) a:05KineticPelagic→03None [14:57:25] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: Fix issue line highlighting in the linter - https://phabricator.wikimedia.org/T425935#11936799 (10HCoplin-WMF) a:05KineticPelagic→03None [14:58:32] FIRING: TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [14:58:43] FIRING: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [15:03:32] RESOLVED: [2x] TargetDown: Job app is unreachable in project quarry instance quarry.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [15:03:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:03:43] RESOLVED: QuarryDown: Quarry application is unreachable - https://prometheus-alerts.wmcloud.org/?q=alertname%3DQuarryDown [15:18:23] (03PS1) 10Majavah: hiddenparma: Add cwilliams [labs/private] - 10https://gerrit.wikimedia.org/r/1289376 [15:33:14] 10Tool-centralnotice-banner-editor: Add more font options for text elements - https://phabricator.wikimedia.org/T426767 (10Oyelola_Victoria) 03NEW [15:37:50] 10Tool-centralnotice-banner-editor: Add templates with more content combinations - https://phabricator.wikimedia.org/T426768 (10Oyelola_Victoria) 03NEW [15:38:27] 10Tool-centralnotice-banner-editor: Add templates with more content combinations - https://phabricator.wikimedia.org/T426768#11937175 (10Oyelola_Victoria) [15:41:44] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-34 (2026-05-19 to 2026-06-02)), 07OKR-Work: [SPIKE] Add security scheme information to MediaWiki REST API description - https://phabricator.wikimedia.org/T423552#11937198 (10KineticPelagic) I shared with the whole team this [[ https://docs.google.co... [15:41:50] 10Tool-centralnotice-banner-editor: Add templates with more content combinations - https://phabricator.wikimedia.org/T426768#11937201 (10Oyelola_Victoria) [15:48:34] (03update) 10fnegri: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [15:52:20] (03update) 10fnegri: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [15:52:23] (03update) 10fnegri: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [15:52:23] (03approved) 10fnegri: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [15:52:25] (03update) 10fnegri: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [15:53:58] (03update) 10fnegri: jobs-api: use webservice image variants in one-off job tests [repos/cloud/toolforge/toolforge-deploy] (test_for_image_argument_handling_in_jobs) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1115 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [15:57:02] 06cloud-services-team, 10Toolforge, 06tools-platform-team, 13Patch-For-Review: add more logs tests to toolforge-deploy - https://phabricator.wikimedia.org/T418326#11937272 (10Raymond_Ndibe) 05In progress→03Resolved [16:01:30] 10VPS-project-Codesearch, 06MediaWiki-Platform-Team (Kanban Board): Include "Mobile Apps" in "MediaWiki & services at WMF" preset for Codesearch - https://phabricator.wikimedia.org/T426627#11937278 (10Tgr) The Commons app isn't really "at WMF", it's a third-party app (although a very prominent one). [16:01:32] 10VPS-project-Codesearch, 07dark-mode: Codesearch: Adding Dark Mode - https://phabricator.wikimedia.org/T415460#11937275 (10neriah) the change is merged. [16:09:43] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [16:09:55] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [16:14:29] (03update) 10fnegri: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) (owner: 10raymond-ndibe) [16:14:36] (03approved) 10fnegri: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) (owner: 10raymond-ndibe) [16:14:40] (03update) 10fnegri: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) (owner: 10raymond-ndibe) [16:15:21] (03update) 10raymond-ndibe: jobs-api: use webservice image variants in one-off job tests [repos/cloud/toolforge/toolforge-deploy] (test_for_image_argument_handling_in_jobs) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1115 (https://phabricator.wikimedia.org/T415322) [16:15:28] (03update) 10raymond-ndibe: jobs-api: use webservice image variants in one-off job tests [repos/cloud/toolforge/toolforge-deploy] (test_for_image_argument_handling_in_jobs) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1115 (https://phabricator.wikimedia.org/T415322) [16:15:35] (03update) 10raymond-ndibe: jobs-api: use webservice image variants in one-off job tests [repos/cloud/toolforge/toolforge-deploy] (test_for_image_argument_handling_in_jobs) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1115 (https://phabricator.wikimedia.org/T415322) [16:18:47] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [16:19:47] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [16:19:48] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [16:20:00] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [16:20:00] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [16:22:28] (03CR) 10CDanis: [C:03+1] hiddenparma: Add cwilliams [labs/private] - 10https://gerrit.wikimedia.org/r/1289376 (owner: 10Majavah) [16:23:21] (03CR) 10Majavah: [V:03+2 C:03+2] hiddenparma: Add cwilliams [labs/private] - 10https://gerrit.wikimedia.org/r/1289376 (owner: 10Majavah) [16:26:16] (03update) 10fnegri: core.py: fix jobs loading messaging ux issue [repos/cloud/toolforge/jobs-api] (fix_jobs_load_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/291 (https://phabricator.wikimedia.org/T423891) (owner: 10raymond-ndibe) [16:26:20] (03approved) 10fnegri: core.py: fix jobs loading messaging ux issue [repos/cloud/toolforge/jobs-api] (fix_jobs_load_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/291 (https://phabricator.wikimedia.org/T423891) (owner: 10raymond-ndibe) [16:26:21] (03update) 10fnegri: core.py: fix jobs loading messaging ux issue [repos/cloud/toolforge/jobs-api] (fix_jobs_load_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/291 (https://phabricator.wikimedia.org/T423891) (owner: 10raymond-ndibe) [16:28:52] (03update) 10fnegri: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [16:28:53] (03approved) 10fnegri: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [16:40:32] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate Puppet CA: metricsinfra-puppetmaster-1.metricsinfra.eqiad1.wikimedia.cloud is about to expire in 25d 23h 58m 10s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [17:27:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:37:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:47:46] (03PS1) 10Majavah: openstack: cloudvirt: Fix argparse specification [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289400 [17:51:25] (03CR) 10Andrew Bogott: [C:03+2] openstack: cloudvirt: Fix argparse specification [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289400 (owner: 10Majavah) [17:52:07] (03PS1) 10BryanDavis: dev(docker-compose): Change to `restart: unless-stopped` [labs/striker] - 10https://gerrit.wikimedia.org/r/1289401 [17:52:07] (03PS1) 10BryanDavis: Add note about logout/login cycle to refresh LDAP data [labs/striker] - 10https://gerrit.wikimedia.org/r/1289402 (https://phabricator.wikimedia.org/T426394) [17:54:35] (03Merged) 10jenkins-bot: openstack: cloudvirt: Fix argparse specification [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1289400 (owner: 10Majavah) [17:54:46] (03CR) 10Majavah: [C:03+2] dev(docker-compose): Change to `restart: unless-stopped` [labs/striker] - 10https://gerrit.wikimedia.org/r/1289401 (owner: 10BryanDavis) [17:55:07] (03CR) 10CI reject: [V:04-1] Add note about logout/login cycle to refresh LDAP data [labs/striker] - 10https://gerrit.wikimedia.org/r/1289402 (https://phabricator.wikimedia.org/T426394) (owner: 10BryanDavis) [17:57:49] (03Merged) 10jenkins-bot: dev(docker-compose): Change to `restart: unless-stopped` [labs/striker] - 10https://gerrit.wikimedia.org/r/1289401 (owner: 10BryanDavis) [17:58:00] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [17:58:01] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=97) on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [17:58:08] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [17:58:54] (03PS2) 10BryanDavis: Add note about logout/login cycle to refresh LDAP data [labs/striker] - 10https://gerrit.wikimedia.org/r/1289402 (https://phabricator.wikimedia.org/T426394) [18:10:21] PROBLEM - Host cloudvirt1044 is DOWN: PING CRITICAL - Packet loss = 100% [18:12:43] RECOVERY - Host cloudvirt1044 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [18:30:05] PROBLEM - Host cloudvirt1045 is DOWN: PING CRITICAL - Packet loss = 100% [18:31:37] RECOVERY - Host cloudvirt1045 is UP: PING OK - Packet loss = 0%, RTA = 0.38 ms [18:39:30] (03CR) 10A smart kitten: "(for the record, this was for T415460)" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1230251 (owner: 10Neriah) [18:42:41] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [18:48:09] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{O:wmcs::openstack::eqiad1::virt_ceph} AND NOT P{F:kernelversion = 6.12.88}' [18:51:53] andrew@cloudcumin1001 safe_reboot (PID 92056) is awaiting input [19:01:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1046 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:06:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1046 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:11:32] 06cloud-services-team, 10Openstack-Magnum: Investigate new Magnum drivers - https://phabricator.wikimedia.org/T393782#11938102 (10Andrew) Time for a new update: Things are back to pretty much working in codfw1dev. The only serious blocker to deployment in eqiad1 is the issue with docker_volume_size[0] which u... [19:20:50] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1047 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:21:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:25:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1047 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:28:00] 10VPS-project-Codesearch, 07dark-mode: Codesearch: Adding Dark Mode - https://phabricator.wikimedia.org/T415460#11938132 (10Ladsgroup) 05In progress→03Resolved I just deployed it. Thanks! [19:30:19] PROBLEM - SSH on cloudbackup1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:30:45] PROBLEM - Host cloudbackup2003 is DOWN: PING CRITICAL - Packet loss = 100% [19:30:53] PROBLEM - Host cloudbackup2004 is DOWN: PING CRITICAL - Packet loss = 100% [19:31:11] PROBLEM - SSH on cloudbackup1004 is CRITICAL: connect to address 10.64.20.23 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:31:41] PROBLEM - Host cloudbackup1003 is DOWN: PING CRITICAL - Packet loss = 100% [19:32:13] RECOVERY - Host cloudbackup2003 is UP: PING OK - Packet loss = 0%, RTA = 31.61 ms [19:33:09] PROBLEM - Host cloudvirtlocal1003 is DOWN: PING CRITICAL - Packet loss = 100% [19:33:13] RECOVERY - Host cloudbackup2004 is UP: PING OK - Packet loss = 0%, RTA = 31.56 ms [19:34:11] RECOVERY - SSH on cloudbackup1003 is OK: SSH OK - OpenSSH_10.0p2 Debian-7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:34:13] RECOVERY - Host cloudbackup1003 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [19:34:37] RECOVERY - Host cloudvirtlocal1003 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [19:34:47] FIRING: NodeDown: Node cloudbackup1004 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [19:35:11] RECOVERY - SSH on cloudbackup1004 is OK: SSH OK - OpenSSH_10.0p2 Debian-7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [19:36:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:39:47] RESOLVED: NodeDown: Node cloudbackup1004 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [19:40:57] PROBLEM - Host cloudvirtlocal1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:42:37] RECOVERY - Host cloudvirtlocal1001 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [19:43:19] FIRING: [3x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1047 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:44:55] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 488 bytes in 3.011 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [19:47:09] PROBLEM - Host cloudvirtlocal1002 is DOWN: PING CRITICAL - Packet loss = 100% [19:48:20] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1048 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:48:34] FIRING: [3x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1048 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:48:37] RECOVERY - Host cloudvirtlocal1002 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [19:48:46] FIRING: JobUnavailable: Reduced availability for job pdns_rec in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:49:53] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.325 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [19:50:13] PROBLEM - Host cloudservices1005 is DOWN: PING CRITICAL - Packet loss = 100% [19:50:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:51:35] RECOVERY - Host cloudservices1005 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [19:52:05] FIRING: [4x] HostBGPDown: BGP session for cloudservices1005 (172.20.2.4) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [19:52:41] PROBLEM - Check DNS auth via TCP of www.wmcloud.org on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is CRITICAL: DNS CRITICAL - 7.063 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:52:41] PROBLEM - Check DNS auth via TCP of login.toolforge.org on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is CRITICAL: DNS CRITICAL - 7.067 seconds response time (login.toolforge.org. 3600 IN CNAME bastion.toolforge.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:52:41] PROBLEM - Check DNS auth via UDP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is CRITICAL: DNS CRITICAL - 7.075 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.18.169) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:52:41] PROBLEM - Check DNS auth via UDP of login.toolforge.org on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is CRITICAL: DNS CRITICAL - 7.084 seconds response time (login.toolforge.org. 3600 IN CNAME bastion.toolforge.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:52:41] PROBLEM - Check DNS auth via UDP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is CRITICAL: DNS CRITICAL - 7.060 seconds response time (tools-puppetserver-01.tools.eqiad1.wikimedia.cloud. 60 IN A 172.16.3.13) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:52:42] PROBLEM - Check DNS auth via UDP of www.wmcloud.org on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is CRITICAL: DNS CRITICAL - 7.081 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:52:42] PROBLEM - Check DNS auth via TCP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is CRITICAL: DNS CRITICAL - 7.062 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.18.169) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:52:42] PROBLEM - Check DNS auth via TCP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is CRITICAL: DNS CRITICAL - 7.083 seconds response time (tools-puppetserver-01.tools.eqiad1.wikimedia.cloud. 60 IN A 172.16.3.13) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:53:19] RESOLVED: [3x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1048 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:53:33] RECOVERY - Check DNS auth via UDP of login.toolforge.org on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is OK: DNS OK - 0.075 seconds response time (login.toolforge.org. 3600 IN CNAME bastion.toolforge.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:53:33] RECOVERY - Check DNS auth via TCP of login.toolforge.org on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is OK: DNS OK - 0.050 seconds response time (login.toolforge.org. 3600 IN CNAME bastion.toolforge.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:53:33] RECOVERY - Check DNS auth via UDP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is OK: DNS OK - 0.074 seconds response time (tools-puppetserver-01.tools.eqiad1.wikimedia.cloud. 60 IN A 172.16.3.13) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:53:34] RECOVERY - Check DNS auth via UDP of www.wmcloud.org on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is OK: DNS OK - 0.054 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:53:34] RECOVERY - Check DNS auth via TCP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is OK: DNS OK - 0.070 seconds response time (tools-puppetserver-01.tools.eqiad1.wikimedia.cloud. 60 IN A 172.16.3.13) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:53:34] RECOVERY - Check DNS auth via TCP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is OK: DNS OK - 0.050 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.18.169) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:53:34] RECOVERY - Check DNS auth via UDP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is OK: DNS OK - 0.120 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.18.169) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:53:34] RECOVERY - Check DNS auth via TCP of www.wmcloud.org on server ns0.openstack.eqiad1.wikimediacloud.org on cloudservices1005 is OK: DNS OK - 0.098 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:54:20] FIRING: [5x] JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:55:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:56:05] PROBLEM - Host cloudservices1006 is DOWN: PING CRITICAL - Packet loss = 100% [19:56:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:57:04] (03update) 10danyya: Draft: Resolve T422343 "Integrate Denelezh and merge stats tables" [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/56 [19:57:05] FIRING: [6x] HostBGPDown: BGP session for cloudservices1005 (172.20.2.4) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [19:57:35] RECOVERY - Host cloudservices1006 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [19:58:43] PROBLEM - Check DNS auth via TCP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:58:43] PROBLEM - Check DNS auth via UDP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:58:43] PROBLEM - Check DNS auth via UDP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:58:44] PROBLEM - Check DNS auth via TCP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:58:44] PROBLEM - Check DNS auth via UDP of login.toolforge.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:58:44] PROBLEM - Check DNS auth via TCP of login.toolforge.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:58:44] PROBLEM - Check DNS auth via UDP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:58:44] PROBLEM - Check DNS auth via TCP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is CRITICAL: CRITICAL - Plugin timed out while executing system call https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [19:59:11] FIRING: [5x] JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [19:59:33] PROBLEM - Bird Internet Routing Daemon on cloudservices1006 is CRITICAL: PROCS CRITICAL: 0 processes with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running [20:00:33] RECOVERY - Bird Internet Routing Daemon on cloudservices1006 is OK: PROCS OK: 1 process with command name bird https://wikitech.wikimedia.org/wiki/Anycast%23Bird_daemon_not_running [20:00:33] RECOVERY - Check DNS auth via TCP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.049 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:00:33] RECOVERY - Check DNS auth via UDP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.050 seconds response time (tools-puppetserver-01.tools.eqiad1.wikimedia.cloud. 60 IN A 172.16.3.13) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:00:33] RECOVERY - Check DNS auth via UDP of login.toolforge.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.079 seconds response time (login.toolforge.org. 3600 IN CNAME bastion.toolforge.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:00:34] RECOVERY - Check DNS auth via UDP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.074 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.18.169) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:00:34] RECOVERY - Check DNS auth via UDP of www.wmcloud.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.048 seconds response time (www.wmcloud.org. 3600 IN CNAME wmcloud.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:00:34] RECOVERY - Check DNS auth via TCP of login.toolforge.org on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.042 seconds response time (login.toolforge.org. 3600 IN CNAME bastion.toolforge.org.) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:00:34] RECOVERY - Check DNS auth via TCP of tools-puppetserver-01.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.044 seconds response time (tools-puppetserver-01.tools.eqiad1.wikimedia.cloud. 60 IN A 172.16.3.13) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:00:34] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1049 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:00:34] RECOVERY - Check DNS auth via TCP of k8s.svc.tools.eqiad1.wikimedia.cloud on server ns1.openstack.eqiad1.wikimediacloud.org on cloudservices1006 is OK: DNS OK - 0.065 seconds response time (k8s.svc.tools.eqiad1.wikimedia.cloud. 300 IN A 172.16.18.169) https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [20:01:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:02:05] RESOLVED: [8x] HostBGPDown: BGP session for cloudservices1005 (172.20.2.4) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [20:03:46] RESOLVED: [5x] JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:05:19] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1049 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:07:09] (03update) 10danyya: Draft: Resolve T422343 "Integrate Denelezh and merge stats tables" [toolforge-repos/humaniki] - 10https://gitlab.wikimedia.org/toolforge-repos/humaniki/-/merge_requests/56 [21:09:54] (03open) 10lucaswerkmeister: Add CI for Python 3.14 [toolforge-repos/python-toolforge] - 10https://gitlab.wikimedia.org/toolforge-repos/python-toolforge/-/merge_requests/29 [21:45:07] 10Cloud-VPS (Quota-requests): Quota increase request for project osmit - https://phabricator.wikimedia.org/T426790 (10Danysan1) 03NEW [21:51:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.reboot_node on hosts matched by 'P{O:wmcs::openstack::codfw1dev::control}' [21:54:46] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [21:57:33] 10VPS-project-Phabricator, 06collaboration-services, 06Infrastructure-Foundations, 10Mail: @wikimedia.org email addresses don't seem to be receiving emails sent by the test Phabricator instance - https://phabricator.wikimedia.org/T422559#11938383 (10jhathaway) The issue is that on `mx-in{1001,2001}.wikimed... [21:59:46] RESOLVED: [2x] JobUnavailable: Reduced availability for job mysql-galera in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:01:01] FIRING: JobUnavailable: Reduced availability for job mysql-galera in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:03:16] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.reboot_node (exit_code=0) on hosts matched by 'P{O:wmcs::openstack::codfw1dev::control}' [22:04:46] RESOLVED: [2x] JobUnavailable: Reduced availability for job mysql-galera in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [22:09:48] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'P{P:openstack::codfw1dev::nova::compute::service} AND NOT P{F:kernelversion = 6.12.88}' [22:27:46] 10Tool-k8s-status: k8s-status can't show information about one tool - https://phabricator.wikimedia.org/T405150#11938575 (10bd808) 05Open→03Invalid https://k8s-status.toolforge.org/namespaces/tool-jimmy/ is rendering today. It is very difficult to say what API crawling failure was breaking this particula... [22:30:18] 10VPS-project-Codesearch, 13Patch-For-Review: Codesearch stuck at Feb 12th? - https://phabricator.wikimedia.org/T421147#11938583 (10Dzahn) I deployed the code change above. This created a new systemd service and timer: ` root@codesearch9:~# systemctl list-units | grep zombie codesearch-delete-zombie-locks.... [22:39:36] (03approved) 10bd808: Add CI for Python 3.14 [toolforge-repos/python-toolforge] - 10https://gitlab.wikimedia.org/toolforge-repos/python-toolforge/-/merge_requests/29 (owner: 10lucaswerkmeister) [22:40:00] (03merge) 10bd808: Add CI for Python 3.14 [toolforge-repos/python-toolforge] - 10https://gitlab.wikimedia.org/toolforge-repos/python-toolforge/-/merge_requests/29 (owner: 10lucaswerkmeister) [22:51:34] 10VPS-project-Codesearch, 13Patch-For-Review: Codesearch stuck at Feb 12th? - https://phabricator.wikimedia.org/T421147#11938609 (10Dzahn) Yea, so the puppet code and find command work as intended.. just that it took 13 minutes to run one time. So could not leave the timer at "every 10 minutes" and increased... [22:56:47] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'P{P:openstack::codfw1dev::nova::compute::service} AND NOT P{F:kernelversion = 6.12.88}' [22:57:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:04:50] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1058 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:09:50] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1058 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:17:37] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:18:11] (03update) 10raymond-ndibe: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) [23:18:45] (03update) 10raymond-ndibe: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) [23:18:56] (03update) 10raymond-ndibe: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) [23:18:59] (03update) 10raymond-ndibe: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) [23:19:01] (03approved) 10raymond-ndibe: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) [23:19:42] (03update) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [23:19:43] (03approved) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [23:19:50] (03update) 10raymond-ndibe: jobs-api: use webservice image variants in one-off job tests [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1115 (https://phabricator.wikimedia.org/T415322) [23:19:51] (03merge) 10raymond-ndibe: jobs-api: test for proper handling of the diff variations of the --image argument [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1113 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [23:20:08] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [23:20:09] (03approved) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [23:21:33] (03update) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [23:21:40] (03merge) 10raymond-ndibe: replace job images with web images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/263 (https://phabricator.wikimedia.org/T415322) [23:21:42] (03update) 10raymond-ndibe: support exposing continuous jobs to the internet [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/262 (https://phabricator.wikimedia.org/T388092) [23:22:12] (03update) 10raymond-ndibe: core.py: fix jobs loading messaging ux issue [repos/cloud/toolforge/jobs-api] (fix_jobs_load_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/291 (https://phabricator.wikimedia.org/T423891) [23:22:38] (03approved) 10raymond-ndibe: core.py: fix jobs loading messaging ux issue [repos/cloud/toolforge/jobs-api] (fix_jobs_load_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/291 (https://phabricator.wikimedia.org/T423891) [23:27:13] (03update) 10raymond-ndibe: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) [23:27:28] (03update) 10raymond-ndibe: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) [23:31:40] (03merge) 10raymond-ndibe: core.py: fix jobs loading messaging ux issue [repos/cloud/toolforge/jobs-api] (fix_jobs_load_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/291 (https://phabricator.wikimedia.org/T423891) [23:31:42] (03update) 10raymond-ndibe: fix jobs load bug [repos/cloud/toolforge/jobs-api] (fix_oneoff_storage_error) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/287 (https://phabricator.wikimedia.org/T423544 https://phabricator.wikimedia.org/T423891) [23:32:07] (03update) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: jobs-api: bump to 0.0.494-20260519232221-965db235 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1261 (https://phabricator.wikimedia.org/T415322) [23:32:18] (03open) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: jobs-api: bump to 0.0.494-20260519232221-965db235 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1261 (https://phabricator.wikimedia.org/T415322) [23:34:28] (03merge) 10raymond-ndibe: kubernetes.py: handle 409 errors [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/104 (https://phabricator.wikimedia.org/T423005) [23:35:37] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:40:52] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-7:443 has failed probes (http_admin_beta_toolforge_org_ip6) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:44:22] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [23:46:44] FIRING: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [23:50:27] (03open) 10raymond-ndibe: d/changelog: bump to 0.103.22 [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/106 (https://phabricator.wikimedia.org/T423005) [23:51:44] RESOLVED: MaintainDBUsersManyErrors: Maintain-dbusers is having sustained errors - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/MaintainDBUsersManyErrors - https://grafana.wikimedia.org/d/ae240a06-c13e-49f3-b12c-58432c551e85/wmcs-maintain-dbusers - https://alerts.wikimedia.org/?q=alertname%3DMaintainDBUsersManyErrors [23:54:13] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api