[00:01:31] (03PS1) 10Dwisehaupt: community_civicrm: add stub for dovecot_passwd [labs/private] - 10https://gerrit.wikimedia.org/r/1124204 (https://phabricator.wikimedia.org/T383715) [00:20:56] FIRING: MaxConntrack: Max conntrack at 80.15% on cloudvirt1039:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:25:56] RESOLVED: MaxConntrack: Max conntrack at 80.12% on cloudvirt1039:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:42:56] FIRING: MaxConntrack: Max conntrack at 80.41% on cloudvirt1039:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:52:56] RESOLVED: MaxConntrack: Max conntrack at 80.44% on cloudvirt1039:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:02:48] 10Horizon, 05Cloud-Services-Origin-User: Interfaces tab of deployment-deploy04.deployment-prep VM never finishes loading - https://phabricator.wikimedia.org/T380531#10599026 (10bd808) The HTTP 500 error for `/project/instances/{instance id}/?tab=instance_details__interfaces` seems to happen for all instances i... [01:03:11] 06cloud-services-team, 10Horizon, 05Cloud-Services-Origin-User: Interfaces tab of deployment-deploy04.deployment-prep VM never finishes loading - https://phabricator.wikimedia.org/T380531#10599040 (10bd808) [01:04:26] 06cloud-services-team, 10Horizon, 05Cloud-Services-Origin-User: /project/instances/{instance id}/?tab=instance_details__interfaces returns HTTP 500; Interfaces tab never finishes loading - https://phabricator.wikimedia.org/T380531#10599043 (10bd808) p:05Low→03Medium [01:26:38] 06cloud-services-team, 10Horizon, 05Cloud-Services-Origin-User: /project/instances/{instance id}/?tab=instance_details__interfaces returns HTTP 500; Interfaces tab never finishes loading - https://phabricator.wikimedia.org/T380531#10599084 (10bd808) From `docker logs -f openstack-dashboard.service` on cloudw... [03:19:10] FIRING: [2x] GaleraNoRecentWrites: Galera node cloudcontrol1005:9104 has no writes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraNoRecentWrites - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraNoRecentWrites [03:24:10] FIRING: [3x] GaleraNoRecentWrites: Galera node cloudcontrol1005:9104 has no writes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraNoRecentWrites - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraNoRecentWrites [03:52:46] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [05:59:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-55 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:24:10] FIRING: [3x] GaleraNoRecentWrites: Galera node cloudcontrol1005:9104 has no writes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraNoRecentWrites - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraNoRecentWrites [08:19:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [08:46:54] 06cloud-services-team, 10Cloud-VPS: openstack galera no recent writes 2025-03-04 - https://phabricator.wikimedia.org/T387828 (10aborrero) 03NEW [08:49:10] FIRING: [3x] GaleraNoRecentWrites: Galera node cloudcontrol1005:9104 has no writes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraNoRecentWrites - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraNoRecentWrites [08:50:10] FIRING: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [08:50:22] FIRING: HAProxyBackendUnavailable: HAProxy service mysql backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [08:54:10] RESOLVED: [2x] GaleraNoRecentWrites: Galera node cloudcontrol1006:9104 has no writes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraNoRecentWrites - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraNoRecentWrites [08:58:19] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [08:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:58:30] 06cloud-services-team, 10Cloud-VPS: openstack galera no recent writes 2025-03-04 - https://phabricator.wikimedia.org/T387828#10599765 (10ops-monitoring-bot) Host rebooted by aborrero@cumin1002 with reason: galera problem [08:59:17] 06cloud-services-team, 10Cloud-VPS: openstack galera no recent writes 2025-03-04 - https://phabricator.wikimedia.org/T387828#10599766 (10aborrero) 05Open→03In progress p:05Triage→03High [09:00:22] FIRING: [9x] HAProxyBackendUnavailable: HAProxy service glance-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [09:05:22] FIRING: [4x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [09:07:43] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [09:07:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:14:38] (03approved) 10dcaro: api-gateway: bump to 0.0.64-20250303171648-bd834b88 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/688 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [09:14:41] (03merge) 10dcaro: api-gateway: bump to 0.0.64-20250303171648-bd834b88 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/688 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [09:15:34] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 [09:15:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:18:27] PROBLEM - Memcached on cloudcontrol1005 is CRITICAL: connect to address 10.64.151.3 and port 11211: Connection refused https://wikitech.wikimedia.org/wiki/Memcached [09:18:33] PROBLEM - SSH on cloudcontrol1005 is CRITICAL: connect to address 10.64.151.3 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring [09:20:17] 06cloud-services-team, 10Cloud-VPS: openstack galera no recent writes 2025-03-04 - https://phabricator.wikimedia.org/T387828#10599859 (10aborrero) server wont shutdown for reboot, so we are force-rebooting it [09:23:28] FIRING: InstanceDown: Project tools instance tools-k8s-worker-nfs-55 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:23:45] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 [09:23:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:27:13] PROBLEM - Host cloudcontrol1005 is DOWN: PING CRITICAL - Packet loss = 100% [09:28:28] RESOLVED: InstanceDown: Project tools instance tools-k8s-worker-nfs-55 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:36:07] 06cloud-services-team, 10Cloud-VPS: openstack galera no recent writes 2025-03-04 - https://phabricator.wikimedia.org/T387828#10599925 (10aborrero) when force-rebooted, the server did not have the correct network configuration [09:37:21] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/18 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [09:37:24] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/18 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [09:41:02] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: envvars-admission: bump to 0.0.25-20250304093736-6bb51ac3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/691 [09:45:23] 10Toolforge (Toolforge iteration 18): [components-api] add basic prometheus instrumentation - https://phabricator.wikimedia.org/T381249#10599973 (10dcaro) 05In progress→03Resolved [09:46:46] 06cloud-services-team, 10Cloud-VPS: openstack galera no recent writes 2025-03-04 - https://phabricator.wikimedia.org/T387828#10599976 (10aborrero) the switch port the server is connected to is somehow down: {F58592794} Not sure if there is a problem in the switch port, the cable or similar. [09:47:33] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission [09:47:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:49:11] 06cloud-services-team, 10Cloud-VPS: openstack galera no recent writes 2025-03-04, suspected network hardware problem - https://phabricator.wikimedia.org/T387828#10599983 (10aborrero) [09:51:13] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad: openstack galera no recent writes 2025-03-04, suspected network hardware problem - https://phabricator.wikimedia.org/T387828#10599984 (10aborrero) Hey @VRiley-WMF and/or @Jclark-ctr, Could you please check on-site if there is a loose cab... [09:51:42] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad: openstack galera no recent writes 2025-03-04, suspected network hardware problem - https://phabricator.wikimedia.org/T387828#10599987 (10aborrero) a:03VRiley-WMF [09:51:58] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission [09:52:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:06:29] 10Tool-schedule-deployment: Allow scheduling for current backport window - https://phabricator.wikimedia.org/T381237#10600003 (10Lucas_Werkmeister_WMDE) Thank you both! [10:11:15] (03PS1) 10David Caro: toolforge.component.deploy: fail if the tests failed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1124377 [10:14:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-55 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:19:49] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission [10:19:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:20:43] (03update) 10dcaro: deployment: update the deployment state during deploy [repos/cloud/toolforge/components-api] (deploy_source_build_too) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/54 [10:27:11] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission [10:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:27:44] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission [10:27:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:34:09] (03open) 10dcaro: deployment: add the status and long_status fields [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/20 [10:35:32] 06cloud-services-team, 10Cloud-VPS, 07IPv6: CloudVPS: IPv6 in eqiad1 - https://phabricator.wikimedia.org/T380174#10600096 (10aborrero) Given we have a theory for why {T380728} happened, we (@cmooney and me) are trying to target end of March for a rollout of this. [10:37:53] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission [10:37:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:40:25] (03approved) 10dcaro: envvars-admission: bump to 0.0.25-20250304093736-6bb51ac3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/691 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:40:28] (03merge) 10dcaro: envvars-admission: bump to 0.0.25-20250304093736-6bb51ac3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/691 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:42:44] (03update) 10dcaro: deployment: add the status and long_status fields [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/20 [10:49:09] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/146 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:49:13] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/146 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:52:31] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: jobs-api: bump to 0.0.356-20250304104944-40142c5d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/692 [10:59:47] FIRING: NodeDown: Node cloudcontrol1005 has been down for long. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1005 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [10:59:52] 06cloud-services-team: NodeDown Node cloudcontrol1005 has been down for long. - https://phabricator.wikimedia.org/T387844 (10phaultfinder) 03NEW [11:08:23] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [11:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:12:49] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [11:12:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:14:00] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [11:16:33] 06cloud-services-team: NodeDown Node cloudcontrol1005 has been down for long. - https://phabricator.wikimedia.org/T387844#10600192 (10aborrero) [11:16:40] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: openstack galera no recent writes 2025-03-04, suspected network hardware problem - https://phabricator.wikimedia.org/T387828#10600193 (10aborrero) [11:16:42] 06cloud-services-team: NodeDown Node cloudcontrol1005 has been down for long. - https://phabricator.wikimedia.org/T387844#10600194 (10aborrero) 05Open→03Resolved a:03aborrero [11:18:25] 10Toolforge (Toolforge iteration 18): [maintain-dbusers] move to another cloudcontrol node - https://phabricator.wikimedia.org/T387845 (10dcaro) 03NEW [11:19:00] FIRING: [2x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [11:20:59] 10cloud-services-team (Hardware), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: openstack galera no recent writes 2025-03-04, suspected network hardware problem - https://phabricator.wikimedia.org/T387828#10600211 (10aborrero) 05In progress→03Open [11:25:32] 06cloud-services-team, 10Cloud-VPS, 07IPv6, 13Patch-For-Review: openstack: network problems when introducing new networks - https://phabricator.wikimedia.org/T380728#10600217 (10aborrero) 05In progress→03Open [11:34:09] 10Toolforge (Toolforge iteration 18): [maintain-dbusers] move to another cloudcontrol node - https://phabricator.wikimedia.org/T387845#10600256 (10dcaro) 05Open→03Resolved p:05Triage→03High [11:35:48] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [11:35:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:38:01] (03PS2) 10David Caro: toolforge.component.deploy: fail if the tests failed [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1124377 [11:39:38] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [11:39:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:59:26] 06cloud-services-team, 06DC-Ops, 10Ganeti, 06Infrastructure-Foundations, and 2 others: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10600332 (10fnegri) 05Resolved→03Open The alert has not fired since Feb 27. The alert is based on `ipmi_temperature_stat... [12:02:08] (03update) 10raymond-ndibe: [toolforge-deploy] add maintain-harbor quota config [repos/cloud/toolforge/toolforge-deploy] (refactor_maintain_harbor_config) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/674 (https://phabricator.wikimedia.org/T352417) [12:10:39] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:15:39] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:16:04] (03update) 10raymond-ndibe: Upgrade Kubernetes to 1.29 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/227 (https://phabricator.wikimedia.org/T362868) (owner: 10fnegri) [12:23:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:28:06] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:43:31] (03PS4) 10Arturo Borrero Gonzalez: wmcs.openstack.restart_openstack: add runtime_description() [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1088597 [12:45:09] (03PS1) 10Arturo Borrero Gonzalez: inventory: place cloudcontrol1005 last on the inventory [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1124421 (https://phabricator.wikimedia.org/T387828) [12:47:49] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] wmcs.openstack.restart_openstack: add runtime_description() [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1088597 (owner: 10Arturo Borrero Gonzalez) [12:50:37] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [12:50:39] !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api [12:50:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:50:43] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [12:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:50:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:51:48] (03CR) 10David Caro: [C:03+1] "We might want to implement some fallback mechanism eventually" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1124421 (https://phabricator.wikimedia.org/T387828) (owner: 10Arturo Borrero Gonzalez) [12:52:13] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] inventory: place cloudcontrol1005 last on the inventory [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1124421 (https://phabricator.wikimedia.org/T387828) (owner: 10Arturo Borrero Gonzalez) [12:58:11] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [12:58:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:00:48] (03update) 10aborrero: Draft: test new project module [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/93 (https://phabricator.wikimedia.org/T375283) (owner: 10fnegri) [13:05:22] FIRING: [4x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [13:28:16] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:33:16] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:37:27] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715#10600688 (10aborrero) [13:38:16] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:40:48] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715#10600699 (10aborrero) [13:42:39] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:42:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:43:00] (03open) 10dcaro: add tabulate [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/21 [13:44:30] (03update) 10dcaro: deployment.list: tabulate and add some colors [repos/cloud/toolforge/components-cli] (add_status) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/21 [13:45:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:48:16] (03update) 10dcaro: deployment.list: tabulate and add some colors [repos/cloud/toolforge/components-cli] (add_status) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/21 [13:50:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:51:20] (03update) 10dcaro: deployment.list: tabulate and add some colors [repos/cloud/toolforge/components-cli] (add_status) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/21 [13:51:56] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [13:51:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:54:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:56:12] (03update) 10dcaro: deployment.list: tabulate and add some colors [repos/cloud/toolforge/components-cli] (add_status) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/21 [13:56:24] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:57:13] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715#10600774 (10Chuckonwumelu) [13:57:15] (03approved) 10dcaro: jobs-api: bump to 0.0.356-20250304104944-40142c5d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/692 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:57:17] (03merge) 10dcaro: jobs-api: bump to 0.0.356-20250304104944-40142c5d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/692 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:57:31] (03approved) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/145 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:57:35] (03update) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/145 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:58:43] (03update) 10dcaro: deployment.list: tabulate and add some colors [repos/cloud/toolforge/components-cli] (add_status) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/21 [13:59:23] (03update) 10dcaro: deployment: add the status and long_status fields [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/20 [13:59:40] (03merge) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/145 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [14:00:04] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: toolforge-legacy-redirector: constant failed probes by prometheus - https://phabricator.wikimedia.org/T385908#10600785 (10fnegri) The probe has been failing more frequently in the past week: {F58595190} I rebooted the VM `reboot tools-legacy-redirecto... [14:00:29] (03update) 10dcaro: deployment.list: tabulate and add some colors [repos/cloud/toolforge/components-cli] (add_status) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/21 [14:01:21] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (T362868) [14:01:22] !log fnegri@cloudcumin1001 tools Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.12.0 (T362868) [14:01:25] T362868: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.29 - https://phabricator.wikimedia.org/T362868 [14:01:33] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) (T362868) [14:02:16] (03update) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: jobs-api: bump to 0.0.357-20250304135950-937d3d54 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/693 [14:02:19] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: jobs-api: bump to 0.0.357-20250304135950-937d3d54 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/693 [14:03:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:04:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [14:07:51] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:15:45] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:17:51] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:19:50] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [14:19:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:28:28] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [14:34:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:39:06] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:39:21] (03CR) 10MVernon: [C:03+1] cassandra: obsolete secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1123703 (https://phabricator.wikimedia.org/T387586) (owner: 10Eevans) [14:44:39] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:49:39] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:22:53] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Engineering-Icebox, 10Datasets-General-or-Unknown: Provide dumps using bittorrent - https://phabricator.wikimedia.org/T29653#10601174 (10valerio.bozzolan) So this seems not possible as long as the only way to access/mount dumps is through... [15:29:32] (03open) 10dcaro: deploy_task: build the source based components [repos/cloud/toolforge/components-api] (add_deployment_status_updates) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/57 [15:33:41] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [15:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:34:45] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:39:45] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:42:05] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [15:42:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:44:29] (03approved) 10dcaro: jobs-api: bump to 0.0.357-20250304135950-937d3d54 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/693 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:44:32] (03merge) 10dcaro: jobs-api: bump to 0.0.357-20250304135950-937d3d54 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/693 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:44:48] (03approved) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/18 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:44:48] (03merge) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/18 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [15:47:01] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: jobs-emailer: bump to 0.0.53-20250304154500-21345a23 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/694 [15:52:36] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [15:52:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:55:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:00:06] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:00:13] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [16:00:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [16:05:16] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:10:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:15:16] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:20:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:25:16] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:30:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:35:12] (03update) 10raymond-ndibe: volume-admission: add first test for disallowed path [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/675 (owner: 10dcaro) [16:35:16] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:35:27] (03update) 10raymond-ndibe: volume-admission: add first test for disallowed path [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/675 (owner: 10dcaro) [16:35:29] (03approved) 10raymond-ndibe: volume-admission: add first test for disallowed path [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/675 (owner: 10dcaro) [16:40:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:45:16] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:50:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:55:16] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:00:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:05:22] FIRING: [4x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [17:05:46] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:06:01] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:10:46] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:15:46] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:17:24] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:25:24] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: toolforge-legacy-redirector: constant failed probes by prometheus - https://phabricator.wikimedia.org/T385908#10601977 (10fnegri) After the reboot, it started flapping at an almost regular interval: {F58596967} Grafana link (requires login): https://g... [17:40:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [18:06:06] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:11:06] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:16:06] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:21:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:26:06] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:31:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:36:16] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:41:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:46:16] FIRING: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:51:16] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:57:16] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:02:16] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:36:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:41:06] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:06:36] (03update) 10raymond-ndibe: jobs: continuous: set strategy based on number of replicas [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/124 (https://phabricator.wikimedia.org/T375366) (owner: 10aborrero) [20:14:26] 06cloud-services-team: Chuck Onwumelu internship: experiments with sample Toolforge tools - https://phabricator.wikimedia.org/T386805#10602775 (10Chuckonwumelu) [20:19:51] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [20:19:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:28:49] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [20:28:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:31:13] (03approved) 10dcaro: jobs-emailer: bump to 0.0.53-20250304154500-21345a23 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/694 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [20:31:17] (03merge) 10dcaro: jobs-emailer: bump to 0.0.53-20250304154500-21345a23 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/694 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [20:31:42] (03update) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/17 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [20:31:52] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/17 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [20:33:29] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/jobs-emailer] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-emailer/-/merge_requests/17 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [20:35:43] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: jobs-emailer: bump to 0.0.54-20250304203340-d628ac53 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/695 [20:38:59] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [20:39:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:47:29] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [20:47:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:47:51] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [20:47:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:56:53] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [20:56:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:02:50] (03approved) 10dcaro: [toolforge-deploy] test maintain-harbor quota management [repos/cloud/toolforge/toolforge-deploy] (add_maintain_harbor_quota_config) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/690 (https://phabricator.wikimedia.org/T352417) (owner: 10raymond-ndibe) [21:03:51] (03approved) 10dcaro: jobs-emailer: bump to 0.0.54-20250304203340-d628ac53 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/695 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:03:53] (03merge) 10dcaro: jobs-emailer: bump to 0.0.54-20250304203340-d628ac53 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/695 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:04:05] (03approved) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/17 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:04:08] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/ingress-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/ingress-admission/-/merge_requests/17 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [21:07:31] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: ingress-admission: bump to 0.0.57-20250304210417-f60db3c7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/696 [21:21:52] (03approved) 10dcaro: [toolforge-deploy] add maintain-harbor quota config [repos/cloud/toolforge/toolforge-deploy] (refactor_maintain_harbor_config) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/674 (https://phabricator.wikimedia.org/T352417) (owner: 10raymond-ndibe) [21:22:08] (03update) 10dcaro: [toolforge-deploy] add maintain-harbor quota config [repos/cloud/toolforge/toolforge-deploy] (refactor_maintain_harbor_config) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/674 (https://phabricator.wikimedia.org/T352417) (owner: 10raymond-ndibe) [21:23:09] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission [21:23:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [21:31:40] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission [21:31:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [22:01:16] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:06:16] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:58:10] 10VPS-project-Phabricator, 06collaboration-services: phabricator.wmcloud.org says "Access denied for user 'app_user'@'localhost'" - https://phabricator.wikimedia.org/T387619#10603554 (10Dzahn) The `phabricator-bullseye` instance uses (as it should) the regular `role::phabricator` puppet class like in productio...