[00:02:55] FIRING: MaxConntrack: Max conntrack at 82.46% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:03:55] FIRING: MaxConntrack: Max conntrack at 91.19% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:04:02] 06cloud-services-team: MaxConntrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1040:9100 - https://phabricator.wikimedia.org/T371445 (10phaultfinder) 03NEW [00:18:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:28:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:48:55] RESOLVED: MaxConntrack: Max conntrack at 92.78% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:49:55] RESOLVED: MaxConntrack: Max conntrack at 82.55% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:19:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:29:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:19:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:29:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:16:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:16:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:21:48] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:22:48] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:27:48] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:28:48] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:41:33] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [06:12:51] (03update) 10sstefanova: backends: add components-api [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/28 (https://phabricator.wikimedia.org/T362069) [06:49:10] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] review kubelet flags before 1.26 upgrade - https://phabricator.wikimedia.org/T370245#10030651 (10Slst2020) To recap, what needs to happen for this change is: 1. Edit the kubelet-config ConfigMap to include the `containerRuntimeEndpoint`... [06:52:46] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10030653 (10Slst2020) [06:53:24] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10030654 (10Slst2020) [06:53:25] 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641#10030655 (10Slst2020) [06:53:26] 06cloud-services-team, 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.26 - https://phabricator.wikimedia.org/T327025#10030656 (10Slst2020) [06:58:43] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10030659 (10Slst2020) @dcaro not sure where to go from here. It seems we currently don't have an automated way to roll out cluster-wide config... [07:00:53] (03update) 10sstefanova: kind: upgrade k8s to 1.26 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/177 (https://phabricator.wikimedia.org/T370244) [07:01:06] (03update) 10sstefanova: components: add components-api [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/173 (https://phabricator.wikimedia.org/T362069) [07:18:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:26:26] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] prepare deb packages for k8s 1.26 - https://phabricator.wikimedia.org/T370246#10030672 (10Slst2020) 05Open→03In progress [07:28:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:29:29] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 06Content-Transform-Team-WIP: Rebuild or delete deployment-docker-proton01 - https://phabricator.wikimedia.org/T369916#10030684 (10Jgiannelos) [07:33:39] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 06Content-Transform-Team-WIP: Rebuild or delete deployment-docker-proton01 - https://phabricator.wikimedia.org/T369916#10030691 (10Jgiannelos) I created a new bookworm instance for proton and turned off the old o... [07:34:20] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 06Content-Transform-Team-WIP: Rebuild or delete deployment-docker-proton01 - https://phabricator.wikimedia.org/T369916#10030696 (10Jgiannelos) ` curl https://proton-beta.wmflabs.org/_info {"name":"proton","versio... [07:36:34] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 06Content-Transform-Team-WIP: Rebuild or delete deployment-docker-proton01 - https://phabricator.wikimedia.org/T369916#10030698 (10Jgiannelos) 05Open→03Resolved a:03Jgiannelos [07:56:29] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10030730 (10dcaro) >>! In T370245#10030659, @Slst2020 wrote: > @dcaro not sure where to go from here. It seems we currently don't have an auto... [08:23:33] (03open) 10dcaro: bump openapi version [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/30 (https://phabricator.wikimedia.org/T356974) [08:28:48] (03approved) 10aborrero: calico: upgrade to 3.26.4 [repos/cloud/toolforge/calico] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/calico/-/merge_requests/8 (https://phabricator.wikimedia.org/T370046) (owner: 10sstefanova) [08:44:41] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/27 [08:44:52] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/27 [08:46:19] (03update) 10sstefanova: calico: upgrade to 3.26.4 [repos/cloud/toolforge/calico] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/calico/-/merge_requests/8 (https://phabricator.wikimedia.org/T370046) [08:46:20] (03merge) 10sstefanova: calico: upgrade to 3.26.4 [repos/cloud/toolforge/calico] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/calico/-/merge_requests/8 (https://phabricator.wikimedia.org/T370046) [08:47:58] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: calico: bump to 0.0.8-20240731084636-9937ff2a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/462 (https://phabricator.wikimedia.org/T370046) [08:56:24] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10030832 (10Slst2020) I tried this manually on my kubeadm test cluster (1.28) and it worked fine. However, in Toolsbeta we are getting this er... [09:00:59] 06cloud-services-team, 10Toolforge (Toolforge iteration 13): [infra,k8s] remove deprecated kubelet flags before 1.27 upgrade - https://phabricator.wikimedia.org/T370245#10030838 (10Slst2020) Another option would be to move `--container-runtime-endpoint` to `/etc/default/kubelet` for now, which would let us del... [09:07:36] (03update) 10sstefanova: calico: bump to 0.0.8-20240731084636-9937ff2a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/462 (https://phabricator.wikimedia.org/T370046) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:15:36] (03update) 10sstefanova: backends: add components-api [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/28 (https://phabricator.wikimedia.org/T362069) [09:26:33] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-22 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:38:22] (03update) 10sstefanova: backends: add components-api [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/28 (https://phabricator.wikimedia.org/T362069) [09:49:03] (03PS1) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [09:49:46] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 [09:49:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:50:11] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [09:50:59] (03update) 10sstefanova: components-api: bump to 0.0.3-20240725133653-f188d2d0 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/450 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:52:26] (03PS2) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [09:53:05] (03PS3) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [09:53:35] (03merge) 10sstefanova: backends: add components-api [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/28 (https://phabricator.wikimedia.org/T362069) [09:55:26] (03update) 10sstefanova: components: add components-api [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/173 (https://phabricator.wikimedia.org/T362069) [09:55:56] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [09:56:00] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: api-gateway: bump to 0.0.32-20240731095342-26fae004 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/463 (https://phabricator.wikimedia.org/T362069) [10:02:26] (03PS4) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:05:52] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [10:07:19] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 [10:07:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [10:09:50] (03PS5) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:12:36] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/27 [10:12:55] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/27 [10:17:42] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:18:18] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:21:09] (03PS6) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:21:15] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:21:40] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:22:35] (03PS7) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:22:44] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:23:07] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:24:18] (03PS8) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:24:22] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:24:54] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:28:11] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [10:28:15] (03PS9) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:28:20] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:29:03] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:30:01] (03PS10) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:30:04] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:30:40] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:31:02] (03PS11) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:31:13] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:31:39] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:34:30] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [10:36:50] (03PS12) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:37:16] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:37:52] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:39:32] (03PS13) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [10:40:43] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:41:12] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/26 [10:53:33] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-26 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:16:49] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api [11:16:59] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api [11:22:07] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway [11:22:17] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway [11:29:41] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10031454 (10Jgiannelos) I don't think the bookworm instance is going t... [11:31:11] 06cloud-services-team, 10Cloud-VPS: Cloud VPS: extend tofu-infra to cover projects, users and roles - https://phabricator.wikimedia.org/T371393#10031448 (10aborrero) 05Open→03In progress p:05Triage→03Medium a:03aborrero [11:33:07] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api [11:33:17] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api [11:38:07] !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway [11:38:18] !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway [11:45:21] 10Data-Services: View 'centralauth_p.globalblocks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them - https://phabricator.wikimedia.org/T371437#10031515 (10Dreamy_Jazz) @Ladsgroup might be able to help here? [11:48:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:55:04] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [11:55:05] (03open) 10aborrero: tofu-infra: add projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 (https://phabricator.wikimedia.org/T371393) [11:55:32] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [11:58:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:05:25] (03update) 10sstefanova: api-gateway: bump to 0.0.32-20240731095342-26fae004 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/463 (https://phabricator.wikimedia.org/T362069) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:05:33] (03merge) 10sstefanova: api-gateway: bump to 0.0.32-20240731095342-26fae004 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/463 (https://phabricator.wikimedia.org/T362069) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:05:39] (03update) 10sstefanova: components-api: bump to 0.0.3-20240725133653-f188d2d0 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/450 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:06:16] (03update) 10aborrero: tofu-infra: add projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 (https://phabricator.wikimedia.org/T371393) [12:06:17] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-maps-master01 with a Bullseye or Bookworm instance - https://phabricator.wikimedia.org/T361381#10031578 (10hnowlan) It appears there have been issues imaging a new host which is hampering reimagi... [12:06:23] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:06:30] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:06:42] (03update) 10sstefanova: components-api: bump to 0.0.3-20240725133653-f188d2d0 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/450 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:06:48] (03approved) 10sstefanova: components-api: bump to 0.0.3-20240725133653-f188d2d0 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/450 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:07:04] (03merge) 10sstefanova: components-api: bump to 0.0.3-20240725133653-f188d2d0 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/450 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:07:04] (03update) 10aborrero: tofu-infra: add projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 (https://phabricator.wikimedia.org/T371393) [12:07:11] 10Data-Services, 06DBA: View 'centralauth_p.globalblocks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them - https://phabricator.wikimedia.org/T371437#10031573 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup [12:07:33] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:07:43] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:10:28] (03merge) 10sstefanova: components: add components-api [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/173 (https://phabricator.wikimedia.org/T362069) [12:12:07] (03PS14) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [12:12:58] (03update) 10aborrero: tofu-infra: add projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 (https://phabricator.wikimedia.org/T371393) [12:12:59] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:13:23] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:13:55] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 13Patch-For-Review: [components-api] Get a skeleton of API webservice and implement `/tool//deploy` with build-only features - https://phabricator.wikimedia.org/T362069#10031589 (10Slst2020) [12:14:06] (03PS15) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [12:14:32] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:15:08] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:15:34] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge, 13Patch-For-Review: [components-api] Get a skeleton of API webservice and implement `/tool//deploy` with build-only features - https://phabricator.wikimedia.org/T362069#10031591 (10Slst2020) The initial boilerplate is done and deployed everywh... [12:16:04] 10Toolforge: [builds-builder, builds-api] upgrade tekton version - https://phabricator.wikimedia.org/T370869#10031592 (10Slst2020) a:05Slst2020→03None [12:17:03] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [12:19:49] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [components-api] Get a skeleton of API webservice and implement `/tool//deploy` with build-only features - https://phabricator.wikimedia.org/T362069#10031594 (10Slst2020) [12:19:57] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 13), 13Patch-For-Review: [components-api] Get a skeleton of API webservice and implement `/tool//deploy` with build-only features - https://phabricator.wikimedia.org/T362069#10031596 (10Slst2020) 05Open→03In progress [12:21:57] 06cloud-services-team, 06Data-Persistence, 10observability, 07Grafana: Grafana MySQL charts can be inconsistent when zooming out - https://phabricator.wikimedia.org/T371485 (10fnegri) 03NEW [12:24:45] (03PS16) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [12:25:10] (03update) 10sstefanova: calico: bump to 0.0.8-20240731084636-9937ff2a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/462 (https://phabricator.wikimedia.org/T370046) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:27:29] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:27:55] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [12:28:30] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [12:28:52] (03PS17) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [12:28:54] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico [12:29:07] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico [12:33:14] FIRING: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-8.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:38:14] FIRING: [5x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-6.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:40:38] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:42:32] 10Data-Services, 06Trust and Safety Product Team: Hide rows in the globalblocks table when the associated globaluser row has gu_hidden_level as not 0 - https://phabricator.wikimedia.org/T371488#10031752 (10Marostegui) [12:42:36] 10Data-Services, 06Trust and Safety Product Team: Hide the value of gb_address column in public replicas if gb_autoblock_parent_id is not null - https://phabricator.wikimedia.org/T371486#10031754 (10Marostegui) [12:43:14] FIRING: [3x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-6.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:45:38] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:48:14] FIRING: [4x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-6.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:49:38] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:52:27] !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico [12:52:41] !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico [12:53:14] FIRING: [3x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-7.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:54:38] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-6:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:58:14] RESOLVED: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: toolsbeta-test-k8s-ingress-6.toolsbeta.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [13:00:04] (03approved) 10fnegri: tofu-infra: add projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 (https://phabricator.wikimedia.org/T371393) (owner: 10aborrero) [13:14:33] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 06Content-Transform-Team-WIP: Rebuild or delete deployment-docker-proton01 - https://phabricator.wikimedia.org/T369916#10031826 (10Andrew) thank you! [13:17:13] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Replace deployment-maps-master01 with a Bullseye or Bookworm instance - https://phabricator.wikimedia.org/T361381#10031828 (10Andrew) Puppet is failing on the new host because of ` Error: Could not retrieve catalo... [13:23:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:28:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:30:51] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10031872 (10fnegri) I filed {T371485} to improve the Grafana charts. Zooming in around the date when I rebooted clouddb1019 the difference in traffic p... [13:54:29] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10031985 (10fnegri) @BTullis can I revert your patch and switch back the role for `clouddb1021` to `wmcs::db::wikireplicas::dedicated::analytics_multi... [14:05:10] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10032028 (10Marostegui) Will you try first with clouddb1021 as we discussed a few days ago? [14:09:56] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10032034 (10fnegri) > Will you try first with clouddb1021 as we discussed a few days ago? Yes, I will start with clouddb1021 and I will track all the r... [14:11:13] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10032038 (10Marostegui) Great - thank you! [14:35:49] (03update) 10sstefanova: calico: bump to 0.0.8-20240731084636-9937ff2a [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/462 (https://phabricator.wikimedia.org/T370046) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [14:36:04] 14Toolforge Jobs framework: `toolforge jobs` is broken - https://phabricator.wikimedia.org/T371505 (10Magnus) 03NEW [14:37:16] (03PS1) 10XtexChooser: labsauth: Set OATH input autocomplete to one-time-code [labs/striker] - 10https://gerrit.wikimedia.org/r/1058616 [14:40:34] (03PS2) 10XtexChooser: labsauth: Set OATH input autocomplete to one-time-code [labs/striker] - 10https://gerrit.wikimedia.org/r/1058616 [14:42:32] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure, 10Data-Platform-SRE (2024.07.29 - 2024.08.16): Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370465#10032161 (10bking) [14:44:43] 10Toolforge: `toolforge jobs` is broken - https://phabricator.wikimedia.org/T371505#10032171 (10JJMC89) [14:53:12] 10Toolforge: `toolforge jobs` is broken - https://phabricator.wikimedia.org/T371505#10032215 (10LucasWerkmeister) This looks like you’re still on a bastion with Python 3.7 (`tool-sgebastion-10`), while `toolforge-jobs` (unintentionally? no idea) started to rely on Python 3.9 functionality. Try connecting to `lo... [14:57:13] (03update) 10aborrero: tofu-infra: add projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 (https://phabricator.wikimedia.org/T371393) [14:59:10] 10Toolforge: `toolforge jobs` is broken - https://phabricator.wikimedia.org/T371505#10032228 (10Magnus) 05Open→03Resolved a:03Magnus Thnaks, that was it. Could have a better error message but if the old host is shut down soon anyway... [15:00:02] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [15:00:50] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [15:06:42] (03PS18) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [15:08:43] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [15:09:12] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [15:10:27] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [15:13:21] (03PS19) 10Arturo Borrero Gonzalez: wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) [15:16:43] (03CR) 10CI reject: [V:04-1] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [15:29:44] 06cloud-services-team, 10Cloud-VPS, 07Epic: tofu-infra: introduce additional gitlab-ci automation - https://phabricator.wikimedia.org/T370652#10032408 (10fnegri) [15:37:41] 06cloud-services-team, 10Cloud-VPS: Support managing Cloud VPS project membership via OpenTofu - https://phabricator.wikimedia.org/T320750#10032432 (10fnegri) [15:39:26] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Alert, 07Cloud-Services-Worktype-Maintenance: [cloudvps] puppetize the OpenTofu tests VM (tf-infra-test) - https://phabricator.wikimedia.org/T341814#10032441 (10fnegri) [15:40:01] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 05Cloud-Services-Origin-Alert, 07Cloud-Services-Worktype-Maintenance: [cloudvps] use a systemd timer for the OpenTofu tests to get logs - https://phabricator.wikimedia.org/T341769#10032437 (10fnegri) p:05Triage→03Low [15:41:46] (03CR) 10Arturo Borrero Gonzalez: "recheck" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [15:46:26] 06cloud-services-team, 10Cloud-VPS: openstack: consider reducing log pressure - https://phabricator.wikimedia.org/T371356#10032468 (10Andrew) 05Open→03Resolved Made the same change for the eqiad1 logs [15:56:20] (03CR) 10Arturo Borrero Gonzalez: "recheck" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [15:59:45] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] wmcs.openstack.tofu: write plan to gitlab MR as note [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1058573 (https://phabricator.wikimedia.org/T370414) (owner: 10Arturo Borrero Gonzalez) [16:03:36] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [16:04:04] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/28 [16:17:20] 10Tools: SVG Check renders text (font: Liberation Sans) differently - https://phabricator.wikimedia.org/T335415#10032690 (10Glrx) This looks like the text-chunk regression bug. librsvg erroneously calculates the width of the whole text chunk as the width of the last tspan or #text node. In this case, the last #... [16:20:12] 10Tools: SVG Check renders text (font: Liberation Sans) differently - https://phabricator.wikimedia.org/T335415#10032725 (10Glrx) Close a duplicate failed. trying again. [16:21:46] 10Tools: SVG Check renders text (font: Liberation Sans) differently - https://phabricator.wikimedia.org/T335415#10032728 (10Glrx) →14Duplicate dup:03T200443 [16:35:50] 06cloud-services-team, 10Data-Services, 10VPS-Projects: Request to /data/project(/statanalyser) - https://phabricator.wikimedia.org/T326904#10032809 (10Andrew) 05Open→03Resolved Your VM now has a /data/scratch mountpoint. New future VMs should have it as well, thanks to a project-wide hiera setting I... [16:54:53] 10Tools, 10gitlab-settings, 06Release-Engineering-Team, 10GitLab (Administration, Settings & Policy): Mirroring from one Wikimedia GitLab repository to another one no longer works - https://phabricator.wikimedia.org/T364199#10032965 (10thcipriani) p:05Triage→03Low @LucasWerkmeister is this mirroring a... [17:32:50] 10Tools, 10gitlab-settings, 06Release-Engineering-Team, 10GitLab (Administration, Settings & Policy): Mirroring from one Wikimedia GitLab repository to another one no longer works - https://phabricator.wikimedia.org/T364199#10033102 (10LucasWerkmeister) Yes – I initially put the repository on GitLab before... [18:02:08] (03CR) 10BryanDavis: labsauth: Set OATH input autocomplete to one-time-code (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/1058616 (owner: 10XtexChooser) [18:23:52] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: MVP: Privately server wiktech via mw-on-k8s - https://phabricator.wikimedia.org/T371537 (10jijiki) 03NEW [19:01:35] (03update) 10raymond-ndibe: [toolforge-weld] add custom resources version to k8sclient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/51 (https://phabricator.wikimedia.org/T359650) [19:02:24] (03update) 10raymond-ndibe: [toolforge-weld] add custom resources version to k8sclient [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/51 (https://phabricator.wikimedia.org/T359650) [20:25:56] (03open) 10dcaro: general: support python 3.7 types [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/59 [20:26:54] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [20:26:55] !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli [20:26:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:28:49] (03approved) 10dcaro: general: support python 3.7 types [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/59 [20:28:52] (03merge) 10dcaro: general: support python 3.7 types [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/59 [20:29:57] (03open) 10dcaro: d/changelog: bump to 16.1.1 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/60 [20:30:19] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [20:30:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:31:58] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli [20:32:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [20:36:01] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli [20:36:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:37:12] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli [20:37:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:48:28] (03approved) 10dcaro: d/changelog: bump to 16.1.1 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/60 [20:48:31] (03merge) 10dcaro: d/changelog: bump to 16.1.1 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/60 [20:48:31] (03update) 10dcaro: d/changelog: bump to 16.1.1 [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/60 [20:50:29] 10Toolforge: Please re-install "joe" - https://phabricator.wikimedia.org/T371556#10033703 (10Peachey88) [20:51:06] (03update) 10dcaro: bump openapi version [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/30 (https://phabricator.wikimedia.org/T356974) [21:02:16] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: MVP: Privately server wiktech via mw-on-k8s - https://phabricator.wikimedia.org/T371537#10033727 (10bd808) The config thing that most needs to be changed to use the multiversion images is `/etc/mediawiki/WikitechPrivateSettings.php` where a number of secre... [21:54:09] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS: eqiad1: fix PTR delegations for 185.15.56.0/24 - https://phabricator.wikimedia.org/T341338#10033899 (10Andrew) @cmooney can you advise what (if anything) needs doing here? [22:07:42] 10Tools: 'cronout' dir in tb-dev - https://phabricator.wikimedia.org/T324172#10033910 (10Andrew) 05Open→03Resolved No response so I've emptied this directory. [22:16:15] 06cloud-services-team, 07Documentation: Update list of upstream dependencies on mw:Upstream_projects - https://phabricator.wikimedia.org/T328345#10033932 (10Andrew) a:05Andrew→03dcaro I added a bunch of openstack things, and removed the grid engine reference. Now I'm passing this over to @dcaro to add tool... [22:44:01] (03PS1) 10Andrew Bogott: Fake passwords for cinder rabbitmq user [labs/private] - 10https://gerrit.wikimedia.org/r/1058711 [22:48:44] (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Fake passwords for cinder rabbitmq user [labs/private] - 10https://gerrit.wikimedia.org/r/1058711 (owner: 10Andrew Bogott) [22:53:29] 10Toolforge: Please re-install "joe" - https://phabricator.wikimedia.org/T371556#10034024 (10Dzahn) The relevant change seems to be https://gerrit.wikimedia.org/r/c/operations/puppet/+/1058654 T371505#10032215 indicates there is still an old bastion (which probably is the one that had joe installed) at login.to... [23:18:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:28:27] 10Toolforge: Please re-install "joe" - https://phabricator.wikimedia.org/T371556#10034113 (10LucasWerkmeister) >>! In T371556#10034024, @Dzahn wrote: > T371505#10032215 indicates there is still an old bastion (which probably is the one that had joe installed) at login.tools.wmflabs.org but it will be shut down s... [23:33:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [23:34:55] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [23:35:06] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [23:42:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [23:42:26] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)