[00:16:29] 06Toolforge-standards-committee: Adoption request for image-metadata-viewer - https://phabricator.wikimedia.org/T338558#10043918 (10mdaniels5757) 05Open→03Resolved a:03mdaniels5757 In that case, I'm not interested. Thanks for looking! [00:16:29] FIRING: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:17:40] 06Toolforge-standards-committee: Adoption request for image-metadata-viewer - https://phabricator.wikimedia.org/T338558#10043921 (10Pppery) 05Resolved→03Declined [00:18:05] (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650) [00:21:29] RESOLVED: InstanceDown: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:53:02] (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650) [01:12:55] FIRING: MaxConntrack: Max conntrack at 82.15% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [01:17:55] RESOLVED: MaxConntrack: Max conntrack at 81.52% on cloudvirt1040:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [02:06:54] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Update designate sink plugins to work with caracal - https://phabricator.wikimedia.org/T371707#10043968 (10Andrew) The remaining case is in wmf_sink, the proxy cleanup code. That code relies on being able to look up the dns records (and, by extension, t... [03:15:56] FIRING: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:26:03] 10Toolforge-standards-committee (Maintainer needed): New maintainer needed for Phetools OCR for Wikisource - https://phabricator.wikimedia.org/T239353#10044000 (10Soda) 05Open→03Declined Boldly closing this as declined, phetools OCR hasn't been up for a while (since 5 years?) and with the Grid engine shu... [03:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:10:56] FIRING: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:06:39] FIRING: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:11:39] RESOLVED: ProbeDown: Service toolsbeta-test-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:16:52] (03update) 10raymond-ndibe: Draft: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [06:18:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [06:19:14] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [06:29:27] FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [06:29:38] 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T371867 (10phaultfinder) 03NEW [06:34:27] RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [06:54:27] FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [06:59:27] RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:07:45] 06cloud-services-team, 10Cloud-VPS, 06collaboration-services, 13Patch-For-Review: puppet problems mounting cinder volumes (and suggested fixes) - https://phabricator.wikimedia.org/T371573#10044129 (10LSobanski) p:05Triage→03Medium [07:09:27] FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:19:27] RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:35:44] 06cloud-services-team, 06Data-Persistence, 10observability, 07Grafana: Grafana MySQL charts can be inconsistent when zooming out - https://phabricator.wikimedia.org/T371485#10044163 (10fgiunchedi) If going with `rate[$__rate_interval]` is acceptable I'd recommend going for that as the simplest solution, se... [08:10:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1045 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [08:17:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-52 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:24:27] FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:29:27] RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:30:49] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1045 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [08:32:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-k8s-worker-nfs-52 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:35:57] FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:45:57] RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:54:09] FIRING: CephSlowOps: Ceph cluster in eqiad has 37 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [08:54:17] 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T370752#10044353 (10phaultfinder) [08:55:56] RESOLVED: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:02:28] FIRING: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance toolsbeta-puppetserver-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [09:02:58] RESOLVED: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance metricsinfra-puppetserver-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [09:03:28] FIRING: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:05:26] RESOLVED: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:09:27] FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:19:27] RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:19:33] !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [09:19:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:19:41] !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) [09:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:19:57] !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [09:19:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:20:02] !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) [09:20:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:20:08] !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [09:20:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:20:34] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) [09:20:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:28:17] !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [09:28:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:29:27] FIRING: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:31:54] RESOLVED: CephSlowOps: Ceph cluster in eqiad has 20 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [09:33:58] RESOLVED: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance toolsbeta-puppetserver-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [09:44:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [09:45:57] RESOLVED: [2x] ProbeDown: Service virt.cloudgw.eqiad1.wikimediacloud.org:0 has failed probes (icmp_virt_cloudgw_eqiad1_wikimediacloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:50:07] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) [09:50:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:50:29] !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [09:50:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:50:38] !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) [09:50:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:59:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [09:59:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [10:05:58] RESOLVED: InstanceDown: Project tools instance tools-prometheus-7 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:13:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [11:18:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:33:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:57:54] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10044740 (10SLyngshede-WMF) @Htriedman I'm responsible for offboarding you from any systems you no longer requ... [12:00:48] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10044744 (10SLyngshede-WMF) @Reedy Given that @Htriedman won't be needing access to security security and priv... [12:16:35] 06cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T370752#10044786 (10dcaro) Last round was caused by {T371879} [12:18:40] 06cloud-services-team: ProbeDown - https://phabricator.wikimedia.org/T371867#10044793 (10dcaro) 05Open→03Resolved a:03dcaro Caused by {T371879} [12:34:00] (03update) 10dcaro: show backend status [toolforge-repos/sample-complex-app-frontend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370324) [12:41:36] 10cloud-services-team (FY2024/2025-Q1-Q2): [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10044851 (10dcaro) [12:42:02] 10cloud-services-team (FY2024/2025-Q1-Q2): [ceph,network] Intermittent network packets lost - https://phabricator.wikimedia.org/T371869#10044853 (10dcaro) [12:42:39] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [ceph,network] Intermittent network packets lost - https://phabricator.wikimedia.org/T371869#10044854 (10dcaro) [12:43:10] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10044855 (10dcaro) [12:54:23] (03PS5) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [12:54:31] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [12:54:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:55:34] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [12:55:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:57:16] (03PS6) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [13:00:32] (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro) [13:00:47] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [13:00:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:01:41] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [13:01:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:03:35] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [13:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:04:06] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: [toolsdb] Migrate mixnmatch db to Trove - https://phabricator.wikimedia.org/T350862#10044926 (10Magnus) Hi @fnegri if this will take a while, could you throw me a few more DB connections for mix-n-match on ToolsDB in the meantime? I have a l... [13:04:23] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [13:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:10:44] (03PS5) 10David Caro: WMCSCookbookRunnerBase: load the wmcs config if it's there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 [13:10:44] (03PS3) 10David Caro: openstack.tofu: use gitlab token from wmcs config [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925 [13:10:44] (03PS5) 10David Caro: toolforge.component.deploy: remove the k8s prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059890 [13:10:45] (03PS5) 10David Caro: toolforge.component.deploy: use bump_ as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 [13:10:46] (03PS8) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 [13:10:50] (03PS5) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 [13:10:54] (03PS7) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [13:13:43] (03CR) 10David Caro: toolforge.component.deploy: use bump_ as default branch (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 (owner: 10David Caro) [13:14:22] (03CR) 10CI reject: [V:04-1] openstack.tofu: use gitlab token from wmcs config [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925 (owner: 10David Caro) [13:14:26] (03CR) 10CI reject: [V:04-1] toolforge.component.deploy: use bump_ as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 (owner: 10David Caro) [13:14:32] (03CR) 10CI reject: [V:04-1] openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 (owner: 10David Caro) [13:14:33] (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro) [13:14:42] (03CR) 10CI reject: [V:04-1] toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 (owner: 10David Caro) [13:18:34] (03PS6) 10David Caro: toolforge.component.deploy: use bump_ as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 [13:18:34] (03PS9) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 [13:18:34] (03PS6) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 [13:18:35] (03PS8) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [13:18:35] (03PS1) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060102 [13:20:14] (03PS8) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906 [13:20:14] (03PS10) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 [13:20:14] (03PS7) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 [13:20:14] (03PS9) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [13:20:42] (03Abandoned) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060102 (owner: 10David Caro) [13:23:05] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10044971 (10dcaro) [13:24:09] (03CR) 10CI reject: [V:04-1] openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 (owner: 10David Caro) [13:24:17] (03CR) 10CI reject: [V:04-1] wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906 (owner: 10David Caro) [13:24:22] (03CR) 10CI reject: [V:04-1] toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 (owner: 10David Caro) [13:24:27] (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro) [13:29:28] (03PS9) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906 [13:29:29] (03PS11) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 [13:29:29] (03PS8) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 [13:30:11] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10044979 (10fnegri) So far the upgrade to Bookworm and MariaDB 10.6.18 seems to have helped. There is no replication lag and traffic shapes in clouddb10... [13:32:56] (03CR) 10CI reject: [V:04-1] toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 (owner: 10David Caro) [13:33:12] (03CR) 10CI reject: [V:04-1] wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906 (owner: 10David Caro) [13:33:19] (03CR) 10CI reject: [V:04-1] openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 (owner: 10David Caro) [13:48:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:52:26] (03CR) 10FNegri: [C:03+1] "LGTM!" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 (owner: 10David Caro) [14:03:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:04:46] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10045162 (10fnegri) [14:15:47] 10Tool-toolviews: Statistics from toolviews are erratic for Scholia - https://phabricator.wikimedia.org/T320533#10045202 (10Fnielsen) Since 15 March 2024 the statistics for the Scholia Toolforge application has not been erratic. [14:17:36] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10045206 (10BTullis) @fnegri, I forgot to mention something. I think there is a manual step that we need to carry out on these hosts after reimage. The reason is... [14:20:47] 10Data-Services, 06DBA: Prepare and check storage layer for bdrwiki - https://phabricator.wikimedia.org/T371759#10045222 (10Zabe) Wiki has been created [14:25:08] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10045236 (10fnegri) @BTullis thanks! That was actually in my checklist at https://wikitech.wikimedia.org/wiki/MariaDB/Rebooting_a_host but I somehow managed to mi... [15:13:15] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [15:14:04] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045456 (10Andrew) Current plan: [] Cathal gets new cloudcephosd nodes online (T363344) [] David drains as many affected OSD nodes as possible [] Andrew depools all affected clou... [15:20:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:20:57] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [15:21:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:21:39] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [15:22:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet' [15:22:13] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1036.eqiad.wmnet' [15:25:15] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:25:16] !log fnegri@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [15:28:24] (03PS1) 10David Caro: proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 [15:29:31] (03CR) 10Andrew Bogott: [C:03+1] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:29:45] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:29:46] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [15:30:35] (03PS2) 10David Caro: proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 [15:30:44] (03CR) 10David Caro: proxy: skip the proxy if there's no proxy settings (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:30:50] (03CR) 10David Caro: proxy: skip the proxy if there's no proxy settings (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:35:50] (03CR) 10David Caro: [C:03+2] "Tested with test-cookbooks on cloudcumin1001" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 (owner: 10David Caro) [15:36:47] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:36:47] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [15:37:16] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045627 (10dcaro) [15:37:39] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045629 (10dcaro) >>! In T371878#10045456, @Andrew wrote: > Current plan: > Thanks! Moved it to the task description :) [15:39:06] (03CR) 10CI reject: [V:04-1] WMCSCookbookRunnerBase: load the wmcs config if it's there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 (owner: 10David Caro) [15:39:17] (03PS1) 10David Caro: ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 [15:41:34] (03PS1) 10David Caro: ceph.osd.drain_node: use the osd from the node list only [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135 [15:41:49] (03CR) 10David Caro: [C:03+2] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:42:03] (03CR) 10David Caro: [C:03+2] ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro) [15:42:25] (03CR) 10CI reject: [V:04-1] ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro) [15:43:23] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node (T371878) [15:43:29] T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878 [15:44:04] !log dcaro@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878) [15:44:59] (03CR) 10CI reject: [V:04-1] ceph.osd.drain_node: use the osd from the node list only [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135 (owner: 10David Caro) [15:45:01] (03CR) 10CI reject: [V:04-1] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:45:52] (03CR) 10David Caro: [C:03+2] "recheck" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro) [15:46:05] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10045661 (10cmooney) 05Open→03Resolved Thanks guys, the second ports are now configured on the switches. [15:49:06] (03CR) 10CI reject: [V:04-1] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:49:06] (03CR) 10CI reject: [V:04-1] ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro) [15:50:25] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344#10045673 (10cmooney) I should say cloudcephosd1036 change I've not pushed to the switch - that will happen when we do a homer run after the planned reb... [15:51:22] (03PS3) 10David Caro: proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 [15:51:22] (03PS2) 10David Caro: ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 [15:51:22] (03CR) 10David Caro: ceph.osd.drain_node: fix the example (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro) [15:51:22] (03PS2) 10David Caro: ceph.osd.drain_node: use the osd from the node list only [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135 [15:51:23] (03PS1) 10David Caro: gitlab: fix no-member issue on ci [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060138 [15:54:58] (03CR) 10Andrew Bogott: [C:03+1] gitlab: fix no-member issue on ci [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060138 (owner: 10David Caro) [15:55:49] (03CR) 10David Caro: [C:03+2] gitlab: fix no-member issue on ci [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060138 (owner: 10David Caro) [15:55:56] (03CR) 10David Caro: [C:03+1] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:55:59] (03CR) 10David Caro: [C:03+2] proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:56:33] (03CR) 10David Caro: [C:03+2] "Tested in cloudcumin" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135 (owner: 10David Caro) [15:58:51] (03Merged) 10jenkins-bot: gitlab: fix no-member issue on ci [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060138 (owner: 10David Caro) [15:58:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:58:59] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [15:59:13] (03Merged) 10jenkins-bot: proxy: skip the proxy if there's no proxy settings [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060130 (owner: 10David Caro) [15:59:52] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045719 (10cmooney) [16:01:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:01:42] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:02:23] (03Merged) 10jenkins-bot: ceph.osd.drain_node: fix the example [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060134 (owner: 10David Caro) [16:02:24] (03Merged) 10jenkins-bot: ceph.osd.drain_node: use the osd from the node list only [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060135 (owner: 10David Caro) [16:02:27] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045741 (10cmooney) It's probably best to manually flip the HA on the cloudgw/cloudnet nodes to the ones in rack C8 before we start. I just checked and the two nodes in rack D5 (... [16:02:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:03:25] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:03:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:03:55] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:03:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:04:25] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:04:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:04:55] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:04:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:05:25] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:05:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:05:55] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:05:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:06:25] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:06:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:06:54] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:06:55] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:07:24] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:07:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:07:34] (03PS1) 10David Caro: ceph.osd.drain_*: fix the help note for --no-wait [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060140 [16:07:53] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:07:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:08:23] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=99) [16:08:57] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045793 (10cmooney) >>! In T371878#10045741, @cmooney wrote: > It's probably best to manually flip the HA on the cloudgw/cloudnet nodes to the ones in rack C8 before we start. I... [16:08:59] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10045794 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1002 for host clouddb1020.eqiad.wmnet with OS bookworm [16:09:56] (03CR) 10David Caro: [C:03+2] ceph.osd.drain_*: fix the help note for --no-wait [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060140 (owner: 10David Caro) [16:10:17] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_node (T371878) [16:13:14] FIRING: ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-8.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [16:13:51] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:16:31] FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [16:17:58] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10045845 (10Andrew) [16:18:14] RESOLVED: [6x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-8.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [16:18:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:18:51] RESOLVED: [3x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:31:31] RESOLVED: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-1 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [16:33:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:37:23] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet' [16:37:23] !log andrew@bullseye admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1036.eqiad.wmnet' [16:37:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:37:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:39:11] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10046012 (10Htriedman) @SLyngshede-WMF that sounds like a good plan! Can you link me too the volunteer NDA? I... [16:39:29] (03PS1) 10David Caro: readme: fix the configuration command [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060143 [16:39:30] (03PS1) 10David Caro: alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 [16:41:39] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10046019 (10Dzahn) Hi @Htriedman I can see that you are not on the doc that lists people who signed the NDA. T... [16:42:29] (03CR) 10CI reject: [V:04-1] alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 (owner: 10David Caro) [16:44:32] (03PS2) 10David Caro: alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 [16:47:14] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet' [16:47:14] !log andrew@bullseye admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1036.eqiad.wmnet' [16:47:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:48:01] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet' [16:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [16:48:44] (03CR) 10Andrew Bogott: [C:03+1] "these worked for me!" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060143 (owner: 10David Caro) [16:48:54] (03CR) 10David Caro: [C:03+2] readme: fix the configuration command [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060143 (owner: 10David Caro) [16:51:32] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10046045 (10Htriedman) @Dzahn Got it — yeah, the one I recall signing a few years ago was through the phab UI.... [16:52:03] (03Merged) 10jenkins-bot: readme: fix the configuration command [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060143 (owner: 10David Caro) [16:54:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:54:57] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:54:58] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:55:36] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:55:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:56:15] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:56:16] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:56:52] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10046064 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1002 for host clouddb1020.eqiad.wmnet with OS bookworm completed: - cl... [16:56:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:56:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:57:35] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:57:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:58:05] (03CR) 10FNegri: [C:03+1] "I think we can find nicer solutions but this one is good enough to unblock the current use case." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 (owner: 10David Caro) [16:58:14] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:58:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:58:53] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:58:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [16:59:32] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [16:59:33] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [17:00:11] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [17:00:12] !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1036.eqiad.wmnet' [17:00:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [17:00:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:00:51] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [17:00:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [17:01:22] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1037.eqiad.wmnet' [17:01:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:01:31] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [17:03:02] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1047.eqiad.wmnet' [17:03:16] (03CR) 10David Caro: [C:03+2] alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 (owner: 10David Caro) [17:04:03] (03PS6) 10David Caro: WMCSCookbookRunnerBase: load the wmcs config if it's there [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059920 [17:04:03] (03PS4) 10David Caro: openstack.tofu: use gitlab token from wmcs config [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059925 [17:04:03] (03PS6) 10David Caro: toolforge.component.deploy: remove the k8s prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059890 [17:04:03] (03PS7) 10David Caro: toolforge.component.deploy: use bump_ as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 [17:04:04] (03PS10) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906 [17:04:06] (03PS12) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 [17:04:10] (03PS9) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 [17:04:14] (03PS10) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [17:06:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirtlocal1001 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [17:06:53] (03Merged) 10jenkins-bot: alerts: don't fail if we can't reach icinga from cloudcumin [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060144 (owner: 10David Caro) [17:08:04] 10VPS-project-Wikistats: Add bdrwiki to wikistats - https://phabricator.wikimedia.org/T371764#10046090 (10Dzahn) 05Open→03Resolved p:05Triage→03Medium a:03Dzahn ` MariaDB [wikistats]> insert into wikipedias (prefix, lang, loclang, method) values ("bdr", "West Coast Bajau", "Ling Sama", 8); ` ` d... [17:08:15] (03CR) 10CI reject: [V:04-1] openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 (owner: 10David Caro) [17:08:32] (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro) [17:09:22] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: [toolsdb] Migrate mixnmatch db to Trove - https://phabricator.wikimedia.org/T350862#10046099 (10fnegri) @magnus the migration to Trove is still in my to-do list but I'm not planning to start it before next month at the earliest. Re: connect... [17:10:29] (03PS10) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 [17:10:29] (03PS11) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [17:13:55] (03CR) 10CI reject: [V:04-1] toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 (owner: 10David Caro) [17:14:01] !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1037.eqiad.wmnet' [17:14:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:14:44] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1047.eqiad.wmnet' [17:18:32] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10046125 (10fnegri) [17:20:59] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1038.eqiad.wmnet' [17:21:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:21:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' [17:27:05] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 06Trust and Safety Product Team, 10Temporary accounts (Create/update essential tools/anti-abuse management): Use a time period field in Special:GlobalContributions instead of start/end date fields - https://phabricator.wikimedia.org/T371917 (10T... [17:29:21] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 06Trust and Safety Product Team, 10Temporary accounts (Create/update essential tools/anti-abuse management): Use a time period field in Special:GlobalContributions instead of start/end date fie... - https://phabricator.wikimedia.org/T371917#10046220 [17:30:12] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 07Epic, 10Temporary accounts (Create/update essential tools/anti-abuse management): [Epic] Implement global user contributions feature - https://phabricator.wikimedia.org/T337089#10046225 (10Tchanders) [17:31:04] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1046.eqiad.wmnet' [17:31:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1045.eqiad.wmnet' [17:32:52] !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1038.eqiad.wmnet' [17:32:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:34:22] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' [17:34:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:36:50] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirtlocal1001 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [17:40:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [17:41:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [17:43:51] !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1039.eqiad.wmnet' [17:43:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:45:03] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet' [17:45:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:47:45] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1045.eqiad.wmnet' [17:48:00] RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [17:48:20] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1044.eqiad.wmnet' [17:58:49] !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1040.eqiad.wmnet' [17:58:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:59:04] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1041.eqiad.wmnet' [17:59:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:02:26] 06Toolforge-standards-committee: Adoption request for Yapperbot - https://phabricator.wikimedia.org/T361426#10046303 (10bd808) >>! In T361426#9757455, @Soda wrote: > @taavi Would it be possible to unprotect the yml files in the yapper bot directory? Lets just publish the config here. The $HOME/pruner/config-pru... [18:03:16] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1044.eqiad.wmnet' [18:05:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1043.eqiad.wmnet' [18:07:18] 06Toolforge-standards-committee: Adoption request for Yapperbot - https://phabricator.wikimedia.org/T361426#10046351 (10bd808) @DavidTornheim I can turn the tool over to you, but only after breaking it by stopping the jobs and removing the bot password currently stored in $HOME/botpassword. Would you like me to... [18:17:02] !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1041.eqiad.wmnet' [18:17:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:19:34] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1043.eqiad.wmnet' [18:21:28] !log andrew@bullseye admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1042.eqiad.wmnet' [18:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:36:42] !log andrew@bullseye admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1042.eqiad.wmnet' [18:36:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:41:37] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10046446 (10Andrew) [18:49:46] 10Tool-handle-commons-on-osm: HandleCommonsOnOSM: Add SDC directly - https://phabricator.wikimedia.org/T371776#10046486 (10FlSchmitt) Fixed by #4ff1124d [18:51:56] 10Tool-handle-commons-on-osm: HandleCommonsOnOSM: Add SDC directly - https://phabricator.wikimedia.org/T371776#10046488 (10FlSchmitt) 05Open→03Resolved a:03FlSchmitt fixed by 4ff1124d90e8b5d81db39a84b606f37901580243 [19:51:40] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_node (exit_code=99) (T371878) [19:51:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:51:46] T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878 [20:06:06] 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930 (10hashar) 03NEW [20:06:28] 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046604 (10hashar) [20:11:41] 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046610 (10bd808) The login for https://idm.wikimedia.org is via https://idp.wikimedia.org. The IdP service uses the... [20:12:03] 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046614 (10hashar) [20:17:12] 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046623 (10bd808) Here is the developer account record: ` $ ldap uid=jenkins-deploy dn: uid=jenkins-deploy,ou=people,dc=wik... [20:41:31] 06cloud-services-team, 10Beta-Cluster-Infrastructure, 10Bitu, 10CAS-SSO, and 2 others: Wikitech system account and SUL for Jenkins agents? - https://phabricator.wikimedia.org/T371930#10046726 (10Dzahn) Given that users are always supposed to use different keys for prod vs cloud, should the system user also... [20:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:59:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:20:59] (03PS1) 10David Caro: ceph.{drain,undrain}: fix chunking [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060173 [21:22:20] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_node (T371878) [21:22:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [21:22:25] T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878 [22:03:44] 10Tool-extloc, 13Patch-For-Review, 10Release-Engineering-Team (Yak Shaving 🐃🪒): extloc: Move to Toolforge Build Service - https://phabricator.wikimedia.org/T365665#10046945 (10brennen) 05Open→03Resolved [23:12:13] (03PS5) 10Krinkle: frontend: Server-side rendering (take 2) [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1056248