[00:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:34:51] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:39:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:55:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-14 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [02:55:51] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:00:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:15:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-14 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [03:46:51] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:51:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:05:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-14 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:44:51] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:49:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [04:50:42] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:48:51] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [05:50:03] FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-14 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [05:58:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:10:51] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:15:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:26:50] (03PS1) 10Stevemunene: Add new an worker keytabs [labs/private] - 10https://gerrit.wikimedia.org/r/1072655 (https://phabricator.wikimedia.org/T353788) [06:37:39] 10Tools: wikishootme not working since a few hours ago - https://phabricator.wikimedia.org/T345388#10143201 (10jeremyb-phone) [06:41:35] 10Tools: wikishootme not working since a few hours ago - https://phabricator.wikimedia.org/T345388#10143204 (10jeremyb-phone) @Jim.henderson has wikishootme been down at all recently? [06:49:51] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:54:51] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [06:59:51] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:11:54] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641#10143249 (10Raymond_Ndibe) [07:35:03] FIRING: [4x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-14 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:01:52] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:03:59] (03open) 10aborrero: kubernetes/worker_stuck: clarify summary and include number [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/19 [09:06:10] (03update) 10aborrero: kubernetes/worker_stuck: clarify summary and include number [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/19 [09:06:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:07:09] (03merge) 10aborrero: kubernetes/worker_stuck: clarify summary and include number [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/19 [09:09:43] 06cloud-services-team: toolforge: workers with many D procs (2024-09-13 edition) - https://phabricator.wikimedia.org/T374692 (10aborrero) 03NEW [09:10:51] FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:12:10] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 (T374692) [09:12:13] !log aborrero@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 (T374692) [09:12:14] T374692: toolforge: workers with many D procs (2024-09-13 edition) - https://phabricator.wikimedia.org/T374692 [09:15:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:20:55] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 (T374692) [09:20:59] T374692: toolforge: workers with many D procs (2024-09-13 edition) - https://phabricator.wikimedia.org/T374692 [09:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:42:46] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 (T374692) [09:42:50] T374692: toolforge: workers with many D procs (2024-09-13 edition) - https://phabricator.wikimedia.org/T374692 [09:43:23] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: instrument VXLAN-based flat network - https://phabricator.wikimedia.org/T374020#10143535 (10aborrero) About how openstack and MTU: https://docs.openstack.org/neutron/latest/admin/config-mtu.html [09:48:59] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: instrument VXLAN-based flat network - https://phabricator.wikimedia.org/T374020#10143548 (10cmooney) Good stuff! Looking at the VMs and the setup things seem to be working well. I'll need to dig more into the OVS stuff on the Linux side to fa... [09:51:05] 06cloud-services-team: toolforge: workers with many D procs (2024-09-13 edition) - https://phabricator.wikimedia.org/T374692#10143550 (10aborrero) 05Open→03Resolved a:03aborrero [10:28:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-55 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:30:32] 10Cloud-VPS: Frequent radosgw 500 errors with OpenTofu - https://phabricator.wikimedia.org/T360626#10143661 (10aborrero) p:05Triage→03Medium [11:00:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-54 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:13:18] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 (T374692) [11:13:22] T374692: toolforge: workers with many D procs (2024-09-13 edition) - https://phabricator.wikimedia.org/T374692 [11:18:51] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 (T374692) [11:18:55] T374692: toolforge: workers with many D procs (2024-09-13 edition) - https://phabricator.wikimedia.org/T374692 [11:25:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-54 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:25:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-54 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:30:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-54 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:36:17] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Communications support for Wikitech/Wikimedia Developer Account migration - https://phabricator.wikimedia.org/T373615#10143886 (10jijiki) [11:39:58] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Communication for Wikitech/Wikimedia Developer Account migration - https://phabricator.wikimedia.org/T373615#10143888 (10jijiki) a:05jijiki→03None [12:06:15] 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge: Decision Request: To strictly enforce semantic versioning rules for toolforge services' APIs or not - https://phabricator.wikimedia.org/T373072#10143962 (10aborrero) 05Open→03In progress p:05Triage→03Medium I feel like we are mixing in t... [12:32:41] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10144077 (10Bugreporter) Some thoughts: (1) LDAP-SUL connection 1. Many people created LDAP accounts for Toolforge uses and has LDAP-SUL connection vi... [12:34:48] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Communication for Wikitech/Wikimedia Developer Account migration - https://phabricator.wikimedia.org/T373615#10144104 (10Bugreporter) We should also mention if you reset your Wikitech password in this transitional period, for LDAP users with known SUL conn... [12:35:23] (03PS1) 10Muehlenhoff: labs-private: Remove parsoid stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1072738 (https://phabricator.wikimedia.org/T357750) [12:40:51] 10Tools: CropTool does not work - https://phabricator.wikimedia.org/T374709#10144122 (10Bugreporter) [12:40:54] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Migrate Cloud VPS instances to VXLAN based networks - https://phabricator.wikimedia.org/T364725#10144124 (10aborrero) [12:40:59] 06cloud-services-team, 10Cloud-VPS, 07Epic, 07IPv6: Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947#10144123 (10aborrero) [12:43:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:44:02] 06cloud-services-team, 10Cloud-VPS, 07Epic, 07IPv6: Enable IPv6 on CloudVPS - https://phabricator.wikimedia.org/T37947#10144130 (10aborrero) It has been suggested by @cmooney that we introduce support for IPv6 while on the migration for {T364725}, which I agree, and I'll try to do. [12:44:55] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Migrate Cloud VPS instances to VXLAN based networks - https://phabricator.wikimedia.org/T364725#10144135 (10aborrero) It has been mentioned by @cmooney that doing {T37947} while doing this VXLAN transition might be the perfect opportunity. I agree, and I... [12:47:35] 10Tools: CropTool does not work - https://phabricator.wikimedia.org/T374709#10144157 (10Aklapper) 05Open→03Invalid CropTool manages its tasks on GitHub issues at https://github.com/danmichaelo/croptool/issues - please file a task there. [12:47:52] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: cloudsw: codfw: enable IPv6 - https://phabricator.wikimedia.org/T374713 (10aborrero) 03NEW [12:48:21] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: openstack: verify security groups settings for IPv6 - https://phabricator.wikimedia.org/T374714 (10aborrero) 03NEW [12:48:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:49:47] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: openstack: work out IPv6 and designate integration - https://phabricator.wikimedia.org/T374715 (10aborrero) 03NEW [12:50:25] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: openstack: work out IPv6 and designate integration - https://phabricator.wikimedia.org/T374715#10144206 (10aborrero) [12:50:28] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: openstack: verify security groups settings for IPv6 - https://phabricator.wikimedia.org/T374714#10144207 (10aborrero) [12:50:31] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: CloudVPS: IPv6 early PoC - https://phabricator.wikimedia.org/T245495#10144208 (10aborrero) [12:55:00] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144214 (10aborrero) [12:55:29] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144223 (10aborrero) [12:56:05] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: cloudgw: add support and enable IPv6 - https://phabricator.wikimedia.org/T374716 (10aborrero) 03NEW [12:56:18] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: cloudsw: codfw: enable IPv6 - https://phabricator.wikimedia.org/T374713#10144237 (10aborrero) [12:56:22] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: cloudgw: add support and enable IPv6 - https://phabricator.wikimedia.org/T374716#10144238 (10aborrero) [12:56:28] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144239 (10aborrero) [12:57:10] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144245 (10aborrero) [12:57:32] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144241 (10aborrero) [12:57:35] 06cloud-services-team, 06Infrastructure-Foundations, 10netops, 06SRE: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144246 (10aborrero) [13:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:25:34] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 10Data-Services: cloudcumin: allow wmcs-admin to run wikireplicas cookbooks and scripts - https://phabricator.wikimedia.org/T347977#10144474 (10fnegri) 05Open→03Declined In {T344599} it was decided members of wmcs-roots should //not// have root... [14:51:29] (03CR) 10Alexandros Kosiaris: [C:03+2] labs-private: Remove parsoid stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1072738 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff) [14:51:31] (03CR) 10Alexandros Kosiaris: [V:03+2 C:03+2] labs-private: Remove parsoid stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1072738 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff) [15:19:56] FIRING: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:24:56] RESOLVED: SystemdUnitDown: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:31:46] 06cloud-services-team, 06Data-Engineering, 05Cloud-Services-Origin-User: WMCS-roots paging responsibilities - https://phabricator.wikimedia.org/T344608#10144604 (10fnegri) 05Open→03Declined In {T344599}, it was decided members of wmcs-roots should //not// have root access to wiki replicas hosts (clou... [15:40:20] 10cloud-services-team (FY2024/2025-Q1-Q2), 06Data-Engineering, 05Cloud-Services-Origin-User: WMCS-roots paging responsibilities - https://phabricator.wikimedia.org/T344608#10144665 (10fnegri) [15:40:26] 10cloud-services-team (FY2024/2025-Q1-Q2), 06Data-Engineering, 05Cloud-Services-Origin-User: WMCS-roots paging responsibilities - https://phabricator.wikimedia.org/T344608#10144666 (10fnegri) a:03fnegri [16:48:48] 10Toolforge: [builds-cli] No obvious way to delete individual `toolforge build` generated artifacts other than `toolforge clean` - https://phabricator.wikimedia.org/T368317#10144873 (10bd808) >>! In T368317#10136747, @dcaro wrote: > This is half-intentional, in the sense that we decided to avoid exposing the con... [18:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:05:50] 10Tool-schedule-deployment: ScheduleDeploymentBot edits being marked as bot means they break watchlisting of the Deployments page - https://phabricator.wikimedia.org/T374735 (10Jdforrester-WMF) 03NEW [23:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks