[00:06:02] 06cloud-services-team, 10Toolforge: cfdw-28928147-9qtjx stuck in Terminating state - https://phabricator.wikimedia.org/T382863 (10JJMC89) 03NEW [00:24:28] 06cloud-services-team, 10Toolforge: Toolforge joibs: increased exit code 137 rate since 2024-12-14 - https://phabricator.wikimedia.org/T382865 (10JJMC89) 03NEW [00:25:40] 06cloud-services-team, 10Toolforge: Toolforge jobs: increased exit code 137 rate since 2024-12-14 - https://phabricator.wikimedia.org/T382865#10426066 (10JJMC89) [00:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:51:35] 06cloud-services-team, 10Toolforge: [jobs-emailer] duplicate failure emails - https://phabricator.wikimedia.org/T382866 (10JJMC89) 03NEW [02:22:31] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-61 [02:28:00] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-61 [02:33:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:25:29] 10Cloud Services Proposals, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE: Decision request - Who runs wikireplicas cookbooks - https://phabricator.wikimedia.org/T382607#10426165 (10Marostegui) I like option #4 too, but ideally hosts and scripts should be idempotent and should be able to r... [12:22:24] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/27 [13:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:26:11] (03open) 10marostegui: templates: Change order of heartbeat [toolforge-repos/switchmaster] - 10https://gitlab.wikimedia.org/toolforge-repos/switchmaster/-/merge_requests/7 [15:30:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:33:22] (03approved) 10ladsgroup: templates: Change order of heartbeat [toolforge-repos/switchmaster] - 10https://gitlab.wikimedia.org/toolforge-repos/switchmaster/-/merge_requests/7 (owner: 10marostegui) [15:33:30] (03merge) 10marostegui: templates: Change order of heartbeat [toolforge-repos/switchmaster] - 10https://gitlab.wikimedia.org/toolforge-repos/switchmaster/-/merge_requests/7 [16:19:53] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), and 2 others: Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup failures, often for ou... - https://phabricator.wikimedia.org/T374830#10427083 [16:28:19] 10PAWS: update rstudio - https://phabricator.wikimedia.org/T382903 (10rook) 03NEW [16:49:33] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), and 2 others: Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup failures, often for ou... - https://phabricator.wikimedia.org/T374830#10427220 [17:44:31] 10PAWS: update rstudio - https://phabricator.wikimedia.org/T382903#10427407 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/474 [17:44:34] 10PAWS: update rstudio - https://phabricator.wikimedia.org/T382903#10427408 (10rook) 05Open→03Resolved [18:38:59] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), and 2 others: Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup failures, often for ou... - https://phabricator.wikimedia.org/T374830#10427602 [18:44:40] 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10427613 (10Andrew) I worked on this a bit over the break. I'm pretty happy with the [[ https://wts.wmcloud.org | static site that httrack produces ]]. It was generated l... [18:53:02] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure), and 2 others: Various CI jobs running in the integration Cloud VPS project failing due to transient DNS lookup failures, often for ou... - https://phabricator.wikimedia.org/T374830#10427640