[00:06:50] 06cloud-services-team, 10Toolforge: Lock down tools-sgebastion-10 (login-buster.toolforge.org) to only members of tools with known dependencies on it - https://phabricator.wikimedia.org/T397459#10945437 (10bd808) >>! In T397459#10937535, @taavi wrote: > Per the last WMCS team meeting, I've set `profile::ld... [00:14:37] 06cloud-services-team, 10Striker: Striker should use ID instead of username to identify SUL accounts - https://phabricator.wikimedia.org/T359428#10945439 (10bd808) >>! In T359428#10937627, @taavi wrote: > What's left is finding a way to set the new ID field for old accounts that only have an username set. It... [00:22:20] 06cloud-services-team, 10Toolforge, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (2025.06.13 - 2025.07.04): Problem with SPARQL endpoint response and crawling on Toolforge - https://phabricator.wikimedia.org/T397570#10945443 (10bd808) Very few web crawlers these days actually respect robots.t... [00:29:49] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10945465 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host cloudcephosd2005-dev.codfw.wmnet with OS... [00:43:29] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10945492 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host cloudcephosd2006-dev.codfw.wmnet with OS... [00:44:31] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10945493 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host cloudcephosd2007-dev.codfw.wmnet with OS... [00:59:37] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795 (10Andrew) 03NEW [01:00:22] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10945529 (10Andrew) [01:00:23] 06cloud-services-team, 10Cloud-VPS: Un-attachable volume in account-creation-assistance, 'app-www' - https://phabricator.wikimedia.org/T397517#10945530 (10Andrew) [01:00:24] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 06collaboration-services, 10GitLab (Infrastructure): Volume is stuck to deleted instance in devtools project - https://phabricator.wikimedia.org/T396739#10945531 (10Andrew) [01:01:10] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 10Data-Platform-SRE (2025.06.13 - 2025.07.04): Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10945533 (10aude) It would be very helpful to have access to these tables on too... [01:01:20] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10945534 (10Andrew) [01:01:21] 06cloud-services-team, 10Cloud-VPS, 10VPS-Project-wikicommunityhealth: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089#10945535 (10Andrew) [01:04:51] 06cloud-services-team, 10Cloud-VPS: 'mwcurator' volume in the 'mwoffliner' project cannot be attached to anything - https://phabricator.wikimedia.org/T397796 (10Andrew) 03NEW [01:05:58] !log andrew@cloudcumin1001 mwoffliner START - Cookbook wmcs.openstack.quota_increase (T397796) [01:06:01] T397796: 'mwcurator' volume in the 'mwoffliner' project cannot be attached to anything - https://phabricator.wikimedia.org/T397796 [01:06:05] !log andrew@cloudcumin1001 mwoffliner END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T397796) [01:19:40] 06cloud-services-team, 10Cloud-VPS: 'mwcurator' volume in the 'mwoffliner' project cannot be attached to anything - https://phabricator.wikimedia.org/T397796#10945578 (10Andrew) To get the users unstuck I've restored this volume to 'mwcurator1' and adjusted the project quota accordingly. Keeping the cursed vol... [01:31:42] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10945597 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host cloudcephosd2005-dev.codfw.wmnet with OS bul... [01:31:45] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10945598 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host cloudcephosd2006-dev.codfw.wmnet with OS bul... [01:31:46] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10945599 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host cloudcephosd2007-dev.codfw.wmnet with OS bul... [01:35:26] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10945602 (10Jhancock.wm) 05Open→03Resolved [01:35:47] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10945605 (10Jhancock.wm) @Andrew done! [02:07:26] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-38 [02:18:03] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-38 [02:22:15] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10945636 (10Andrew) Here is an example of what it looks like on a cloudvirt when I try to attach one of these volumes. I think what should happen is: 1) nova-api receive... [02:24:56] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10945637 (10Andrew) ...and here it is, in cinder-api logs on cloudcontrol1007: ` 2025-06-25 02:17:14.766 421948 ERROR cinder.api.v3.attachments [req-72f834c3-ab39-4c60-... [02:25:44] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10945647 (10Andrew) I'm attached to the idea that that timeout is happening because cinder-api thinks that a no-longer-existing cloudcontrol is in charge of the volume, a... [02:40:11] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10945670 (10Andrew) When I attach/detach an uncursed volume I see a message about it in the cinder-volume log. When I do the same with a cursed volume, just silence. [02:44:37] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10945681 (10Andrew) Suspicious fact: some of the IPs here are for cloudcephmon hosts that don't exist anymore: ` mysql:root@localhost [cinder]> select connection_info f... [02:51:07] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10945683 (10Andrew) ...I think I made a typo when moving some volumes from cloudcontrol1005 to cloudcontrol1011. I see invalid host files. Fixing with... update volumes... [02:51:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [02:53:05] 06cloud-services-team, 10Cloud-VPS: 'mwcurator' volume in the 'mwoffliner' project cannot be attached to anything - https://phabricator.wikimedia.org/T397796#10945686 (10Andrew) Ok, now I think I've also unstuck 'mwcurator'. So @Benoit74 please pick one, attach and confirm that it works, then delete the other... [02:56:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [02:57:04] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-14 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [03:07:04] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-14 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce [04:36:14] (03open) 10chuckonwumelu: bash-completion: Add autocomplete for creating a config [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/41 (https://phabricator.wikimedia.org/T395077) [04:39:23] (03update) 10chuckonwumelu: bash-completion: Add autocomplete for creating a config [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/41 (https://phabricator.wikimedia.org/T395077) [06:37:21] 06cloud-services-team, 10Cloud-VPS: 'mwcurator' volume in the 'mwoffliner' project cannot be attached to anything - https://phabricator.wikimedia.org/T397796#10945862 (10Benoit74) What is kinda weird is that volume is still available from inside the VM ` $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda... [08:06:32] (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480) [08:06:32] (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480) [08:06:33] (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480) [08:06:34] (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826 (https://phabricator.wikimedia.org/T386480) [08:06:34] (03update) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480) [08:06:40] (03update) 10taavi: logging: loki: Add second Loki instance for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/834 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [08:06:44] (03update) 10taavi: logging: alloy: Add routing for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/835 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [08:06:48] (03open) 10taavi: logging: loki: Add second Loki instance for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/834 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [08:06:52] (03open) 10taavi: logging: alloy: Add routing for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/835 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [08:07:00] (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480) [08:07:08] (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480) [08:07:12] (03update) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480) [08:07:16] (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826 (https://phabricator.wikimedia.org/T386480) [08:07:21] (03update) 10taavi: logging: loki: Add second Loki instance for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/834 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [08:07:24] (03update) 10taavi: logging: alloy: Add routing for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/835 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [08:07:29] (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480) [08:15:52] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [toolforge,infra] Centralized logging for Toolforge infrastructure logs - https://phabricator.wikimedia.org/T97861#10946047 (10dcaro) [08:20:19] (03merge) 10dcaro: generate: add new subcommand [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/38 [08:24:51] (03open) 10dcaro: bump_version: use bookworm by default [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/42 [08:25:02] (03approved) 10dcaro: bump_version: use bookworm by default [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/42 [08:25:07] (03merge) 10dcaro: bump_version: use bookworm by default [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/42 [08:47:53] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Persistence, 06Privacy Engineering: Set up x1 replication to Wiki Replicas - https://phabricator.wikimedia.org/T395881#10946215 (10fnegri) A use case for having x1 in Wiki Replicas was discussed in {T387419}: being able to "generate stati... [09:12:28] FIRING: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance tools-puppetserver-01 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [09:37:28] RESOLVED: PuppetSyncFailure: Failed to update Puppet repository /srv/git/operations/puppet on instance tools-puppetserver-01 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [09:50:22] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Add IPv6 DNS recursor to v6-capable hosts - https://phabricator.wikimedia.org/T397822 (10taavi) 03NEW [09:50:45] (03open) 10dcaro: d/changelog: bump to 0.0.9 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/43 (https://phabricator.wikimedia.org/T394753) [09:53:37] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [09:55:59] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [10:13:29] 10Toolforge (Toolforge iteration 21): [components-api] Add warning when keys of the tool config are not understood - https://phabricator.wikimedia.org/T397828 (10dcaro) 03NEW [10:13:38] 10Toolforge (Toolforge iteration 21): [components-api] Add warning when keys of the tool config are not understood - https://phabricator.wikimedia.org/T397828#10946559 (10dcaro) p:05Triage→03High [10:13:40] 10Toolforge (Toolforge iteration 21): [components-api] Add warning when keys of the tool config are not understood - https://phabricator.wikimedia.org/T397828#10946561 (10dcaro) 05Open→03In progress [10:14:35] (03open) 10dcaro: update_tool_config: Return a warning for each non-managed field [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/97 (https://phabricator.wikimedia.org/T395070) [10:15:29] (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480) [10:15:30] (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480) [10:15:31] (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480) [10:15:31] (03open) 10taavi: logging: alloy: Allow running on the entire cluster [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/836 (https://phabricator.wikimedia.org/T97861) [10:15:31] (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826 (https://phabricator.wikimedia.org/T386480) [10:15:33] (03update) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480) [10:15:37] (03update) 10taavi: logging: loki: Add second Loki instance for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/834 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [10:15:41] (03update) 10taavi: logging: alloy: Add routing for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/835 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [10:15:45] (03update) 10taavi: logging: alloy: Allow running on the entire cluster [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/836 (https://phabricator.wikimedia.org/T97861) [10:15:53] (03update) 10taavi: logging: alloy: Allow running on the entire cluster [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/836 (https://phabricator.wikimedia.org/T97861) [10:30:45] (03update) 10dcaro: update_tool_config: Return a warning for each non-managed field [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/97 (https://phabricator.wikimedia.org/T395070) [10:52:48] (03PS1) 10Klausman: hiera/k8s: Add missing :prod suffix to machinetranslation S3 credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1163739 (https://phabricator.wikimedia.org/T335491) [10:53:08] (03CR) 10Klausman: [V:03+2 C:03+2] hiera/k8s: Add missing :prod suffix to machinetranslation S3 credentials [labs/private] - 10https://gerrit.wikimedia.org/r/1163739 (https://phabricator.wikimedia.org/T335491) (owner: 10Klausman) [11:06:19] (03update) 10fnegri: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 (owner: 10dcaro) [11:11:23] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-cli [11:13:17] (03PS2) 10Arendpieter: Remove support for SUL 'realname' field. [labs/striker] - 10https://gerrit.wikimedia.org/r/1163331 (https://phabricator.wikimedia.org/T384206) [11:14:00] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [11:14:50] (03CR) 10Arendpieter: Remove support for SUL 'realname' field. (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/1163331 (https://phabricator.wikimedia.org/T384206) (owner: 10Arendpieter) [11:21:46] 06cloud-services-team, 10Toolforge: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724#10946714 (10taavi) I [[ https://wikitech.wikimedia.org/w/index.php?diff=2317372&oldid=2316929 | changed the docs ]] to recommend placing the deployment config to a... [11:39:19] (03approved) 10dcaro: d/changelog: bump to 0.0.9 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/43 (https://phabricator.wikimedia.org/T394753) [11:39:23] (03merge) 10dcaro: d/changelog: bump to 0.0.9 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/43 (https://phabricator.wikimedia.org/T394753) [11:45:30] (03update) 10dcaro: components: add test for the generate feature [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/822 [11:59:24] (03update) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [12:00:54] (03update) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [12:01:41] (03update) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [12:35:03] (03update) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [12:36:10] (03close) 10dcaro: [components.tool_router] add example config endpoint [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/82 (https://phabricator.wikimedia.org/T394753) (owner: 10raymond-ndibe) [12:36:17] (03update) 10dcaro: build: fail if ref failed to resolve [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/96 [12:36:22] (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [12:36:29] (03update) 10dcaro: deploy: add all the missing options for continuous job [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/93 (https://phabricator.wikimedia.org/T395070) [12:36:38] (03update) 10dcaro: update_tool_config: Return a warning for each non-managed field [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/97 (https://phabricator.wikimedia.org/T395070) [12:37:11] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [12:38:17] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [12:38:23] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [12:39:29] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [12:39:59] (03update) 10dcaro: deploy: add all the missing options for continuous job [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/93 (https://phabricator.wikimedia.org/T395070) [12:40:06] (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [12:40:11] (03update) 10dcaro: build: fail if ref failed to resolve [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/96 [12:40:17] (03update) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [12:41:32] (03close) 10dcaro: [components-cli] print example config [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/34 (https://phabricator.wikimedia.org/T394753) (owner: 10raymond-ndibe) [12:47:11] (03open) 10dcaro: tool-config: export the config schema [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/98 (https://phabricator.wikimedia.org/T397724) [12:48:24] (03update) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [12:48:25] (03update) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [12:49:10] (03approved) 10fnegri: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 (owner: 10dcaro) [12:50:05] (03merge) 10dcaro: openapi: add examples and docs for the config model [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/91 [12:52:43] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.123-20250625125026-5a83e249 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/837 [13:04:12] (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [13:05:33] (03update) 10dcaro: update_tool_config: Return a warning for each non-managed field [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/97 (https://phabricator.wikimedia.org/T395070) [13:06:11] (03update) 10dcaro: tool-config: export the config schema [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/98 (https://phabricator.wikimedia.org/T397724) [13:22:03] (03update) 10dcaro: tool-config: export the config schema [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/98 (https://phabricator.wikimedia.org/T397724) [13:23:27] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724#10947040 (10dcaro) @taavi is this what you meant? https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/98/diffs [13:24:13] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724#10947041 (10dcaro) I would wait a bit for the schema to stabilize before uploading to the store though [13:24:45] (03update) 10dcaro: tool-config: export the config schema [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/98 (https://phabricator.wikimedia.org/T397724) [13:25:36] (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [13:27:57] 10Tool-translatetagger: Create Gadget to Simplify Workflow for Adding Translation Tag - https://phabricator.wikimedia.org/T393170#10947050 (10TiagoLubiana) Hi @Gopavasanth ! I need to tag some page for translation and I'd love to use your script. There is nothing currently in https://www.mediawiki.org/wiki/User:... [13:29:27] 10Tool-translatetagger: Create Gadget to Simplify Workflow for Adding Translation Tag - https://phabricator.wikimedia.org/T393170#10947070 (10TiagoLubiana) I see https://translatetagger.toolforge.org too — I guess I'll use this now. [13:38:56] (03update) 10chuckonwumelu: bash-completion: Add autocomplete for creating a config [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/41 (https://phabricator.wikimedia.org/T395077) [13:40:26] 06cloud-services-team, 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724#10947132 (10dcaro) [13:40:35] 06cloud-services-team, 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724#10947135 (10dcaro) a:03dcaro [13:40:51] 06cloud-services-team, 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724#10947136 (10dcaro) p:05Triage→03Medium [13:41:02] 06cloud-services-team, 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: [components-api] Provide a standalone version of tool config schema - https://phabricator.wikimedia.org/T397724#10947138 (10dcaro) 05Open→03In progress [13:42:14] (03approved) 10fnegri: bash-completion: Add autocomplete for creating a config [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/41 (https://phabricator.wikimedia.org/T395077) (owner: 10chuckonwumelu) [13:43:45] (03merge) 10chuckonwumelu: bash-completion: Add autocomplete for creating a config [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/41 (https://phabricator.wikimedia.org/T395077) [13:45:35] (03open) 10chuckonwumelu: d/changelog: bump to 0.0.10 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T395077) [13:46:47] !log chuckonwumelu@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [13:49:12] !log chuckonwumelu@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [13:50:20] !log chuckonwumelu@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-cli [13:52:45] !log chuckonwumelu@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [14:03:16] (03merge) 10chuckonwumelu: d/changelog: bump to 0.0.10 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/44 (https://phabricator.wikimedia.org/T395077) [14:15:12] 10Toolforge (Toolforge iteration 21), 07good first task, 13Patch-For-Review: [components-cli] bash autocomplete does not autocomplete file name when creating config - https://phabricator.wikimedia.org/T395077#10947361 (10Chuckonwumelu) 05In progress→03Resolved [14:15:42] 06cloud-services-team, 10Data-Services, 10Tool-wikiloves: Frequent "Aborted connections" to Wiki Replicas - https://phabricator.wikimedia.org/T395952#10947362 (10fnegri) p:05Triage→03Low [14:16:30] 06cloud-services-team, 10Cloud-VPS: Import Fedora CoreOS 42 image for use with Magnum - https://phabricator.wikimedia.org/T396912#10947370 (10joanna_borun) p:05Triage→03Low [14:17:47] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): ZuulDevOpsBot user can create but not delete a cluster template - https://phabricator.wikimedia.org/T396932#10947384 (10joanna_borun) p:05Triage→03Medium [14:18:17] 06cloud-services-team, 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T396933#10947385 (10joanna_borun) p:05Triage→03Medium [14:19:15] 06cloud-services-team, 10Cloud-VPS, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Maintenance, 05Goal: cloudcephosd1xxxx.private.eqiad.wikimedia.cloud - https://phabricator.wikimedia.org/T396940#10947398 (10dcaro) p:05Triage→03Medium [14:20:43] 06cloud-services-team, 10Cloud-VPS, 07Epic, 07IPv6: Replace remaining IPv4 NAT exemptions by IPv6 adoption - https://phabricator.wikimedia.org/T396986#10947408 (10joanna_borun) p:05Triage→03Low [14:21:03] 06cloud-services-team, 10Striker: Striker LibUp runs failing due to weird handling of .dockerignore - https://phabricator.wikimedia.org/T397044#10947409 (10taavi) p:05Triage→03Medium [14:21:58] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Add IPv6 DNS recursor to v6-capable hosts - https://phabricator.wikimedia.org/T397822#10947423 (10taavi) p:05Triage→03Medium [14:22:08] 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T397557#10947424 (10dcaro) 05Open→03Resolved a:03dcaro Fixed [14:24:47] 06cloud-services-team, 10Toolforge (Toolforge iteration 21): [tools-static,infra] NFS issues should not bring tools-static down - https://phabricator.wikimedia.org/T397634#10947453 (10joanna_borun) p:05Triage→03Medium [14:25:32] 06cloud-services-team, 10Toolforge: [components-api] Deployment token should not be a GET param - https://phabricator.wikimedia.org/T397712#10947459 (10joanna_borun) p:05Triage→03Medium [14:26:18] 06cloud-services-team: MetricsinfraAlertmanagerDown Metricsinfra alertmanager is unreachable # page - https://phabricator.wikimedia.org/T397782#10947463 (10joanna_borun) 05Open→03Resolved [14:26:59] 06cloud-services-team: MetricsinfraAlertmanagerDown Metricsinfra alertmanager is unreachable # page - https://phabricator.wikimedia.org/T397782#10947474 (10fnegri) Not firing anymore. Likely caused by {T397563}. [14:27:20] 06cloud-services-team, 10Cloud-VPS: 'mwcurator' volume in the 'mwoffliner' project cannot be attached to anything - https://phabricator.wikimedia.org/T397796#10947479 (10joanna_borun) p:05Triage→03High [14:28:19] 06cloud-services-team, 10Striker: Use IDP for authentication in Striker - https://phabricator.wikimedia.org/T359554#10947482 (10joanna_borun) p:05Triage→03Medium [14:28:23] 06cloud-services-team, 10Striker, 13Patch-For-Review: Stop trying to store MW real name in Striker - https://phabricator.wikimedia.org/T384206#10947483 (10joanna_borun) p:05Triage→03Medium [14:35:02] 06cloud-services-team, 10Toolforge, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (2025.06.13 - 2025.07.04): Problem with SPARQL endpoint response and crawling on Toolforge - https://phabricator.wikimedia.org/T397570#10947520 (10Fnielsen) 05Open→03Resolved a:03Fnielsen Thanks for the... [14:36:07] 10Toolforge (Toolforge iteration 21): [components-cli,toolforge-cli] add shortcuts to top-level cli for deploy/config - https://phabricator.wikimedia.org/T397725#10947527 (10dcaro) p:05Triage→03Low [14:43:28] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Add IPv6 DNS recursor to v6-capable hosts - https://phabricator.wikimedia.org/T397822#10947570 (10taavi) a:05taavi→03None [14:52:06] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure (Zuul upgrade): ZuulDevOpsBot user can create but not delete a cluster template - https://phabricator.wikimedia.org/T396932#10947617 (10bd808) [15:12:53] (03approved) 10dcaro: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480) (owner: 10taavi) [15:18:06] (03approved) 10dcaro: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826 (https://phabricator.wikimedia.org/T386480) (owner: 10taavi) [15:28:10] 06cloud-services-team, 10Toolforge: Lock down tools-sgebastion-10 (login-buster.toolforge.org) to only members of tools with known dependencies on it - https://phabricator.wikimedia.org/T397459#10947740 (10-jem-) Thanks for the ping. I'm migrating my automatic tasks that require direct shell access to PHP... [16:10:14] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10947873 (10fnegri) p:05Triage→03Low I will keep this task open as "Low" priority, as I don't see any clearly actionable solutions. I agree with @cmooney that our US infra does no... [16:31:25] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10947976 (10dcaro) @Nokib_Sarkar An interesting test could be to tunnel through your droplet to reach toolforge, like (untested): ` ssh -L 127.0.0.1:1234:login.toolforge.org:22 campwi... [16:32:48] (03open) 10dcaro: cancel: add endpoint to cancel an ongoing deployment [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/99 (https://phabricator.wikimedia.org/T395039) [16:42:05] (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480) [16:48:19] (03update) 10dcaro: cancel: add endpoint to cancel an ongoing deployment [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/99 (https://phabricator.wikimedia.org/T395039) [16:53:18] (03update) 10dcaro: cancel: add endpoint to cancel an ongoing deployment [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/99 (https://phabricator.wikimedia.org/T395039) [16:57:26] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10948050 (10fnegri) > We have https://network-tests.toolforge.org/ TIL! That's perfect for testing. Example from my location (Milan, Italy): ` ~ $ curl -o /dev/null https://network-t... [16:58:02] (03update) 10dcaro: cancel: add endpoint to cancel an ongoing deployment [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/99 (https://phabricator.wikimedia.org/T395039) [17:09:20] (03update) 10dcaro: cancel: add endpoint to cancel an ongoing deployment [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/99 (https://phabricator.wikimedia.org/T395039) [17:09:52] 10Cloud-VPS (Project-requests): Request creation of torrents VPS project - https://phabricator.wikimedia.org/T397861 (10TheresNoTime) 03NEW [17:15:28] (03open) 10dcaro: cancel: add new subcommand to cancel a deployment [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/45 [17:16:00] (03update) 10dcaro: cancel: add new subcommand to cancel a deployment [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/45 [17:17:18] (03update) 10dcaro: cancel: add new subcommand to cancel a deployment [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/45 [17:21:53] (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480) [17:23:40] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [17:24:26] (03update) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480) [17:25:06] (03merge) 10taavi: logging: Fix path to get_secret.sh [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/823 (https://phabricator.wikimedia.org/T386480) [17:25:07] (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480) [17:25:35] (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480) [17:25:36] (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480) [17:25:36] (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826 (https://phabricator.wikimedia.org/T386480) [17:25:37] (03update) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480) [17:25:37] (03update) 10taavi: logging: loki: Add second Loki instance for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/834 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [17:25:41] (03update) 10taavi: logging: alloy: Add routing for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/835 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [17:25:45] (03update) 10taavi: logging: alloy: Allow running on the entire cluster [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/836 (https://phabricator.wikimedia.org/T97861) [17:25:51] (03update) 10taavi: logging: alloy: Add routing for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/835 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [17:25:59] (03update) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480) [17:26:04] (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480) [17:26:08] (03update) 10taavi: logging: alloy: Allow running on the entire cluster [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/836 (https://phabricator.wikimedia.org/T97861) [17:26:12] (03update) 10taavi: logging: loki: Add network policy rule for object storage access [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/827 (https://phabricator.wikimedia.org/T386480) [17:26:16] (03update) 10taavi: logging: alloy: Fix loki write service name [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/826 (https://phabricator.wikimedia.org/T386480) [17:26:20] (03update) 10taavi: logging: loki: Add second Loki instance for infrastructure logs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/834 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) [17:26:24] (03merge) 10taavi: logging: loki: Add missing emptyDir mounts in toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/824 (https://phabricator.wikimedia.org/T386480) [17:26:28] (03update) 10taavi: logging: loki: Set nameOverride [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/825 (https://phabricator.wikimedia.org/T386480) [17:27:26] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [17:32:26] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [17:35:27] (03approved) 10fnegri: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 (owner: 10dcaro) [17:36:31] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [17:36:41] (03update) 10fnegri: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 (owner: 10dcaro) [17:41:05] (03approved) 10dcaro: components-api: bump to 0.0.123-20250625125026-5a83e249 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/837 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:41:08] (03update) 10dcaro: components-api: bump to 0.0.123-20250625125026-5a83e249 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/837 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:41:23] (03merge) 10dcaro: components-api: bump to 0.0.123-20250625125026-5a83e249 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/837 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:41:27] (03merge) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [17:44:29] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.124-20250625174144-2a13cbb7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/838 [17:44:30] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.124-20250625174144-2a13cbb7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/838 [17:53:58] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [17:54:20] 06cloud-services-team, 10Toolforge: [builds-cli] Show "image_name" in build details - https://phabricator.wikimedia.org/T397863 (10fnegri) 03NEW [17:54:36] 06cloud-services-team, 10Toolforge: [builds-cli] Show "image_name" in build details - https://phabricator.wikimedia.org/T397863#10948188 (10fnegri) [17:54:38] 14Toolforge (Toolforge iteration 20), 07good first task: [builds-api] populate the `image_name` for the builds returned - https://phabricator.wikimedia.org/T395035#10948189 (10fnegri) [17:54:54] 06cloud-services-team, 10Toolforge: Lock down tools-sgebastion-10 (login-buster.toolforge.org) to only members of tools with known dependencies on it - https://phabricator.wikimedia.org/T397459#10948190 (10bd808) >>! In T397459#10947740, @-jem- wrote: > Thanks for the ping. I'm migrating my automatic tasks... [17:58:10] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [18:07:05] (03open) 10chuckonwumelu: bash-completion: Add file system recognition to autocomplete [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T395077) [18:07:11] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [18:09:12] (03update) 10chuckonwumelu: bash-completion: Add file system recognition to autocomplete [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/46 (https://phabricator.wikimedia.org/T395077) [18:10:59] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [18:12:32] (03approved) 10dcaro: components-api: bump to 0.0.124-20250625174144-2a13cbb7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/838 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [18:12:36] (03merge) 10dcaro: components-api: bump to 0.0.124-20250625174144-2a13cbb7 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/838 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [18:37:27] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Quota increase required for Catalyst - https://phabricator.wikimedia.org/T397716#10948321 (10bd808) +1 [19:31:32] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS, 06collaboration-services, 10GitLab (Infrastructure): Volume is stuck to deleted instance in devtools project - https://phabricator.wikimedia.org/T396739#10948470 (10Andrew) 05Open→03Resolved I think that gitlab-prod-backup is now unstuck, so c... [19:42:07] !log andrew@cloudcumin1001 mwoffliner START - Cookbook wmcs.openstack.quota_increase (T397796) [19:42:11] T397796: 'mwcurator' volume in the 'mwoffliner' project cannot be attached to anything - https://phabricator.wikimedia.org/T397796 [19:42:14] !log andrew@cloudcumin1001 mwoffliner END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T397796) [19:43:43] 06cloud-services-team, 10Cloud-VPS: 'mwcurator' volume in the 'mwoffliner' project cannot be attached to anything - https://phabricator.wikimedia.org/T397796#10948495 (10Andrew) 05Open→03Resolved a:03Andrew I've deleted mwcurator1 and reverted the quota increase, so I think this can be closed. I don... [19:48:04] PROBLEM - Host cloudvirt1046 is DOWN: PING CRITICAL - Packet loss = 100% [19:49:50] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1046 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [19:51:34] RECOVERY - Host cloudvirt1046 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [20:06:37] PROBLEM - Host cloudvirt1047 is DOWN: PING CRITICAL - Packet loss = 100% [20:09:05] RECOVERY - Host cloudvirt1047 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [20:09:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1046 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:14:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1046 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:23:22] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [20:25:31] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance cvn-app10 in project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:27:47] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) [20:28:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [20:28:16] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) [20:28:19] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [20:29:35] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) [20:29:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node [20:29:50] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1046 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:29:51] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=0) [20:31:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_node [20:32:01] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.drain_node (exit_code=0) [20:33:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [20:33:13] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [20:33:20] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1047 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:33:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [20:33:50] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [20:34:30] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [20:35:36] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) [20:35:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [20:35:47] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) [20:36:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [20:38:19] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1047 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [20:43:10] andrew@cloudcumin1001 bootstrap_and_add (PID 263498) is awaiting input [20:46:32] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) [21:02:19] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1048 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [21:10:23] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [21:10:23] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) [21:10:28] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [21:16:34] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) [21:17:11] (03CR) 10Arendpieter: [C:03+1] labsauth: Write SUL account details to LDAP on registration [labs/striker] - 10https://gerrit.wikimedia.org/r/1076815 (https://phabricator.wikimedia.org/T148048) (owner: 10Majavah) [21:17:19] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1049 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [21:17:36] (03CR) 10Arendpieter: [C:03+1] labsauth: Write SUL details to LDAP when updating linkage [labs/striker] - 10https://gerrit.wikimedia.org/r/1076816 (https://phabricator.wikimedia.org/T148048) (owner: 10Majavah) [21:17:52] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add [21:22:53] !log andrew@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=97) [21:32:20] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1049 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [21:37:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [21:37:05] 06cloud-services-team: NovafullstackSustainedFailures Novafullstack tests have been failing for more than 5hours in eqiad - https://phabricator.wikimedia.org/T397882 (10phaultfinder) 03NEW [21:37:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1051 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [21:42:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1050 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [21:57:48] 06cloud-services-team, 10Cloud-VPS: Un-attachable volume in account-creation-assistance, 'app-www' - https://phabricator.wikimedia.org/T397517#10948884 (10Andrew) 05Open→03Resolved I think I've now resolved the general case of this bug, so I'm going to delete app-www. You can delete accounts-appserver6... [21:57:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1051 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [22:02:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1051 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [22:09:45] 06cloud-services-team, 10Cloud-VPS, 10VPS-Project-wikicommunityhealth: [cinder] Volume failing to attach/detach - https://phabricator.wikimedia.org/T392089#10948909 (10Andrew) 05Open→03Resolved I've deleted the 'frontdata' volume so now I'm returning this to the hands of its users :) [22:11:31] 06cloud-services-team, 10Cloud-VPS: Recurring issues with cinder volumes attaching and detaching - https://phabricator.wikimedia.org/T397795#10948915 (10Andrew) 05Open→03Resolved a:03Andrew Closing with an abundance of optimism [22:17:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1052 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [22:18:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance syslog-server-audit02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:21:54] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1053 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [22:26:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1052 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [22:36:06] 06cloud-services-team, 10Cloud-VPS: Neutron policy does not allow the admin role to modify security groups - https://phabricator.wikimedia.org/T348582#10948955 (10Andrew) a:05Andrew→03taavi @taavi, I suspect that this is resolved after multiple recent policy updates. Can you recheck and either close the ta... [22:41:49] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1053 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [22:43:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance syslog-server-audit01 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:46:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1053 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [22:58:40] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10948977 (10Andrew) Currently we only have one NIC connected for each of these. Ports are scarce in that rack, so the plan (in too much detail) is:... [23:00:15] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567]-dev - https://phabricator.wikimedia.org/T393614#10948980 (10Andrew) 05Resolved→03Open [23:01:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1054 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:02:00] RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [23:06:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1054 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:21:49] FIRING: [3x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1054 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:41:49] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1056 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:56:49] RESOLVED: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1056 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:57:19] FIRING: [2x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1057 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [23:58:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance syslog-server-audit01 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources