[00:03:07] (03merge) 10bd808: Allow target path manipulation [toolforge-repos/containers-redirect] - 10https://gitlab.wikimedia.org/toolforge-repos/containers-redirect/-/merge_requests/4 [00:27:18] 10Tool-toolinfo-scraper: Port tool to build system and job framework - https://phabricator.wikimedia.org/T410472 (10bd808) 03NEW [00:40:29] 10Toolforge, 06tools-infrastructure-team: Reduce tool breakage over new ingress-nginx annotation validation rules - https://phabricator.wikimedia.org/T409474#11386777 (10bd808) [x] jenkins-build-stats [x] toolforge-standards-committee [x] toolinfo-scraper [x] xn--9s9h I think that is all of them. @taavi would... [00:44:56] (03open) 10bd808: README: typo fix [toolforge-repos/containers-redirect] - 10https://gitlab.wikimedia.org/toolforge-repos/containers-redirect/-/merge_requests/5 [00:45:33] (03merge) 10bd808: README: typo fix [toolforge-repos/containers-redirect] - 10https://gitlab.wikimedia.org/toolforge-repos/containers-redirect/-/merge_requests/5 [01:22:41] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' [01:41:19] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1046.eqiad.wmnet' [01:45:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1074 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [01:48:49] FIRING: NeutronAgentDownForLong: Neutron neutron-openvswitch-agent on cloudvirt1074 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [01:50:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1076.eqiad.wmnet' [01:57:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1076.eqiad.wmnet' [03:00:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' [03:04:01] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1046.eqiad.wmnet' [03:04:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1050.eqiad.wmnet' [03:21:47] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1050.eqiad.wmnet' [03:45:04] FIRING: ObjectStorageObjectQuotaFull: Object storage quota by 'objects' is 87.49% full for project tools - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageObjectQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageObjectQuotaFull [07:45:04] FIRING: ObjectStorageObjectQuotaFull: Object storage quota by 'objects' is 88.2% full for project tools - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageObjectQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageObjectQuotaFull [07:47:19] 06cloud-services-team, 10Cloud-VPS: cloudvirt1071 crash - https://phabricator.wikimedia.org/T410470#11387063 (10taavi) SEL shows this, although the timing seems to be after the reboot: ` ------------------------------------------------------------------------------- Record: 7 Date/Time: 11/18/2025 23:26... [07:55:13] 10Toolforge, 06tools-infrastructure-team: Reduce tool breakage over new ingress-nginx annotation validation rules - https://phabricator.wikimedia.org/T409474#11387070 (10taavi) 05Open→03Resolved It seems like you left some of the old ingress objects in place. I deleted those, and manually checked that... [07:57:06] (03open) 10taavi: Reapply "ingress-nginx: Update to chart-4.13.3 on tools" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1085 (https://phabricator.wikimedia.org/T383516) [07:57:10] (03update) 10taavi: Reapply "ingress-nginx: Update to chart-4.13.3 on tools" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1085 (https://phabricator.wikimedia.org/T383516) [07:58:43] 06cloud-services-team, 06SRE: latest Trixie image (as of 2025-10-16) grub failure on R450 hardware - https://phabricator.wikimedia.org/T407586#11387076 (10fgiunchedi) [08:00:40] (03approved) 10filippo: Reapply "ingress-nginx: Update to chart-4.13.3 on tools" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1085 (https://phabricator.wikimedia.org/T383516) (owner: 10taavi) [08:01:30] (03merge) 10taavi: Reapply "ingress-nginx: Update to chart-4.13.3 on tools" [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1085 (https://phabricator.wikimedia.org/T383516) [08:01:35] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [08:04:09] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx [08:09:03] 06cloud-services-team (FY2025/26-Q1): [infra,haproxy,ingress] 2025-09-23 Ingress hitting the backend session limit and started replying with 5xxs - https://phabricator.wikimedia.org/T405280#11387117 (10taavi) [08:09:04] 06cloud-services-team, 10Toolforge: HAProxy frontend session limit hit (repeat outage) - https://phabricator.wikimedia.org/T410463#11387116 (10taavi) [08:10:37] 06cloud-services-team, 10Toolforge: HAProxy frontend session limit hit (repeat outage) - https://phabricator.wikimedia.org/T410463#11387121 (10taavi) 05Open→03Resolved This outage is over and the particular issue was fixed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1206902. Follow-ups ar... [08:30:25] 06cloud-services-team, 10Cloud-VPS: cloudvirt1071 crash - https://phabricator.wikimedia.org/T410470#11387157 (10dcaro) p:05Triage→03High [08:30:55] 06cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: cloudvirt1071 crash - https://phabricator.wikimedia.org/T410470#11387159 (10dcaro) [08:30:57] 06cloud-services-team (FY2025/26-Q1), 10Cloud-VPS: cloudvirt1071 crash - https://phabricator.wikimedia.org/T410470#11387162 (10dcaro) [08:32:12] 06cloud-services-team, 10Cloud-VPS: Fix pacct rotation properly everywhere - https://phabricator.wikimedia.org/T410410#11387181 (10dcaro) p:05Triage→03Low [08:33:39] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Ensure ingress pods get scheduled on ingress nodes - https://phabricator.wikimedia.org/T410382#11387183 (10dcaro) p:05Triage→03Medium [08:34:50] 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 06SRE, 10vm-requests: Site: codfw 1 VM request for codfw1dev CAS test/dev, hostname: cloudidp2001-dev - https://phabricator.wikimedia.org/T410294#11387197 (10dcaro) p:05Triage→03Medium [08:34:52] 06cloud-services-team, 10Cloud-VPS: [tofu-infra] tofu failing to retrieve DNS zones on codfw - https://phabricator.wikimedia.org/T410265#11387198 (10dcaro) p:05Triage→03Medium [08:35:14] 06cloud-services-team, 10Cloud-VPS: tofu-infra: add cinder volume types - https://phabricator.wikimedia.org/T410148#11387199 (10dcaro) p:05Triage→03Medium [08:35:35] 06cloud-services-team, 10Toolforge: redis-cli is absent from tools bastion hosts - https://phabricator.wikimedia.org/T410102#11387200 (10dcaro) p:05Triage→03Low [08:35:52] 06cloud-services-team, 10Toolforge: [bastion] redis-cli is absent from tools bastion hosts - https://phabricator.wikimedia.org/T410102#11387201 (10dcaro) [08:36:10] 06cloud-services-team, 10Toolforge: [builds-api] support specifying tag in build - https://phabricator.wikimedia.org/T410058#11387203 (10dcaro) p:05Triage→03Low [08:36:26] 06cloud-services-team, 10Toolforge: [logs-api] `--follow` returns inconsistent/artificial log entries - https://phabricator.wikimedia.org/T410055#11387204 (10dcaro) p:05Triage→03Medium [08:36:39] 06cloud-services-team, 10Toolforge: [jobs-cli] provides no meaningful feedback for delete - https://phabricator.wikimedia.org/T410048#11387205 (10dcaro) p:05Triage→03Low [08:36:48] 06cloud-services-team, 10Toolforge: [jobs-cli] provides no meaningful feedback for restart - https://phabricator.wikimedia.org/T410046#11387206 (10dcaro) p:05Triage→03Low [08:37:18] 06cloud-services-team, 10Toolforge, 13Patch-Needs-Improvement: Use GitLab CI to upload packages to the toolsbeta repo - https://phabricator.wikimedia.org/T340180#11387207 (10dcaro) p:05Triage→03Medium [08:37:31] 06cloud-services-team, 10Cloud-VPS, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: sso failure in codfw1dev (labtesthorizon.wikimedia.org) - https://phabricator.wikimedia.org/T409328#11387208 (10dcaro) p:05Triage→03High [08:37:52] 06cloud-services-team, 10Toolforge: Add paging alert if Toolforge HAProxy connection limit is reached - https://phabricator.wikimedia.org/T410421#11387209 (10dcaro) p:05Triage→03High [08:40:39] 06cloud-services-team, 10Toolforge: HAProxy frontend session limit hit (repeat outage) - https://phabricator.wikimedia.org/T410463#11387251 (10DamianZaremba) I got another alert around 00:28 CET, will investigate when I have a moment, but had the same symptoms as this. I can't subscribe to T410433, so rep... [08:53:58] 06cloud-services-team, 10Toolforge: Add paging alert if Toolforge HAProxy connection limit is reached - https://phabricator.wikimedia.org/T410421#11387282 (10dcaro) a:03dcaro [08:55:09] 06cloud-services-team, 10Toolforge: Add paging alert if Toolforge HAProxy connection limit is reached - https://phabricator.wikimedia.org/T410421#11387287 (10dcaro) [09:01:57] (03update) 10taavi: kubernetes: Update HAProxy metrics for the new exporter [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/46 (https://phabricator.wikimedia.org/T343885) [09:04:11] 06cloud-services-team, 10Toolforge: Add paging alert if Toolforge HAProxy connection limit is reached - https://phabricator.wikimedia.org/T410421#11387307 (10dcaro) Related to {T343885} [09:04:57] FIRING: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown [09:05:28] FIRING: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [09:29:32] !log dcaro@cloudcumin1001 paws START - Cookbook wmcs.vps.instance.force_reboot vm paws-127c-uwce57bvcgrt-node-1 (cluster eqiad1, project paws) [09:29:35] !log dcaro@cloudcumin1001 paws END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm paws-127c-uwce57bvcgrt-node-1 (cluster eqiad1, project paws) [09:31:23] !log dcaro@cloudcumin1001 paws START - Cookbook wmcs.vps.instance.force_reboot vm paws-127c-uwce57bvcgrt-node-2 (cluster eqiad1, project paws) [09:31:25] !log dcaro@cloudcumin1001 paws END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm paws-127c-uwce57bvcgrt-node-2 (cluster eqiad1, project paws) [09:32:33] !log dcaro@cloudcumin1001 paws START - Cookbook wmcs.vps.instance.force_reboot vm paws-127c-uwce57bvcgrt-node-4 (cluster eqiad1, project paws) [09:32:36] !log dcaro@cloudcumin1001 paws END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm paws-127c-uwce57bvcgrt-node-4 (cluster eqiad1, project paws) [09:41:27] RESOLVED: PawsJupyterHubDown: PAWS JupyterHub is down https://wikitech.wikimedia.org/wiki/PAWS/Admin - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPawsJupyterHubDown [09:41:58] RESOLVED: TargetDown: Job jupyterhub is unreachable in project paws instance hub-paws.wmcloud.org:443 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [10:15:25] dhinus opened https://github.com/toolforge/paws/pull/504 [10:16:56] 06cloud-services-team, 10PAWS: Update list of PAWS admins - https://phabricator.wikimedia.org/T397165#11387492 (10fnegri) 05Open→03In progress a:03fnegri https://github.com/toolforge/paws/pull/504 [10:31:28] 06cloud-services-team (FY2025/26-Q1), 10PAWS: Update list of PAWS admins - https://phabricator.wikimedia.org/T397165#11387540 (10fnegri) 05In progress→03Resolved [10:31:34] dhinus closed https://github.com/toolforge/paws/pull/504 [11:18:25] (03approved) 10fnegri: kubernetes: Update HAProxy metrics for the new exporter [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/46 (https://phabricator.wikimedia.org/T343885) (owner: 10taavi) [11:21:04] (03update) 10taavi: kubernetes: Update HAProxy metrics for the new exporter [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/46 (https://phabricator.wikimedia.org/T343885) [11:23:00] (03merge) 10taavi: kubernetes: Update HAProxy metrics for the new exporter [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/46 (https://phabricator.wikimedia.org/T343885) [11:28:14] FIRING: [9x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-elastic-4.tools.eqiad1.wikimedia.cloud is - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [11:32:34] (03open) 10taavi: kubernetes: haproxy: Specify job name [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/50 (https://phabricator.wikimedia.org/T343885) [11:32:38] (03update) 10taavi: kubernetes: haproxy: Specify job name [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/50 (https://phabricator.wikimedia.org/T343885) [11:34:33] (03update) 10taavi: kubernetes: haproxy: Specify job name [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/50 (https://phabricator.wikimedia.org/T343885) [11:38:29] (03approved) 10dcaro: kubernetes: haproxy: Specify job name [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/50 (https://phabricator.wikimedia.org/T343885) (owner: 10taavi) [11:39:51] (03merge) 10taavi: kubernetes: haproxy: Specify job name [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/50 (https://phabricator.wikimedia.org/T343885) [11:45:04] FIRING: ObjectStorageObjectQuotaFull: Object storage quota by 'objects' is 88.97% full for project tools - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageObjectQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageObjectQuotaFull [11:58:14] RESOLVED: [9x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-elastic-4.tools.eqiad1.wikimedia.cloud is - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [12:05:49] 10Toolforge, 06tools-infrastructure-team: [infra,k8s] Upgrade ingress-nginx to v1.12.1+ - https://phabricator.wikimedia.org/T383516#11387762 (10taavi) 05Open→03Resolved [13:04:35] 06cloud-services-team, 10Toolforge: Move all alerts to the toolforge/alerts git repo - https://phabricator.wikimedia.org/T410505 (10fnegri) 03NEW [13:05:31] 06cloud-services-team, 10Toolforge: Move all Toolforge alerts to the toolforge/alerts git repo - https://phabricator.wikimedia.org/T410505#11387984 (10taavi) [13:09:57] 06cloud-services-team, 10Toolforge: Move all Toolforge alerts to the toolforge/alerts git repo - https://phabricator.wikimedia.org/T410505#11387996 (10fnegri) [13:16:06] 06cloud-services-team, 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11388009 (10Gopavasanth) @Reputation22 I don't think we need to set alerts for the beta version, since it's mostly used for our internal testing. Please upda... [13:17:37] 06cloud-services-team, 10Toolforge: Move all Toolforge alerts to the toolforge/alerts git repo - https://phabricator.wikimedia.org/T410505#11388014 (10fnegri) The rules in the git repo are deployed to both tools prometheus and toolsbeta prometheus, but we can add `# deploy-tag: project-tools` (see [filtering i... [13:24:05] 06cloud-services-team, 10Toolforge, 10Tools: Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic - https://phabricator.wikimedia.org/T409185#11388043 (10Magnus) p:05Triage→03Unbreak! I tried to restart the webservice as the Rust version (in `geohack` s... [13:33:11] 06cloud-services-team, 10Toolforge, 10Tools: Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic - https://phabricator.wikimedia.org/T409185#11388060 (10taavi) p:05Unbreak!→03Triage [13:34:32] 06cloud-services-team, 10Toolforge, 10Tools: Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic - https://phabricator.wikimedia.org/T409185#11388063 (10taavi) It seems like your `Procfile` is missing the actual command to execute? Try this: `lang=yaml web... [13:47:41] 06cloud-services-team, 10Toolforge, 10Tools: Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic - https://phabricator.wikimedia.org/T409185#11388109 (10Magnus) 05Open→03In progress thanks, found it myself :-) [13:50:19] 06cloud-services-team, 06SRE, 13Patch-For-Review: latest Trixie image (as of 2025-10-16) grub failure on R450 hardware - https://phabricator.wikimedia.org/T407586#11388128 (10fgiunchedi) Reported to Debian as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1121006 [13:52:30] 06cloud-services-team, 06SRE, 13Patch-For-Review, 07Upstream: latest Trixie image (as of 2025-10-16) grub failure on R450 hardware - https://phabricator.wikimedia.org/T407586#11388143 (10taavi) [13:56:59] 06cloud-services-team, 10Cloud-VPS, 10VideoCutTool: [alerting] Create alerts for cloud-vps/VideoCutTool app - https://phabricator.wikimedia.org/T409668#11388172 (10Reputation22) >>! In T409668#11388009, @Gopavasanth wrote: > @Reputation22 I don't think we need to set alerts for the beta version, since it's m... [14:22:39] 06cloud-services-team, 10Toolforge, 10Tools: Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic - https://phabricator.wikimedia.org/T409185#11388256 (10taavi) It's too early to make full conclusions yet, but so far it's looking very promising. CPU usage i... [14:41:35] 06cloud-services-team (FY2025/26-Q1), 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Unplanned, 13Patch-For-Review: [promethus,haproxy] Move to haproxy internal metrics from haproxy_exporter - https://phabricator.wikimedia.org/T343885#11388361 (10taavi) 05Open→03Resolved [14:41:52] 06cloud-services-team, 10Cloud-VPS, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: sso failure in codfw1dev (labtesthorizon.wikimedia.org) - https://phabricator.wikimedia.org/T409328#11388363 (10taavi) a:05taavi→03None [15:11:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1050.eqiad.wmnet' [15:12:56] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1050.eqiad.wmnet' [15:15:03] (03PS1) 10Majavah: openstack: ensure_canary: Use Trixie for new instances [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1207176 [15:16:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:17:30] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [15:17:38] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:18:11] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [15:18:35] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:19:08] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [15:19:13] 10Tool-translatetagger: Add support for Wikitext Tables - https://phabricator.wikimedia.org/T374784#11388547 (10Gopavasanth) 05Open→03Resolved [15:19:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:19:46] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [15:20:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.set_maintenance [15:20:08] 10Tools, 05PES1.3.3 WP25 Easter Eggs, 07Software-Licensing: wikipedia25-years-of-wikipedia tool loads and uses non-free JavaScript - https://phabricator.wikimedia.org/T410465#11388553 (10Reedy) [15:20:39] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) [15:20:45] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1050.eqiad.wmnet' [15:21:57] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1050.eqiad.wmnet' [15:22:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1046.eqiad.wmnet' [15:25:23] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1046.eqiad.wmnet' [15:25:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1054.eqiad.wmnet' [15:37:11] (03update) 10volans: tracing: add tracing loki instance [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1040 (https://phabricator.wikimedia.org/T399313) [15:40:30] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1054.eqiad.wmnet' [15:40:54] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1054.eqiad.wmnet' [15:44:55] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1054.eqiad.wmnet' [15:45:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1061.eqiad.wmnet' [15:45:04] FIRING: ObjectStorageObjectQuotaFull: Object storage quota by 'objects' is 89.85% full for project tools - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageObjectQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageObjectQuotaFull [15:46:18] !log volans@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component registry-admission [15:54:18] !log volans@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission [15:54:43] !log volans@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logging [16:03:10] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1061.eqiad.wmnet' [16:03:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1070.eqiad.wmnet' [16:04:55] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1070.eqiad.wmnet' [16:09:57] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt2004-dev.codfw.wmnet' [16:10:01] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt2004-dev.codfw.wmnet' [16:10:43] !log volans@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging [16:11:05] !log volans@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing [16:11:11] !log volans@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component infra-tracing [16:12:59] (03update) 10volans: tracing: add tracing loki instance [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1040 (https://phabricator.wikimedia.org/T399313) [16:13:23] (03update) 10volans: tracing: add tracing loki instance [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1040 (https://phabricator.wikimedia.org/T399313) [16:18:23] !log volans@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing [16:23:36] !log volans@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component infra-tracing [16:33:02] (03update) 10volans: tracing: add tracing loki instance [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1040 (https://phabricator.wikimedia.org/T399313) [16:33:36] !log volans@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing [16:50:50] !log volans@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing [17:15:19] RESOLVED: ObjectStorageObjectQuotaFull: Object storage quota by 'objects' is 90.02% full for project tools - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageObjectQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageObjectQuotaFull [18:07:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet' [18:12:11] 06cloud-services-team, 10Toolforge, 10Tools: Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic - https://phabricator.wikimedia.org/T409185#11389302 (10Magnus) Fixed the CSS MIMEtype header. Seems to run smoothly on 1 CPU now. Feel free to up that if it's... [18:20:50] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1040.eqiad.wmnet' [18:26:03] 10Tool-containers: Document at containers.toolforge.org - https://phabricator.wikimedia.org/T368391#11389332 (10bd808) 05Open→03Resolved p:05Triage→03Medium a:03bd808 Done using https://wikitech.wikimedia.org/wiki/Tool:Containers#Redirect_container (I'm also now watching this project so next time... [18:31:59] 10Wikibugs: Wikibugs does not rejoin channels automatically following a BNC restart - https://phabricator.wikimedia.org/T410540 (10bd808) 03NEW [18:34:24] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet' [18:36:41] 10Wikibugs, 06MediaWiki-Platform-Team (Radar): Allow filtering of Gerrit events by file pattern - https://phabricator.wikimedia.org/T410103#11389406 (10bd808) p:05Triage→03Medium [18:37:03] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1040.eqiad.wmnet' [18:37:29] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet' [18:40:00] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1040.eqiad.wmnet' [18:40:33] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet' [18:43:10] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1040.eqiad.wmnet' [19:03:58] 10Wikibugs, 06MediaWiki-Platform-Team (Radar): Allow filtering of Gerrit events by file pattern - https://phabricator.wikimedia.org/T410103#11389565 (10bd808) `ChannelFilter.channels_for` has a "selector" concept where callable can be used to apply additional tests to an event when deciding which channels to n... [19:44:14] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm None (cluster eqiad1) [19:44:14] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.vps.instance.stop_start (exit_code=99) vm None (cluster eqiad1) [19:44:41] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm content-diff-index (cluster eqiad1) [19:44:43] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.vps.instance.stop_start (exit_code=99) vm content-diff-index (cluster eqiad1) [19:44:58] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.vps.instance.stop_start vm 764d92cd-09df-468a-9595-7bcfbc4a8841 (cluster eqiad1) [19:45:34] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm 764d92cd-09df-468a-9595-7bcfbc4a8841 (cluster eqiad1) [19:45:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1070.eqiad.wmnet' [19:46:44] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1070.eqiad.wmnet' [20:16:37] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt2004-dev.codfw.wmnet' [20:27:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for all services [20:27:52] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment codfw1dev for all services [20:28:06] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for all services [20:28:15] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment codfw1dev for all services [20:28:46] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for all services [20:29:52] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt2004-dev.codfw.wmnet' [20:32:16] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt2004-dev.codfw.wmnet' [20:33:09] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for all services [20:37:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt2004-dev.codfw.wmnet' [20:38:56] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt2004-dev.codfw.wmnet}' [20:42:23] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0) on hosts matched by 'D{cloudvirt2004-dev.codfw.wmnet}' [20:44:26] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot on hosts matched by 'D{cloudvirt2005-dev.codfw.wmnet}' [21:07:26] andrew@cloudcumin1001 safe_reboot (PID 777411) is awaiting input [21:25:09] (03PS1) 10Kamila Součková: hcaptcha_proxy: remove unused parameters [labs/private] - 10https://gerrit.wikimedia.org/r/1207265 [22:39:07] andrew@cloudcumin1001 safe_reboot (PID 777411) is awaiting input