[00:08:17] !log raymond-ndibe@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api [00:11:09] (03update) 10raymond-ndibe: logs-api: bump to 0.0.18-20260407233052-4b8ebdca [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1212 (https://phabricator.wikimedia.org/T422454) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [00:11:12] (03approved) 10raymond-ndibe: logs-api: bump to 0.0.18-20260407233052-4b8ebdca [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1212 (https://phabricator.wikimedia.org/T422454) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [00:11:22] (03merge) 10raymond-ndibe: logs-api: bump to 0.0.18-20260407233052-4b8ebdca [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1212 (https://phabricator.wikimedia.org/T422454) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [00:26:22] 10VPS-project-Phabricator, 06collaboration-services, 06Infrastructure-Foundations, 10Mail: The test Phabricator instance doesn't seem to be successfully sending emails to @wikimedia.org addresses - https://phabricator.wikimedia.org/T422559#11797249 (10Peachey88) [00:46:48] (03update) 10raymond-ndibe: common.yaml: set max_query_length [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1211 (https://phabricator.wikimedia.org/T422453) [00:58:14] 10Tool-echo-chamber: Echo Chamber API error: You don't have permission - https://phabricator.wikimedia.org/T422318#11797276 (10Legoktm) a:03Legoktm The new OAuth consumer is pending approval: https://meta.wikimedia.org/wiki/Special:OAuthListConsumers/view/56797921b0a88d17f8ea5d338f04b11d (sorry for nerfing my... [01:21:59] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Running dotnet job fails on Toolforge because "24" builder stack changed the compiled binary output path - https://phabricator.wikimedia.org/T422224#11797303 (10Hawkeye7) @dcaro Testing failed. This is not sufficient. My bots take command line... [01:34:09] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Running dotnet job fails on Toolforge because "24" builder stack changed the compiled binary output path - https://phabricator.wikimedia.org/T422224#11797309 (10Hawkeye7) The entries in my jobs.yaml look like this: ` # announcements - name:... [02:31:31] 10Tool-global-search: Global search fails with HTTP 500 - https://phabricator.wikimedia.org/T286388#11797347 (10MusikAnimal) 05Open→03Resolved a:03MusikAnimal Okay, closing! [02:41:00] 10Tool-refill: delete 33,000 unnecessary front end files from production - https://phabricator.wikimedia.org/T422441#11797362 (10Novem_Linguae) [02:46:24] 06cloud-services-team, 10Toolforge: [builds-builder,dotnet] migrate to Heroku buildpack for dotnet 10 - https://phabricator.wikimedia.org/T412653#11797370 (10Hawkeye7) This has been provided with T380127 so it can be closed or merged now. [03:05:29] 10Tool-refill: create a deploy.sh script for the front end - https://phabricator.wikimedia.org/T422570 (10Novem_Linguae) 03NEW [03:05:51] 10Tool-refill: figure out how to deploy the front end - https://phabricator.wikimedia.org/T422370#11797401 (10Novem_Linguae) [03:11:57] 10Tool-refill: refill continuous integration is broken - https://phabricator.wikimedia.org/T367026#11797407 (10Novem_Linguae) I've disabled the 4 failing Python CI jobs because I don't like red X's that aren't caused by the patch itself. These should be fixed and re-enabled in the future when someone tackles the... [03:23:48] 06cloud-services-team, 10Toolforge: Connection with `k8s.tools.eqiad1.wikimedia.cloud` hits SSL error - https://phabricator.wikimedia.org/T422538#11797419 (10Nokib_Sarkar) Now it gave me ` Traceback (most recent call last): File "/usr/bin/toolforge-webservice", line 33, in sys.exit(load_entry_po... [03:50:44] 10Tool-refill: apnews.com contains author "leer en espanol" - https://phabricator.wikimedia.org/T422572 (10Novem_Linguae) 03NEW [03:56:14] 10Tool-refill: reuters.com fails with confusing error message - https://phabricator.wikimedia.org/T422573 (10Novem_Linguae) 03NEW [04:15:46] 10Tool-refill: should use {{Cite news}} for bbc.co.uk - https://phabricator.wikimedia.org/T422574 (10Novem_Linguae) 03NEW [04:45:41] 10Tool-refill: figure out how to deploy the front end - https://phabricator.wikimedia.org/T422370#11797535 (10Novem_Linguae) [04:45:51] 10Tool-refill: figure out how to deploy the front end - https://phabricator.wikimedia.org/T422370#11797537 (10Novem_Linguae) 05Open→03Resolved a:03Novem_Linguae [05:03:47] !log tools.cluebotng-review Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24118645397 (https://github.com/cluebotng/component-configs/commits/908dd70b5972cca0c0dafbe50a0020547b833a4e) [05:03:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [06:38:00] 10Tool-refill: create a deploy.sh script for the front end - https://phabricator.wikimedia.org/T422570#11797704 (10Alachuckthebuck) @Novem_Linguae, it shouldn’t be, @anticomposite said it was on the default python package. That might have been the backend though. Either way, the whole thing needs a serious updat... [06:42:43] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure: Puppet fail to create volume group for ephemeral disk space when it is sda (instead of sdb) - https://phabricator.wikimedia.org/T422258#11797706 (10hashar) [06:43:54] 10Tool-refill: create a deploy.sh script for the front end - https://phabricator.wikimedia.org/T422570#11797707 (10Novem_Linguae) Yeah, looks like (back end) refill-api's webservice uses python 3.7, and (front end) refill's webservice uses php 8.2. ` novemlinguae@tools-bastion-15:~$ become refill-api tools.refi... [07:04:59] PROBLEM - Host cloudrabbit1001 is DOWN: PING CRITICAL - Packet loss = 100% [07:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:09:47] FIRING: NodeDown: Node cloudrabbit1001 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1001 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [07:15:24] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure: Puppet fail to create volume group for ephemeral disk space when it is sda (instead of sdb) - https://phabricator.wikimedia.org/T422258#11797761 (10hashar) I asked WMCS admins: ` lang=irc hashar: unfortunately the dri... [07:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:27:29] RECOVERY - Host cloudrabbit1001 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [07:29:47] RESOLVED: NodeDown: Node cloudrabbit1001 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1001 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [07:38:45] PROBLEM - Host cloudcontrol1011 is DOWN: PING CRITICAL - Packet loss = 100% [07:39:10] FIRING: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [07:39:22] FIRING: [15x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [07:43:47] FIRING: NodeDown: Node cloudcontrol1011 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1011 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [07:43:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:44:10] FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [07:48:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [07:50:17] FIRING: JobUnavailable: Reduced availability for job blackbox_http in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [07:55:17] RESOLVED: JobUnavailable: Reduced availability for job blackbox_http in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [07:58:17] RECOVERY - Host cloudcontrol1011 is UP: PING OK - Packet loss = 0%, RTA = 7.19 ms [07:58:47] RESOLVED: NodeDown: Node cloudcontrol1011 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1011 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [07:59:10] RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [07:59:22] RESOLVED: [15x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [08:13:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [08:16:01] PROBLEM - Host cloudrabbit1001 is DOWN: PING CRITICAL - Packet loss = 100% [08:19:22] 06cloud-services-team, 10Toolforge (Toolforge iteration 26), 06tools-platform-team: [builds-builder,dotnet] migrate to Heroku buildpack for dotnet 10 - https://phabricator.wikimedia.org/T412653#11797911 (10dcaro) 05Open→03Resolved a:03dcaro [08:20:29] RECOVERY - Host cloudrabbit1001 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [08:20:47] FIRING: NodeDown: Node cloudrabbit1001 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1001 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [08:24:28] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [08:25:47] RESOLVED: NodeDown: Node cloudrabbit1001 is down. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NodeDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1001 - https://alerts.wikimedia.org/?q=alertname%3DNodeDown [08:28:14] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [08:31:13] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [08:35:40] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [08:36:20] 06cloud-services-team, 10Toolforge: Connection with `k8s.tools.eqiad1.wikimedia.cloud` hits SSL error - https://phabricator.wikimedia.org/T422538#11797990 (10dcaro) I have been trying to trigger this from my own tool, running `toolforge components config create config.yaml` in a loop for 1000 times, and was un... [08:37:57] (03approved) 10dcaro: api-gateway: bump to 0.0.91-20260407134903-c79d5988 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1209 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [08:38:00] (03update) 10dcaro: api-gateway: bump to 0.0.91-20260407134903-c79d5988 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1209 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [08:38:27] (03merge) 10dcaro: api-gateway: bump to 0.0.91-20260407134903-c79d5988 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1209 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [08:38:32] (03update) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/16 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [08:48:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:13:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:15:56] FIRING: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:20:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:30:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:34:40] 06cloud-services-team, 10Cloud-VPS, 10Continuous-Integration-Infrastructure: Puppet fail to create volume group for ephemeral disk space when it is sda (instead of sdb) - https://phabricator.wikimedia.org/T422258#11798108 (10hashar) > Stuff to find: where is `sdb` hardcoded? ;-] So we have in the labs_lvm mo... [09:35:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:37:21] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#11798110 (10dcaro) I suspect this has gotten wor... [09:38:55] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11798114 (10fgiunchedi) Today cloudrabbit1001 and cloudcontrol1011 were tested: * rabbitmq itself performed as expected, i.e. all quorum queues were not l... [09:45:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:50:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:08:09] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Running dotnet job fails on Toolforge because "24" builder stack changed the compiled binary output path - https://phabricator.wikimedia.org/T422224#11798221 (10dcaro) >>! In T422224#11797303, @Hawkeye7 wrote: > @dcaro Testing failed. This is not... [10:08:47] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#11798223 (10dcaro) Sent a patch upstream (https:... [10:14:08] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#11798230 (10dcaro) 05Open→03In progress [10:15:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [10:49:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:00:30] (03update) 10fnegri: Add --diff-mode and remove --dry-run [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/10 (https://phabricator.wikimedia.org/T351637) [11:00:30] (03update) 10fnegri: Add summary with counts [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/11 (https://phabricator.wikimedia.org/T351637) [11:00:51] (03update) 10fnegri: Add summary with counts [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/11 (https://phabricator.wikimedia.org/T351637) [11:00:54] (03update) 10fnegri: Add --diff-mode and remove --dry-run [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/10 (https://phabricator.wikimedia.org/T351637) [11:00:57] (03update) 10fnegri: Replace only views that need updating [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/9 (https://phabricator.wikimedia.org/T351637) [11:04:16] (03update) 10fnegri: Add --diff-mode and remove --dry-run [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/10 (https://phabricator.wikimedia.org/T351637) [11:08:56] FIRING: CloudVPSDesignateLeaks: Detected 29 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:14:56] RESOLVED: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:18:14] 10VPS-project-Phabricator, 06collaboration-services, 06Infrastructure-Foundations, 10Mail: @wikimedia.org email addresses don't seem to be receiving emails sent by the test Phabricator instance - https://phabricator.wikimedia.org/T422559#11798466 (10A_smart_kitten) [11:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:33:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:43:56] RESOLVED: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:44:15] 10VPS-project-Phabricator, 06collaboration-services, 06Infrastructure-Foundations, 10Mail: @wikimedia.org email addresses don't seem to be receiving emails sent by the test Phabricator instance - https://phabricator.wikimedia.org/T422559#11798615 (10A_smart_kitten) Yeah, I guess it seems like this might po... [11:45:56] FIRING: SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:49:52] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Toolforge Prometheus instance is unstable - https://phabricator.wikimedia.org/T422287#11798641 (10taavi) 05Open→03Resolved a:03taavi Declaring this a success, Prometheus has stayed up without an OOM for a significantly longer time than it d... [11:50:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [11:53:01] !log filippo@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,designate [11:54:04] !log filippo@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment eqiad1 for service: project,designate [11:58:34] !log filippo@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,designate [11:59:37] !log filippo@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) on deployment eqiad1 for service: project,designate [12:00:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:05:56] FIRING: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:15:34] (03open) 10dcaro: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 [12:15:56] RESOLVED: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [12:32:24] (03update) 10dcaro: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 [12:41:00] FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [13:01:11] 10Tool-refill: apnews.com contains author "leer en espanol" - https://phabricator.wikimedia.org/T422572#11799006 (10Novem_Linguae) Able to reproduce in Visual Editor's generate citation. Guess this means it's an upstream bug with Zotero. {F75324253} [13:02:50] 10Tool-refill: reuters.com fails with confusing error message - https://phabricator.wikimedia.org/T422573#11799015 (10Novem_Linguae) Visual Editor generate citation also gives an error. An upstream bug may be involved, but there are parts of this ticket that are definitely not upstream (error wording, treating n... [13:04:26] 10Tool-refill: should use {{Cite news}} for bbc.co.uk - https://phabricator.wikimedia.org/T422574#11799026 (10Novem_Linguae) Visual Editor's generate citation tool also chooses `{{Cite web}}` instead of `{{Cite news}}`. Does that make this an upstream Citoid bug? {F75324396} [13:18:56] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-dnsleaks.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:32:38] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Designate API timing out - https://phabricator.wikimedia.org/T422646 (10fgiunchedi) 03NEW [13:37:22] FIRING: HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [13:42:22] FIRING: [2x] HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [13:43:56] FIRING: [2x] SystemdUnitDown: The service unit designate-api.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:48:56] FIRING: [3x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:52:22] RESOLVED: HAProxyBackendUnavailable: HAProxy service designate-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [13:53:56] RESOLVED: [2x] SystemdUnitDown: The service unit designate_floating_ip_ptr_records_updater.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:54:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [13:58:00] FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [13:59:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:05:17] (03update) 10raymond-ndibe: common.yaml: set max_query_length [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1211 (https://phabricator.wikimedia.org/T422453) [14:05:19] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-31 (2026-04-07 to 2026-04-21)): Fix linter issues discovered during implementation of the OAD example - https://phabricator.wikimedia.org/T414974#11799767 (10KBach) [14:07:44] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 07Essential-Work: Linter: Split rules into more logical groupings. - https://phabricator.wikimedia.org/T415914#11799775 (10KBach) [14:08:00] FIRING: [2x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:13:00] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:15:30] RESOLVED: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures [14:16:05] 10Tool-refill: reuters.com fails with confusing error message - https://phabricator.wikimedia.org/T422573#11799808 (10Novem_Linguae) Probably related bug report in Citoid: {T359161} [14:19:38] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: Support pre-built images on components-api - https://phabricator.wikimedia.org/T405262#11799849 (10aputhin) p:05High→03Medium Changing to medium prio after refinement. We need a bit more clarity on impact of the use cases for this feature. @dcaro to l... [14:21:17] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11799854 (10fgiunchedi) FWIW the oslo timeout issue looks like to me a whole lot like https://bugs.launchpad.net/oslo.messaging/+bug/2096926 [14:22:08] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: [components-api] Add webservice support - https://phabricator.wikimedia.org/T362077#11799858 (10aputhin) [14:22:11] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Designate API timing out - https://phabricator.wikimedia.org/T422646#11799859 (10fgiunchedi) Following up from IRC: stopping memcached on all cloudcontrols, together with all designate servers, then restarting memcached and designate seems to have brought... [14:23:06] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team: [components-api] Add webservice support - https://phabricator.wikimedia.org/T362077#11799861 (10aputhin) [14:23:45] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team: Support pre-built images on components-api - https://phabricator.wikimedia.org/T405262#11799863 (10aputhin) [14:25:02] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11799875 (10Andrew) > cloudcontrol nodes not in C8 (i.e. 1006/1007) though didn't seem to give up trying to connect to rabbitmq01.eqiad1.wikimediacloud.org... [14:32:50] 06cloud-services-team, 10Data-Services, 06tools-platform-team: [wikireplicas] Upgrade clouddbs to 10.11.16 - https://phabricator.wikimedia.org/T422527#11799961 (10aputhin) - This has to be done sporadically when there's a new version provided by Data Persistence - Should / Could possibly be the responsibilit... [14:35:16] 10Data-Services, 06tools-platform-team, 10Datasets-General-or-Unknown: Stop serving dumps.wikimedia.org port 80 - https://phabricator.wikimedia.org/T422672 (10taavi) 03NEW [14:37:45] 06cloud-services-team, 10Toolforge, 06tools-infrastructure-team: Toolforge Prometheus instance is unstable - https://phabricator.wikimedia.org/T422287#11800008 (10aputhin) [14:39:18] 10Data-Services, 06tools-platform-team, 10Datasets-General-or-Unknown, 13Patch-For-Review: Stop serving dumps.wikimedia.org port 80 - https://phabricator.wikimedia.org/T422672#11800024 (10aputhin) @taavi is this done by tools platform or tools infra? [14:43:02] 10Toolforge, 06tools-platform-team: clis: only create tag on merge of the release patch - https://phabricator.wikimedia.org/T422452#11800046 (10aputhin) p:05Triage→03Medium [14:45:20] 10Toolforge (Toolforge iteration 26), 06tools-platform-team: [general] upgrade all python repos to python >=3.13 - https://phabricator.wikimedia.org/T422184#11800061 (10aputhin) p:05Triage→03Medium [14:47:28] 10Toolforge, 06tools-platform-team: clis: only create tag on merge of the release patch - https://phabricator.wikimedia.org/T422452#11800080 (10taavi) Oppose. Pushing a tag should be the action that triggers the release pipeline. > If for any reason you push the tags and later need to modify the patch, you ca... [14:50:24] (03update) 10dcaro: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 [14:52:53] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 07Epic, 13Patch-For-Review: [jobs-api] allow exposing continuous jobs to the internet via `toolname.toolforge.org`, just like webservice - https://phabricator.wikimedia.org/T388092#11800104 (10aputhin)... [14:54:45] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 07Epic, 13Patch-For-Review: [jobs-api] allow exposing continuous jobs to the internet via `toolname.toolforge.org`, just like webservice - https://phabricator.wikimedia.org/T388092#11800106 (10aputhin)... [14:58:11] RESOLVED: CloudVPSDesignateLeaks: Detected 64 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:59:29] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 07Upstream: [builds-builder] golang based images get infinite nested loops for procfile entries - https://phabricator.wikimedia.org/T363417#11800121 (10dcaro) 05Stalled→03In progress [15:01:24] 06cloud-services-team, 10Toolforge, 06tools-platform-team: `toolforge jobs logs` misplaces my logs - https://phabricator.wikimedia.org/T421929#11800129 (10dcaro) 05Open→03Resolved I'll close this, but feel free to reopen if you still have issues. [15:02:32] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 13Patch-For-Review: [jobs-api] Allow customizing time to request Loki logs for - https://phabricator.wikimedia.org/T400917#11800137 (10dcaro) 05In progress→03Resolved [15:02:44] (03update) 10raymond-ndibe: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 (owner: 10dcaro) [15:02:50] (03approved) 10raymond-ndibe: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 (owner: 10dcaro) [15:05:50] (03update) 10dcaro: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 [15:08:47] (03update) 10dcaro: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 [15:15:30] (03approved) 10dcaro: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 [15:15:38] (03merge) 10dcaro: procfile: fix the processes generated to allow args [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/87 [15:16:54] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: builds-builder: bump to 0.0.147-20260408151547-25552732 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1213 (https://phabricator.wikimedia.org/T356016) [15:17:01] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: builds-builder: bump to 0.0.147-20260408151547-25552732 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1213 (https://phabricator.wikimedia.org/T356016) [15:19:58] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-builder [15:24:37] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder [15:27:32] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 [15:28:52] 06cloud-services-team, 10decommission-hardware: decommission cloudcephmon2004-dev - https://phabricator.wikimedia.org/T422437#11800219 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin2002 for hosts: `cloudcephmon2004-dev.codfw.wmnet` - cloudcephmon2004-dev.codfw.wmnet (**PASS**)... [15:45:35] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-builder [15:47:10] 10Toolforge (Quota-requests): Elasticsearch credential request for techactivity - https://phabricator.wikimedia.org/T422462#11800310 (10Andrew) This is fine but I have to figure out how to do it! Docs seem to be at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Granting_a_tool_write_access_to_Elastic... [15:50:53] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder [15:53:20] 10Tool-echo-chamber: Echo Chamber API error: You don't have permission - https://phabricator.wikimedia.org/T422318#11800334 (10Legoktm) 05Open→03Resolved Thanks @bd808 for approving, the tool should be usable again. [15:58:07] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 [15:59:08] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [16:09:17] 06cloud-services-team, 06tools-platform-team, 07Surveys: 2025 Cloud Services Survey - https://phabricator.wikimedia.org/T411421#11800412 (10aputhin) [16:10:45] 06cloud-services-team, 06tools-platform-team, 07Surveys: 2025 Cloud Services Survey - https://phabricator.wikimedia.org/T411421#11800429 (10aputhin) 05Open→03In progress [16:12:30] 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 13Patch-For-Review, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#11800430 (10Andrew) The only thing left to do here (that I know if) is relative links being messed up in the initial wikitech-static landing pag... [16:23:13] 10Toolforge (Quota-requests): Elasticsearch credential request for techactivity - https://phabricator.wikimedia.org/T422462#11800469 (10dcaro) +1 from me, is a relatively big amount of data, so if you have a way to verify with less data if elastic works for you it would be appreciated (also, keep into account th... [16:30:46] 10Data-Services, 06tools-infrastructure-team, 10Datasets-General-or-Unknown, 13Patch-For-Review: Stop serving dumps.wikimedia.org port 80 - https://phabricator.wikimedia.org/T422672#11800519 (10taavi) [16:30:51] 10Data-Services, 06tools-infrastructure-team, 10Datasets-General-or-Unknown, 13Patch-For-Review: Stop serving dumps.wikimedia.org port 80 - https://phabricator.wikimedia.org/T422672#11800520 (10taavi) a:03taavi [16:31:07] 10Data-Services, 06tools-infrastructure-team, 10Datasets-General-or-Unknown, 13Patch-For-Review: Stop serving dumps.wikimedia.org port 80 - https://phabricator.wikimedia.org/T422672#11800523 (10taavi) >>! In T422672#11800024, @aputhin wrote: > @taavi is this done by tools platform or tools infra? Bah, mis... [16:31:37] 06tools-infrastructure-team, 10Datasets-General-or-Unknown: Reclaim public IPs from individual dumps distribution (clouddumps) hosts - https://phabricator.wikimedia.org/T417028#11800524 (10taavi) [16:36:15] 06tools-platform-team: [builds-builder] use yq instead of tomljson/jq/jsontoml - https://phabricator.wikimedia.org/T422691 (10dcaro) 03NEW [16:40:53] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 13Patch-For-Review, 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#11800572 (10dcaro) This is... [16:41:18] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 13Patch-For-Review, 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#11800574 (10dcaro) 05... [16:42:05] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 13Patch-For-Review, 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#11800576 (10dcaro) [16:42:06] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 13Patch-For-Review: [builds-builder] Add support for Heroku's "24" builder stack based on Ubuntu 2024.04 noble - https://phabricator.wikimedia.org/T380127#11800577 (10dcaro) [16:44:17] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 07Upstream: [builds-builder] golang based images get infinite nested loops for procfile entries - https://phabricator.wikimedia.org/T363417#11800579 (10dcaro) 05In progress→03Resolved This is fixed also [16:45:12] 10Toolforge, 06tools-platform-team, 13Patch-For-Review: logs-api fails with cryptic error if query range is too far in the past e.g. --since 1000d - https://phabricator.wikimedia.org/T422453#11800584 (10dcaro) a:03Raymond_Ndibe [16:47:12] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 13Patch-For-Review: [jobs-api] Allow customizing time to request Loki logs for - https://phabricator.wikimedia.org/T400917#11800585 (10taavi) https://wikitech.wikimedia.org/wiki/Help:Toolforge/Runnin... [16:49:51] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 13Patch-For-Review: [jobs-api] Allow customizing time to request Loki logs for - https://phabricator.wikimedia.org/T400917#11800610 (10dcaro) >>! In T400917#11800585, @taavi wrote: > https://wikitech... [16:53:49] (03approved) 10dcaro: common.yaml: set max_query_length [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1211 (https://phabricator.wikimedia.org/T422453) (owner: 10raymond-ndibe) [16:53:51] (03update) 10dcaro: common.yaml: set max_query_length [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1211 (https://phabricator.wikimedia.org/T422453) (owner: 10raymond-ndibe) [16:54:10] (03approved) 10dcaro: builds-builder: bump to 0.0.147-20260408151547-25552732 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1213 (https://phabricator.wikimedia.org/T356016) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [16:54:15] (03merge) 10dcaro: builds-builder: bump to 0.0.147-20260408151547-25552732 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1213 (https://phabricator.wikimedia.org/T356016) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:07:32] 10Toolforge (Quota-requests): Elasticsearch credential request for techactivity - https://phabricator.wikimedia.org/T422462#11800709 (10Addshore) > is a relatively big amount of data I expect I can verify with a subset. I can also likely strip out some of the longer / larger parts of the data and not load those... [17:15:46] (03approved) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/16 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:15:49] (03merge) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/logs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/logs-api/-/merge_requests/16 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:18:15] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: logs-api: bump to 0.0.19-20260408171558-01a87ed3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1214 [17:20:26] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component logs-api [17:24:01] 10Toolforge (Quota-requests): Elasticsearch credential request for techactivity - https://phabricator.wikimedia.org/T422462#11800973 (10dcaro) >>! In T422462#11800709, @Addshore wrote: >> is a relatively big amount of data > > I expect I can verify with a subset. > I can also likely strip out some of the longer... [17:24:43] 10Toolforge (Quota-requests): Elasticsearch credential request for techactivity - https://phabricator.wikimedia.org/T422462#11800999 (10dcaro) Note that you'll be the biggest user ;) [17:28:01] (03approved) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/162 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:28:08] (03merge) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/162 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:29:11] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api [17:31:44] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component logs-api [17:31:46] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.191-20260408172821-ba145624 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1215 [17:34:41] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Running dotnet job fails on Toolforge because "24" builder stack changed the compiled binary output path - https://phabricator.wikimedia.org/T422224#11801030 (10dcaro) >>! In T422224#11798221, @dcaro wrote: >>>! In T422224#11797303, @Hawkeye7 wro... [17:35:09] (03approved) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/77 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:35:12] (03merge) 10dcaro: build: Upgrade Poetry dependencies [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/77 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:41:06] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [17:41:17] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api [17:42:24] (03approved) 10dcaro: logs-api: bump to 0.0.19-20260408171558-01a87ed3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1214 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:42:27] (03merge) 10dcaro: logs-api: bump to 0.0.19-20260408171558-01a87ed3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1214 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:42:39] (03update) 10dcaro: components-api: bump to 0.0.191-20260408172821-ba145624 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1215 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:45:46] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [17:47:09] 10Toolforge (Quota-requests): Elasticsearch credential request for techactivity - https://phabricator.wikimedia.org/T422462#11801114 (10Addshore) Hopefully the outcome will be worth it :) Some of the data I have is also already in other people's public indexes there, (the real question will be can I reasonably r... [17:50:26] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [17:55:18] (03update) 10dcaro: istio-gateway: allow customizing the resources [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1189 [17:55:20] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [17:56:20] (03approved) 10dcaro: components-api: bump to 0.0.191-20260408172821-ba145624 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1215 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:56:27] (03merge) 10dcaro: components-api: bump to 0.0.191-20260408172821-ba145624 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1215 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:56:34] 10Toolforge (Toolforge iteration 26), 06tools-platform-team: [global] First of the month automatic dependency upgrade - https://phabricator.wikimedia.org/T422191#11801158 (10dcaro) 05In progress→03Resolved [17:58:15] 06tools-platform-team: [builds-builder] use yq instead of tomljson/jq/jsontoml - https://phabricator.wikimedia.org/T422691#11801163 (10dcaro) It seems to work, but still trying to find the equivalent to `jq '.processes |= map(.command = [.command] + (.args | .[-1] += " \"$@\"") + ["--"] | .args = [])'`, not ther... [18:02:19] 06tools-platform-team: [builds-builder] use yq instead of tomljson/jq/jsontoml - https://phabricator.wikimedia.org/T422691#11801182 (10dcaro) Oop, I think this should do the trick: ` dcaro@acme$ yq -p toml -o toml ' .processes |= map( .command = ["bash"] + .args + [" \"$@\"", "--"] | .args = [] ) '... [18:03:29] 06tools-platform-team: [builds-builder] use yq instead of tomljson/jq/jsontoml - https://phabricator.wikimedia.org/T422691#11801188 (10dcaro) this :) ` dcaro@acme$ yq -p toml -o toml ' .processes |= map( .command = ["bash"] + [.args[-1] + " \"$@\"", "--"] | .args = [] ) ' /tmp/metadata.toml ` [18:08:40] (03update) 10dcaro: add jobs-api backend to webservice-cli [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/95 (https://phabricator.wikimedia.org/T348755) (owner: 10raymond-ndibe) [18:10:07] (03update) 10dcaro: add webservice endpoint [repos/cloud/toolforge/jobs-api] (add_ingress_option_to_continuous_job) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/264 (https://phabricator.wikimedia.org/T348755) (owner: 10raymond-ndibe) [18:11:04] (03update) 10dcaro: support exposing continuous jobs to the internet [repos/cloud/toolforge/jobs-api] (replace_job_images_with_web_images) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/262 (https://phabricator.wikimedia.org/T388092) (owner: 10raymond-ndibe) [18:12:01] (03update) 10dcaro: [jobs-cli] refactor job payload [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/98 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [18:13:00] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [18:17:31] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [18:18:29] 10VPS-project-Phabricator: 6.09 - https://phabricator.wikimedia.org/T422727 (10Asantha9) 03NEW [18:20:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:25:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [18:30:34] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services [19:02:03] 10Tool-delintbot: Enhance regex to fix night-mode-unaware-background-colors errors - https://phabricator.wikimedia.org/T422013#11801440 (10Redmin) p:05High→03Medium [19:03:04] 10Tool-delintbot: Enhance regex to fix night-mode-unaware-background-colors errors - https://phabricator.wikimedia.org/T422013#11801443 (10Redmin) [19:03:20] 10Tool-delintbot: Enhance regex to fix night-mode-unaware-background-colors errors - https://phabricator.wikimedia.org/T422013#11801444 (10Redmin) a:03Redmin [19:14:20] (03merge) 10wikigit: Add Example Rulesets and Tests [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/11 (owner: 10jaredblumer) [19:16:26] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: cloudcephmon2007-dev service implementation - https://phabricator.wikimedia.org/T420282#11801494 (10Andrew) 05Open→03Resolved [19:18:01] FIRING: [3x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:18:02] 10Toolforge (Quota-requests): Elasticsearch credential request for techactivity - https://phabricator.wikimedia.org/T422462#11801497 (10Andrew) >> How much disk is there to play with? > > ~400T of unreplicated space, split in 3 nodes I believe that number is 400G not 400T [19:24:10] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [19:24:27] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [19:26:13] (03open) 10raymond-ndibe: improve image parsing tests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/282 (https://phabricator.wikimedia.org/T415322) [19:28:49] 10Toolforge (Quota-requests): Elasticsearch credential request for techactivity - https://phabricator.wikimedia.org/T422462#11801535 (10Andrew) 05Open→03Resolved a:03Andrew This is done, and your creds should be in your envvars as TOOL_ELASTICSEARCH_USER and TOOL_ELASTICSEARCH_PASSWORD Please let me k... [19:41:41] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [19:42:16] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [19:55:32] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [20:05:38] (03open) 10wikigit: Add 'path' column to spec result table [toolforge-repos/wmf-openapi-linter] - 10https://gitlab.wikimedia.org/toolforge-repos/wmf-openapi-linter/-/merge_requests/14 [20:19:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [20:40:17] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud init and unattended upgrades while bootstrapping Trixie VMs - https://phabricator.wikimedia.org/T422509#11801761 (10Andrew) p:05Triage→03Medium Do you have any theory (you being @elukey and @fgiunchedi) about why that happened on this exact in... [20:42:11] 06cloud-services-team, 10Toolforge: Connection with `k8s.tools.eqiad1.wikimedia.cloud` hits SSL error - https://phabricator.wikimedia.org/T422538#11801767 (10Andrew) @Nokib_Sarkar have you seen this happen on multiple occasions, or just several times on the 7th specifically? (I want to make sure it's not a sid... [20:42:19] 06cloud-services-team, 10Toolforge: Connection with `k8s.tools.eqiad1.wikimedia.cloud` hits SSL error - https://phabricator.wikimedia.org/T422538#11801768 (10Andrew) p:05Triage→03Medium [20:44:13] 06cloud-services-team, 10Cloud-VPS: Handle project IDs with dash in cloud cookbooks / openstack API - https://phabricator.wikimedia.org/T422515#11801775 (10Andrew) p:05Triage→03Medium For starters we should probably look for places that take a --project arg and convert them to either --project-id or --proj... [20:45:33] 06cloud-services-team, 10Cloud-VPS: wmcs cookbook "--project" arg is ambiguous, could mean project id or project name - https://phabricator.wikimedia.org/T422515#11801779 (10Andrew) [20:47:22] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team: Support pre-built images on components-api - https://phabricator.wikimedia.org/T405262#11801783 (10bd808) > It would be quite useful to be able to execute these images via components and not have to execute them on NFS workers (wh... [20:50:04] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team: OpenAPI linting: Add "examples" to generated schema - https://phabricator.wikimedia.org/T422739 (10Mooeypoo) 03NEW [20:55:22] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team: OpenAPI linting: Add missing OpenAPI spec elements to Response Components - https://phabricator.wikimedia.org/T422739#11801808 (10Mooeypoo) [20:58:56] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud init and unattended upgrades while bootstrapping Trixie VMs - https://phabricator.wikimedia.org/T422509#11801856 (10taavi) This is not an unattended-upgrades problem. It instead seems to be a problem with how the Puppet agent packages are installe... [21:36:30] RESOLVED: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:47:23] (03PS2) 10Cwhite: add beta-logs pki key [labs/private] - 10https://gerrit.wikimedia.org/r/1268683 (https://phabricator.wikimedia.org/T350516) [22:14:03] 06cloud-services-team, 10Toolforge, 06Release-Engineering-Team, 10GitLab (Integrations): gitlab-webhooks build fails on "The runtime.txt file isn't supported" - https://phabricator.wikimedia.org/T422734#11802149 (10bd808) Adding the #toolforge tag in case someone searches for this general problem only... [22:17:37] 10Toolforge (Toolforge iteration 26): [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753 (10DamianZaremba) 03NEW [22:19:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity