[02:56:14] 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06Data-Engineering, and 5 others: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#12021914 (10aranyap) Hi @SD0001 , is there any possible way for this data to be correlated back to an individ... [07:49:58] !log volans@cloudcumin1001 eduwikihubstaging START - Cookbook wmcs.vps.create_project for project eduwikihubstaging in eqiad1 (T429032) [07:49:59] volans@cloudcumin1001: Unknown project "eduwikihubstaging" [07:49:59] T429032: Request creation of eduwikihubstaging VPS project - https://phabricator.wikimedia.org/T429032 [07:50:40] (03open) 10group_199_bot_f98be072172e323ae6d1441939d3e461: projects: added project eduwikihubstaging [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/317 (https://phabricator.wikimedia.org/T429032) [07:52:33] (03approved) 10volans: projects: added project eduwikihubstaging [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/317 (https://phabricator.wikimedia.org/T429032) (owner: 10group_199_bot_f98be072172e323ae6d1441939d3e461) [07:53:10] (03merge) 10volans: projects: added project eduwikihubstaging [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/317 (https://phabricator.wikimedia.org/T429032) (owner: 10group_199_bot_f98be072172e323ae6d1441939d3e461) [07:54:55] !log volans@cloudcumin1001 eduwikihubstaging END (PASS) - Cookbook wmcs.vps.create_project (exit_code=0) for project eduwikihubstaging in eqiad1 (T429032) [07:54:55] volans@cloudcumin1001: Unknown project "eduwikihubstaging" [07:58:10] 10Cloud-VPS (Project-requests), 13Patch-For-Review: Request creation of eduwikihubstaging VPS project - https://phabricator.wikimedia.org/T429032#12022306 (10Volans) 05Open→03Resolved a:03Volans Project created. @Ederporto, @Ragesoss , @JGonzalez_EdWH, please verify that you have access and also make... [07:59:17] 10Cloud-VPS (Quota-requests), 06Release-Engineering-Team (Radar): Quota increase request for zuul - https://phabricator.wikimedia.org/T428515#12022311 (10Volans) @dduvall did you had a chance to test if it works without it? [08:27:41] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 13Patch-For-Review: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones - https://phabricator.wikimedia.org/T313030#12022399 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is done -- toolschecker has been deprecated... [09:00:08] 06cloud-services-team, 10Cloud-VPS, 06tools-infrastructure-team: puppet failing on metricsinfra-prometheus-2.metricsinfra due to expired crl - https://phabricator.wikimedia.org/T429298 (10fgiunchedi) 03NEW [09:03:43] 06cloud-services-team, 10Cloud-VPS, 06tools-infrastructure-team: puppet failing on metricsinfra-prometheus-2.metricsinfra due to expired crl - https://phabricator.wikimedia.org/T429298#12022559 (10fgiunchedi) The fix is trivial (remove the cached crl and run puppet agent again) though it is silly we have to... [09:08:39] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve linting - info.license - https://phabricator.wikimedia.org/T422908#12022572 (10KBach) [09:08:41] 10Tool-wmf-openapi-linter, 06Tech-Docs-Team, 03[MWI] FY2025-26 Q4, 07OKR-Work: [Hypothesis] 5.2.5b: Productionalize API spec linting - https://phabricator.wikimedia.org/T422476#12022574 (10KBach) [09:08:42] 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07Epic, 07OKR-Work: [5.2.5b Epic] Implement and improve linter rules - https://phabricator.wikimedia.org/T422479#12022573 (10KBach) [09:14:19] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS, 10Toolforge, 10Observability-Alerting, and 3 others: Move WMCS off of Icinga and introduce alertmanager - https://phabricator.wikimedia.org/T328502#12022588 (10fgiunchedi) Thank you for the comments / feedback ! >>! In T328502#12020109, @Andrew wrote... [09:17:40] 10Toolforge (Push-to-Deploy), 06tools-platform-team: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#12022617 (10fnegri) a:03dcaro [09:37:32] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance metricsinfra-prometheus-2 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [09:44:27] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 05MW-1.47-notes (1.47.0-wmf.7; 2026-06-16), 06MW-Interfaces-Team (MWI-Sprint-35 (2026-06-02 to 2026-06-16)): Support root-level externalDocs in MediaWiki REST Framework OAD generation - https://phabricator.wikimedia.org/T427356#12022766 (10KBach) [09:55:17] 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06Data-Engineering, and 5 others: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#12022802 (10SD0001) The table doesn't have any user identifiers, so I don't see a way. [11:11:57] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)), 07OKR-Work: Publish the Wikimedia Spectral ruleset to NPM - https://phabricator.wikimedia.org/T425627#12023106 (10HCoplin-WMF) [11:18:00] 10Toolforge, 06tools-platform-team: [builds-cli] does not output valid json when there's no builds - https://phabricator.wikimedia.org/T429229#12023141 (10aputhin) [11:22:48] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)): Generate a valid semver info.version for the default MediaWiki REST module - https://phabricator.wikimedia.org/T427359#12023157 (10KBach) [11:23:18] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)), 07OKR-Work: Fix top-level issues in the MediaWiki REST API OAD - https://phabricator.wikimedia.org/T428147#12023159 (10KBach) [11:23:33] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)), 07OKR-Work: Fix type, media type, and array issues in the MediaWiki REST API OAD - https://phabricator.wikimedia.org/T428149#12023160 (10KBach) [11:57:48] !log tools.cluebotng-monitoring Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/27615771202 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [11:57:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [11:58:40] !log tools.cluebotng-staging Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/27615770915 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [11:58:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [12:00:00] !log tools.toolforge-functional-runner Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770896 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [12:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolforge-functional-runner/SAL [12:00:21] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615771226 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [12:00:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [12:00:23] !log tools.cluebot3 Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770905 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [12:00:23] !log tools.cluebotng-editsets Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770932 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [12:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebot3/SAL [12:00:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-editsets/SAL [12:00:39] !log tools.cluebot-syncer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770917 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [12:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebot-syncer/SAL [12:02:12] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615852687 (https://github.com/cluebotng/component-configs/commits/c9746412542e654e3ae7337a570727eb5d7195d6) [12:02:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [12:02:14] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770870 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [12:02:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [12:04:12] !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770908 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9) [12:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [12:09:10] 06cloud-services-team, 10Toolforge: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023363 (10dcaro) I have been able to reproduce, but not with a simple job (probably related to some of the options there): ` local.tf-test@toolslocal:~$ toolforge components dep... [12:15:00] 06cloud-services-team, 10Toolforge: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023391 (10DamianZaremba) Yes, I also noticed that with another account yesterday that doesn't emit the warning. One way to trigger it appears to be setting `replicas` to `1` (th... [12:17:20] !log tools.toolforge-functional-runner Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27616804754 (https://github.com/cluebotng/component-configs/commits/af2bc530f42e1d932c23f8bea9c8d3686c343d10) [12:17:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolforge-functional-runner/SAL [12:18:37] !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27616804682 (https://github.com/cluebotng/component-configs/commits/af2bc530f42e1d932c23f8bea9c8d3686c343d10) [12:18:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [12:20:02] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27616668922 (https://github.com/cluebotng/component-configs/commits/1bfb0487d9d8b3b799e0d9a30269ca314046feba) [12:20:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [12:23:55] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27616804749 (https://github.com/cluebotng/component-configs/commits/af2bc530f42e1d932c23f8bea9c8d3686c343d10) [12:23:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [12:30:42] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27617061512 (https://github.com/cluebotng/component-configs/commits/4505fa21f50c25710eaf644eecdc0233f726c946) [12:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [12:32:47] 10Cloud-VPS (Project-requests): Request creation of eduwikihubstaging VPS project - https://phabricator.wikimedia.org/T429032#12023493 (10JGonzalez_EdWH) Thank you @Volans :-) [12:34:26] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27617439342 (https://github.com/cluebotng/component-configs/commits/25d55050b7cd85801d40ba66cf804d6d4a3b51bd) [12:34:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [12:40:40] 10Cloud-VPS, 06tools-infrastructure-team, 06Infrastructure-Foundations, 10netops, 06SRE: Upgrade cloudsw1-e4-eqiad - https://phabricator.wikimedia.org/T429013#12023550 (10fgiunchedi) Indeed the recent rack redundancy testing has shown we are resilient to the loss of one rack, for all hosts but cloudvirts... [12:41:13] 10Cloud-VPS, 06tools-infrastructure-team, 06Infrastructure-Foundations, 10netops, 06SRE: Upgrade cloudsw1-f4-eqiad - https://phabricator.wikimedia.org/T429014#12023556 (10fgiunchedi) See my update at https://phabricator.wikimedia.org/T429013#12023550 since it applies equally here [12:51:04] 06cloud-services-team, 10Cloud-VPS: Add/review linting of flavors in opentofu - https://phabricator.wikimedia.org/T429336 (10Andrew) 03NEW [12:51:34] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27618465342 (https://github.com/cluebotng/component-configs/commits/9b3fcbbd479041d764ab53f0a74027e2df6df4f6) [12:51:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [13:02:04] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023692 (10aputhin) [13:02:08] 10Toolforge (Push-to-Deploy), 06tools-platform-team: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#12023693 (10aputhin) [13:02:09] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023694 (10aputhin) [13:02:50] 06tools-platform-team: [toolforge-weld] Fails to publish to pypi - https://phabricator.wikimedia.org/T429241#12023696 (10dcaro) p:05Medium→03Triage [13:04:16] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023704 (10dcaro) 05Open→03In progress [13:05:24] 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [docs] update all readmes with the same deployment docs - https://phabricator.wikimedia.org/T407477#12023711 (10dcaro) a:05dcaro→03None [13:18:43] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudservices.safe_reboot on hosts matched by 'P{O:wmcs::openstack::codfw1dev::services}' [13:20:17] FIRING: [2x] JobUnavailable: Reduced availability for job pdns in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [13:23:05] FIRING: [2x] HostBGPDown: BGP session for cloudservices2004-dev (172.20.5.8) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [13:24:18] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudservices.safe_reboot (exit_code=0) on hosts matched by 'P{O:wmcs::openstack::codfw1dev::services}' [13:25:17] RESOLVED: [3x] JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [13:28:05] RESOLVED: [2x] HostBGPDown: BGP session for cloudservices2004-dev (172.20.5.8) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown [13:52:38] 06tools-platform-team: [lima-kilo] ansible deprecation error - https://phabricator.wikimedia.org/T429343 (10dcaro) 03NEW [13:52:54] 10Toolforge, 06tools-platform-team: [lima-kilo] ansible deprecation error - https://phabricator.wikimedia.org/T429343#12023917 (10dcaro) [13:53:07] 10Toolforge, 06tools-platform-team: [toolforge-weld] Fails to publish to pypi - https://phabricator.wikimedia.org/T429241#12023918 (10dcaro) [13:54:42] 10Toolforge, 06tools-platform-team: [lima-kilo] ansible deprecation errors - https://phabricator.wikimedia.org/T429343#12023936 (10dcaro) [14:22:27] 06tools-platform-team: [lima-kilo] increase the number of inotify watches - https://phabricator.wikimedia.org/T429347 (10dcaro) 03NEW [14:22:40] 10Toolforge, 06tools-platform-team: [lima-kilo] increase the number of inotify watches - https://phabricator.wikimedia.org/T429347#12024108 (10dcaro) [14:22:48] 10Toolforge, 06tools-platform-team: [lima-kilo] increase the number of inotify watches - https://phabricator.wikimedia.org/T429347#12024116 (10dcaro) p:05Triage→03Medium [14:23:53] (03open) 10dcaro: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 [14:25:12] (03update) 10dcaro: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 [14:35:07] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024181 (10dcaro) It stopped happening locally :/, looking [14:46:45] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024253 (10dcaro) It comes and goes it seems :/, no deployments in between? ` tools.cluebotng-staging@tools-bastion-15:~$ toolforge jobs list +----------------------+------------... [14:50:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [14:50:13] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024309 (10dcaro) And after restarting the jobs-api pod that was getting hit when the warning happened, made it not happen anymore :/, maybe there's some caching going on somewhere? [14:52:18] 10Cloud-VPS, 06tools-infrastructure-team: Openstack uwsgi logging to '.log' - https://phabricator.wikimedia.org/T422830#12024314 (10Andrew) March 17th was when we upgraded to Flamingo. [14:53:56] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024317 (10dcaro) >>! In T429231#12024309, @dcaro wrote: > And after restarting the jobs-api pod that was getting hit when the warning happened, made it not happen anymore :/, may... [15:02:56] FIRING: SystemdUnitDown: The service unit ceph-osd@289.service is in failed status on host cloudcephosd1038. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1038 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:07:56] FIRING: [2x] SystemdUnitDown: The service unit ceph-osd@281.service is in failed status on host cloudcephosd1037. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:10:06] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Replace https://os-deprecation.toolforge.org/ with something that handles in-place upgraded hosts - https://phabricator.wikimedia.org/T428919#12024397 (10aputhin) > So if a host was built off of bullseye but then upgraded in place to Bookworm it... [15:14:02] 06cloud-services-team, 10Toolforge, 07Documentation, 07good first task: Create a new doc about managing and sharing files in Toolforge - https://phabricator.wikimedia.org/T347753#12024404 (10Tejinderk.2004) Hey there ! here's an update on the task. Created an initial draft of the new page: https://wikitec... [15:15:50] (03update) 10renovatebot: Update Rust crate itertools to 0.15.0 [toolforge-repos/dewiki-rangeblock] - 10https://gitlab.wikimedia.org/toolforge-repos/dewiki-rangeblock/-/merge_requests/16 [15:15:59] (03open) 10renovatebot: Update Rust crate itertools to 0.15.0 [toolforge-repos/dewiki-rangeblock] - 10https://gitlab.wikimedia.org/toolforge-repos/dewiki-rangeblock/-/merge_requests/16 [15:17:57] 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Reach out to Cloud VPS project maintainers about Debian Bullseye deprecation - https://phabricator.wikimedia.org/T428196#12024452 (10aputhin) 05Open→03In progress [15:18:20] 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Reach out to Cloud VPS project maintainers about Debian Bullseye deprecation - https://phabricator.wikimedia.org/T428196#12024455 (10aputhin) p:05Triage→03High [15:20:17] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)), 07OKR-Work: MediaWiki REST API appropriately defines array types for parameters - https://phabricator.wikimedia.org/T409113#12024472 (10OWresch-WMF) [15:21:46] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)): Support requestBody content examples in MediaWiki REST Framework OAD generation - https://phabricator.wikimedia.org/T427360#12024496 (10OWresch-WMF) [15:30:59] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2006-dev.codfw.wmnet' (T429361) [15:40:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2006-dev.codfw.wmnet' (T429361) [15:41:50] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2005-dev.codfw.wmnet' (T429361) [15:45:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:45:20] 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06Data-Engineering, and 5 others: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#12024691 (10aranyap) Thank you @SD0001 . The Product Safety & Integrity team has reviewed this table and is c... [15:47:08] (03open) 10fnegri: Release version 0.2.0 [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/13 (https://phabricator.wikimedia.org/T351637) [15:51:27] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024726 (10dcaro) Finally got in in lima-kilo :), and this time was able to fetch the logs: ` ... [2026-06-16 15:47:14,711] p17:t140338572805824 /app/tjf/core/core.py:50:_update_s... [15:52:20] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2005-dev.codfw.wmnet' (T429361) [15:52:33] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2011-dev.codfw.wmnet' (T429361) [15:52:33] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudcontrol2011-dev.codfw.wmnet' (T429361) [15:52:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2010-dev.codfw.wmnet' (T429361) [15:54:17] FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:58:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:58:40] (03update) 10fnegri: Release version 0.2.0 [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/13 (https://phabricator.wikimedia.org/T351637) [15:59:17] RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:01:45] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2010-dev.codfw.wmnet' (T429361) [16:03:05] 10Toolforge (Push-to-Deploy), 06tools-platform-team: [components-api] Queue builds when the build queue is full - https://phabricator.wikimedia.org/T402568#12024856 (10dcaro) a:05Raymond_Ndibe→03dcaro [16:03:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:08:47] 06cloud-services-team, 10Cloud-VPS: [ceph] Enable encrypted client traffic for the ceph clusters - https://phabricator.wikimedia.org/T294432#12024907 (10Volans) It seems that the current WMCS cluster does support encryption (`secure` mode) already. I guess probably a by-product of some cluster upgrades? ` $ s... [16:10:43] (03update) 10fnegri: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 (owner: 10dcaro) [16:10:45] (03update) 10fnegri: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 (owner: 10dcaro) [16:10:49] (03approved) 10fnegri: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 (owner: 10dcaro) [16:12:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1011.eqiad.wmnet' (T429361) [16:14:07] 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#12024934 (10Nihar_Chakravarti) a:03Nihar_Chakravarti [16:17:34] 06cloud-services-team, 10Toolforge, 07patch-welcome: Toolforge 404 handler displays wrong tool name from ?url= query parameter - https://phabricator.wikimedia.org/T421542#12024961 (10Shadabgdg) a:03Shadabgdg [16:17:46] (03update) 10fnegri: dev: upgrade blubber buildkit and remove makefile [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/34 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) (owner: 10raymond-ndibe) [16:17:47] (03update) 10fnegri: dev: upgrade blubber buildkit and remove makefile [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/34 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) (owner: 10raymond-ndibe) [16:17:53] (03approved) 10fnegri: dev: upgrade blubber buildkit and remove makefile [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/34 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) (owner: 10raymond-ndibe) [16:18:04] (03merge) 10fnegri: dev: upgrade blubber buildkit and remove makefile [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/34 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) (owner: 10raymond-ndibe) [16:18:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:21:22] FIRING: [15x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:21:27] PROBLEM - Host cloudcontrol1011 is DOWN: PING CRITICAL - Packet loss = 100% [16:22:10] FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:22:54] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1011.eqiad.wmnet' (T429361) [16:22:55] RECOVERY - Host cloudcontrol1011 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [16:23:25] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1007.eqiad.wmnet' (T429361) [16:23:59] 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#12024990 (10Prakhar0804) a:05Nihar_Chakravarti→03Prakhar0804 [16:25:14] (03update) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: envvars-admission: bump to 0.0.41-20260616161814-962b885d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1301 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) [16:25:18] (03open) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: envvars-admission: bump to 0.0.41-20260616161814-962b885d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1301 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) [16:26:22] FIRING: [15x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:27:10] RESOLVED: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:29:08] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12025036 (10dcaro) This looks very very suspicious: ` except requests.exceptions.HTTPError as e: if (not e.response) or e.response.status_code != 401: # You... [16:31:22] RESOLVED: [15x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:34:17] FIRING: JobUnavailable: Reduced availability for job maintain_dbusers_eqiad in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:34:45] PROBLEM - Host cloudcontrol1007 is DOWN: PING CRITICAL - Packet loss = 100% [16:35:40] FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:35:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1007.eqiad.wmnet' (T429361) [16:36:13] RECOVERY - Host cloudcontrol1007 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [16:36:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1006.eqiad.wmnet' (T429361) [16:36:52] FIRING: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:37:25] FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:40:40] RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:41:52] RESOLVED: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:44:17] RESOLVED: JobUnavailable: Reduced availability for job maintain_dbusers_eqiad in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:46:03] PROBLEM - SSH on cloudcontrol1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [16:46:03] PROBLEM - Memcached on cloudcontrol1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Memcached [16:46:52] FIRING: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:46:59] RECOVERY - SSH on cloudcontrol1006 is OK: SSH OK - OpenSSH_10.0p2 Debian-7+deb13u4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [16:47:00] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1006.eqiad.wmnet' (T429361) [16:47:01] RECOVERY - Memcached on cloudcontrol1006 is OK: TCP OK - 7.191 second response time on 10.64.150.6 port 11211 https://wikitech.wikimedia.org/wiki/Memcached [16:47:25] FIRING: [3x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:47:40] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds (T428385) [16:47:41] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.upgrade_osds (exit_code=99) (T428385) [16:51:27] PROBLEM - Host cloudcephosd1038 is DOWN: PING CRITICAL - Packet loss = 100% [16:51:33] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12025214 (10dcaro) Found a couple issues (not sure this is the same exact case though): * If the harbor images are deleted, then the storage still has the `aliases`, but the resol... [16:51:52] FIRING: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:52:07] RESOLVED: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:52:25] RESOLVED: [3x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [16:52:56] FIRING: [2x] SystemdUnitDown: The service unit ceph-osd@281.service is in failed status on host cloudcephosd1037. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:54:55] RECOVERY - Host cloudcephosd1038 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [16:58:26] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634024663 (https://github.com/cluebotng/component-configs/commits/8a9410bd29ead90dcd183405a7c6ef35c0d898af) [16:58:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [16:59:39] 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#12025301 (10Nihar_Chakravarti) a:05Prakhar0804→03Nihar_Chakravarti [17:00:45] PROBLEM - Host cloudcephosd1037 is DOWN: PING CRITICAL - Packet loss = 100% [17:00:54] 10Cloud-VPS (Quota-requests), 06Release-Engineering-Team (Radar): Quota increase request for zuul - https://phabricator.wikimedia.org/T428515#12025314 (10dduvall) 05Open→03Resolved a:03dduvall Sorry, yes! We have a working setup that uses CAPI and a custom SANs entry for haproxy to give us a publicly... [17:01:04] !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634022645 (https://github.com/cluebotng/component-configs/commits/8a9410bd29ead90dcd183405a7c6ef35c0d898af) [17:01:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [17:01:14] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634025367 (https://github.com/cluebotng/component-configs/commits/8a9410bd29ead90dcd183405a7c6ef35c0d898af) [17:01:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [17:02:56] RESOLVED: [2x] SystemdUnitDown: The service unit ceph-osd@281.service is in failed status on host cloudcephosd1037. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:03:06] 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12025332 (10DamianZaremba) This might also explain why some deployments are restarting jobs that have no changes e.g. ` Deployment ID: 20260616-165737-i4ix4154s0 Builds: cluebot... [17:03:07] 10Cloud-VPS (Quota-requests), 06Release-Engineering-Team (Radar): Quota increase request for zuul - https://phabricator.wikimedia.org/T428515#12025333 (10Volans) Great, thanks for closing the loop. [17:04:13] RECOVERY - Host cloudcephosd1037 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [17:10:33] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634738411 (https://github.com/cluebotng/component-configs/commits/a1821e6d783f3a9822825a8325623592bd1754ea) [17:10:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [17:13:29] !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634737553 (https://github.com/cluebotng/component-configs/commits/a1821e6d783f3a9822825a8325623592bd1754ea) [17:13:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [17:14:17] 06cloud-services-team, 10Toolforge: [logs-api] failing to return logs for job - https://phabricator.wikimedia.org/T429265#12025378 (10DamianZaremba) Is not happening currently (logs are being returned), so there appears to be some transient error that was present overnight [17:14:39] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634737527 (https://github.com/cluebotng/component-configs/commits/a1821e6d783f3a9822825a8325623592bd1754ea) [17:14:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [17:15:06] 10Toolforge, 06tools-platform-team: [logs-api] failing to return logs for job - https://phabricator.wikimedia.org/T429265#12025382 (10aputhin) [17:19:12] (03open) 10dcaro: core.images._get_harbor_images: fix wrong stale cache check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/310 [17:19:25] (03open) 10dcaro: core.images.from_short_name_or_url: add aliases for unknown image [repos/cloud/toolforge/jobs-api] (fix_harbor_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/311 [17:19:41] (03update) 10dcaro: core.images.from_short_name_or_url: add aliases for unknown image [repos/cloud/toolforge/jobs-api] (fix_harbor_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/311 [17:29:57] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27635849962 (https://github.com/cluebotng/component-configs/commits/893b9730fafbbd5762e1473e55ad6dbe8cb9db24) [17:30:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [17:32:51] !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27635849976 (https://github.com/cluebotng/component-configs/commits/893b9730fafbbd5762e1473e55ad6dbe8cb9db24) [17:32:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [17:34:07] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27635849959 (https://github.com/cluebotng/component-configs/commits/893b9730fafbbd5762e1473e55ad6dbe8cb9db24) [17:34:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [17:42:41] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Privacy Engineering, 13Patch-For-Review: Add global_edit_count to wikireplicas - https://phabricator.wikimedia.org/T344108#12025675 (10SD0001) Hi @aranyap, similar to T402145, could you review and approve this one too? For reference, this is... [17:55:20] 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387 (10Andrew) 03NEW [17:58:56] 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12025829 (10Andrew) I am trying compacting and restarting on 4 of the osds: 126, 127, 129, 131. ` ceph tell osd.126 compact systemctl restart ceph-osd@126.service ` i... [18:00:29] (03open) 10dcaro: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] (fix_aliases) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312 [18:01:43] 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12025853 (10Andrew) going to compact and restart four more: 119, 120, 121, 123. [18:04:48] 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12025875 (10Andrew) 131 is back in the list: ` osd.131 observed slow operation indications in BlueStore ` so compact/restart is at least not the 100% solution [18:09:56] FIRING: SystemdUnitDown: The service unit ceph-osd@119.service is in failed status on host cloudcephosd1019. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1019 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [18:12:28] 06cloud-services-team, 10Toolforge, 07Documentation, 07good first task: Create a new doc about managing and sharing files in Toolforge - https://phabricator.wikimedia.org/T347753#12025899 (10apaskulin) Thanks for your work on this, @Tejinderk.2004! The draft you created looks great! The structure does a gr... [18:45:06] 10Cloud-VPS (Project-requests): Request creation of eduwikihubstaging VPS project - https://phabricator.wikimedia.org/T429032#12026014 (10Ragesoss) Confirming, I have access and I see the others on the list. Thanks @Volans ! [18:48:09] !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27639941201 (https://github.com/cluebotng/component-configs/commits/893b9730fafbbd5762e1473e55ad6dbe8cb9db24) [18:48:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [18:48:59] (03update) 10dcaro: core.images._get_harbor_images: fix wrong stale cache check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/310 [18:54:25] 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12026048 (10Andrew) Further evidence that this is a reporting issue: ` osd.128 observed slow operation indications in BlueStore ` ` ceph daemon osd.128 dump_ops_in_fl... [18:55:53] (03merge) 10dcaro: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 [19:06:27] 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Reach out to Cloud VPS project maintainers about Debian Bullseye deprecation - https://phabricator.wikimedia.org/T428196#12026103 (10komla) [19:22:04] 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Replace https://os-deprecation.toolforge.org/ with something that handles in-place upgraded hosts - https://phabricator.wikimedia.org/T428919#12026243 (10Andrew) >>! In T428919#12024397, @aputhin wrote: >> So if a host was built off of bullseye... [19:29:30] 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#12026274 (10Nihar_Chakravarti) I investigated the issue and submitted a fix: https://github.com/toolforge/quarry/pull/98 The change updates the result count label to... [19:36:57] 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12026329 (10Andrew) From https://forum.proxmox.com/threads/ceph-after-upgrade-to-18-2-6-observed-slow-operation-indications-in-bluestore.165741/ I propose we make this le... [19:42:36] 06cloud-services-team, 10Toolforge, 07Documentation, 07good first task: Create a new doc about managing and sharing files in Toolforge - https://phabricator.wikimedia.org/T347753#12026359 (10BLiviero-WMF) Thank you for capturing this information! A small suggestion is, a section about backing up files el... [19:42:37] 06cloud-services-team, 10Toolforge: toolforge webservice logs output has encoding issues and gets truncated compared to kubectl logs - https://phabricator.wikimedia.org/T429028#12026360 (10diegodlh) 05Open→03Declined Oh sure, no worries. I can perfectly wait. Closing this then. Thanks! [19:49:56] FIRING: [4x] SystemdUnitDown: The service unit ceph-osd@111.service is in failed status on host cloudcephosd1018. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:55:39] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:04:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [20:11:41] RESOLVED: [4x] SystemdUnitDown: The service unit ceph-osd@111.service is in failed status on host cloudcephosd1018. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [20:15:32] FIRING: NfsAlmostFull: The NFS drive is over 85% capacity (currently 85.28%) at host paws-nfs-1 in project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull [21:15:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.996% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [21:56:42] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)): Support requestBody content examples in MediaWiki REST Framework OAD generation - https://phabricator.wikimedia.org/T427360#12026952 (10AGhirelli-WMF) a:03AGhirelli-WMF