[02:56:14] <wikibugs>	 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06Data-Engineering, and 5 others: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#12021914 (10aranyap) Hi @SD0001 , is there any possible way for this data to be correlated back to an individ...
[07:49:58] <logmsgbot_cloud>	 !log volans@cloudcumin1001 eduwikihubstaging START - Cookbook wmcs.vps.create_project for project eduwikihubstaging in eqiad1 (T429032)
[07:49:59] <stashbot>	 volans@cloudcumin1001: Unknown project "eduwikihubstaging"
[07:49:59] <stashbot>	 T429032: Request creation of eduwikihubstaging VPS project - https://phabricator.wikimedia.org/T429032
[07:50:40] <wikibugs>	 (03open) 10group_199_bot_f98be072172e323ae6d1441939d3e461: projects: added project eduwikihubstaging [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/317 (https://phabricator.wikimedia.org/T429032)
[07:52:33] <wikibugs>	 (03approved) 10volans: projects: added project eduwikihubstaging [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/317 (https://phabricator.wikimedia.org/T429032) (owner: 10group_199_bot_f98be072172e323ae6d1441939d3e461)
[07:53:10] <wikibugs>	 (03merge) 10volans: projects: added project eduwikihubstaging [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/317 (https://phabricator.wikimedia.org/T429032) (owner: 10group_199_bot_f98be072172e323ae6d1441939d3e461)
[07:54:55] <logmsgbot_cloud>	 !log volans@cloudcumin1001 eduwikihubstaging END (PASS) - Cookbook wmcs.vps.create_project (exit_code=0) for project eduwikihubstaging in eqiad1 (T429032)
[07:54:55] <stashbot>	 volans@cloudcumin1001: Unknown project "eduwikihubstaging"
[07:58:10] <wikibugs>	 10Cloud-VPS (Project-requests), 13Patch-For-Review: Request creation of eduwikihubstaging VPS project - https://phabricator.wikimedia.org/T429032#12022306 (10Volans) 05Open→03Resolved a:03Volans Project created. @Ederporto, @Ragesoss , @JGonzalez_EdWH, please verify that you have access and also make...
[07:59:17] <wikibugs>	 10Cloud-VPS (Quota-requests), 06Release-Engineering-Team (Radar): Quota increase request for zuul - https://phabricator.wikimedia.org/T428515#12022311 (10Volans) @dduvall did you had a chance to test if it works without it?
[08:27:41] <wikibugs>	 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 13Patch-For-Review: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones - https://phabricator.wikimedia.org/T313030#12022399 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is done -- toolschecker has been deprecated...
[09:00:08] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 06tools-infrastructure-team: puppet failing on metricsinfra-prometheus-2.metricsinfra due to expired crl - https://phabricator.wikimedia.org/T429298 (10fgiunchedi) 03NEW
[09:03:43] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 06tools-infrastructure-team: puppet failing on metricsinfra-prometheus-2.metricsinfra due to expired crl - https://phabricator.wikimedia.org/T429298#12022559 (10fgiunchedi) The fix is trivial (remove the cached crl and run puppet agent again) though it is silly we have to...
[09:08:39] <wikibugs>	 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Improve linting - info.license - https://phabricator.wikimedia.org/T422908#12022572 (10KBach)
[09:08:41] <wikibugs>	 10Tool-wmf-openapi-linter, 06Tech-Docs-Team, 03[MWI] FY2025-26 Q4, 07OKR-Work: [Hypothesis] 5.2.5b: Productionalize API spec linting - https://phabricator.wikimedia.org/T422476#12022574 (10KBach)
[09:08:42] <wikibugs>	 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07Epic, 07OKR-Work: [5.2.5b Epic] Implement and improve linter rules - https://phabricator.wikimedia.org/T422479#12022573 (10KBach)
[09:14:19] <wikibugs>	 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS, 10Toolforge, 10Observability-Alerting, and 3 others: Move WMCS off of Icinga and introduce alertmanager - https://phabricator.wikimedia.org/T328502#12022588 (10fgiunchedi) Thank you for the comments / feedback !  >>! In T328502#12020109, @Andrew wrote...
[09:17:40] <wikibugs>	 10Toolforge (Push-to-Deploy), 06tools-platform-team: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#12022617 (10fnegri) a:03dcaro
[09:37:32] <wmcs-alerts>	 RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance metricsinfra-prometheus-2 in project metricsinfra   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[09:44:27] <wikibugs>	 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 05MW-1.47-notes (1.47.0-wmf.7; 2026-06-16), 06MW-Interfaces-Team (MWI-Sprint-35 (2026-06-02 to 2026-06-16)): Support root-level externalDocs in MediaWiki REST Framework OAD generation - https://phabricator.wikimedia.org/T427356#12022766 (10KBach)
[09:55:17] <wikibugs>	 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06Data-Engineering, and 5 others: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#12022802 (10SD0001) The table doesn't have any user identifiers, so I don't see a way.
[11:11:57] <wikibugs>	 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)), 07OKR-Work: Publish the Wikimedia Spectral ruleset to NPM - https://phabricator.wikimedia.org/T425627#12023106 (10HCoplin-WMF)
[11:18:00] <wikibugs>	 10Toolforge, 06tools-platform-team: [builds-cli] does not output valid json when there's no builds - https://phabricator.wikimedia.org/T429229#12023141 (10aputhin)
[11:22:48] <wikibugs>	 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)): Generate a valid semver info.version for the default MediaWiki REST module - https://phabricator.wikimedia.org/T427359#12023157 (10KBach)
[11:23:18] <wikibugs>	 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)), 07OKR-Work: Fix top-level issues in the MediaWiki REST API OAD - https://phabricator.wikimedia.org/T428147#12023159 (10KBach)
[11:23:33] <wikibugs>	 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)), 07OKR-Work: Fix type, media type, and array issues in the MediaWiki REST API OAD - https://phabricator.wikimedia.org/T428149#12023160 (10KBach)
[11:57:48] <wm-bot2>	 !log tools.cluebotng-monitoring Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/27615771202 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[11:57:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL
[11:58:40] <wm-bot2>	 !log tools.cluebotng-staging Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/27615770915 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[11:58:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL
[12:00:00] <wm-bot2>	 !log tools.toolforge-functional-runner Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770896 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[12:00:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolforge-functional-runner/SAL
[12:00:21] <wm-bot2>	 !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615771226 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[12:00:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL
[12:00:23] <wm-bot2>	 !log tools.cluebot3 Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770905 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[12:00:23] <wm-bot2>	 !log tools.cluebotng-editsets Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770932 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[12:00:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebot3/SAL
[12:00:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-editsets/SAL
[12:00:39] <wm-bot2>	 !log tools.cluebot-syncer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770917 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[12:00:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebot-syncer/SAL
[12:02:12] <wm-bot2>	 !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615852687 (https://github.com/cluebotng/component-configs/commits/c9746412542e654e3ae7337a570727eb5d7195d6)
[12:02:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL
[12:02:14] <wm-bot2>	 !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770870 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[12:02:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL
[12:04:12] <wm-bot2>	 !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27615770908 (https://github.com/cluebotng/component-configs/commits/38c93594afd6d57d84fde7b3d927c1c73e0a18f9)
[12:04:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL
[12:09:10] <wikibugs>	 06cloud-services-team, 10Toolforge: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023363 (10dcaro) I have been able to reproduce, but not with a simple job (probably related to some of the options there):  ` local.tf-test@toolslocal:~$ toolforge components dep...
[12:15:00] <wikibugs>	 06cloud-services-team, 10Toolforge: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023391 (10DamianZaremba) Yes, I also noticed that with another account yesterday that doesn't emit the warning.  One way to trigger it appears to be setting `replicas` to `1` (th...
[12:17:20] <wm-bot2>	 !log tools.toolforge-functional-runner Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27616804754 (https://github.com/cluebotng/component-configs/commits/af2bc530f42e1d932c23f8bea9c8d3686c343d10)
[12:17:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolforge-functional-runner/SAL
[12:18:37] <wm-bot2>	 !log tools.cluebotng-staging Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27616804682 (https://github.com/cluebotng/component-configs/commits/af2bc530f42e1d932c23f8bea9c8d3686c343d10)
[12:18:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL
[12:20:02] <wm-bot2>	 !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27616668922 (https://github.com/cluebotng/component-configs/commits/1bfb0487d9d8b3b799e0d9a30269ca314046feba)
[12:20:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL
[12:23:55] <wm-bot2>	 !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27616804749 (https://github.com/cluebotng/component-configs/commits/af2bc530f42e1d932c23f8bea9c8d3686c343d10)
[12:23:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL
[12:30:42] <wm-bot2>	 !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27617061512 (https://github.com/cluebotng/component-configs/commits/4505fa21f50c25710eaf644eecdc0233f726c946)
[12:30:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL
[12:32:47] <wikibugs>	 10Cloud-VPS (Project-requests): Request creation of eduwikihubstaging VPS project - https://phabricator.wikimedia.org/T429032#12023493 (10JGonzalez_EdWH) Thank you @Volans :-)
[12:34:26] <wm-bot2>	 !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27617439342 (https://github.com/cluebotng/component-configs/commits/25d55050b7cd85801d40ba66cf804d6d4a3b51bd)
[12:34:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL
[12:40:40] <wikibugs>	 10Cloud-VPS, 06tools-infrastructure-team, 06Infrastructure-Foundations, 10netops, 06SRE: Upgrade cloudsw1-e4-eqiad - https://phabricator.wikimedia.org/T429013#12023550 (10fgiunchedi) Indeed the recent rack redundancy testing has shown we are resilient to the loss of one rack, for all hosts but cloudvirts...
[12:41:13] <wikibugs>	 10Cloud-VPS, 06tools-infrastructure-team, 06Infrastructure-Foundations, 10netops, 06SRE: Upgrade cloudsw1-f4-eqiad - https://phabricator.wikimedia.org/T429014#12023556 (10fgiunchedi) See my update at https://phabricator.wikimedia.org/T429013#12023550 since it applies equally here
[12:51:04] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: Add/review linting of flavors in opentofu - https://phabricator.wikimedia.org/T429336 (10Andrew) 03NEW
[12:51:34] <wm-bot2>	 !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27618465342 (https://github.com/cluebotng/component-configs/commits/9b3fcbbd479041d764ab53f0a74027e2df6df4f6)
[12:51:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL
[13:02:04] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023692 (10aputhin)
[13:02:08] <wikibugs>	 10Toolforge (Push-to-Deploy), 06tools-platform-team: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#12023693 (10aputhin)
[13:02:09] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023694 (10aputhin)
[13:02:50] <wikibugs>	 06tools-platform-team: [toolforge-weld] Fails to publish to pypi - https://phabricator.wikimedia.org/T429241#12023696 (10dcaro) p:05Medium→03Triage
[13:04:16] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12023704 (10dcaro) 05Open→03In progress
[13:05:24] <wikibugs>	 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [docs] update all readmes with the same deployment docs - https://phabricator.wikimedia.org/T407477#12023711 (10dcaro) a:05dcaro→03None
[13:18:43] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudservices.safe_reboot on hosts matched by 'P{O:wmcs::openstack::codfw1dev::services}'
[13:20:17] <jinxer-wm>	 FIRING: [2x] JobUnavailable: Reduced availability for job pdns in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:23:05] <jinxer-wm>	 FIRING: [2x] HostBGPDown: BGP session for cloudservices2004-dev (172.20.5.8) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown
[13:24:18] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudservices.safe_reboot (exit_code=0) on hosts matched by 'P{O:wmcs::openstack::codfw1dev::services}'
[13:25:17] <jinxer-wm>	 RESOLVED: [3x] JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[13:28:05] <jinxer-wm>	 RESOLVED: [2x] HostBGPDown: BGP session for cloudservices2004-dev (172.20.5.8) is down - https://wikitech.wikimedia.org/wiki/Network_monitoring#BGP_status  - https://alerts.wikimedia.org/?q=alertname%3DHostBGPDown
[13:52:38] <wikibugs>	 06tools-platform-team: [lima-kilo] ansible deprecation error - https://phabricator.wikimedia.org/T429343 (10dcaro) 03NEW
[13:52:54] <wikibugs>	 10Toolforge, 06tools-platform-team: [lima-kilo] ansible deprecation error - https://phabricator.wikimedia.org/T429343#12023917 (10dcaro)
[13:53:07] <wikibugs>	 10Toolforge, 06tools-platform-team: [toolforge-weld] Fails to publish to pypi - https://phabricator.wikimedia.org/T429241#12023918 (10dcaro)
[13:54:42] <wikibugs>	 10Toolforge, 06tools-platform-team: [lima-kilo] ansible deprecation errors - https://phabricator.wikimedia.org/T429343#12023936 (10dcaro)
[14:22:27] <wikibugs>	 06tools-platform-team: [lima-kilo] increase the number of inotify watches - https://phabricator.wikimedia.org/T429347 (10dcaro) 03NEW
[14:22:40] <wikibugs>	 10Toolforge, 06tools-platform-team: [lima-kilo] increase the number of inotify watches - https://phabricator.wikimedia.org/T429347#12024108 (10dcaro)
[14:22:48] <wikibugs>	 10Toolforge, 06tools-platform-team: [lima-kilo] increase the number of inotify watches - https://phabricator.wikimedia.org/T429347#12024116 (10dcaro) p:05Triage→03Medium
[14:23:53] <wikibugs>	 (03open) 10dcaro: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326
[14:25:12] <wikibugs>	 (03update) 10dcaro: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326
[14:35:07] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024181 (10dcaro) It stopped happening locally :/, looking
[14:46:45] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024253 (10dcaro) It comes and goes it seems :/, no deployments in between?  ` tools.cluebotng-staging@tools-bastion-15:~$ toolforge jobs list +----------------------+------------...
[14:50:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[14:50:13] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024309 (10dcaro) And after restarting the jobs-api pod that was getting hit when the warning happened, made it not happen anymore :/, maybe there's some caching going on somewhere?
[14:52:18] <wikibugs>	 10Cloud-VPS, 06tools-infrastructure-team: Openstack uwsgi logging to '<frozen importlib._bootstrap>.log' - https://phabricator.wikimedia.org/T422830#12024314 (10Andrew) March 17th was when we upgraded to Flamingo.
[14:53:56] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024317 (10dcaro) >>! In T429231#12024309, @dcaro wrote: > And after restarting the jobs-api pod that was getting hit when the warning happened, made it not happen anymore :/, may...
[15:02:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit ceph-osd@289.service is in failed status on host cloudcephosd1038. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1038 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[15:07:56] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The service unit ceph-osd@281.service is in failed status on host cloudcephosd1037. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[15:10:06] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Replace https://os-deprecation.toolforge.org/ with something that handles in-place upgraded hosts - https://phabricator.wikimedia.org/T428919#12024397 (10aputhin) > So if a host was built off of bullseye but then upgraded in place to Bookworm it...
[15:14:02] <wikibugs>	 06cloud-services-team, 10Toolforge, 07Documentation, 07good first task: Create a new doc about managing and sharing files in Toolforge - https://phabricator.wikimedia.org/T347753#12024404 (10Tejinderk.2004) Hey there ! here's an update on the task.  Created an initial draft of the new page: https://wikitec...
[15:15:50] <wikibugs>	 (03update) 10renovatebot: Update Rust crate itertools to 0.15.0 [toolforge-repos/dewiki-rangeblock] - 10https://gitlab.wikimedia.org/toolforge-repos/dewiki-rangeblock/-/merge_requests/16
[15:15:59] <wikibugs>	 (03open) 10renovatebot: Update Rust crate itertools to 0.15.0 [toolforge-repos/dewiki-rangeblock] - 10https://gitlab.wikimedia.org/toolforge-repos/dewiki-rangeblock/-/merge_requests/16
[15:17:57] <wikibugs>	 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Reach out to Cloud VPS project maintainers about Debian Bullseye deprecation - https://phabricator.wikimedia.org/T428196#12024452 (10aputhin) 05Open→03In progress
[15:18:20] <wikibugs>	 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Reach out to Cloud VPS project maintainers about Debian Bullseye deprecation - https://phabricator.wikimedia.org/T428196#12024455 (10aputhin) p:05Triage→03High
[15:20:17] <wikibugs>	 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)), 07OKR-Work: MediaWiki REST API appropriately defines array types for parameters - https://phabricator.wikimedia.org/T409113#12024472 (10OWresch-WMF)
[15:21:46] <wikibugs>	 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)): Support requestBody content examples in MediaWiki REST Framework OAD generation - https://phabricator.wikimedia.org/T427360#12024496 (10OWresch-WMF)
[15:30:59] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2006-dev.codfw.wmnet' (T429361)
[15:40:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2006-dev.codfw.wmnet' (T429361)
[15:41:50] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2005-dev.codfw.wmnet' (T429361)
[15:45:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[15:45:20] <wikibugs>	 06cloud-services-team, 10Data-Services, 10BetaFeatures, 06Data-Engineering, and 5 others: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#12024691 (10aranyap) Thank you @SD0001 . The Product Safety & Integrity team has reviewed this table and is c...
[15:47:08] <wikibugs>	 (03open) 10fnegri: Release version 0.2.0 [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/13 (https://phabricator.wikimedia.org/T351637)
[15:51:27] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12024726 (10dcaro) Finally got in in lima-kilo :), and this time was able to fetch the logs: ` ... [2026-06-16 15:47:14,711] p17:t140338572805824 /app/tjf/core/core.py:50:_update_s...
[15:52:20] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2005-dev.codfw.wmnet' (T429361)
[15:52:33] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2011-dev.codfw.wmnet' (T429361)
[15:52:33] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudcontrol2011-dev.codfw.wmnet' (T429361)
[15:52:39] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2010-dev.codfw.wmnet' (T429361)
[15:54:17] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:58:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[15:58:40] <wikibugs>	 (03update) 10fnegri: Release version 0.2.0 [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/13 (https://phabricator.wikimedia.org/T351637)
[15:59:17] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job openstack in cloud@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:01:45] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol2010-dev.codfw.wmnet' (T429361)
[16:03:05] <wikibugs>	 10Toolforge (Push-to-Deploy), 06tools-platform-team: [components-api] Queue builds when the build queue is full - https://phabricator.wikimedia.org/T402568#12024856 (10dcaro) a:05Raymond_Ndibe→03dcaro
[16:03:09] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[16:08:47] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: [ceph] Enable encrypted client traffic for the ceph clusters - https://phabricator.wikimedia.org/T294432#12024907 (10Volans) It seems that the current WMCS cluster does support encryption (`secure` mode) already. I guess probably a by-product of some cluster upgrades?  ` $ s...
[16:10:43] <wikibugs>	 (03update) 10fnegri: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 (owner: 10dcaro)
[16:10:45] <wikibugs>	 (03update) 10fnegri: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 (owner: 10dcaro)
[16:10:49] <wikibugs>	 (03approved) 10fnegri: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326 (owner: 10dcaro)
[16:12:11] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1011.eqiad.wmnet' (T429361)
[16:14:07] <wikibugs>	 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#12024934 (10Nihar_Chakravarti) a:03Nihar_Chakravarti
[16:17:34] <wikibugs>	 06cloud-services-team, 10Toolforge, 07patch-welcome: Toolforge 404 handler displays wrong tool name from ?url= query parameter - https://phabricator.wikimedia.org/T421542#12024961 (10Shadabgdg) a:03Shadabgdg
[16:17:46] <wikibugs>	 (03update) 10fnegri: dev: upgrade blubber buildkit and remove makefile [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/34 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) (owner: 10raymond-ndibe)
[16:17:47] <wikibugs>	 (03update) 10fnegri: dev: upgrade blubber buildkit and remove makefile [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/34 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) (owner: 10raymond-ndibe)
[16:17:53] <wikibugs>	 (03approved) 10fnegri: dev: upgrade blubber buildkit and remove makefile [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/34 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) (owner: 10raymond-ndibe)
[16:18:04] <wikibugs>	 (03merge) 10fnegri: dev: upgrade blubber buildkit and remove makefile [repos/cloud/toolforge/envvars-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-admission/-/merge_requests/34 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698) (owner: 10raymond-ndibe)
[16:18:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[16:21:22] <jinxer-wm>	 FIRING: [15x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:21:27] <icinga-wm>	 PROBLEM - Host cloudcontrol1011 is DOWN: PING CRITICAL - Packet loss = 100%
[16:22:10] <jinxer-wm>	 FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[16:22:54] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1011.eqiad.wmnet' (T429361)
[16:22:55] <icinga-wm>	 RECOVERY - Host cloudcontrol1011 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms
[16:23:25] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1007.eqiad.wmnet' (T429361)
[16:23:59] <wikibugs>	 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#12024990 (10Prakhar0804) a:05Nihar_Chakravarti→03Prakhar0804
[16:25:14] <wikibugs>	 (03update) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: envvars-admission: bump to 0.0.41-20260616161814-962b885d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1301 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698)
[16:25:18] <wikibugs>	 (03open) 10group_203_bot_3c0afd0d9fd9529f3b7bc7e69a4a3bce: envvars-admission: bump to 0.0.41-20260616161814-962b885d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1301 (https://phabricator.wikimedia.org/T321316 https://phabricator.wikimedia.org/T428698)
[16:26:22] <jinxer-wm>	 FIRING: [15x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:27:10] <jinxer-wm>	 RESOLVED: GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[16:29:08] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12025036 (10dcaro) This looks very very suspicious: `     except requests.exceptions.HTTPError as e:         if (not e.response) or e.response.status_code != 401:             # You...
[16:31:22] <jinxer-wm>	 RESOLVED: [15x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:34:17] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job maintain_dbusers_eqiad in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:34:45] <icinga-wm>	 PROBLEM - Host cloudcontrol1007 is DOWN: PING CRITICAL - Packet loss = 100%
[16:35:40] <jinxer-wm>	 FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[16:35:56] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1007.eqiad.wmnet' (T429361)
[16:36:13] <icinga-wm>	 RECOVERY - Host cloudcontrol1007 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms
[16:36:13] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol1006.eqiad.wmnet' (T429361)
[16:36:52] <jinxer-wm>	 FIRING: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:37:25] <jinxer-wm>	 FIRING: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[16:40:40] <jinxer-wm>	 RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[16:41:52] <jinxer-wm>	 RESOLVED: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:44:17] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job maintain_dbusers_eqiad in cloud@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[16:46:03] <icinga-wm>	 PROBLEM - SSH on cloudcontrol1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[16:46:03] <icinga-wm>	 PROBLEM - Memcached on cloudcontrol1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Memcached
[16:46:52] <jinxer-wm>	 FIRING: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:46:59] <icinga-wm>	 RECOVERY - SSH on cloudcontrol1006 is OK: SSH OK - OpenSSH_10.0p2 Debian-7+deb13u4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[16:47:00] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) on host 'cloudcontrol1006.eqiad.wmnet' (T429361)
[16:47:01] <icinga-wm>	 RECOVERY - Memcached on cloudcontrol1006 is OK: TCP OK - 7.191 second response time on 10.64.150.6 port 11211 https://wikitech.wikimedia.org/wiki/Memcached
[16:47:25] <jinxer-wm>	 FIRING: [3x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[16:47:40] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.upgrade_osds (T428385)
[16:47:41] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.upgrade_osds (exit_code=99) (T428385)
[16:51:27] <icinga-wm>	 PROBLEM - Host cloudcephosd1038 is DOWN: PING CRITICAL - Packet loss = 100%
[16:51:33] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12025214 (10dcaro) Found a couple issues (not sure this is the same exact case though):  * If the harbor images are deleted, then the storage still has the `aliases`, but the resol...
[16:51:52] <jinxer-wm>	 FIRING: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:52:07] <jinxer-wm>	 RESOLVED: [30x] HAProxyBackendUnavailable: HAProxy service cinder-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[16:52:25] <jinxer-wm>	 RESOLVED: [3x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[16:52:56] <jinxer-wm>	 FIRING: [2x] SystemdUnitDown: The service unit ceph-osd@281.service is in failed status on host cloudcephosd1037. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[16:54:55] <icinga-wm>	 RECOVERY - Host cloudcephosd1038 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms
[16:58:26] <wm-bot2>	 !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634024663 (https://github.com/cluebotng/component-configs/commits/8a9410bd29ead90dcd183405a7c6ef35c0d898af)
[16:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL
[16:59:39] <wikibugs>	 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#12025301 (10Nihar_Chakravarti) a:05Prakhar0804→03Nihar_Chakravarti
[17:00:45] <icinga-wm>	 PROBLEM - Host cloudcephosd1037 is DOWN: PING CRITICAL - Packet loss = 100%
[17:00:54] <wikibugs>	 10Cloud-VPS (Quota-requests), 06Release-Engineering-Team (Radar): Quota increase request for zuul - https://phabricator.wikimedia.org/T428515#12025314 (10dduvall) 05Open→03Resolved a:03dduvall Sorry, yes! We have a working setup that uses CAPI and a custom SANs entry for haproxy to give us a publicly...
[17:01:04] <wm-bot2>	 !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634022645 (https://github.com/cluebotng/component-configs/commits/8a9410bd29ead90dcd183405a7c6ef35c0d898af)
[17:01:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL
[17:01:14] <wm-bot2>	 !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634025367 (https://github.com/cluebotng/component-configs/commits/8a9410bd29ead90dcd183405a7c6ef35c0d898af)
[17:01:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL
[17:02:56] <jinxer-wm>	 RESOLVED: [2x] SystemdUnitDown: The service unit ceph-osd@281.service is in failed status on host cloudcephosd1037. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[17:03:06] <wikibugs>	 10Toolforge, 06tools-platform-team: [jobs-cli] emits a warning to re-create valid jobs - https://phabricator.wikimedia.org/T429231#12025332 (10DamianZaremba) This might also explain why some deployments are restarting jobs that have no changes e.g. ` Deployment ID: 20260616-165737-i4ix4154s0  Builds:   cluebot...
[17:03:07] <wikibugs>	 10Cloud-VPS (Quota-requests), 06Release-Engineering-Team (Radar): Quota increase request for zuul - https://phabricator.wikimedia.org/T428515#12025333 (10Volans) Great, thanks for closing the loop.
[17:04:13] <icinga-wm>	 RECOVERY - Host cloudcephosd1037 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[17:10:33] <wm-bot2>	 !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634738411 (https://github.com/cluebotng/component-configs/commits/a1821e6d783f3a9822825a8325623592bd1754ea)
[17:10:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL
[17:13:29] <wm-bot2>	 !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634737553 (https://github.com/cluebotng/component-configs/commits/a1821e6d783f3a9822825a8325623592bd1754ea)
[17:13:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL
[17:14:17] <wikibugs>	 06cloud-services-team, 10Toolforge: [logs-api] failing to return logs for job - https://phabricator.wikimedia.org/T429265#12025378 (10DamianZaremba) Is not happening currently (logs are being returned), so there appears to be some transient error that was present overnight
[17:14:39] <wm-bot2>	 !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27634737527 (https://github.com/cluebotng/component-configs/commits/a1821e6d783f3a9822825a8325623592bd1754ea)
[17:14:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL
[17:15:06] <wikibugs>	 10Toolforge, 06tools-platform-team: [logs-api] failing to return logs for job - https://phabricator.wikimedia.org/T429265#12025382 (10aputhin)
[17:19:12] <wikibugs>	 (03open) 10dcaro: core.images._get_harbor_images: fix wrong stale cache check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/310
[17:19:25] <wikibugs>	 (03open) 10dcaro: core.images.from_short_name_or_url: add aliases for unknown image [repos/cloud/toolforge/jobs-api] (fix_harbor_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/311
[17:19:41] <wikibugs>	 (03update) 10dcaro: core.images.from_short_name_or_url: add aliases for unknown image [repos/cloud/toolforge/jobs-api] (fix_harbor_cache) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/311
[17:29:57] <wm-bot2>	 !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27635849962 (https://github.com/cluebotng/component-configs/commits/893b9730fafbbd5762e1473e55ad6dbe8cb9db24)
[17:30:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL
[17:32:51] <wm-bot2>	 !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27635849976 (https://github.com/cluebotng/component-configs/commits/893b9730fafbbd5762e1473e55ad6dbe8cb9db24)
[17:32:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL
[17:34:07] <wm-bot2>	 !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27635849959 (https://github.com/cluebotng/component-configs/commits/893b9730fafbbd5762e1473e55ad6dbe8cb9db24)
[17:34:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL
[17:42:41] <wikibugs>	 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Privacy Engineering, 13Patch-For-Review: Add global_edit_count to wikireplicas - https://phabricator.wikimedia.org/T344108#12025675 (10SD0001) Hi @aranyap, similar to T402145, could you review and approve this one too?   For reference, this is...
[17:55:20] <wikibugs>	 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387 (10Andrew) 03NEW
[17:58:56] <wikibugs>	 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12025829 (10Andrew) I am trying compacting and restarting on 4 of the osds: 126, 127, 129, 131.   ` ceph tell osd.126 compact systemctl restart ceph-osd@126.service  `  i...
[18:00:29] <wikibugs>	 (03open) 10dcaro: _update_storage_job_status_from_runtime: exclude image.exists/status [repos/cloud/toolforge/jobs-api] (fix_aliases) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/312
[18:01:43] <wikibugs>	 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12025853 (10Andrew) going to compact and restart four more:  119, 120, 121, 123.
[18:04:48] <wikibugs>	 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12025875 (10Andrew) 131 is back in the list:   ` osd.131 observed slow operation indications in BlueStore `  so compact/restart is at least not the 100% solution
[18:09:56] <jinxer-wm>	 FIRING: SystemdUnitDown: The service unit ceph-osd@119.service is in failed status on host cloudcephosd1019. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1019 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[18:12:28] <wikibugs>	 06cloud-services-team, 10Toolforge, 07Documentation, 07good first task: Create a new doc about managing and sharing files in Toolforge - https://phabricator.wikimedia.org/T347753#12025899 (10apaskulin) Thanks for your work on this, @Tejinderk.2004! The draft you created looks great! The structure does a gr...
[18:45:06] <wikibugs>	 10Cloud-VPS (Project-requests): Request creation of eduwikihubstaging VPS project - https://phabricator.wikimedia.org/T429032#12026014 (10Ragesoss) Confirming, I have access and I see the others on the list. Thanks @Volans !
[18:48:09] <wm-bot2>	 !log tools.cluebotng Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/27639941201 (https://github.com/cluebotng/component-configs/commits/893b9730fafbbd5762e1473e55ad6dbe8cb9db24)
[18:48:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL
[18:48:59] <wikibugs>	 (03update) 10dcaro: core.images._get_harbor_images: fix wrong stale cache check [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/310
[18:54:25] <wikibugs>	 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12026048 (10Andrew) Further evidence that this is a reporting issue:   ` osd.128 observed slow operation indications in BlueStore `   ` ceph daemon osd.128 dump_ops_in_fl...
[18:55:53] <wikibugs>	 (03merge) 10dcaro: basic_system: increase the inotify limits [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/326
[19:06:27] <wikibugs>	 10Cloud-VPS (Debian Bullseye Deprecation), 06tools-platform-team: Reach out to Cloud VPS project maintainers about Debian Bullseye deprecation - https://phabricator.wikimedia.org/T428196#12026103 (10komla)
[19:22:04] <wikibugs>	 06cloud-services-team, 10Cloud-VPS (Debian Bullseye Deprecation): Replace https://os-deprecation.toolforge.org/ with something that handles in-place upgraded hosts - https://phabricator.wikimedia.org/T428919#12026243 (10Andrew) >>! In T428919#12024397, @aputhin wrote: >> So if a host was built off of bullseye...
[19:29:30] <wikibugs>	 06cloud-services-team, 10Quarry, 07patch-welcome, 07Plural-Support: "1 rows" should be "1 row" - https://phabricator.wikimedia.org/T419564#12026274 (10Nihar_Chakravarti) I investigated the issue and submitted a fix:  https://github.com/toolforge/quarry/pull/98  The change updates the result count label to...
[19:36:57] <wikibugs>	 06cloud-services-team: cloudceph "HEALTH_WARN 17 OSD(s) experiencing slow operations in BlueStore" - https://phabricator.wikimedia.org/T429387#12026329 (10Andrew) From https://forum.proxmox.com/threads/ceph-after-upgrade-to-18-2-6-observed-slow-operation-indications-in-bluestore.165741/ I propose we make this le...
[19:42:36] <wikibugs>	 06cloud-services-team, 10Toolforge, 07Documentation, 07good first task: Create a new doc about managing and sharing files in Toolforge - https://phabricator.wikimedia.org/T347753#12026359 (10BLiviero-WMF) Thank you for capturing this information!   A small suggestion is, a section about backing up files el...
[19:42:37] <wikibugs>	 06cloud-services-team, 10Toolforge: toolforge webservice logs output has encoding issues and gets truncated compared to kubectl logs - https://phabricator.wikimedia.org/T429028#12026360 (10diegodlh) 05Open→03Declined Oh sure, no worries. I can perfectly wait. Closing this then. Thanks!
[19:49:56] <jinxer-wm>	 FIRING: [4x] SystemdUnitDown: The service unit ceph-osd@111.service is in failed status on host cloudcephosd1018. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[19:55:39] <jinxer-wm>	 RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[20:04:09] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[20:11:41] <jinxer-wm>	 RESOLVED: [4x] SystemdUnitDown: The service unit ceph-osd@111.service is in failed status on host cloudcephosd1018. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[20:15:32] <wmcs-alerts>	 FIRING: NfsAlmostFull: The NFS drive is over 85% capacity (currently 85.28%) at host paws-nfs-1 in project paws   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DNfsAlmostFull
[21:15:34] <jinxer-wm>	 FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.996% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace
[21:56:42] <wikibugs>	 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team (MWI-Sprint-36 (2026-06-16 to 2026-06-30)): Support requestBody content examples in MediaWiki REST Framework OAD generation - https://phabricator.wikimedia.org/T427360#12026952 (10AGhirelli-WMF) a:03AGhirelli-WMF