[00:13:12] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 13Patch-For-Review: OpenAPI linting: Add missing OpenAPI spec elements to Response Components - https://phabricator.wikimedia.org/T422739#11802614 (10Mooeypoo) I've added an example and descriptions to the response component. The OpenAP... [00:31:12] 06cloud-services-team, 10Toolforge, 06tools-platform-team: Running dotnet job fails on Toolforge because "24" builder stack changed the compiled binary output path - https://phabricator.wikimedia.org/T422224#11802630 (10Hawkeye7) Yes! It is working! Thank you! [00:31:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [00:50:31] 10Tool-refill: delete 33,000 unnecessary front end files from production - https://phabricator.wikimedia.org/T422441#11802664 (10Novem_Linguae) Today I started redirecting absolutely all web traffic to /ng/index.html (the Vue app), using a URL rewrite rule in .lighttpd.conf. This should be a good "scream test" t... [00:51:42] 10Tool-refill: refill back end (python) continuous integration is broken - https://phabricator.wikimedia.org/T367026#11802665 (10Novem_Linguae) [00:58:11] 10Tool-refill: add a language picker to preferences - https://phabricator.wikimedia.org/T422772 (10Novem_Linguae) 03NEW [00:59:35] 10Tool-refill: figure out if we're still using internationalization/localization - https://phabricator.wikimedia.org/T422440#11802690 (10Novem_Linguae) 05Open→03Resolved a:03Novem_Linguae We completed "figure out if we're still using internationalization/localization", so marking this as resolved. I fi... [01:56:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [02:30:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [04:40:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [05:19:52] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11802846 (10Marostegui) @fnegri can we resume this and reimage one host with Trixie? [05:20:27] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11802860 (10Marostegui) If possible clouddb1015 would be nice T422777 [05:21:26] (03open) 10samwilson: Check temp file before adding picture to Zip file [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/8 (https://phabricator.wikimedia.org/T415479) [05:49:20] 06cloud-services-team, 10Toolforge, 06Release-Engineering-Team, 10GitLab (Integrations): gitlab-webhooks build fails on "The runtime.txt file isn't supported" - https://phabricator.wikimedia.org/T422734#11802933 (10dcaro) Made a note in https://wikitech.wikimedia.org/wiki/Help:Toolforge/Building_contai... [06:02:17] (03open) 10dcaro: igress-nginx: upgrade to 1.14.5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1216 [06:03:29] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [06:03:34] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx [06:04:29] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [06:06:44] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx [06:06:52] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [06:08:18] FIRING: IngressPodMisplaced: ingress-nginx-gen2 pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIngressPodMisplaced [06:12:05] !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx [06:12:51] (03update) 10dcaro: igress-nginx: upgrade to 1.14.5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1216 [06:15:14] (03update) 10dcaro: igress-nginx: upgrade to 1.14.5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1216 [06:16:28] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [06:18:54] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx [06:18:59] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [06:23:18] FIRING: [3x] IngressPodMisplaced: ingress-nginx-gen2 pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIngressPodMisplaced [06:24:13] !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx [06:28:18] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx [06:28:18] FIRING: [6x] IngressPodMisplaced: ingress-nginx-gen2 pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIngressPodMisplaced [06:29:55] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx [06:30:57] (03approved) 10dcaro: igress-nginx: upgrade to 1.14.5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1216 [06:31:01] (03merge) 10dcaro: igress-nginx: upgrade to 1.14.5 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1216 [06:33:18] RESOLVED: [6x] IngressPodMisplaced: ingress-nginx-gen2 pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIngressPodMisplaced [07:25:18] 10Data-Services, 06tools-infrastructure-team, 10Datasets-General-or-Unknown, 13Patch-For-Review: Stop serving dumps.wikimedia.org port 80 - https://phabricator.wikimedia.org/T422672#11803016 (10taavi) 05Open→03Resolved [07:28:27] 10VPS-project-Codesearch: Remove DarkVector from codesearch - https://phabricator.wikimedia.org/T407115#11803020 (10lcawte) >>! In T407115#11276072, @jhsoby wrote: > The [[https://github.com/MWStake/nonwmf-skins/blob/master/README.mediawiki|README]] in the MWStake repo says "If the skin is no longer supported an... [07:34:30] 10Tool-refill: create a deploy.sh script for the front end - https://phabricator.wikimedia.org/T422570#11803031 (10Novem_Linguae) > you should probably not be installing things directly on the bastion -- instead, use a container like webservice node20 shell > I don't think Node has as many problems with it as Py... [07:47:26] 06tools-platform-team: [Toolforge]: User Stats Baseline (Channels, Accounts, Activity - https://phabricator.wikimedia.org/T422783 (10komla) 03NEW [07:48:04] 06tools-platform-team: [Toolforge]: User Stats Baseline (Outreach Channels, Accounts, Activity) - https://phabricator.wikimedia.org/T422783#11803059 (10komla) [07:48:21] 10Toolforge, 06tools-platform-team: [Toolforge]: User Stats Baseline (Outreach Channels, Accounts, Activity) - https://phabricator.wikimedia.org/T422783#11803061 (10fnegri) [07:53:16] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 13Patch-For-Review: [wikireplicas] add proper dry-run/diff mode to maintain-views - https://phabricator.wikimedia.org/T351637#11803071 (10fnegri) I'm testing the patches above on `clouddb1017`, and this caused {T422779}. I d... [08:30:21] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud init and unattended upgrades while bootstrapping Trixie VMs - https://phabricator.wikimedia.org/T422509#11803189 (10fgiunchedi) Indeed under normal circumstances `cloud-init` will try to bring back puppet to 7 after the first puppet run (from modu... [08:33:20] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud init and unattended upgrades while bootstrapping Trixie VMs - https://phabricator.wikimedia.org/T422509#11803196 (10fgiunchedi) Alternatively we can ask unattended-upgrades to not do anything until cloud-init has finished, though I'd rather avoid... [08:46:46] 06cloud-services-team, 10Cloud-VPS: wmcs cookbook "--project" arg is ambiguous, could mean project id or project name - https://phabricator.wikimedia.org/T422515#11803289 (10fgiunchedi) The other aspect to consider, which was the culprit in this case, is OS_PROJECT_ID vs OS_PROJECT_NAME usage (+OS_PROJECT_DOMA... [08:53:34] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11803302 (10fnegri) a:03fnegri Sure, I can reimage clouddb1015 to Trixie. [09:05:37] 06cloud-services-team, 10Cloud-VPS: wmcs cookbook "--project" arg is ambiguous, could mean project id or project name - https://phabricator.wikimedia.org/T422515#11803376 (10dcaro) >>! In T422515#11803289, @fgiunchedi wrote: > The other aspect to consider, which was the culprit in this case, is OS_PROJECT_ID v... [09:13:43] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 13Patch-For-Review: OpenAPI linting: Add missing OpenAPI spec elements to Response Components - https://phabricator.wikimedia.org/T422739#11803423 (10KBach) It's not exactly a collision, but the example validation seems to need a bit mo... [09:20:18] (03update) 10dcaro: improve image parsing tests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/282 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [09:31:36] (03update) 10fnegri: Add summary with counts [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/11 (https://phabricator.wikimedia.org/T351637) [09:31:40] (03update) 10fnegri: Add --diff-mode and remove --dry-run [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/10 (https://phabricator.wikimedia.org/T351637) [09:31:42] (03update) 10fnegri: Replace only views that need updating [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/9 (https://phabricator.wikimedia.org/T351637) [09:35:41] (03update) 10fnegri: Add summary with counts [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/11 (https://phabricator.wikimedia.org/T351637) [09:35:45] (03update) 10fnegri: Add --diff-mode and remove --dry-run [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/10 (https://phabricator.wikimedia.org/T351637) [09:35:46] (03update) 10fnegri: Replace only views that need updating [repos/cloud/wikireplicas-utils] - 10https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/9 (https://phabricator.wikimedia.org/T351637) [09:53:58] (03update) 10dcaro: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [09:57:59] (03open) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [10:00:38] (03update) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [10:04:55] (03update) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [10:06:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.918% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [10:09:21] 06cloud-services-team, 10Cloud-VPS: Consider allowing cumin access to all Cloud VPS VMs - https://phabricator.wikimedia.org/T422801 (10fgiunchedi) 03NEW [10:15:57] 06cloud-services-team, 10Cloud-VPS: Consider allowing cumin access to all Cloud VPS VMs - https://phabricator.wikimedia.org/T422801#11803632 (10fgiunchedi) [10:20:46] 10Toolforge (Toolforge iteration 26), 06tools-platform-team, 13Patch-For-Review: [harbor,tools] Harbor object usage in S3 is steadily increasing - https://phabricator.wikimedia.org/T418528#11803637 (10aputhin) [10:21:02] 10Tool-clarity-tool: Add automatic title suggestions in Clarity Tool - https://phabricator.wikimedia.org/T415780#11803639 (10Aklapper) [10:21:25] (03update) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [10:32:22] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 13Patch-For-Review: [wikireplicas] add proper dry-run/diff mode to maintain-views - https://phabricator.wikimedia.org/T351637#11803689 (10fnegri) I fixed the issue and it now completes successfully, and it does find some vie... [10:47:51] 10Data-Services, 06tools-platform-team, 06Data-Persistence: [wikireplicas] Update grants for "maintainviews" user - https://phabricator.wikimedia.org/T422806 (10fnegri) 03NEW [10:48:02] 10Data-Services, 06tools-platform-team, 06Data-Persistence: [wikireplicas] Update grants for "maintainviews" user - https://phabricator.wikimedia.org/T422806#11803772 (10fnegri) 05Open→03In progress p:05Triage→03Medium [10:58:22] 10Data-Services, 06tools-platform-team, 06Data-Persistence: [wikireplicas] Update grants for "maintainviews" user - https://phabricator.wikimedia.org/T422806#11803799 (10Marostegui) We can probably start by running a pt-show-grants on each host, compare them and make sure they are all the same. Once done tha... [11:00:28] (03update) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [11:04:41] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team: Support pre-built images on components-api - https://phabricator.wikimedia.org/T405262#11803814 (10dcaro) >>! In T405262#11799849, @aputhin wrote: > Changing to medium prio after refinement. We need a bit more clarity on impact of... [11:07:37] (03open) 10dcaro: py3.13-trixie-tox: Add coverage gathering [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/83 [11:08:20] (03update) 10dcaro: py3.13-trixie-tox: Add coverage gathering [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/83 [11:08:53] (03update) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [11:10:17] (03update) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [12:06:06] PROBLEM - Host clouddb1019 is DOWN: PING CRITICAL - Packet loss = 100% [12:06:40] (03update) 10dcaro: support --webservice option [repos/cloud/toolforge/jobs-cli] (publish_continuous_job_to_internet) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/143 (https://phabricator.wikimedia.org/T348755) (owner: 10raymond-ndibe) [12:07:22] FIRING: [2x] HAProxyBackendUnavailable: HAProxy service wikireplica-db-analytics-s4 backend clouddb1019.eqiad.wmnet is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:07:22] (03update) 10dcaro: images: support harbor-based pre-built images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/252 (https://phabricator.wikimedia.org/T409727) [12:07:28] (03approved) 10fnegri: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 (owner: 10dcaro) [12:08:09] (03update) 10dcaro: support publishing continuous jobs to the internet [repos/cloud/toolforge/jobs-cli] (refactor_job_payload) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/142 (https://phabricator.wikimedia.org/T388092) (owner: 10raymond-ndibe) [12:09:24] 06cloud-services-team, 10Data-Services, 06Data-Persistence: clouddb1019 down - https://phabricator.wikimedia.org/T422813 (10Marostegui) 03NEW [12:10:36] 06cloud-services-team, 10Data-Services, 06Data-Persistence: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804002 (10Marostegui) ` ------------------------------------------------------------------------------- Record: 8 Date/Time: 04/09/2026 12:05:09 Source: system Severity: C... [12:11:19] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, 10ops-eqiad: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804005 (10Marostegui) #ops-eqiad can you check on site? The above errors seem HW related. [12:17:22] RESOLVED: [2x] HAProxyBackendUnavailable: HAProxy service wikireplica-db-analytics-s4 backend clouddb1019.eqiad.wmnet is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [12:22:11] (03update) 10dcaro: [jobs-cli] refactor job payload [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/98 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [12:23:17] (03approved) 10filippo: py3.13-trixie-tox: Add coverage gathering [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/83 (owner: 10dcaro) [12:25:59] 10Cloud Services Proposals, 06cloud-services-team, 10Wikibase GraphQL: GraphQL frontend challenges: evaluate better integrations with PAWS (Jupyter) - https://phabricator.wikimedia.org/T422817 (10valerio.bozzolan) 03NEW [12:27:19] (03merge) 10dcaro: py3.13-trixie-tox: Add coverage gathering [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/83 [12:27:28] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/67 [12:27:43] 10Cloud Services Proposals, 06cloud-services-team, 06Data-Engineering, 10Data-Engineering-Jupyter, 10Wikibase GraphQL: GraphQL frontend challenges: evaluate better integrations with PAWS (Jupyter) - https://phabricator.wikimedia.org/T422817#11804077 (10valerio.bozzolan) Kindly adding under the attention... [12:29:05] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/67 (owner: 10l10n-bot) [12:29:10] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/67 (owner: 10l10n-bot) [12:29:41] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/weapon-of-mass-description] - 10https://gerrit.wikimedia.org/r/1269437 (owner: 10L10n-bot) [12:31:44] 10Cloud Services Proposals, 06cloud-services-team, 06Data-Engineering, 10Data-Engineering-Jupyter, and 2 others: GraphQL frontend challenges: evaluate better integrations with PAWS (Jupyter) - https://phabricator.wikimedia.org/T422817#11804099 (10valerio.bozzolan) If taken in consideration, a subtask could... [12:35:07] 10Data-Services, 06tools-platform-team, 06Data-Persistence: [wikireplicas] Update grants for "maintainviews" user - https://phabricator.wikimedia.org/T422806#11804109 (10fnegri) I did a cumin run: `lang=shell-session fnegri@cumin1003:~$ sudo cumin clouddb* 'for s in $(ls /run/mysqld/mysqld*sock); do pt-show... [12:40:40] 10Data-Services, 06tools-platform-team, 06Data-Persistence: [wikireplicas] Update grants for "maintainviews" user - https://phabricator.wikimedia.org/T422806#11804121 (10fnegri) Manual commands I'm running on the affected hosts to fix the double escape: `lang=mysql GRANT ALL PRIVILEGES ON `meta\_p`.* TO `ma... [12:44:07] 06cloud-services-team, 10PAWS, 06Data-Engineering, 10Data-Engineering-Jupyter, and 2 others: GraphQL frontend challenges: evaluate better integrations with PAWS (Jupyter) - https://phabricator.wikimedia.org/T422817#11804131 (10taavi) [12:44:57] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804137 (10Jclark-ctr) a:03Jclark-ctr [12:51:26] (03open) 10raymond-ndibe: images::from_url_or_name: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978) [12:52:57] 10Data-Services, 06tools-platform-team, 06Data-Persistence: [wikireplicas] Update grants for "maintainviews" user - https://phabricator.wikimedia.org/T422806#11804158 (10fnegri) Done, and verified that now the grants for `maintainviews` are in sync across all clouddbs (except clouddb1019 that I could not ver... [12:53:45] 10Data-Services, 06tools-platform-team, 06Data-Persistence: [wikireplicas] Update grants for "maintainviews" user - https://phabricator.wikimedia.org/T422806#11804163 (10Marostegui) [12:53:48] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804164 (10Marostegui) [12:54:12] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [12:56:08] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: oslo.messaging does not failover to the next rabbit host on traffic blackhole situations - https://phabricator.wikimedia.org/T422820 (10fgiunchedi) 03NEW [12:56:16] (03update) 10raymond-ndibe: images::from_url_or_name: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978) [12:56:49] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: Carry out controlled network switch down tests in cloud - https://phabricator.wikimedia.org/T417393#11804183 (10fgiunchedi) >>! In T417393#11799875, @Andrew wrote: >> cloudcontrol nodes not in C8 (i.e. 1006/1007) though didn't seem to give up trying to co... [12:56:51] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804185 (10Jclark-ctr) this server is out of warranty i performed flea power drain and did come up. i am updating firmwares right now you might see it reboot a few times [12:57:11] (03update) 10raymond-ndibe: images: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978) [12:57:25] (03update) 10raymond-ndibe: images.py: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978) [12:57:54] RECOVERY - Host clouddb1019 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [12:58:32] PROBLEM - SSH on clouddb1019 is CRITICAL: connect to address 10.64.48.9 and port 22: Connection refused https://wikitech.wikimedia.org/wiki/SSH/monitoring [12:58:38] (03update) 10raymond-ndibe: images.py: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978) [13:00:16] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804220 (10Marostegui) Thank you @Jclark-ctr - let us know when we can take over. Thankfully its replacement will arrive soon (famous last words) (T405296) [13:01:33] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Cloud-VPS: oslo.messaging does not failover to the next rabbit host on traffic blackhole situations - https://phabricator.wikimedia.org/T422820#11804234 (10fgiunchedi) [13:09:20] PROBLEM - Host clouddb1019 is DOWN: PING CRITICAL - Packet loss = 100% [13:24:13] 10Tools, 06LPL Onboarding and Development, 10Starter Kit, 10LPL Projects (Starter kit): Host and deploy starter kit tool on Toolforge - https://phabricator.wikimedia.org/T420074#11804350 (10MaryMunyoki) [13:27:54] 06cloud-services-team, 10Toolforge: Toolforge HTML head links sometimes are issued as http://.toolforge:443 - https://phabricator.wikimedia.org/T422829 (10fgiunchedi) 03NEW [13:29:58] RECOVERY - Host clouddb1019 is UP: PING OK - Packet loss = 0%, RTA = 0.18 ms [13:36:42] (03approved) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [13:36:47] (03merge) 10dcaro: build: add coverage report [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 [13:38:02] 06cloud-services-team, 10Toolforge: Toolforge HTML head links sometimes are issued as http://.toolforge:443 - https://phabricator.wikimedia.org/T422829#11804439 (10taavi) a:03taavi This is an issue with our Istio configuration: {P90341} [13:39:29] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.482-20260409133657-c1e22033 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1217 [13:39:34] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.482-20260409133657-c1e22033 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1217 [13:40:29] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804450 (10Jclark-ctr) @Marostegui the hardware error has cleared for now, but the system is reporting filesystem corruption and will need to be reimaged. Let’s keep the tic... [13:41:40] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:44:19] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 2 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804456 (10Marostegui) Thanks John - let me reimage it now [13:46:51] (03update) 10dcaro: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [13:52:02] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11804490 (10fnegri) a:05fnegri→03Marostegui Change of plans, @Marostegui will reimage clouddb1019 instead, as tha... [13:52:04] 06cloud-services-team, 10Cloud-VPS: Openstack uwsgi logging to '.log' - https://phabricator.wikimedia.org/T422830 (10fgiunchedi) 03NEW [13:52:18] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Data-Services, 06tools-platform-team, 06Data-Persistence: Install a clouddb hosts with Debian Trixie - https://phabricator.wikimedia.org/T415165#11804506 (10fnegri) [13:52:23] 06cloud-services-team, 10Data-Services, 06DBA, 06DC-Ops, and 3 others: clouddb1019 down - https://phabricator.wikimedia.org/T422813#11804507 (10fnegri) [13:53:12] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [13:54:11] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [13:56:25] (03update) 10dcaro: improve image parsing tests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/282 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [13:56:35] 10VPS-project-Phabricator, 06collaboration-services, 06Infrastructure-Foundations, 10Mail: @wikimedia.org email addresses don't seem to be receiving emails sent by the test Phabricator instance - https://phabricator.wikimedia.org/T422559#11804524 (10Nintendofan885) [14:01:33] PROBLEM - Host clouddb1019 is DOWN: PING CRITICAL - Packet loss = 100% [14:01:41] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 6.592% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [14:03:00] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [14:03:57] (03open) 10raymond-ndibe: values.yaml: hoist web image variants to top of config [repos/cloud/toolforge/image-config] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/21 (https://phabricator.wikimedia.org/T415322) [14:04:25] (03update) 10raymond-ndibe: values.yaml: hoist web image variants to top of config [repos/cloud/toolforge/image-config] (replace_job_with_webservice_image_variants) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/21 (https://phabricator.wikimedia.org/T415322) [14:08:44] FIRING: [2x] IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [14:09:11] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [14:10:50] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team (MWI-Sprint-31 (2026-04-07 to 2026-04-21)): Fix linter issues discovered during implementation of the OAD example - https://phabricator.wikimedia.org/T414974#11804600 (10AGhirelli-WMF) a:05Atieno→03AGhirelli-WMF [14:13:44] RESOLVED: [3x] IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [14:15:14] FIRING: [5x] IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [14:16:59] (03open) 10taavi: istio-gateway: Trust X-Forwarded-Proto from previous proxy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1218 (https://phabricator.wikimedia.org/T422829) [14:17:01] (03update) 10taavi: istio-gateway: Trust X-Forwarded-Proto from previous proxy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1218 (https://phabricator.wikimedia.org/T422829) [14:17:31] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge (Toolforge iteration 26): Replace ingress-nginx before upstream EOL date - https://phabricator.wikimedia.org/T392356#11804647 (10taavi) [14:17:34] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Toolforge HTML head links sometimes are issued as http://.toolforge:443 - https://phabricator.wikimedia.org/T422829#11804646 (10taavi) [14:17:47] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Toolforge HTML head links sometimes are issued as http://.toolforge:443 - https://phabricator.wikimedia.org/T422829#11804649 (10taavi) p:05Triage→03High [14:20:14] RESOLVED: [5x] IstioGatewayPodMisplaced: istio-gateway pod misplaced - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced - https://prometheus-alerts.wmcloud.org/?q=alertname%3DIstioGatewayPodMisplaced [14:20:14] (03update) 10taavi: istio-gateway: Trust X-Forwarded-Proto from previous proxy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1218 (https://phabricator.wikimedia.org/T422829) [14:20:40] !log tools.cluebotng-review Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24195131449 (https://github.com/cluebotng/component-configs/commits/6b512f6db7cc4e49078b135e437185906821ae81) [14:20:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [14:22:40] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud init and unattended upgrades while bootstrapping Trixie VMs - https://phabricator.wikimedia.org/T422509#11804688 (10Andrew) The base image is based on a trixie VM with our puppet classes already applied (that happens at build time). So shouldn't /... [14:25:18] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud init and unattended upgrades while bootstrapping Trixie VMs - https://phabricator.wikimedia.org/T422509#11804703 (10taavi) >>! In T422509#11804688, @Andrew wrote: > The base image is based on a trixie VM with our puppet classes already applied (th... [14:36:12] (03approved) 10dcaro: jobs-api: bump to 0.0.482-20260409133657-c1e22033 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1217 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:36:15] (03merge) 10dcaro: jobs-api: bump to 0.0.482-20260409133657-c1e22033 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1217 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [15:08:16] 06cloud-services-team, 10Cloud-VPS: Consider allowing cumin access to all Cloud VPS VMs - https://phabricator.wikimedia.org/T422801#11804970 (10Volans) +1 to add the keys from my pov. Cumin aliases can help to group/exclude those hosts from common selections. [15:09:42] 10Tool-wmf-openapi-linter, 06MW-Interfaces-Team, 07OKR-Work: Add linting rules for operations and paths - https://phabricator.wikimedia.org/T422504#11804980 (10aaron) [15:11:34] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 07OKR-Work: Add linting rules for operations and paths - https://phabricator.wikimedia.org/T422504#11805003 (10HCoplin-WMF) [15:15:31] 10Tool-wmf-openapi-linter, 10MediaWiki-REST-API, 06MW-Interfaces-Team, 07OKR-Work: Add the remaining linting rules - https://phabricator.wikimedia.org/T422600#11805071 (10HCoplin-WMF) [15:23:59] 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07Epic, 07OKR-Work: [5.2.5b Epic] Implement and improve linter rules - https://phabricator.wikimedia.org/T422479#11805162 (10KBach) [15:24:21] 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07OKR-Work: [Hypothesis] 5.2.5b: Productionalize API spec linting - https://phabricator.wikimedia.org/T422476#11805163 (10KBach) [15:25:24] 10Tool-wmf-openapi-linter, 03[MWI] FY2025-26 Q4, 07Epic, 07OKR-Work: [5.2.5b Epic] Create a proof of concept of a CI workflow with the spec linter - https://phabricator.wikimedia.org/T422483#11805166 (10KBach) [15:26:55] (03PS1) 10Urbanecm: deletedEverywhere: update to linktarget [labs/tools/urbanecmbot] - 10https://gerrit.wikimedia.org/r/1269503 [15:27:04] (03CR) 10Urbanecm: [C:03+2] deletedEverywhere: update to linktarget [labs/tools/urbanecmbot] - 10https://gerrit.wikimedia.org/r/1269503 (owner: 10Urbanecm) [15:27:14] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [15:27:36] (03Merged) 10jenkins-bot: deletedEverywhere: update to linktarget [labs/tools/urbanecmbot] - 10https://gerrit.wikimedia.org/r/1269503 (owner: 10Urbanecm) [15:30:40] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: Replace ingress-nginx before upstream EOL date - https://phabricator.wikimedia.org/T392356#11805205 (10taavi) [15:30:42] 06cloud-services-team, 10Toolforge: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11805208 (10taavi) [15:30:45] 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [jobs-api] Use the same images as webservice - https://phabricator.wikimedia.org/T415322#11805206 (10taavi) [15:30:46] 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [harbor,tools] Harbor object usage in S3 is steadily increasing - https://phabricator.wikimedia.org/T418528#11805207 (10taavi) [15:30:59] 10Toolforge, 06tools-platform-team: [general] upgrade all python repos to python >=3.13 - https://phabricator.wikimedia.org/T422184#11805214 (10taavi) [15:31:03] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team, 07Epic, 13Patch-For-Review: [jobs-api] allow exposing continuous jobs to the internet via `toolname.toolforge.org`, just like webservice - https://phabricator.wikimedia.org/T388092#11805212 (10taavi) [15:31:06] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [builds-builder] Add support for Heroku's "24" builder stack based on Ubuntu 2024.04 noble - https://phabricator.wikimedia.org/T380127#11805211 (10taavi) [15:31:07] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team, 07Epic, 13Patch-For-Review: [jobs-api,webservice] Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#11805213 (10taavi) [15:31:25] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [jobs-api] make job status an enum, with clearly defined states - https://phabricator.wikimedia.org/T401172#11805217 (10taavi) [15:31:28] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: add more logs tests to toolforge-deploy - https://phabricator.wikimedia.org/T418326#11805219 (10taavi) [15:31:32] 10Cloud Services Proposals, 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 05Cloud-Services-Origin-Team, and 3 others: [builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-deploy and build pack... - https://phabricator.wikimedia.org/T194332#11805218 [15:31:43] 10Toolforge, 06tools-platform-team: [Toolforge Sustainability Framework] Create an inventory of Toolforge actions - https://phabricator.wikimedia.org/T420559#11805229 (10taavi) [15:31:47] 06cloud-services-team, 10Toolforge: [Toolforge Sustainability Framework]Percentage scoring of framework subcategories - https://phabricator.wikimedia.org/T420425#11805230 (10taavi) [15:31:48] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 06tools-platform-team: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#11805228 (10taavi) [15:31:57] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 07Epic: [KR] WE6.3 Introduce a sustainability scoring system for the Toolforge platform - https://phabricator.wikimedia.org/T368600#11805232 (10taavi) [15:32:01] 06cloud-services-team, 10Toolforge: [docs] enable docs linter in one of the repos - https://phabricator.wikimedia.org/T397949#11805234 (10taavi) [15:32:05] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [components-api] allow specifying `source_repo`+`ref` for the config - https://phabricator.wikimedia.org/T402764#11805235 (10taavi) [15:32:11] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.32 - https://phabricator.wikimedia.org/T379047#11805239 (10taavi) [15:32:14] 06cloud-services-team, 10Data-Services, 06DBA, 07SecTeam-Processed: clouddb1017 reports a new database created - https://phabricator.wikimedia.org/T422779#11805238 (10sbassett) [15:32:18] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [15:32:24] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [builds-api, maintain-harbor] fix build/image cleanup - https://phabricator.wikimedia.org/T404157#11805241 (10taavi) [15:32:27] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [15:32:45] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [docs] update all readmes with the same deployment docs - https://phabricator.wikimedia.org/T407477#11805246 (10taavi) [15:32:46] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [components-api] Queue builds when the build queue is full - https://phabricator.wikimedia.org/T402568#11805247 (10taavi) [15:33:03] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [jobs-api] Refactor before webservice support - https://phabricator.wikimedia.org/T359804#11805254 (10taavi) [15:33:05] 06cloud-services-team (FY2025/2026-Q3-Q4), 10Toolforge, 13Patch-For-Review: [builds-api,harbor,image-config] Move pre-built images to harbor - https://phabricator.wikimedia.org/T409727#11805255 (10taavi) [16:01:59] (03approved) 10filippo: istio-gateway: Trust X-Forwarded-Proto from previous proxy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1218 (https://phabricator.wikimedia.org/T422829) (owner: 10taavi) [16:23:35] 06cloud-services-team, 10Cloud-VPS, 10Tool-spacemedia, 10video2commons, 07Upstream: Cloud Services shared IP (static NAT for external communications) often rate limited by YouTube for video downloads - https://phabricator.wikimedia.org/T236446#11805584 (10LWyatt) >>! In T236446#10727517, @bvibber wrote:... [16:41:30] (03update) 10dcaro: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [16:41:56] (03update) 10dcaro: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) (owner: 10raymond-ndibe) [17:04:21] (03update) 10raymond-ndibe: improve image parsing tests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/282 (https://phabricator.wikimedia.org/T415322) [17:11:56] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11805837 (10dcaro) [17:13:08] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11805841 (10dcaro) I'm able to reproduce in lima-kilo with: ` $ toolforge components config generate | toolforge components config create $ toolforge... [17:14:43] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11805854 (10dcaro) The jobs api does not really give any extra info: ` │ INFO: 127.0.0.1:42416 - "PATCH /v1/tool/tf-test2/jobs/ HTTP/1.0" 422 Unproc... [17:31:01] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11805914 (10dcaro) Found something using restish to try to patch the same way components did: ` msg: "Input tag 'path' found using 'health_check_t... [17:38:03] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11805936 (10dcaro) Hmmm... I think that there might be some issue when generating the models from the toolforge openapi spec on components-api, as it ge... [17:42:09] (03open) 10dcaro: models: fixed the right string for the JobsHttpHealtheckCheck [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 [17:43:10] (03update) 10dcaro: models: fixed the right string for the JobsHttpHealtheckCheck [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 [17:46:32] (03update) 10dcaro: models: fixed the right string for the JobsHttpHealtheckCheck [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 [17:50:36] (03update) 10dcaro: models: fixed the right string for the JobsHttpHealtheckCheck [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 [17:50:48] (03update) 10dcaro: models: fixed the right string for the JobsHttpHealtheckCheck [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 [18:09:24] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11806004 (10DamianZaremba) That was changed in https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/commit/1b5c88374b0986e9b2a260b7ed55a6... [18:13:48] 06cloud-services-team, 10Toolforge, 06tools-platform-team: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11806006 (10DamianZaremba) > I was going to try and re-produce this on staging (removing the health check), but it now seems to be stuck waiting for bui... [18:14:24] (03update) 10raymond-ndibe: improve image parsing tests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/282 (https://phabricator.wikimedia.org/T415322) [18:16:10] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud init and unattended upgrades while bootstrapping Trixie VMs - https://phabricator.wikimedia.org/T422509#11806015 (10Andrew) We are attempting to only get the puppet package from the wikimedia repo (this is set by cloud-init at creation time) `... [18:16:19] !log tools.cluebotng-staging Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24206093212 (https://github.com/cluebotng/component-configs/commits/a97bfe791582e24f1c696f1bd89b965ea233c253) [18:16:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [18:17:08] !log tools.cluebotng Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24206093216 (https://github.com/cluebotng/component-configs/commits/a97bfe791582e24f1c696f1bd89b965ea233c253) [18:17:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng/SAL [18:17:30] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24206093268 (https://github.com/cluebotng/component-configs/commits/a97bfe791582e24f1c696f1bd89b965ea233c253) [18:17:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [18:17:55] !log tools.cluebotng-monitoring Deployment failed: https://github.com/cluebotng/component-configs/actions/runs/24206093215 (https://github.com/cluebotng/component-configs/commits/a97bfe791582e24f1c696f1bd89b965ea233c253) [18:17:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [18:18:34] !log tools.cluebotng-editsets Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24206093239 (https://github.com/cluebotng/component-configs/commits/a97bfe791582e24f1c696f1bd89b965ea233c253) [18:18:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-editsets/SAL [18:18:49] !log tools.toolforge-functional-runner Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24206093210 (https://github.com/cluebotng/component-configs/commits/a97bfe791582e24f1c696f1bd89b965ea233c253) [18:18:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.toolforge-functional-runner/SAL [18:19:40] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24206093220 (https://github.com/cluebotng/component-configs/commits/a97bfe791582e24f1c696f1bd89b965ea233c253) [18:19:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [18:26:23] (03update) 10raymond-ndibe: improve image parsing tests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/282 (https://phabricator.wikimedia.org/T415322) [18:26:26] (03update) 10raymond-ndibe: improve image parsing tests [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/282 (https://phabricator.wikimedia.org/T415322) [18:32:13] (03update) 10raymond-ndibe: refactor image parsing and handling [repos/cloud/toolforge/jobs-api] (improve_image_parsing_tests) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/273 (https://phabricator.wikimedia.org/T415322) [18:32:41] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24206712296 (https://github.com/cluebotng/component-configs/commits/e7d5ec988541b9d441a5c565f624b7e88e11204f) [18:32:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [18:36:09] !log tools.cluebotng-monitoring Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24206712245 (https://github.com/cluebotng/component-configs/commits/e7d5ec988541b9d441a5c565f624b7e88e11204f) [18:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-monitoring/SAL [18:39:51] (03open) 10damian: DefinedContinuousJob - Correct health check discriminator [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/285 (https://phabricator.wikimedia.org/T422753) [18:40:38] 06cloud-services-team, 10Toolforge, 06tools-platform-team, 13Patch-For-Review: [components-api] failing deployment 422 from jobs-api - https://phabricator.wikimedia.org/T422753#11806078 (10DamianZaremba) The culprit seems to be https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/blob/main/openapi... [18:44:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [18:47:22] (03update) 10damian: DefinedContinuousJob - Correct health check discriminator [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/285 (https://phabricator.wikimedia.org/T422753) [18:47:29] (03update) 10raymond-ndibe: images.py: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978) [18:52:15] (03update) 10raymond-ndibe: images.py: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978) [18:53:44] (03update) 10damian: DefinedContinuousJob - Correct health check discriminator [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/285 (https://phabricator.wikimedia.org/T422753) [18:56:18] (03update) 10damian: DefinedContinuousJob - Correct health check discriminator [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/285 (https://phabricator.wikimedia.org/T422753) [18:56:25] (03update) 10damian: openapi - fix health check discriminator [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/285 (https://phabricator.wikimedia.org/T422753) [19:01:12] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud init and unattended upgrades while bootstrapping Trixie VMs - https://phabricator.wikimedia.org/T422509#11806157 (10taavi) >>! In T422509#11806015, @Andrew wrote: > We are attempting to only get the puppet package from the wikimedia repo (this is... [19:08:01] (03update) 10raymond-ndibe: values.yaml: add image variant name to aliases [repos/cloud/toolforge/image-config] (replace_job_with_webservice_image_variants) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/21 (https://phabricator.wikimedia.org/T415322) [19:13:31] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24208407251 (https://github.com/cluebotng/component-configs/commits/6cd680dd209bf7fbb01cf24cb6cca82f0fab716d) [19:13:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [19:19:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [19:38:55] (03update) 10raymond-ndibe: values.yaml: add image variant name to aliases [repos/cloud/toolforge/image-config] (replace_job_with_webservice_image_variants) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/21 (https://phabricator.wikimedia.org/T415322) [19:51:16] (03open) 10raymond-ndibe: images.py: add tests for image variant matching [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/286 (https://phabricator.wikimedia.org/T415322) [19:54:13] (03update) 10raymond-ndibe: images.py: add tests for image variant matching [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/286 (https://phabricator.wikimedia.org/T415322) [19:54:51] (03update) 10raymond-ndibe: images.py: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978) [19:55:15] (03update) 10raymond-ndibe: images.py: match variants of the same image [repos/cloud/toolforge/jobs-api] (refactor_image_handling) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/284 (https://phabricator.wikimedia.org/T414978 https://phabricator.wikimedia.org/T415322) [20:30:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [20:44:14] !log tools.cluebotng-review Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/24212235180 (https://github.com/cluebotng/component-configs/commits/1cd21afab7312bd0122c0e735f8f4dca03019011) [20:44:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-review/SAL [21:55:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [21:56:25] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [22:04:40] (03update) 10samwilson: Check temp file before adding picture to Zip file [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/8 (https://phabricator.wikimedia.org/T415479) [22:08:32] (03update) 10samwilson: Draft: Make header logo and title into a link [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/5 [22:09:08] (03open) 10samwilson: Draft: Remove phpunit-bridge [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/9 [22:41:10] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [23:31:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [23:41:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of memory - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity