[00:07:00] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [00:12:24] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q4:rack/setup/install cloudcephosd200[567] - https://phabricator.wikimedia.org/T393614#10866283 (10Jhancock.wm) a:03Jhancock.wm [00:15:17] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [00:34:18] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [00:43:36] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [00:55:35] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [01:44:33] 06cloud-services-team, 10Cloud-VPS: Upgrade cloud-vps openstack to version 'Epoxy' - https://phabricator.wikimedia.org/T390914#10866398 (10Andrew) 05Stalled→03Resolved [02:07:39] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [02:09:32] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [02:27:50] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [02:42:33] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [03:15:18] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [03:35:08] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [04:06:06] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [04:19:09] 10Tools, 06Wikimedia-Medicine: Bring back `mdwiki` tool from the demise of GridEngine (transition to Toolforge Kubernetes) - https://phabricator.wikimedia.org/T319887#10866475 (10Mr.Ibrahem) 05Open→03Resolved a:03Mr.Ibrahem [04:26:17] (03update) 10raymond-ndibe: [deploy] add force-build and force-run query params [repos/cloud/toolforge/components-api] (skip_build_if_refs_are_same) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/80 (https://phabricator.wikimedia.org/T389044) [05:02:14] 10Toolforge (Toolforge iteration 20): [components-api] "No such image" error when running with prebuilt image - https://phabricator.wikimedia.org/T395533 (10Raymond_Ndibe) 03NEW [05:02:29] 10Toolforge (Toolforge iteration 20): [components-api] "No such image" error when running with prebuilt image - https://phabricator.wikimedia.org/T395533#10866503 (10Raymond_Ndibe) p:05Triage→03High a:03Raymond_Ndibe [05:03:49] 10Toolforge (Toolforge iteration 20): [components-api] "No such image" error when running with prebuilt image - https://phabricator.wikimedia.org/T395533#10866505 (10Raymond_Ndibe) additional steps to fix this should probably involve some tests in `toolforge-deploy` to check specifically for this kind of issue a... [05:06:22] (03open) 10raymond-ndibe: [components.deploy_task] fix do_run image_name bug [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/81 (https://phabricator.wikimedia.org/T395533) [05:25:47] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:28:30] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:30:37] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:34:36] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:37:02] (03update) 10raymond-ndibe: [components.deploy_task] fix do_run image_name bug [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/81 (https://phabricator.wikimedia.org/T395533) [05:40:21] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:41:32] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:44:17] (03open) 10chuckonwumelu: [builds-api] Return image_name when retrieving build info [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/135 (https://phabricator.wikimedia.org/T395035) [05:45:14] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:45:38] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:48:59] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api] "No such image" error when running with prebuilt image - https://phabricator.wikimedia.org/T395533#10866550 (10Raymond_Ndibe) 05Open→03In progress [05:54:28] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [05:54:42] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [05:56:31] (03update) 10raymond-ndibe: [deploy] skip build if refs are same [repos/cloud/toolforge/components-api] (fix_do_run_image_name_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [05:58:31] (03update) 10raymond-ndibe: [deploy] skip build if refs are same [repos/cloud/toolforge/components-api] (fix_do_run_image_name_bug) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [06:01:07] (03update) 10raymond-ndibe: [deploy] add force-build and force-run query params [repos/cloud/toolforge/components-api] (skip_build_if_refs_are_same) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/80 (https://phabricator.wikimedia.org/T389044) [06:05:54] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [06:08:49] (03update) 10raymond-ndibe: [builds-api] dummy PR to force-create a new release [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/134 [06:09:17] (03close) 10raymond-ndibe: [builds-api] dummy PR to force-create a new release [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/134 [06:13:26] FIRING: SystemdUnitDown: The service unit prometheus-node-textfile-wmcs-bastionless.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [06:18:15] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [06:31:34] supertassu opened https://github.com/toolforge/quarry/pull/83 [06:49:22] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [06:50:02] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [06:59:20] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [07:13:19] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [07:24:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-16 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:34:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-16 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [07:40:27] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations, 13Patch-For-Review: Move Striker to Bitu username validation API - https://phabricator.wikimedia.org/T364605#10866628 (10taavi) [07:40:35] 06cloud-services-team, 10Striker, 13Patch-For-Review: Add Bitu container to Striker development environment - https://phabricator.wikimedia.org/T362318#10866629 (10taavi) [08:04:09] (03update) 10raymond-ndibe: [deploy] add force-build and force-run query params [repos/cloud/toolforge/components-api] (skip_build_if_refs_are_same) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/80 (https://phabricator.wikimedia.org/T389044) [08:09:58] (03update) 10raymond-ndibe: [components-smoke-test] reduce testcase dep on each other [repos/cloud/toolforge/toolforge-deploy] (run_specific_tests_on_deploy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/798 (https://phabricator.wikimedia.org/T389044 https://phabricator.wikimedia.org/T395533) [08:11:56] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [08:12:04] (03update) 10raymond-ndibe: Draft: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [08:12:12] (03update) 10raymond-ndibe: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [08:19:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudgw1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [08:21:44] (03PS11) 10Majavah: Add Bitu container [labs/striker] - 10https://gerrit.wikimedia.org/r/1035718 (https://phabricator.wikimedia.org/T362318) (owner: 10Slyngshede) [08:23:19] (03CR) 10Majavah: [C:03+1] "Thanks, and sorry it's taken me this long to get to this. I've tested this locally and it works fine, sans a minor documentation issue inl" [labs/striker] - 10https://gerrit.wikimedia.org/r/1035718 (https://phabricator.wikimedia.org/T362318) (owner: 10Slyngshede) [08:28:24] (03update) 10raymond-ndibe: [deploy] add force-build and force-run query params [repos/cloud/toolforge/components-api] (skip_build_if_refs_are_same) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/80 (https://phabricator.wikimedia.org/T389044) [08:29:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on cloudgw1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [08:34:56] supertassu closed https://github.com/toolforge/quarry/pull/83 [08:38:24] 06cloud-services-team, 10Striker: 500 error on /tools/id/[tool name] - https://phabricator.wikimedia.org/T395541 (10Alien333) 03NEW [08:43:00] 06cloud-services-team, 10Cloud-VPS: Keystone in Epoxy does not support shell names with underscores - https://phabricator.wikimedia.org/T395542 (10taavi) 03NEW [08:43:01] 06cloud-services-team, 10Cloud-VPS: Keystone in Epoxy does not support shell names with underscores - https://phabricator.wikimedia.org/T395542#10866721 (10taavi) p:05Triage→03High [08:50:24] 06cloud-services-team, 10Cloud-VPS: Keystone in Epoxy does not support shell names with underscores - https://phabricator.wikimedia.org/T395542#10866723 (10taavi) [08:55:27] (03update) 10raymond-ndibe: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [10:12:12] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Platform-SRE (2025.05.24 - 2025.06.13), 13Patch-For-Review: PuppetConstantChange on clouddumps100[12] - https://phabricator.wikimedia.org/T394921#10866894 (10BTullis) 05Resolved→03Open I'm reopening this, as I didn't actually apply a fix yet. The Airflow... [10:59:07] 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10867025 (10jcrespo) db2186 needs some extra due to care: FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter@s3.service on db2186:9100 - https://wikitech.wikimedia.org/wiki/Mon... [11:01:44] 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10867027 (10Marostegui) >>! In T394884#10867025, @jcrespo wrote: > db2186 needs some extra due to care: FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter@s3.service on db2186:... [11:13:36] (03CR) 10NkwadaNora: [C:03+1] added endpoints.ts [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1151745 (owner: 10Martindevelops) [11:16:32] (03CR) 10NkwadaNora: [C:03+1] "all changes are good by me" [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1151745 (owner: 10Martindevelops) [11:21:02] (03CR) 10NkwadaNora: [C:03+1] "all good" [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1151745 (owner: 10Martindevelops) [11:34:04] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Platform-SRE (2025.05.24 - 2025.06.13): PuppetConstantChange on clouddumps100[12] - https://phabricator.wikimedia.org/T394921#10867122 (10BTullis) 05Open→03Resolved OK, I think that this should be fixed now. Apologies for the noise and delay. [11:38:18] 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10867134 (10jcrespo) I believe I fixed db2186, but know that it will require the same kind of cleanup (specially of systemd and /etc/mysql/mysql.d/) for the other hosts, too. [11:38:33] 10Data-Services, 06Data-Persistence, 10Data-Platform, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): an-redacteddb1001: upgrade MariaDB to 10.11 - https://phabricator.wikimedia.org/T394930#10867136 (10BTullis) p:05Triage→03Medium [11:41:03] 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10867139 (10Marostegui) Thank you. db2187 will needed it too, so I am thinking it is just easier/cleaner to reimage keeping the data [11:47:17] 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10867162 (10jcrespo) Up to you, I can fix other hosts quickly, it is just a few commands. [11:47:53] 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10867178 (10Marostegui) >>! In T394884#10867162, @jcrespo wrote: > Up to you, I can fix other hosts quickly, it is just a few commands. Go for it if you have time! Thanks! [11:51:01] 10Data-Services, 06Data-Persistence, 10Data-Platform, 10Data-Platform-SRE (2025.05.24 - 2025.06.13), 13Patch-For-Review: an-redacteddb1001: upgrade MariaDB to 10.11 - https://phabricator.wikimedia.org/T394930#10867186 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=0f45b8f1-0c2a-4554-... [11:57:26] 10Data-Services, 06Data-Persistence, 10Data-Platform, 10Data-Platform-SRE (2025.05.24 - 2025.06.13), 13Patch-For-Review: an-redacteddb1001: upgrade MariaDB to 10.11 - https://phabricator.wikimedia.org/T394930#10867194 (10BTullis) All replication threads stopped. All mariadb services stopped. ` btullis@an... [12:01:38] 06cloud-services-team, 10Cloud-VPS: Keystone in Epoxy does not support shell names with underscores - https://phabricator.wikimedia.org/T395542#10867203 (10taavi) a:03Andrew [12:07:59] 10Data-Services, 06Data-Persistence, 10Data-Platform, 10Data-Platform-SRE (2025.05.24 - 2025.06.13), 13Patch-For-Review: an-redacteddb1001: upgrade MariaDB to 10.11 - https://phabricator.wikimedia.org/T394930#10867235 (10BTullis) 05Open→03Resolved Puppet ran cleanly. ` btullis@an-redacteddb1001:~... [12:11:06] 06cloud-services-team, 10Cloud-VPS: Keystone in Epoxy does not support shell names with underscores - https://phabricator.wikimedia.org/T395542#10867243 (10Andrew) Filed upstream bug https://bugs.launchpad.net/keystone/+bug/2112112 [12:11:36] 06cloud-services-team, 10Data-Services, 06Data-Persistence, 06Security-Team: Add "wikishared" database to wiki replicas - https://phabricator.wikimedia.org/T395072#10867246 (10Marostegui) [12:11:41] 06cloud-services-team, 10Cloud-VPS: Keystone in Epoxy does not support shell names with underscores - https://phabricator.wikimedia.org/T395542#10867247 (10Andrew) Looks like they attempted to fix this with https://bugs.launchpad.net/keystone/+bug/2104185 [12:57:36] 10Tool-translatetagger, 10Capacity Exchange, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Translation export failure for capacity-exchange - https://phabricator.wikimedia.org/T395564 (10Nikerabbit) 03NEW [12:59:08] 06cloud-services-team, 10Cloud-VPS: Keystone in Epoxy does not support shell names with underscores - https://phabricator.wikimedia.org/T395542#10867444 (10Andrew) >>! In T395542#10867247, @Andrew wrote: > Looks like they attempted to fix this with https://bugs.launchpad.net/keystone/+bug/2104185 nope! That's... [14:06:45] 14cloud-services-team (Kanban), 06DBA, 13Patch-For-Review, 07Puppet: labtestpuppetmaster2001 is failing to backup - https://phabricator.wikimedia.org/T256846#10867604 (10jcrespo) @Andrew should we revert now https://gerrit.wikimedia.org/r/c/operations/puppet/+/612167/6/modules/profile/files/backup/job_... [14:33:47] (03CR) 10Krinkle: Build: Update build system (031 comment) [labs/countervandalism/CVNBot] - 10https://gerrit.wikimedia.org/r/1143806 (owner: 10Slyngshede) [14:34:47] (03CR) 10Krinkle: Build: Update build system (031 comment) [labs/countervandalism/CVNBot] - 10https://gerrit.wikimedia.org/r/1143806 (owner: 10Slyngshede) [15:04:57] 10Toolforge (Toolforge iteration 20), 07good first task, 13Patch-For-Review: [builds-api] populate the `image_name` for the builds returned - https://phabricator.wikimedia.org/T395035#10867874 (10Chuckonwumelu) 05Open→03In progress [15:23:20] (03CR) 10Krinkle: [V:03+2 C:03+2] setup: Remove redundant ca-certificates-mono workaround [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1150753 (https://phabricator.wikimedia.org/T395164) (owner: 10Krinkle) [15:30:45] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Keystone in Epoxy does not support shell names with underscores - https://phabricator.wikimedia.org/T395542#10867986 (10Andrew) 05Open→03Resolved [15:31:13] 06cloud-services-team, 10Striker: 500 error on /tools/id/[tool name] - https://phabricator.wikimedia.org/T395541#10867992 (10Andrew) 05Open→03Resolved a:03Andrew Fixed, I think. [16:51:42] (03merge) 10wikibayer: Edit translations.json [toolforge-repos/GlobalUserInfo] - 10https://gitlab.wikimedia.org/toolforge-repos/GlobalUserInfo/-/merge_requests/1 (owner: 10eihel) [16:53:29] (03update) 10chuckonwumelu: [builds-api] Return image_name when retrieving build info [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/135 (https://phabricator.wikimedia.org/T395035) [16:58:34] FIRING: DiskSpace: Disk space cloudcontrol2004-dev:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol2004-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [17:08:34] RESOLVED: DiskSpace: Disk space cloudcontrol2004-dev:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol2004-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [17:10:32] (03update) 10chuckonwumelu: [builds-api] Return image_name when retrieving build info [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/135 (https://phabricator.wikimedia.org/T395035) [17:20:49] (03update) 10chuckonwumelu: [builds-api] Return image_name when retrieving build info [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/135 (https://phabricator.wikimedia.org/T395035) [17:31:55] (03update) 10chuckonwumelu: [api] Adding warning message for beta [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/78 (https://phabricator.wikimedia.org/T394277) [17:50:35] (03update) 10chuckonwumelu: [build] Return image_name when retrieving build info [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/135 (https://phabricator.wikimedia.org/T395035) [18:00:08] (03PS1) 10NkwadaNora: rearrange the location of some files [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1152117 [18:09:47] (03CR) 10Martindevelops: [C:03+1] added endpoints.ts [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1151745 (owner: 10Martindevelops) [21:13:39] 10Wikibugs, 10Phabricator: Wikibugs reports color of milestones wrong - https://phabricator.wikimedia.org/T395250#10869316 (10bd808) The current wikibugs logic [[https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/blob/afbea805fbb6f7794e7cd4b308355fdb9f4d99ef/src/wikibugs2/messagebuilder.py#L194-207|maps... [21:20:20] 10Wikibugs: wikibugs is not outputting anything on IRC.. - https://phabricator.wikimedia.org/T395217#10869353 (10bd808) `ERROR: Failed to connect to http://wikibugs:8000/api/event/stream` sounds like the `webservice` was down as well. The architecture here is somewhat complex: https://www.mediawiki.org/wiki/... [21:41:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-11 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:09:34] FIRING: DiskSpace: Disk space cloudcontrol2004-dev:9100:/ 0.4768% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol2004-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [22:19:34] RESOLVED: DiskSpace: Disk space cloudcontrol2004-dev:9100:/ 1.493% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol2004-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [22:51:51] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:01:34] FIRING: DiskSpace: Disk space cloudcontrol2004-dev:9100:/ 3.04% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol2004-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [23:01:57] 10Cloud-Services: Is it a bug to have a hostname in profile::resolving::nameservers? - https://phabricator.wikimedia.org/T395633 (10dancy) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with... [23:04:01] 06cloud-services-team: Is it a bug to have a hostname in profile::resolving::nameservers? - https://phabricator.wikimedia.org/T395633#10869681 (10dancy) [23:05:49] (03update) 10raymond-ndibe: [components-smoke-test] add components-api conditional build and run tests [repos/cloud/toolforge/toolforge-deploy] (refactor_components_smoke_test) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/797 (https://phabricator.wikimedia.org/T389044) [23:06:34] RESOLVED: DiskSpace: Disk space cloudcontrol2004-dev:9100:/ 3.026% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol2004-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [23:36:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-11 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:36:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-11 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:41:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-11 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:42:33] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-11 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:49:34] FIRING: DiskSpace: Disk space cloudcontrol2004-dev:9100:/ 3.572% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol2004-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [23:59:34] RESOLVED: DiskSpace: Disk space cloudcontrol2004-dev:9100:/ 3.553% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudcontrol2004-dev - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace