[00:05:37] (03open) 10bd808: Upgrade to ZNC 1.9.0 [toolforge-repos/containers-bnc] - 10https://gitlab.wikimedia.org/toolforge-repos/containers-bnc/-/merge_requests/1 (https://phabricator.wikimedia.org/T380108) [00:11:00] (03update) 10bd808: Upgrade to ZNC 1.9.0 [toolforge-repos/containers-bnc] - 10https://gitlab.wikimedia.org/toolforge-repos/containers-bnc/-/merge_requests/1 (https://phabricator.wikimedia.org/T380108) [00:45:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-test-k8s-ingress-9 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [00:50:28] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-test-k8s-ingress-9 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:05:28] RESOLVED: [3x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-test-k8s-ingress-9 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [01:19:16] (03update) 10raymond-ndibe: crds: use the api name in the crd name [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/68 (https://phabricator.wikimedia.org/T386829) (owner: 10dcaro) [01:37:46] (03update) 10raymond-ndibe: start: resolve the commit hash to build on start [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/129 (owner: 10dcaro) [01:38:08] (03update) 10raymond-ndibe: start: resolve the commit hash to build on start [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/129 (owner: 10dcaro) [01:40:40] (03approved) 10raymond-ndibe: pipeline: add unresolved source reference parameter [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/71 (https://phabricator.wikimedia.org/T389043) (owner: 10dcaro) [01:54:15] (03update) 10raymond-ndibe: crds: use the api name in the crd name [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/68 (https://phabricator.wikimedia.org/T386829) (owner: 10dcaro) [01:54:17] (03approved) 10raymond-ndibe: crds: use the api name in the crd name [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/68 (https://phabricator.wikimedia.org/T386829) (owner: 10dcaro) [02:02:01] (03update) 10ttaylor: Draft: Refactored project structure to add Python API to relay events [toolforge-repos/listen-to-wiki-changes] - 10https://gitlab.wikimedia.org/toolforge-repos/listen-to-wiki-changes/-/merge_requests/1 [02:26:51] (03update) 10ttaylor: Draft: Refactored project structure to add Python API to relay events [toolforge-repos/listen-to-wiki-changes] - 10https://gitlab.wikimedia.org/toolforge-repos/listen-to-wiki-changes/-/merge_requests/1 [02:27:41] (03update) 10ttaylor: Refactored project structure to add Python API to relay events + lots of front end improvements [toolforge-repos/listen-to-wiki-changes] - 10https://gitlab.wikimedia.org/toolforge-repos/listen-to-wiki-changes/-/merge_requests/1 [02:27:48] (03merge) 10ttaylor: Refactored project structure to add Python API to relay events + lots of front end improvements [toolforge-repos/listen-to-wiki-changes] - 10https://gitlab.wikimedia.org/toolforge-repos/listen-to-wiki-changes/-/merge_requests/1 [02:43:31] (03update) 10raymond-ndibe: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) [02:45:33] (03update) 10raymond-ndibe: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) [03:00:16] (03update) 10raymond-ndibe: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) [03:06:30] (03update) 10raymond-ndibe: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) [03:08:59] (03update) 10raymond-ndibe: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) [07:01:57] 06cloud-services-team, 10Toolforge: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732#10820291 (10Fnielsen) I have got a problem again: ` $ ssh toolforge Connection closed by 185.15.56.62 port 22 `` [07:30:26] (03CR) 10Majavah: [C:03+2] Upgrade to Django 3.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1009594 (https://phabricator.wikimedia.org/T359217) (owner: 10Majavah) [07:30:30] (03CR) 10Majavah: [C:03+2] Replace ugettext_lazy with gettext_lazy [labs/striker] - 10https://gerrit.wikimedia.org/r/1145211 (owner: 10Majavah) [07:32:58] (03Merged) 10jenkins-bot: Upgrade to Django 3.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1009594 (https://phabricator.wikimedia.org/T359217) (owner: 10Majavah) [07:33:21] (03Merged) 10jenkins-bot: Replace ugettext_lazy with gettext_lazy [labs/striker] - 10https://gerrit.wikimedia.org/r/1145211 (owner: 10Majavah) [07:40:15] 10Striker, 13Patch-For-Review: django-ratelimit-backend is not compatible with Django 3.x - https://phabricator.wikimedia.org/T359559#10820403 (10taavi) 05Open→03Resolved a:03taavi This is worked around for now by pinning a commit hash from https://github.com/supertassu/django-ratelimit-backend until... [07:41:18] (03approved) 10dcaro: crds: use the api name in the crd name [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/68 (https://phabricator.wikimedia.org/T386829) [07:41:22] (03merge) 10dcaro: crds: use the api name in the crd name [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/68 (https://phabricator.wikimedia.org/T386829) [07:43:51] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.106-20250514074135-c6b123d6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/780 (https://phabricator.wikimedia.org/T386829) [07:49:08] (03update) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/65 (owner: 10ghost) [07:49:23] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [07:55:53] (03merge) 10dcaro: pipeline: add unresolved source reference parameter [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/71 (https://phabricator.wikimedia.org/T389043) [07:57:22] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: builds-builder: bump to 0.0.131-20250514075559-ac5a6006 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/781 (https://phabricator.wikimedia.org/T389043) [08:00:08] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [08:05:53] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer [08:18:37] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer [08:38:49] 10Striker: Update Django version used in Striker - https://phabricator.wikimedia.org/T359217#10820613 (10taavi) a:03taavi [08:42:48] 10Toolforge (Toolforge iteration 19): [components-api] add tool config version check - https://phabricator.wikimedia.org/T394273 (10dcaro) 03NEW [08:44:01] 10Toolforge (Toolforge iteration 19): [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10820632 (10dcaro) [08:44:04] 10Toolforge (Toolforge iteration 19), 13Patch-For-Review: [builds-api] Store the commit hash that was used for the build - https://phabricator.wikimedia.org/T389043#10820633 (10dcaro) [08:45:23] 10Toolforge (Toolforge iteration 19): [components-api] add tool config version check - https://phabricator.wikimedia.org/T394273#10820652 (10dcaro) p:05Triage→03High [08:46:58] 10Toolforge (Toolforge iteration 19): [components-api] Add alerts and runbooks for basic service health - https://phabricator.wikimedia.org/T394275 (10dcaro) 03NEW [08:47:04] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [08:47:09] (03approved) 10dcaro: jobs-emailer: bump to 0.0.57-20250512162230-f6958e24 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/777 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [08:47:13] (03update) 10dcaro: jobs-emailer: bump to 0.0.57-20250512162230-f6958e24 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/777 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [08:47:31] (03merge) 10dcaro: jobs-emailer: bump to 0.0.57-20250512162230-f6958e24 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/777 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [08:54:37] (03PS1) 10Majavah: Replace remaining deprecated u* translation methods [labs/striker] - 10https://gerrit.wikimedia.org/r/1145820 [08:54:37] (03PS1) 10Majavah: Upgrade to Django 4.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1145821 [08:54:37] (03PS1) 10Majavah: build: Remove unused direct dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145822 [08:54:37] (03PS1) 10Majavah: Upgrade non-Django dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145823 [08:54:40] 10Toolforge (Toolforge iteration 19): [components-api] Add alerts and runbooks for basic service health - https://phabricator.wikimedia.org/T394275#10820697 (10dcaro) p:05Triage→03High [08:56:05] 10Toolforge (Toolforge iteration 19): [components-api] Add basic stats - https://phabricator.wikimedia.org/T394276 (10dcaro) 03NEW [08:56:28] (03CR) 10CI reject: [V:04-1] Upgrade to Django 4.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1145821 (owner: 10Majavah) [08:56:30] (03CR) 10CI reject: [V:04-1] Upgrade non-Django dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145823 (owner: 10Majavah) [08:56:34] 10Toolforge (Toolforge iteration 19): [components-api] Add basic stats - https://phabricator.wikimedia.org/T394276#10820725 (10dcaro) p:05Triage→03High [08:57:36] (03CR) 10CI reject: [V:04-1] build: Remove unused direct dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145822 (owner: 10Majavah) [08:57:50] 10Toolforge (Toolforge iteration 19): [components-cli] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277 (10dcaro) 03NEW [08:57:51] 10Toolforge (Toolforge iteration 19): [components-cli] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10820744 (10dcaro) p:05Triage→03High [08:58:20] 10Striker: django.core.cache.backends.memcached.MemcachedCache is removed in Django 4.1 - https://phabricator.wikimedia.org/T394278 (10taavi) 03NEW [08:58:30] 06cloud-services-team, 10Toolforge: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732#10820757 (10Magnus) @Fnielsen same here [08:59:14] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [08:59:23] 10Toolforge (Toolforge iteration 19): [components-api,components-cli] add user docs - https://phabricator.wikimedia.org/T394279 (10dcaro) 03NEW [08:59:47] 10Toolforge (Toolforge iteration 19): [components-api,components-cli] add user docs - https://phabricator.wikimedia.org/T394279#10820776 (10dcaro) p:05Triage→03High [09:01:13] 10Toolforge (Toolforge iteration 19): [components-api] Add admin page - https://phabricator.wikimedia.org/T394280 (10dcaro) 03NEW [09:01:14] 10Toolforge (Toolforge iteration 19): [components-api] Add admin page - https://phabricator.wikimedia.org/T394280#10820788 (10dcaro) p:05Triage→03High [09:02:32] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge, 05Cloud-Services-Origin-Team, and 3 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10820791 (10dcaro) [09:03:28] 10Toolforge (Toolforge iteration 19), 07Documentation: [components-api] Add admin documentation page - https://phabricator.wikimedia.org/T394280#10820802 (10taavi) [09:03:31] 10Toolforge (Toolforge iteration 19), 07Documentation: [components-api] Add admin documentation page - https://phabricator.wikimedia.org/T394280#10820803 (10taavi) [09:03:34] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge, 05Cloud-Services-Origin-Team, and 3 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10820804 (10dcaro) [09:03:36] 10Toolforge (Toolforge iteration 19): [components-api,components-cli] add user docs - https://phabricator.wikimedia.org/T394279#10820806 (10dcaro) [09:03:37] 10Toolforge (Toolforge iteration 19), 07Documentation: [components-api] Add admin documentation page - https://phabricator.wikimedia.org/T394280#10820805 (10dcaro) [09:03:39] 10Toolforge (Toolforge iteration 19): [components-api] Add basic stats - https://phabricator.wikimedia.org/T394276#10820807 (10dcaro) [09:03:39] 10Toolforge (Toolforge iteration 19): [components-api] add tool config version check - https://phabricator.wikimedia.org/T394273#10820809 (10dcaro) [09:03:40] 10Toolforge (Toolforge iteration 19): [components-api] Add alerts and runbooks for basic service health - https://phabricator.wikimedia.org/T394275#10820808 (10dcaro) [09:03:42] 10Toolforge (Toolforge iteration 19): [components-api] Add support for port/helathcheck for continuous jobs in tool config/depolyment - https://phabricator.wikimedia.org/T362072#10820810 (10dcaro) [09:04:26] 10Toolforge (Toolforge iteration 19), 07Documentation: [components-api,components-cli] add user documentation page - https://phabricator.wikimedia.org/T394279#10820811 (10dcaro) [09:05:07] (03PS2) 10Majavah: Upgrade to Django 4.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1145821 (https://phabricator.wikimedia.org/T359217) [09:05:10] (03PS2) 10Majavah: build: Remove unused direct dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145822 [09:05:10] (03PS2) 10Majavah: Upgrade non-Django dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145823 [09:05:10] (03PS1) 10Majavah: Swap Memcached driver [labs/striker] - 10https://gerrit.wikimedia.org/r/1145829 (https://phabricator.wikimedia.org/T394278) [09:06:48] (03CR) 10CI reject: [V:04-1] Upgrade to Django 4.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1145821 (https://phabricator.wikimedia.org/T359217) (owner: 10Majavah) [09:07:59] (03CR) 10CI reject: [V:04-1] Swap Memcached driver [labs/striker] - 10https://gerrit.wikimedia.org/r/1145829 (https://phabricator.wikimedia.org/T394278) (owner: 10Majavah) [09:08:06] (03CR) 10CI reject: [V:04-1] build: Remove unused direct dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145822 (owner: 10Majavah) [09:08:08] (03CR) 10CI reject: [V:04-1] Upgrade non-Django dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145823 (owner: 10Majavah) [09:15:06] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [09:16:35] (03PS2) 10Majavah: Swap Memcached driver [labs/striker] - 10https://gerrit.wikimedia.org/r/1145829 (https://phabricator.wikimedia.org/T394278) [09:16:35] (03PS3) 10Majavah: Upgrade to Django 4.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1145821 (https://phabricator.wikimedia.org/T359217) [09:16:35] (03PS3) 10Majavah: build: Remove unused direct dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145822 [09:16:36] (03PS3) 10Majavah: Upgrade non-Django dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145823 [09:16:37] (03PS1) 10Majavah: build: Make flake8 ignore node_modules [labs/striker] - 10https://gerrit.wikimedia.org/r/1145832 [09:25:53] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [09:27:36] 06cloud-services-team, 10Toolforge (Toolforge iteration 19): [builds-builder] Golang buildpack does not allow using Procfiles so can't use custom scripts/entrypoints - https://phabricator.wikimedia.org/T390845#10820842 (10dcaro) Tested also that it works now with golang 1.24: ` local.tf-test@toolslocal:~$ too... [09:28:37] (03approved) 10dcaro: components-api: bump to 0.0.106-20250514074135-c6b123d6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/780 (https://phabricator.wikimedia.org/T386829) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:28:41] (03update) 10dcaro: components-api: bump to 0.0.106-20250514074135-c6b123d6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/780 (https://phabricator.wikimedia.org/T386829) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:29:02] (03merge) 10dcaro: components-api: bump to 0.0.106-20250514074135-c6b123d6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/780 (https://phabricator.wikimedia.org/T386829) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:29:02] (03update) 10dcaro: builds-builder: bump to 0.0.131-20250514075559-ac5a6006 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/781 (https://phabricator.wikimedia.org/T389043) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:29:03] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-builder [09:40:27] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder [09:47:14] 10Cloud Services Proposals, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10820913 (10fnegri) >> Could it be a generic LDAP adapter, with some minimal logic to restrict the damage you can do through its API? > This would increase the security issu... [09:53:23] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-builder [10:05:12] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder [10:28:52] (03approved) 10dcaro: builds-builder: bump to 0.0.131-20250514075559-ac5a6006 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/781 (https://phabricator.wikimedia.org/T389043) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [10:28:54] (03update) 10dcaro: builds-builder: bump to 0.0.131-20250514075559-ac5a6006 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/781 (https://phabricator.wikimedia.org/T389043) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [10:29:10] (03merge) 10dcaro: builds-builder: bump to 0.0.131-20250514075559-ac5a6006 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/781 (https://phabricator.wikimedia.org/T389043) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [10:29:21] 10Toolforge (Toolforge iteration 19): [components-api] Rename the CRDs groups to be `components-api.toolforge.org` - https://phabricator.wikimedia.org/T386829#10821048 (10dcaro) 05In progress→03Resolved [10:35:56] (03update) 10dcaro: start: resolve the commit hash to build on start [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/129 [11:19:00] (03CR) 10Majavah: [C:03+2] Replace remaining deprecated u* translation methods [labs/striker] - 10https://gerrit.wikimedia.org/r/1145820 (owner: 10Majavah) [11:19:10] (03CR) 10Majavah: [C:03+2] build: Make flake8 ignore node_modules [labs/striker] - 10https://gerrit.wikimedia.org/r/1145832 (owner: 10Majavah) [11:20:24] (03Merged) 10jenkins-bot: Replace remaining deprecated u* translation methods [labs/striker] - 10https://gerrit.wikimedia.org/r/1145820 (owner: 10Majavah) [11:21:27] (03Merged) 10jenkins-bot: build: Make flake8 ignore node_modules [labs/striker] - 10https://gerrit.wikimedia.org/r/1145832 (owner: 10Majavah) [11:57:56] (03update) 10dcaro: start: resolve the commit hash to build on start [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/129 [12:00:57] FIRING: PuppetCertificateAboutToExpire: Puppet CA certificate Puppet CA: toolsbeta-puppetmaster-04.toolsbeta.eqiad.wmflabs is about to expire in 26d 23h 56m 43s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [12:08:37] (03approved) 10dcaro: start: resolve the commit hash to build on start [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/129 [12:08:40] (03merge) 10dcaro: start: resolve the commit hash to build on start [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/129 [12:15:06] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: builds-api: bump to 0.0.191-20250514120852-fff150a3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/782 (https://phabricator.wikimedia.org/T389043) [12:28:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:30:33] 10Toolforge (Toolforge iteration 19): [builds-cli] add resolved reference when showing a build - https://phabricator.wikimedia.org/T394300 (10dcaro) 03NEW [12:32:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:33:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:35:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:37:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:38:42] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-05-09 - https://phabricator.wikimedia.org/T393766#10821573 (10fnegri) The relevant value is not `row locks` but `undo log entries`, which is currently at `1501638`, so very clos... [12:44:46] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component builds-api [12:45:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:52:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:54:28] FIRING: TargetDown: Job toolsdb-mariadb is unreachable in project tools instance tools-db-5 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [12:55:57] 10Toolforge (Toolforge iteration 19): [components-cli] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10821616 (10Chuckonwumelu) a:03Chuckonwumelu [12:56:42] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [12:57:07] 06cloud-services-team, 10Toolforge, 07Epic: toolforge: introduce additional IaC automation - https://phabricator.wikimedia.org/T390056#10821632 (10Chuckonwumelu) 05Open→03Resolved [12:57:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project quarry - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:57:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:57:53] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-05-09 - https://phabricator.wikimedia.org/T393766#10821633 (10fnegri) `STOP SLAVE;` did not work, it just hangs waiting for the transaction to complete. Killing the query doesn'... [13:00:10] (03open) 10arthurtaylor: Add support for build completion notification [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/10 (https://phabricator.wikimedia.org/T392892) [13:00:58] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:02:43] (03update) 10arthurtaylor: Add support for build completion notification [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/10 (https://phabricator.wikimedia.org/T392892) [13:06:23] (03update) 10arthurtaylor: Add support for build completion notification [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/10 (https://phabricator.wikimedia.org/T392892) [13:08:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:08:58] (03update) 10arthurtaylor: Add support for build completion notification [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/10 (https://phabricator.wikimedia.org/T392892) [13:08:58] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:08:58] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:12:58] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:13:36] 06cloud-services-team, 10Toolforge: ToolsDB: discard obsolete GTID domains - https://phabricator.wikimedia.org/T334947#10821716 (10fnegri) After some testing in a local dev environment, I am quite confident this can be done safely in ToolsDB. I started by clearing up `gtid_slave_pos` in the current primary (`... [13:13:48] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: ToolsDB: discard obsolete GTID domains - https://phabricator.wikimedia.org/T334947#10821718 (10fnegri) a:03fnegri [13:13:58] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [13:18:32] 06cloud-services-team, 10Cloud-VPS: Make Puppet able to reliabily restart sssd - https://phabricator.wikimedia.org/T394304 (10taavi) 03NEW [13:20:50] 06cloud-services-team, 10Cloud-VPS: Understand Octavia network needs - https://phabricator.wikimedia.org/T394099#10821783 (10Andrew) p:05Triage→03Medium a:05aborrero→03None [13:21:05] 06cloud-services-team, 10Cloud-VPS: OpenTofu vs. radosgw in codfw1dev - https://phabricator.wikimedia.org/T394061#10821785 (10Andrew) p:05Triage→03Medium [13:28:14] 06cloud-services-team, 10Cloud-VPS: Make Puppet able to reliabily restart sssd - https://phabricator.wikimedia.org/T394304#10821821 (10Andrew) p:05Triage→03Low [13:30:06] 10Cloud Services Proposals, 06cloud-services-team, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10821822 (10taavi) p:05Triage→03Medium [13:49:34] (03update) 10dcaro: jobs: continuous: set strategy based on number of replicas [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/124 (https://phabricator.wikimedia.org/T375366) (owner: 10aborrero) [13:50:11] 10Toolforge (Toolforge iteration 19): [builds-cli] add resolved reference when showing a build - https://phabricator.wikimedia.org/T394300#10821949 (10dcaro) p:05Triage→03Low [13:54:35] 10Toolforge (Toolforge iteration 19): [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276#10821982 (10dcaro) [13:58:21] 06cloud-services-team, 10Toolforge (Toolforge iteration 19): [builds-builder] Golang buildpack does not allow using Procfiles so can't use custom scripts/entrypoints - https://phabricator.wikimedia.org/T390845#10822020 (10dcaro) a:03dcaro [13:58:26] 06cloud-services-team, 10Toolforge (Toolforge iteration 19): [builds-builder] Golang buildpack does not allow using Procfiles so can't use custom scripts/entrypoints - https://phabricator.wikimedia.org/T390845#10822023 (10dcaro) 05Open→03In progress [14:01:43] 06cloud-services-team, 10Toolforge: [components-api] allow stopping a deployment that's running - https://phabricator.wikimedia.org/T388644#10822056 (10dcaro) [14:03:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 492310 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [14:03:49] 06cloud-services-team, 10Toolforge: [usage] Try to get an idea of the amount of tools that were created, but never started anything - https://phabricator.wikimedia.org/T379144#10822076 (10dcaro) [14:04:51] 06cloud-services-team, 10Toolforge (Toolforge iteration 19): Upgrade python buildpack to v0.17.0 or newer for Poetry support - https://phabricator.wikimedia.org/T374056#10822079 (10dcaro) a:03dcaro [14:04:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 19): Upgrade python buildpack to v0.17.0 or newer for Poetry support - https://phabricator.wikimedia.org/T374056#10822081 (10dcaro) 05Open→03In progress [14:07:23] 10Toolforge (Toolforge iteration 19): [jobs-api] prepend date and pod name to filelog lines - https://phabricator.wikimedia.org/T372025#10822085 (10dcaro) →14Duplicate dup:03T127367 [14:07:25] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10822087 (10dcaro) [14:08:05] 06cloud-services-team, 10Toolforge: [functional-tests] maintain-harbor tests are a bit flaky - https://phabricator.wikimedia.org/T393878#10822095 (10dcaro) [14:08:10] 06cloud-services-team, 10Toolforge: [toolforge deploy] direct-api tests fail intermittently on toolsbeta - https://phabricator.wikimedia.org/T369891#10822097 (10dcaro) [14:08:31] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 492430 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [14:08:51] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19), 07Epic: [KR] WE6.3 Introduce a sustainability scoring system for the Toolforge platform - https://phabricator.wikimedia.org/T368600#10822099 (10dcaro) a:05dcaro→03komla [14:08:51] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 19), 07Epic: [KR] WE6.3 Introduce a sustainability scoring system for the Toolforge platform - https://phabricator.wikimedia.org/T368600#10822102 (10dcaro) 05Open→03In progress [14:11:31] FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-4 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:13:17] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10822130 (10Wangombe) [14:13:31] FIRING: ToolsToolsDBReplicationError: ToolsDB replication is broken on tools-db-5 (errno 1236) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationError [14:15:38] 06cloud-services-team, 10Cloud-VPS: Make Puppet able to reliabily restart sssd - https://phabricator.wikimedia.org/T394304#10822149 (10dcaro) Note that there's a hierarchy of services, stemming from `sssd`, like `sssd-nss`. The last errors was trying to restart that `sssd-nss` sub-service. [14:16:58] RESOLVED: TargetDown: Job toolsdb-mariadb is unreachable in project tools instance tools-db-5 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [14:18:47] 06cloud-services-team, 10Cloud-VPS: Make Puppet able to reliabily restart sssd - https://phabricator.wikimedia.org/T394304#10822161 (10dcaro) If I don't understand wrong, this means that any of the sub-services will be restarted when restarting `sssd`: ` root@tools-k8s-control-9:~# systemctl show sssd | grep B... [14:23:31] RESOLVED: ToolsToolsDBReplicationError: ToolsDB replication is broken on tools-db-5 (errno 1236) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationError [14:24:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 493241 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [14:26:31] RESOLVED: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-4 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:29:13] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: Check for diff in services when running diff_with_running_job - https://phabricator.wikimedia.org/T392717#10822204 (10dcaro) [14:29:14] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [functional-tests,deploy,cookbook] Run only selected tests when deploying a component - https://phabricator.wikimedia.org/T381011#10822202 (10dcaro) [14:29:16] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [builds-builder] Add support for Heroku's "24" builder stack based on Ubuntu 2024.04 noble - https://phabricator.wikimedia.org/T380127#10822206 (10dcaro) [14:29:18] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [builds-api] Store the commit hash that was used for the build - https://phabricator.wikimedia.org/T389043#10822210 (10dcaro) [14:29:21] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: Persist important toolforge k8s components logs - https://phabricator.wikimedia.org/T383081#10822212 (10dcaro) [14:29:24] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api] when running a command with wrong quoting, no logs nor useful feedback is given to the user - https://phabricator.wikimedia.org/T356267#10822208 (10dcaro) [14:29:28] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api,infra] upgrade all the existing toolforge jobs to the latest job version - https://phabricator.wikimedia.org/T359649#10822214 (10dcaro) [14:29:31] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 493361 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [14:29:36] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [harbor, builds-builder] Audit robot account permissions - https://phabricator.wikimedia.org/T361708#10822216 (10dcaro) [14:29:44] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api] refactor models - https://phabricator.wikimedia.org/T389118#10822220 (10dcaro) [14:29:48] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api] Split the `*Job` API models into three - https://phabricator.wikimedia.org/T390136#10822218 (10dcaro) [14:30:20] 10Toolforge (Toolforge iteration 20), 07Upstream: [builds-builder] golang based images get infinite nested loops for procfile entries - https://phabricator.wikimedia.org/T363417#10822227 (10dcaro) [14:30:25] 10Toolforge (Toolforge iteration 20), 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#10822229 (10dcaro) [14:30:29] 10Toolforge (Toolforge iteration 20): [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#10822225 (10dcaro) [14:30:37] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#10822233 (10dcaro) [14:30:41] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 05Goal: [harbor] Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#10822235 (10dcaro) [14:30:45] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#10822231 (10dcaro) [14:30:49] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api] Create storage layer, and save business models in persistent storage - https://phabricator.wikimedia.org/T359650#10822237 (10dcaro) [14:30:53] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) - https://phabricator.wikimedia.org/T362869#10822241 (10dcaro) [14:30:57] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): Upgrade python buildpack to v0.17.0 or newer for Poetry support - https://phabricator.wikimedia.org/T374056#10822243 (10dcaro) [14:31:01] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20), 07Epic: [KR] WE6.3 Introduce a sustainability scoring system for the Toolforge platform - https://phabricator.wikimedia.org/T368600#10822245 (10dcaro) [14:31:05] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-05-09 - https://phabricator.wikimedia.org/T393766#10822247 (10dcaro) [14:31:09] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [builds-builder] Golang buildpack does not allow using Procfiles so can't use custom scripts/entrypoints - https://phabricator.wikimedia.org/T390845#10822249 (10dcaro) [14:31:13] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api] Periodically refresh image-config data - https://phabricator.wikimedia.org/T357112#10822251 (10dcaro) [14:31:17] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [envvars-api] bug in envvars-api EnvvarName validation Regex - https://phabricator.wikimedia.org/T391966#10822253 (10dcaro) [14:31:21] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [jobs-api] Introduce deprecation metrics - https://phabricator.wikimedia.org/T390137#10822257 (10dcaro) [14:31:25] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 07Epic: [jobs-api] expose jobs-api continuous jobs to the internet via `toolname.toolforge.org`, just like webservice - https://phabricator.wikimedia.org/T388092#10822255 (10dcaro) [14:31:29] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20), 07Epic: [components-api] First iteration of the component API - https://phabricator.wikimedia.org/T362051#10822261 (10dcaro) [14:31:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#10822259 (10dcaro) [14:31:37] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api,jobs-cli] restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10822263 (10dcaro) [14:31:41] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 07Epic: [jobs-api,webservice] Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#10822265 (10dcaro) [14:31:50] 10Toolforge (Toolforge iteration 20): [builds-cli] add resolved reference when showing a build - https://phabricator.wikimedia.org/T394300#10822268 (10dcaro) [14:31:53] 10Toolforge (Toolforge iteration 20): [components-cli] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10822271 (10dcaro) [14:31:57] 10Toolforge (Toolforge iteration 20), 07Documentation: [components-api] Add admin documentation page - https://phabricator.wikimedia.org/T394280#10822269 (10dcaro) [14:32:02] 10Toolforge (Toolforge iteration 20): [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276#10822272 (10dcaro) [14:32:05] 10Toolforge (Toolforge iteration 20): [components-api] Add alerts and runbooks for basic service health - https://phabricator.wikimedia.org/T394275#10822274 (10dcaro) [14:32:09] 10Toolforge (Toolforge iteration 20): [components-api] add tool config version check - https://phabricator.wikimedia.org/T394273#10822275 (10dcaro) [14:32:14] 10Toolforge (Toolforge iteration 20): [builds-api] define a policy to update runtimes - https://phabricator.wikimedia.org/T393937#10822276 (10dcaro) [14:32:17] 10Toolforge (Toolforge iteration 20), 07Epic: [cicd] Streamline toolforge cli deployment and external contributor ci flows - https://phabricator.wikimedia.org/T392524#10822277 (10dcaro) [14:32:22] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [jobs-api] Indicate when a job is too big to be scheduled - https://phabricator.wikimedia.org/T383515#10822278 (10dcaro) [14:32:26] 10Toolforge (Toolforge iteration 20): [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10822279 (10dcaro) [14:32:30] 10Toolforge (Toolforge iteration 20): [jobs-api] prepend date and pod name to filelog lines - https://phabricator.wikimedia.org/T372025#10822281 (10dcaro) [14:32:31] FIRING: [2x] ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-4 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:32:34] 10Toolforge (Toolforge iteration 20): [toolforge,jobs] "toolforge jobs logs" fails when job has not started yet - https://phabricator.wikimedia.org/T349775#10822283 (10dcaro) [14:32:38] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 10Sustainability (Incident Followup): [docs,envvars-api,jobs-api,builds-api] create docs on how to operate the cluster and core components - https://phabricator.wikimedia.org/T380959#10822280 (10dcaro) [14:32:42] 10Toolforge (Toolforge iteration 20): [components-api] Add support for port/helathcheck for continuous jobs in tool config/depolyment - https://phabricator.wikimedia.org/T362072#10822282 (10dcaro) [14:32:46] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge (Toolforge iteration 20), 07Epic, 05Goal: Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#10822284 (10dcaro) [14:34:42] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component builds-api [14:35:39] 06cloud-services-team, 10Cloud-VPS: Understand Octavia network needs - https://phabricator.wikimedia.org/T394099#10822306 (10aborrero) yes, this sounds fine, and we can most likely implement what they require. [14:37:08] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [builds-builder] Golang buildpack does not allow using Procfiles so can't use custom scripts/entrypoints - https://phabricator.wikimedia.org/T390845#10822308 (10Nokib_Sarkar) So, The Issue is fixed apparently. [14:46:16] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api [14:46:33] (03open) 10aborrero: codfw1dev: network: add octavia-lb-mgmt-net [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/231 (https://phabricator.wikimedia.org/T394099) [14:47:31] FIRING: [2x] ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-4 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [14:48:50] (03approved) 10dcaro: builds-api: bump to 0.0.191-20250514120852-fff150a3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/782 (https://phabricator.wikimedia.org/T389043) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:48:54] (03merge) 10dcaro: builds-api: bump to 0.0.191-20250514120852-fff150a3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/782 (https://phabricator.wikimedia.org/T389043) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [14:49:31] FIRING: ToolsToolsDBReplicationError: ToolsDB replication is broken on tools-db-5 (errno 1236) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationError [14:52:55] 10Cloud Services Proposals, 06cloud-services-team, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10822386 (10taavi) >>! In T394035#10820913, @fnegri wrote: >>> Could it be a generic LDAP adapter, with some minimal logic to restrict the damage yo... [14:53:53] (03update) 10aborrero: codfw1dev: network: add octavia-lb-mgmt-net [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/231 (https://phabricator.wikimedia.org/T394099) [14:56:46] (03update) 10aborrero: codfw1dev: network: add octavia-lb-mgmt-net [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/231 (https://phabricator.wikimedia.org/T394099) [14:58:12] 10Cloud Services Proposals, 06cloud-services-team, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10822420 (10taavi) >>! In T394035#10817585, @fnegri wrote: >> it's better to make a Toolforge-specific service than to make something that's in theo... [14:59:31] RESOLVED: ToolsToolsDBReplicationError: ToolsDB replication is broken on tools-db-5 (errno 1236) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationError [15:00:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 495401 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [15:02:31] RESOLVED: [2x] ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-4 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [15:02:49] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-05-09 - https://phabricator.wikimedia.org/T393766#10822476 (10fnegri) I think `STOP SLAVE` hanged because it was trying to roll back the partially-applied transaction. I waited... [15:05:26] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): ToolsDB: discard obsolete GTID domains - https://phabricator.wikimedia.org/T334947#10822502 (10fnegri) 05Stalled→03In progress [15:06:41] (03approved) 10dcaro: jobs: continuous: set strategy based on number of replicas [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/124 (https://phabricator.wikimedia.org/T375366) (owner: 10aborrero) [15:06:47] (03merge) 10dcaro: jobs: continuous: set strategy based on number of replicas [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/124 (https://phabricator.wikimedia.org/T375366) (owner: 10aborrero) [15:07:03] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): ToolsDB: discard obsolete GTID domains - https://phabricator.wikimedia.org/T334947#10822517 (10fnegri) `FLUSH BINARY LOGS DELETE_DOMAIN_ID` worked and we're left with only one domain_id: ` MariaDB [(none)]> SHOW GLOBAL VARIABLES... [15:08:50] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [builds-builder] Golang buildpack does not allow using Procfiles so can't use custom scripts/entrypoints - https://phabricator.wikimedia.org/T390845#10822543 (10dcaro) >>! In T390845#10822308, @Nokib_Sarkar wrote: > So, The Issue is fixed apparently.... [15:09:05] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [builds-builder] Golang buildpack does not allow using Procfiles so can't use custom scripts/entrypoints - https://phabricator.wikimedia.org/T390845#10822549 (10dcaro) 05In progress→03Resolved [15:10:17] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.374-20250514150659-bba54ffd [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/783 (https://phabricator.wikimedia.org/T375366) [15:17:05] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): ToolsDB: discard obsolete GTID domains - https://phabricator.wikimedia.org/T334947#10822623 (10fnegri) In the replica I could note remove all the obsolete domain IDs yet, because the replica is currently lagging ({T393766}) and it... [15:26:31] 10Cloud Services Proposals, 06cloud-services-team, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10822666 (10dcaro) >> Agreed! But we should consider carefully the requirements for this service and its interface. I think I would like this servic... [15:40:15] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [15:44:43] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [16:08:05] 06cloud-services-team, 10Toolforge: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732#10822994 (10Benjavalero) >>! El T393732#10820291, @Fnielsen escribió: > I have got a problem again: > ` > $ ssh toolforge > Connection closed by 185.15.56.62 port 22 > ` I have had... [16:20:37] (03open) 10dcaro: models: update toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/73 [16:30:45] (03approved) 10dcaro: models: update toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/73 [16:30:49] (03merge) 10dcaro: models: update toolforge models [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/73 [16:32:17] (03update) 10dcaro: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) (owner: 10raymond-ndibe) [16:33:29] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.107-20250514163100-8a9caa4d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/784 [16:40:47] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q#:rack/setup/install X - https://phabricator.wikimedia.org/T394333 (10RobH) 03NEW [16:41:08] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q#:rack/setup/install X - https://phabricator.wikimedia.org/T394333#10823162 (10RobH) [16:41:30] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q4:rack/setup/install cloudcephosd10[48-51] & relocate cloudcephosd1039 - https://phabricator.wikimedia.org/T394333#10823165 (10RobH) [16:41:51] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q4:rack/setup/install cloudcephosd10[48-51] & relocate cloudcephosd1039 - https://phabricator.wikimedia.org/T394333#10823166 (10RobH) [16:43:03] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad: Q4:rack/setup/install cloudcephosd10[48-51] & relocate cloudcephosd1039 - https://phabricator.wikimedia.org/T394333#10823169 (10RobH) a:03Andrew @andrew, Please double check the racking details, as I took the old details from the planned order of 12... [16:44:53] (03update) 10dcaro: components-api: bump to 0.0.107-20250514163100-8a9caa4d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/784 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [16:46:51] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [16:48:06] 10Cloud Services Proposals, 06cloud-services-team, 10Striker: Decision request - Tool account management and Striker - https://phabricator.wikimedia.org/T394035#10823194 (10fnegri) Option Purple is my favourite so far, but I'm still a bit confused about how the new service would look like. Apologies if I'm f... [16:58:44] (03update) 10dcaro: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [16:59:05] (03update) 10dcaro: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [16:59:48] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [17:02:39] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [17:02:46] !log dcaro@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api [17:03:18] (03approved) 10dcaro: components-api: bump to 0.0.107-20250514163100-8a9caa4d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/784 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:03:20] (03merge) 10dcaro: components-api: bump to 0.0.107-20250514163100-8a9caa4d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/784 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:03:33] (03update) 10dcaro: jobs-api: bump to 0.0.374-20250514150659-bba54ffd [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/783 (https://phabricator.wikimedia.org/T375366) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:03:48] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [17:08:17] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732#10823284 (10dcaro) [17:08:21] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20), 05Cloud-Services-Origin-Team, and 3 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10823286 (10dcaro) [17:08:25] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 07IPv6: Rebuild Toolforge Prometheus nodes in v6-dualstack network - https://phabricator.wikimedia.org/T393697#10823288 (10dcaro) [17:10:31] 06cloud-services-team, 10Toolforge: toolforge-legacy-redirector: constant failed probes by prometheus - https://phabricator.wikimedia.org/T385908#10823313 (10dcaro) 05Open→03Resolved it's been a while since I've seen any of these, closing [17:14:23] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [17:14:28] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [17:19:08] 10Toolforge (Toolforge iteration 20): [components-api] deploy on tools - https://phabricator.wikimedia.org/T394337 (10dcaro) 03NEW [17:19:14] 10Toolforge (Toolforge iteration 20): [components-api] deploy on tools - https://phabricator.wikimedia.org/T394337#10823388 (10dcaro) p:05Triage→03High [17:19:43] 10Toolforge (Toolforge iteration 20): [components-api] deploy on tools - https://phabricator.wikimedia.org/T394337#10823389 (10dcaro) [17:19:45] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20), 05Cloud-Services-Origin-Team, and 3 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10823390 (10dcaro) [17:20:16] 10Toolforge (Toolforge iteration 20): [components-cli] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10823392 (10dcaro) [17:20:25] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20), 05Cloud-Services-Origin-Team, and 3 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10823393 (10dcaro) [17:21:13] (03open) 10dcaro: components-api: deploy also on tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/785 [17:26:40] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [17:36:08] (03approved) 10dcaro: jobs-api: bump to 0.0.374-20250514150659-bba54ffd [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/783 (https://phabricator.wikimedia.org/T375366) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:36:12] (03merge) 10dcaro: jobs-api: bump to 0.0.374-20250514150659-bba54ffd [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/783 (https://phabricator.wikimedia.org/T375366) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [17:36:31] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api,jobs-cli] restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10823434 (10dcaro) 05In progress→03Resolved [17:36:58] (03update) 10dcaro: components-api: deploy also on tools [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/785 [17:45:00] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823468 (10VRiley-WMF) [17:48:15] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/231 [17:48:33] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/231 [17:48:36] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823485 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1068.eqiad.wmnet with OS bookworm [17:49:08] (03merge) 10andrew: codfw1dev: network: add octavia-lb-mgmt-net [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/231 (https://phabricator.wikimedia.org/T394099) (owner: 10aborrero) [17:49:12] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch [17:49:26] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for main branch [17:49:38] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [17:50:37] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [18:09:43] 06cloud-services-team, 10Toolforge, 07IPv6, 07Kubernetes: Support IPv6 in Toolforge Kubernetes - https://phabricator.wikimedia.org/T380060#10823558 (10bd808) An idea that @aborrero and I talked about a bit at the Istanbul Hackathon would be to allocate IPv6 in Toolforge's Kubernetes cluster by namespace/to... [18:17:34] 06cloud-services-team, 10Cloud-VPS: ldaptui fails to create new users - https://phabricator.wikimedia.org/T394341 (10Andrew) 03NEW [18:22:32] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823646 (10VRiley-WMF) [18:56:50] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823714 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1068.eqiad.wmnet with OS bookworm [19:16:46] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823795 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1068.eqiad.wmnet with OS bookworm executed... [19:19:09] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823830 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1070.eqiad.wmnet with OS bookworm [19:21:38] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823844 (10VRiley-WMF) [19:27:48] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [19:27:54] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol2006-dev:9100 - https://phabricator.wikimedia.org/T394349 (10phaultfinder) 03NEW [19:32:27] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823913 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1068.eqiad.wmnet with OS bookworm [19:36:50] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823931 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1070.eqiad.wmnet with OS bookworm executed... [19:48:32] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823976 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1070.eqiad.wmnet with OS bookworm [19:50:49] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10823982 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1069.eqiad.wmnet with OS bookworm [20:19:25] (03PS1) 10Andrew Bogott: Dummy certs and keys for Openstack Octavia [labs/private] - 10https://gerrit.wikimedia.org/r/1146060 (https://phabricator.wikimedia.org/T393783) [20:37:02] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824159 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1070.eqiad.wmnet with OS bookworm executed... [20:41:45] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824191 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1069.eqiad.wmnet with OS bookworm executed... [20:53:02] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824252 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1070.eqiad.wmnet with OS bookworm [20:59:55] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824286 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1069.eqiad.wmnet with OS bookworm [21:03:10] (03PS2) 10Andrew Bogott: Dummy certs and keys for Openstack Octavia [labs/private] - 10https://gerrit.wikimedia.org/r/1146060 (https://phabricator.wikimedia.org/T393783) [21:12:15] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824310 (10VRiley-WMF) [21:19:47] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824325 (10VRiley-WMF) [21:24:38] (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Dummy certs and keys for Openstack Octavia [labs/private] - 10https://gerrit.wikimedia.org/r/1146060 (https://phabricator.wikimedia.org/T393783) (owner: 10Andrew Bogott) [21:28:10] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824342 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1070.eqiad.wmnet with OS bookworm completed... [21:37:42] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824381 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1069.eqiad.wmnet with OS bookworm completed... [21:41:41] (03PS1) 10Andrew Bogott: Octavia: Added a fake ca passphrase [labs/private] - 10https://gerrit.wikimedia.org/r/1146086 [21:42:18] (03CR) 10Andrew Bogott: [V:03+2 C:03+2] Octavia: Added a fake ca passphrase [labs/private] - 10https://gerrit.wikimedia.org/r/1146086 (owner: 10Andrew Bogott) [21:44:55] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824395 (10VRiley-WMF) [22:21:48] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824502 (10VRiley-WMF) [22:37:48] RESOLVED: PuppetFailure: Puppet has failed on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure