[00:02:27] (03update) 10raymond-ndibe: images: load refresh time from settings [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/249 (owner: 10dcaro) [00:02:27] (03approved) 10raymond-ndibe: images: load refresh time from settings [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/249 (owner: 10dcaro) [00:04:23] (03update) 10raymond-ndibe: images: cache images retrieved from harbor [repos/cloud/toolforge/jobs-api] (image_use_setting) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/250 (owner: 10dcaro) [01:13:40] 10Cloud-Services: redis-cli is absent on tools bastion hosts - https://phabricator.wikimedia.org/T410102 (10Krinkle) 03NEW The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific proje... [01:13:54] 06cloud-services-team, 10Toolforge: redis-cli is absent on tools bastion hosts - https://phabricator.wikimedia.org/T410102#11373109 (10Krinkle) [01:14:16] 06cloud-services-team, 10Toolforge: redis-cli is absent from tools bastion hosts - https://phabricator.wikimedia.org/T410102#11373111 (10Krinkle) [01:27:40] 10Wikibugs, 06MediaWiki-Platform-Team (Radar): Allow filtering of Gerrit events by file pattern - https://phabricator.wikimedia.org/T410103 (10Krinkle) 03NEW [01:32:15] 10Wikibugs, 06MediaWiki-Platform-Team (Radar): Allow filtering of Gerrit events by file pattern - https://phabricator.wikimedia.org/T410103#11373127 (10Krinkle) Wikibugs bot uses `gerrit stream-events` ([docs](https://gerrit-review.googlesource.com/Documentation/cmd-stream-events.html)) where the docs suggest... [01:48:42] 06cloud-services-team, 10Cloud-VPS, 06Data-Persistence, 10Thumbor, and 2 others: haproxy::site doesn't work as expected on the first puppet run - https://phabricator.wikimedia.org/T321684#11373151 (10Pppery) [01:50:54] 06cloud-services-team, 10Data-Services, 06Security-Team, 13Patch-Needs-Improvement: (partially) expose oauth_registered_consumer table - https://phabricator.wikimedia.org/T247800#11373159 (10Pppery) [01:51:59] 06cloud-services-team, 10Toolforge, 13Patch-Needs-Improvement: Use GitLab CI to upload packages to the toolsbeta repo - https://phabricator.wikimedia.org/T340180#11373164 (10Pppery) [06:43:55] 06cloud-services-team, 10Data-Services, 13Patch-For-Review: Add support for x1 and x4 sections on wiki replicas on the load balancer layer - https://phabricator.wikimedia.org/T409560#11373354 (10Marostegui) @taavi you'd also need to add support for x3 (the split from wikidata). There's a thing here that also... [07:11:51] (03PS1) 10Muehlenhoff: Fix secret name [labs/private] - 10https://gerrit.wikimedia.org/r/1205007 (https://phabricator.wikimedia.org/T381565) [07:12:54] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Fix secret name [labs/private] - 10https://gerrit.wikimedia.org/r/1205007 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [08:28:31] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/19358733947 (https://github.com/cluebotng/component-configs/commits/80f11fd31e068d517962c3bcc76a29581d4d3e0a) [08:28:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [08:28:43] !log tools.cluebotng-editsets Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/19358733943 (https://github.com/cluebotng/component-configs/commits/80f11fd31e068d517962c3bcc76a29581d4d3e0a) [08:28:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-editsets/SAL [08:34:31] !log tools.cluebotng-trainer Deployment completed: https://github.com/cluebotng/component-configs/actions/runs/19358864102 (https://github.com/cluebotng/component-configs/commits/b9059392c8da641c97c97919e1d017bf0251914f) [08:34:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-trainer/SAL [08:40:51] (03CR) 10Elukey: [C:03+1] Remove a lot of historical stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1204913 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [08:44:08] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove a lot of historical stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1204913 (https://phabricator.wikimedia.org/T381565) (owner: 10Muehlenhoff) [09:53:38] 10VPS-Projects: gitlab-docker-runner-v2.analytics instance is inaccessible via SSH - https://phabricator.wikimedia.org/T410083#11373608 (10taavi) If the initial set of Puppet runs fail then the instance will get stuck in a weird state. I have manually kicked off a run which might or might not get it working again. [10:07:24] 06cloud-services-team, 10Data-Services, 13Patch-For-Review: Add support for x1 and x4 sections on wiki replicas on the load balancer layer - https://phabricator.wikimedia.org/T409560#11373645 (10taavi) x3 is already live on the replicas and has been for a while (T390954). For x4, the main thing missing at t... [10:34:44] 06cloud-services-team, 10Data-Services, 13Patch-For-Review: Add support for x1 and x4 sections on wiki replicas on the load balancer layer - https://phabricator.wikimedia.org/T409560#11373757 (10Marostegui) >>! In T409560#11373645, @taavi wrote: > x3 is already live on the replicas and has been for a while (... [12:03:16] 06cloud-services-team, 10Toolforge, 10Tools: Geohack tool frequently triggers the Toolforge front proxy's per-tool rate limit due to too much traffic - https://phabricator.wikimedia.org/T409185#11374100 (10Magnus) Rust version is done: https://github.com/magnusmanske/geohack Fully compatibly including path/... [12:47:55] 06cloud-services-team, 10Toolforge: [jobs-api,jobs-cli] add shell support - https://phabricator.wikimedia.org/T410138 (10DamianZaremba) 03NEW [12:50:13] 06cloud-services-team, 10Toolforge: [jobs-api,jobs-cli] add shell support - https://phabricator.wikimedia.org/T410138#11374202 (10taavi) →14Duplicate dup:03T311917 [12:50:20] 06cloud-services-team, 10Toolforge: [webservice,toolforge-cli] Make `webservice shell` a standalone tool - https://phabricator.wikimedia.org/T311917#11374204 (10taavi) [14:01:42] 06cloud-services-team, 10Striker, 10CAS-SSO, 13Patch-For-Review: Use IDP for authentication in Striker - https://phabricator.wikimedia.org/T359554#11374327 (10taavi) >>! In T359554#11372282, @Arendpieter wrote: > @taavi do I need to do something else for https://gerrit.wikimedia.org/r/c/labs/striker/+/1189... [14:09:37] 06cloud-services-team, 10Toolforge: [webservice,toolforge-cli] Make `webservice shell` a standalone tool - https://phabricator.wikimedia.org/T311917#11374360 (10DamianZaremba) >>! In T311917#10316856, @dcaro wrote: > We might want to run a continuous job using jobs-api, and then kubectl exec into it instead of... [14:45:22] (03open) 10fnegri: Resolve T409981 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1078 [14:45:42] (03PS6) 10Arendpieter: Use IDP for authentication [labs/striker] - 10https://gerrit.wikimedia.org/r/1189915 (https://phabricator.wikimedia.org/T359554) [14:46:04] (03update) 10fnegri: Resolve T409981 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1078 (https://phabricator.wikimedia.org/T409981) [14:46:27] (03update) 10fnegri: Increase harbor quota for milhistbot [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1078 (https://phabricator.wikimedia.org/T409981) [14:48:05] (03update) 10fnegri: Increase harbor quota for milhistbot [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1078 (https://phabricator.wikimedia.org/T409981) [14:48:21] (03CR) 10CI reject: [V:04-1] Use IDP for authentication [labs/striker] - 10https://gerrit.wikimedia.org/r/1189915 (https://phabricator.wikimedia.org/T359554) (owner: 10Arendpieter) [14:48:43] 10Cloud-VPS (Quota-requests): Increase volume storage on project analytics - https://phabricator.wikimedia.org/T409970#11374458 (10fnegri) a:03fnegri [14:48:51] 10Toolforge (Quota-requests): Request increased build quota for MilHistBot Toolforge tool - https://phabricator.wikimedia.org/T409981#11374459 (10fnegri) a:03fnegri [14:53:04] !log fnegri@cloudcumin1001 analytics START - Cookbook wmcs.openstack.quota_increase (T409970) [14:53:09] T409970: Increase volume storage on project analytics - https://phabricator.wikimedia.org/T409970 [14:53:12] !log fnegri@cloudcumin1001 analytics END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T409970) [14:54:15] 10Cloud-VPS (Quota-requests): Increase volume storage on project analytics - https://phabricator.wikimedia.org/T409970#11374473 (10fnegri) 05Open→03Resolved `lang=shell-session fnegri@cloudcontrol1007:~$ sudo wmcs-openstack quota show --usage analytics +-----------------------+---------------------------... [14:55:04] (03PS7) 10Arendpieter: Use IDP for authentication [labs/striker] - 10https://gerrit.wikimedia.org/r/1189915 (https://phabricator.wikimedia.org/T359554) [15:04:46] (03update) 10dcaro: images: cache images retrieved from harbor [repos/cloud/toolforge/jobs-api] (image_use_setting) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/250 [15:06:31] (03update) 10fnegri: flavors: add zuul to g4.cores8.ram32.disk20.4xiops [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/279 (https://phabricator.wikimedia.org/T409365) (owner: 10volans) [15:08:30] (03update) 10fnegri: flavors: add zuul to g4.cores8.ram32.disk20.4xiops [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/279 (https://phabricator.wikimedia.org/T409365) (owner: 10volans) [15:09:32] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for main branch (T409365) [15:09:39] T409365: Grant zuul project access to `fast-iops` volume type and `4xiops` instance flavor - https://phabricator.wikimedia.org/T409365 [15:10:58] !log fnegri@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for main branch (T409365) [15:12:30] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/279 (T409365) [15:12:48] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/279 (T409365) [15:13:24] (03update) 10fnegri: flavors: add zuul to g4.cores8.ram32.disk20.4xiops [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/279 (https://phabricator.wikimedia.org/T409365) (owner: 10volans) [15:13:35] !log fnegri@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/279 (T409365) [15:13:53] !log fnegri@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/279 (T409365) [15:38:05] 06cloud-services-team, 10Cloud-VPS: tofu-infra: add cinder volume types - https://phabricator.wikimedia.org/T410148 (10fnegri) 03NEW [15:49:02] 10Cloud Services Proposals, 06cloud-services-team, 10Cloud-VPS: Decision Request - How openstack projects relate to tofu-infra - https://phabricator.wikimedia.org/T385604#11374669 (10fnegri) I'm fine with going with Option 1 if there is consensus, we can re-evaluate option 3 in the future. [15:49:56] 10VPS-Projects: gitlab-docker-runner-v2.analytics instance is inaccessible via SSH - https://phabricator.wikimedia.org/T410083#11374671 (10xcollazo) >>! In T410083#11373608, @taavi wrote: > If the initial set of Puppet runs fail then the instance will get stuck in a weird state. I have manually kicked off a run... [15:50:51] 10VPS-Projects: gitlab-docker-runner-v2.analytics instance is inaccessible via SSH - https://phabricator.wikimedia.org/T410083#11374678 (10xcollazo) 05Open→03Resolved a:03xcollazo Accessible again, closing. [16:06:19] 06cloud-services-team, 10Cloud-VPS: `logging` project missing normal DNS zone delegation - https://phabricator.wikimedia.org/T409361#11374766 (10fnegri) a:03fnegri [16:11:45] 10Cloud-VPS (Quota-requests): Increase volume storage on project analytics - https://phabricator.wikimedia.org/T409970#11374775 (10xcollazo) Thanks! [16:16:21] 06cloud-services-team, 10Cloud-VPS: `logging` project missing normal DNS zone delegation - https://phabricator.wikimedia.org/T409361#11374795 (10fnegri) `lang=shell-session fnegri@cloudcontrol1007:~$ sudo wmcs-makedomain --project logging --domain logging.wmcloud.org --orig-project cloudinfra`1 fnegri@cloudcon... [16:17:24] 06cloud-services-team, 10Cloud-VPS: `logging` project missing normal DNS zone delegation - https://phabricator.wikimedia.org/T409361#11374798 (10fnegri) 05Open→03Resolved The domains are now there: {F70217056} [16:26:48] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-db-4 (T409287) [16:26:53] T409287: [toolsdb] Destroy tools-db-4 and create new host - https://phabricator.wikimedia.org/T409287 [16:27:43] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-4 (T409287) [16:32:13] 06cloud-services-team (FY2025/26-Q1), 10Toolforge, 07Sustainability (Incident Followup): [toolsdb] Destroy tools-db-4 and create new host - https://phabricator.wikimedia.org/T409287#11374863 (10fnegri) I destroyed the `tools-db-4` instance, the volume is still there. I will keep this task open until the volu... [16:33:08] 06cloud-services-team (FY2025/26-Q1), 10Toolforge, 07Sustainability (Incident Followup): [toolsdb] Destroy tools-db-4 and create new host - https://phabricator.wikimedia.org/T409287#11374864 (10fnegri) [17:17:57] 06cloud-services-team, 10Cloud-VPS: `logging` project missing normal DNS zone delegation - https://phabricator.wikimedia.org/T409361#11375013 (10colewhite) Thank you! [19:15:03] 10Cloud Services Proposals, 06cloud-services-team, 10Cloud-VPS: Decision Request - How openstack projects relate to tofu-infra - https://phabricator.wikimedia.org/T385604#11375304 (10Andrew) Seems like consensus around option 1 -- let's close this next week if no one objects. [19:18:41] 06cloud-services-team, 10Cloud-VPS: OpenStack services should use system users to talk to Keystone - https://phabricator.wikimedia.org/T273150#11375313 (10Andrew) 05Open→03Resolved a:03Andrew Refactoring Neutron is scary, and splitting out a new user for Neutron won't really enhance security so I'm d... [19:19:16] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Modernize openstack rbac - https://phabricator.wikimedia.org/T330759#11375317 (10Andrew) [19:38:41] FIRING: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:42:05] 06cloud-services-team, 10Cloud-VPS, 10CAS-SSO, 06Infrastructure-Foundations: sso failure in codfw1dev (labtesthorizon.wikimedia.org) - https://phabricator.wikimedia.org/T409328#11375366 (10taavi) It seems like CAS issues the redirect when a request has the `x-forwarded-proto` header present. [19:48:41] RESOLVED: CloudVPSDesignateLeaks: Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:03:22] FIRING: HAProxyBackendUnavailable: HAProxy service keystone-public-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [20:08:22] RESOLVED: HAProxyBackendUnavailable: HAProxy service keystone-public-api_backend backend cloudcontrol1011.private.eqiad.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [20:19:30] 06cloud-services-team, 10Cloud-VPS, 10CAS-SSO, 06Infrastructure-Foundations, 13Patch-For-Review: sso failure in codfw1dev (labtesthorizon.wikimedia.org) - https://phabricator.wikimedia.org/T409328#11375488 (10taavi) a:05MoritzMuehlenhoff→03taavi [20:21:37] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Modernize openstack rbac - https://phabricator.wikimedia.org/T330759#11375496 (10Andrew) 05Open→03Resolved Keystone logs are still fairly full of warnings like "DeprecationWarning: Policy enforcement is depending on the value of token." howeve... [20:39:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet' [20:40:57] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1040.eqiad.wmnet' [20:41:35] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1041.eqiad.wmnet' [20:56:11] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1041.eqiad.wmnet' [21:23:11] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1042.eqiad.wmnet' [21:30:41] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1042.eqiad.wmnet' [21:33:47] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1042.eqiad.wmnet' [21:35:01] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1042.eqiad.wmnet' [21:36:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1042.eqiad.wmnet' [21:37:22] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1042.eqiad.wmnet' [22:08:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:21:16] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Modernize openstack rbac - https://phabricator.wikimedia.org/T330759#11375803 (10bd808) >>! In T330759#11375496, @Andrew wrote: > Keystone logs are still fairly full of warnings like > > "DeprecationWarning: Policy enforcement is depending on the v... [23:33:09] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: MTU setting in IPv6 VMs causes issues with Docker - https://phabricator.wikimedia.org/T408543#11375953 (10Andrew) Today I'm draining a cloudvirt and I see this error in the logs (along with a failed migration): ` ERROR nova.virt.libvirt.driver [None re... [23:41:06] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Modernize openstack rbac - https://phabricator.wikimedia.org/T330759#11375959 (10Andrew) I think I care about deprecation warnings when they apply to our custom policies, but don't care when keystone is issuing warnings about policies that shipped d... [23:58:47] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1042.eqiad.wmnet' [23:59:48] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1042.eqiad.wmnet'