[06:58:14] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10937147 (10abi_) 05In progress→03Resolved I'm marking this as done. @Nokib_Sarkar please let us know... [07:56:08] 10Toolforge (Toolforge iteration 21): [infra] 2025-06-21 tools-prometheus-8 stopped responding for a bit - https://phabricator.wikimedia.org/T397563#10937320 (10dcaro) This happened again on 2025-06-22 23:54:00: ` Jun 21 23:54:54 tools-prometheus-8 systemd-timesyncd[469]: Network configuration changed, trying to... [07:56:57] 10Toolforge (Toolforge iteration 21): [infra] 2025-06-21 tools-prometheus-8 stopped responding for a bit - https://phabricator.wikimedia.org/T397563#10937322 (10dcaro) There was no OOM this time, only network loss [08:38:18] 06cloud-services-team, 10Toolforge: [jobs-api] logs internal datetime error - https://phabricator.wikimedia.org/T362521#10937437 (10taavi) >>! In T362521#10936676, @derenrich wrote: > i'd make a pull request but i don't have rights. i The GitLab model requires you to "fork" the repository first. Anyway, at l... [09:09:10] 06cloud-services-team, 10Toolforge: Lock down tools-sgebastion-10 (login-buster.toolforge.org) to only members of tools with known dependencies on it - https://phabricator.wikimedia.org/T397459#10937535 (10taavi) 05Open→03Resolved a:03taavi Per the last WMCS team meeting, I've set `profile::ldap::cli... [09:14:30] 06cloud-services-team, 10Bitu, 06Infrastructure-Foundations, 07LDAP: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663#10937550 (10taavi) >>! In T355663#10931620, @MoritzMuehlenhoff wrote: > My alternative proposal would be to allocate 100.000 to 500.000 for huma... [09:31:12] (03update) 10taavi: Use separate project for log storage buckets [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/50 (https://phabricator.wikimedia.org/T396574) [09:35:36] (03update) 10taavi: Revert "shared: Provision storage buckets for Loki" [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/51 (https://phabricator.wikimedia.org/T396574) [09:35:36] (03update) 10taavi: Provision log storage buckets in a separate project [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/50 (https://phabricator.wikimedia.org/T396574) [09:35:37] (03open) 10taavi: Revert "shared: Provision storage buckets for Loki" [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/51 (https://phabricator.wikimedia.org/T396574) [09:35:43] (03update) 10taavi: Provision log storage buckets in a separate project [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/50 (https://phabricator.wikimedia.org/T396574) [09:35:48] (03update) 10taavi: Revert "shared: Provision storage buckets for Loki" [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/51 (https://phabricator.wikimedia.org/T396574) [09:36:46] 06cloud-services-team, 10Striker, 10Phabricator: Rebuild Striker demo server - https://phabricator.wikimedia.org/T329687#10937606 (10Arendpieter) [09:36:59] 06cloud-services-team, 10Striker, 10Phabricator: Striker dev environment needs a new Phabricator base image - https://phabricator.wikimedia.org/T340080#10937608 (10Arendpieter) [09:46:18] 06cloud-services-team, 10Data-Services: [maintain-views] --table acts as a wildcard - https://phabricator.wikimedia.org/T397533#10937625 (10taavi) Hmm. `abuse_filter_action` and `abuse_filter_history` both join `abuse_filter`, and I think `--table` is actually matching on source table names instead of the name... [09:48:18] 06cloud-services-team, 10Striker: Striker should use ID instead of username to identify SUL accounts - https://phabricator.wikimedia.org/T359428#10937627 (10taavi) What's left is finding a way to set the new ID field for old accounts that only have an username set. [09:52:38] 06cloud-services-team, 10Data-Services: [maintain-views] --table acts as a wildcard - https://phabricator.wikimedia.org/T397533#10937633 (10fnegri) Ah that makes sense. Maybe a better fix is to add a "--view" argument to indicate which view you want to recreate? [10:14:31] 10Quarry: DeprecationWarning: 'werkzeug.contrib.iterio' is deprecated as of version 0.15 and will be removed in version 1.0. - https://phabricator.wikimedia.org/T397613 (10taavi) 03NEW [10:16:36] 10Quarry: quarry: Add a robots.txt - https://phabricator.wikimedia.org/T397502#10937733 (10taavi) a:03taavi [10:16:57] 10Quarry: quarry: Add a robots.txt - https://phabricator.wikimedia.org/T397502#10937734 (10github-toolforge-bot) supertassu opened https://github.com/toolforge/quarry/pull/92 [10:17:20] supertassu opened https://github.com/toolforge/quarry/pull/92 [10:45:18] 10Quarry: quarry: Add a robots.txt - https://phabricator.wikimedia.org/T397502#10937826 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/92 [10:46:34] supertassu closed https://github.com/toolforge/quarry/pull/92 [10:47:33] 10Quarry: quarry: Add a robots.txt - https://phabricator.wikimedia.org/T397502#10937844 (10taavi) 05Open→03Resolved [11:02:01] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Quarry: Fix Quarry's Redis pod exiting causing frequent outages - https://phabricator.wikimedia.org/T396785#10937883 (10taavi) 05Open→03Resolved Calling this done. [11:55:15] (03update) 10dcaro: functional_tests.jobs: add tests for health-check [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/810 (https://phabricator.wikimedia.org/T396210) [11:56:22] (03approved) 10dcaro: functional_tests.jobs: add tests for health-check [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/810 (https://phabricator.wikimedia.org/T396210) [11:56:27] (03update) 10dcaro: functional_tests.jobs: add tests for health-check [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/810 (https://phabricator.wikimedia.org/T396210) [11:56:27] (03merge) 10dcaro: functional_tests.jobs: add tests for health-check [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/810 (https://phabricator.wikimedia.org/T396210) [11:56:57] (03update) 10dcaro: components-api: bump to 0.0.120-20250619182909-09ea62ae [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/820 (https://phabricator.wikimedia.org/T394990) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [11:57:15] (03merge) 10dcaro: components-api: bump to 0.0.120-20250619182909-09ea62ae [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/820 (https://phabricator.wikimedia.org/T394990) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:51:36] (03update) 10dcaro: generate: add new subcommand [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/38 [13:05:45] (03update) 10dcaro: runtime: create runtime module to handle actions [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/88 [13:08:59] 06cloud-services-team, 10Bitu, 06Infrastructure-Foundations, 07LDAP: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663#10938232 (10jhathaway) >>! In T355663#10931620, @MoritzMuehlenhoff wrote: > That's a good point! I poked around if I could find a way to configu... [13:13:08] 10Toolforge (Toolforge iteration 21), 07good first task: [components-api] add `GET` endpoint `/v1/tool//deployments/latest` - https://phabricator.wikimedia.org/T394990#10938239 (10Chuckonwumelu) 05In progress→03Resolved [13:19:33] !log taavi@cloudcumin1001 toolsbeta-logging START - Cookbook wmcs.vps.add_user_to_project for user 'toolsbeta-tofu' in role 'member' [13:19:38] !log taavi@cloudcumin1001 toolsbeta-logging END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'toolsbeta-tofu' in role 'member' [13:20:08] !log taavi@cloudcumin1001 tools-logging START - Cookbook wmcs.vps.add_user_to_project for user 'tools-tofu' in role 'member' [13:20:14] !log taavi@cloudcumin1001 tools-logging END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'tools-tofu' in role 'member' [13:26:20] (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [13:30:04] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for service: project,neutron [13:30:51] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for service: project,neutron [13:31:07] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,neutron [13:34:28] (03open) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [13:34:54] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [13:35:17] 10VPS-project-Phabricator, 06collaboration-services: User "Recent Activity" feeds don't display anything on the Phabricator test instance - https://phabricator.wikimedia.org/T397626 (10A_smart_kitten) 03NEW [13:35:43] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [13:36:50] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,neutron [13:42:35] (03approved) 10fnegri: Revert "shared: Provision storage buckets for Loki" [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/51 (https://phabricator.wikimedia.org/T396574) (owner: 10taavi) [13:44:21] (03merge) 10taavi: Revert "shared: Provision storage buckets for Loki" [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/51 (https://phabricator.wikimedia.org/T396574) [13:44:30] (03update) 10taavi: Provision log storage buckets in a separate project [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/50 (https://phabricator.wikimedia.org/T396574) [13:47:27] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [13:48:15] (03PS2) 10NkwadaNora: Folder restructuring [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1155646 [13:48:15] (03PS1) 10NkwadaNora: [fix]: resolved conflict and linter check errors [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1162912 [13:53:09] 06cloud-services-team, 10Toolforge, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (2025.06.13 - 2025.07.04): Problem with SPARQL endpoint response and crawling on Toolforge - https://phabricator.wikimedia.org/T397570#10938436 (10pfischer) [13:54:36] 06cloud-services-team, 10Toolforge, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (2025.06.13 - 2025.07.04): Problem with SPARQL endpoint response and crawling on Toolforge - https://phabricator.wikimedia.org/T397570#10938439 (10pfischer) We could at least investigate the reason for failing re... [13:55:24] (03update) 10ahecht: Draft: Cache database queries [toolforge-repos/afdstats] - 10https://gitlab.wikimedia.org/toolforge-repos/afdstats/-/merge_requests/3 [14:00:02] (03CR) 10Eugene233: "recheck" [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1155646 (owner: 10NkwadaNora) [14:00:27] (03approved) 10fnegri: Provision log storage buckets in a separate project [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/50 (https://phabricator.wikimedia.org/T396574) (owner: 10taavi) [14:00:48] (03CR) 10CI reject: [V:04-1] Folder restructuring [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1155646 (owner: 10NkwadaNora) [14:00:57] (03merge) 10taavi: Provision log storage buckets in a separate project [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/50 (https://phabricator.wikimedia.org/T396574) [14:01:11] (03open) 10dcaro: build: fail if ref failed to resolve [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/96 [14:02:51] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10938453 (10taavi) We talked about this in the WMCS team meeting last week and the result was that this can go ahead. [14:03:30] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [14:05:05] 06cloud-services-team, 10Toolforge, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (2025.06.13 - 2025.07.04): Problem with SPARQL endpoint response and crawling on Toolforge - https://phabricator.wikimedia.org/T397570#10938473 (10Fnielsen) I do not see a User-Agent in my log. What I see is, e.g... [14:06:48] (03CR) 10Andrew Bogott: [C:03+2] views: fix access to non-initialized self.instance_tuples [openstack/horizon/wmf-proxy-dashboard] - 10https://gerrit.wikimedia.org/r/1160677 (https://phabricator.wikimedia.org/T397272) (owner: 10David Caro) [14:07:01] (03CR) 10Eugene233: "@nkwadanora@gmail.com it looks like the rebase was not complete. Please see the comments." [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1155646 (owner: 10NkwadaNora) [14:07:07] (03update) 10dcaro: builds: handle long_status [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/39 [14:07:24] (03Merged) 10jenkins-bot: views: fix access to non-initialized self.instance_tuples [openstack/horizon/wmf-proxy-dashboard] - 10https://gerrit.wikimedia.org/r/1160677 (https://phabricator.wikimedia.org/T397272) (owner: 10David Caro) [14:18:13] (03open) 10taavi: logging: Use separate app creds for separate projects [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/52 (https://phabricator.wikimedia.org/T396574) [14:18:17] (03update) 10taavi: logging: Use separate app creds for separate projects [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/52 (https://phabricator.wikimedia.org/T396574) [14:20:51] (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [14:22:23] (03update) 10dcaro: deploy_task: store error when build fails [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/92 [14:26:49] (03update) 10dcaro: build: fail if ref failed to resolve [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/96 [14:32:00] (03approved) 10fnegri: logging: Use separate app creds for separate projects [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/52 (https://phabricator.wikimedia.org/T396574) (owner: 10taavi) [14:39:31] (03merge) 10taavi: logging: Use separate app creds for separate projects [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/52 (https://phabricator.wikimedia.org/T396574) [14:56:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:02:38] 06cloud-services-team, 10Toolforge: NFS issues should not be able tools-static down - https://phabricator.wikimedia.org/T397634 (10taavi) 03NEW [15:06:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-61 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:07:07] 06cloud-services-team, 10Toolforge: NFS issues should not be able tools-static down - https://phabricator.wikimedia.org/T397634#10938688 (10taavi) Some stats to confirm that the CDNjs and FontCDN mirrors make up most of the traffic here: `lang=shell-session taavi@tools-static-15:~ $ sudo grep '"GET /' /var/log... [15:18:10] (03PS1) 10NkwadaNora: [fix]: tox tests all passes [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1162933 [15:19:10] (03update) 10dcaro: bump_version: copy from jobs-api [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/67 [15:23:03] (03open) 10chuckonwumelu: d/changelog: bump to 0.0.8 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/40 (https://phabricator.wikimedia.org/T394994) [15:23:53] (03update) 10chuckonwumelu: d/changelog: bump to 0.0.8 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/40 (https://phabricator.wikimedia.org/T394994) [15:26:33] (03approved) 10fnegri: d/changelog: bump to 0.0.8 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/40 (https://phabricator.wikimedia.org/T394994) (owner: 10chuckonwumelu) [15:28:33] !log chuckonwumelu@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [15:31:09] !log chuckonwumelu@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [15:32:56] (03merge) 10chuckonwumelu: d/changelog: bump to 0.0.8 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/40 (https://phabricator.wikimedia.org/T394994) [15:40:53] 10Toolforge (Toolforge iteration 21), 07good first task, 13Patch-For-Review: [components-cli] make `toolforge components deployment show` show the latest deployment if no id passed - https://phabricator.wikimedia.org/T394994#10938829 (10Chuckonwumelu) 05In progress→03Resolved [15:41:04] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 10AbuseFilter, 06Data-Engineering, and 6 others: Wiki replicas contain the hit count for private and protected filters - https://phabricator.wikimedia.org/T397508#10938830 (10sbassett) [15:43:38] 06cloud-services-team, 10Cloud-VPS: Nova metadata service failing for all VMs - https://phabricator.wikimedia.org/T395742#10938858 (10Andrew) I just had to restart the service again, in both eqiad1 and codfw1dev [15:45:42] 06cloud-services-team, 10Toolforge: NFS issues should not bring tools-static down - https://phabricator.wikimedia.org/T397634#10938865 (10fnegri) [15:46:56] 06cloud-services-team, 10Toolforge (Toolforge iteration 21): [tools-static,infra] NFS issues should not bring tools-static down - https://phabricator.wikimedia.org/T397634#10938869 (10dcaro) [15:47:40] 10Toolforge (Toolforge iteration 21): [infra] 2025-06-21 Several correlated potentially network issues during the night - https://phabricator.wikimedia.org/T397566#10938901 (10dcaro) [15:47:56] (03update) 10ahecht: Draft: Cache database queries [toolforge-repos/afdstats] - 10https://gitlab.wikimedia.org/toolforge-repos/afdstats/-/merge_requests/3 [15:48:37] 06cloud-services-team, 10Domains, 06Traffic, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10938907 (10ssingh) I updated Markmonitor to further add the v6 glue records: ` ;; ADDITIONAL SECTION: ns0.openstack.eqiad1.wikimediacloud.org. 3600 IN A... [15:52:49] (03open) 10dcaro: prometheus: use vm 9 to test network issues [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/53 (https://phabricator.wikimedia.org/T397566) [16:02:14] 06cloud-services-team, 10Cloud-VPS: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#10939015 (10Andrew) suspicious log: ` 2025-06-18 11:38:07.604 127265 CRITICAL keystone [None req-c0a32d5d-203a-4817-badc-6856e1f13556 - - - - - -] Unhandled e... [16:03:14] 06cloud-services-team, 10Cloud-VPS: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#10939017 (10Andrew) ` 2025-06-18 15:27:43.714 3950140 CRITICAL keystone [None req-3f3de4ae-4796-4ba0-9363-f2ff944332f4 novaobserver teyora - - default default]... [16:03:52] 06cloud-services-team, 10Cloud-VPS: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#10939018 (10Andrew) ` 2025-06-18 19:12:06.914 275182 ERROR ldappool [None req-50ef4a30-984b-4be5-9faa-f21fabe9ec15 - - - - - -] Invalid credentials. Bind is u... [16:07:48] (03update) 10dcaro: prometheus: use vm 9 to test network issues [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/53 (https://phabricator.wikimedia.org/T397566) [16:08:42] (03approved) 10taavi: prometheus: use vm 9 to test network issues [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/53 (https://phabricator.wikimedia.org/T397566) (owner: 10dcaro) [16:09:24] (03merge) 10dcaro: prometheus: use vm 9 to test network issues [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/53 (https://phabricator.wikimedia.org/T397566) [16:23:13] 06cloud-services-team, 10Toolforge, 07Documentation, 07Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#10939110 (10dcaro) Can that be split from the cli? As in, make it optional like a plugin of sorts? There... [16:36:51] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21), 05Cloud-Services-Origin-Team, and 3 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10939220 (10dcaro) [16:38:35] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10939228 (10taavi) [16:41:33] 10Cloud-VPS (Quota-requests), 07affects-Kiwix-and-openZIM: Increase RAM quota of mwoffliner project - https://phabricator.wikimedia.org/T396840#10939259 (10dcaro) +1 [16:41:42] 10Cloud-VPS (Quota-requests), 07affects-Kiwix-and-openZIM: Increase RAM quota of mwoffliner project - https://phabricator.wikimedia.org/T396840#10939260 (10dcaro) a:03komla [16:42:05] 10Cloud-VPS (Quota-requests): Increase Pixel project disk quota to 160 GB - https://phabricator.wikimedia.org/T397266#10939263 (10dcaro) a:03komla +1 [16:54:28] (03open) 10taavi: logging: Add values to deploy to toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/821 (https://phabricator.wikimedia.org/T396574) [16:54:31] (03update) 10taavi: logging: Add values to deploy to toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/821 (https://phabricator.wikimedia.org/T396574) [17:04:11] 06cloud-services-team, 10Cloud-VPS: Keystone not cleaning up ldap groups on project delete - https://phabricator.wikimedia.org/T397648 (10Andrew) 03NEW [17:09:01] (03open) 10ilanen1: Ilanmerge [toolforge-repos/miss-search] (update-cycle) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/8 [17:10:55] (03update) 10ilanen1: Ilanmerge [toolforge-repos/miss-search] (update-cycle) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/8 [17:15:30] (03update) 10taavi: logging: Add values to deploy to toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/821 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T396574) [17:15:34] (03update) 10taavi: logging: Add values to deploy to toolsbeta [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/821 (https://phabricator.wikimedia.org/T386480) [17:15:46] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Provision object storage volumes for Loki - https://phabricator.wikimedia.org/T396574#10939401 (10taavi) 05Open→03Resolved [18:51:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:11:51] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,neutron [19:18:03] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,neutron [19:19:57] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:20:46] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for service: project,neutron [19:21:49] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for service: project,neutron [19:24:57] RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:32:27] 06cloud-services-team, 10Cloud-VPS: Nova metadata service failing for all VMs - https://phabricator.wikimedia.org/T395742#10939810 (10Andrew) Looks like this has been identified https://bugs.launchpad.net/neutron/+bug/2112492 and fixed https://review.opendev.org/c/openstack/neutron/+/952399 [19:32:59] 06cloud-services-team, 10Cloud-VPS: Nova metadata service failing for all VMs - https://phabricator.wikimedia.org/T395742#10939815 (10Andrew) [19:46:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:46:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:51:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:52:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:57:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [19:58:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:18:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:25:30] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance cvn-app10 in project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:58:13] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment codfw1dev for service: project,neutron [21:58:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment codfw1dev for service: project,neutron [21:59:36] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for service: project,neutron [22:00:35] 10Tool-admin, 07Mobile: Zoom on "admin" tools pages get stuck - https://phabricator.wikimedia.org/T230508#10940202 (10Krinkle) [22:02:01] 10Tools: sigma.toolforge.org exceeded 'max_user_connections' (due to concurrent requests problem and outdated SQL query) - https://phabricator.wikimedia.org/T240036#10940204 (10Krinkle) [22:02:41] 10Tools: paste.toolforge.org is continuously spammed - https://phabricator.wikimedia.org/T189255#10940207 (10Krinkle) [22:02:57] 10Tools: Update paste.toolforge.org to Stikked 0.12.0 - https://phabricator.wikimedia.org/T189256#10940209 (10Krinkle) [22:03:34] 10Tools: Request @framawiki access to spamadmin feature of paste.toolforge.org tool - https://phabricator.wikimedia.org/T189257#10940210 (10Krinkle) [22:05:36] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for service: project,neutron [22:09:48] 10Tools: https://pirsquared.toolforge.org/iw.php crashes with fatal error - https://phabricator.wikimedia.org/T150592#10940232 (10Krinkle) [22:12:28] 10Tools, 07Video: Please add Meisam to https://video2commons.toolforge.org/ - https://phabricator.wikimedia.org/T221830#10940245 (10Krinkle) [22:15:19] 10Tools, 07Video: Please add Meisam to Video project on WMCS - https://phabricator.wikimedia.org/T221830#10940285 (10Krinkle) [22:17:10] 10Tool-admin, 10Toolhub: Add Pagination or similar to https://admin.toolforge.org/tools - https://phabricator.wikimedia.org/T278084#10940293 (10Krinkle) 05Open→03Resolved a:03Krinkle https://admin.toolforge.org/tools is now a redirect to https://toolhub.wikimedia.org/ which has pagination. [22:17:33] !log komla@cloudcumin1001 pixel START - Cookbook wmcs.openstack.quota_increase (T397266) [22:17:36] T397266: Increase Pixel project disk quota to 160 GB - https://phabricator.wikimedia.org/T397266 [22:17:39] !log komla@cloudcumin1001 pixel END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T397266) [22:17:51] 10Tool-admin, 10Toolhub: https://tools.wmflabs.org/admin/tools layout slightly broken except the first tools in Firefox 56 - https://phabricator.wikimedia.org/T178258#10940300 (10Krinkle) https://admin.toolforge.org/tools is now a redirect to https://toolhub.wikimedia.org/ which has a different layout seemingl... [22:17:59] 10Tool-admin, 10Toolhub: https://tools.wmflabs.org/admin/tools layout slightly broken except the first tools in Firefox 56 - https://phabricator.wikimedia.org/T178258#10940302 (10Krinkle) 05Open→03Resolved [22:22:57] 10Tool-Pageviews, 07I18n: GRAMMAR doesn't seem to work in https://pageviews.wmcloud.org/siteviews/url_structure/ - https://phabricator.wikimedia.org/T156576#10940310 (10Krinkle) [22:23:39] 10Tools: "around" map of Wikidata-todo tool not user-friendly - https://phabricator.wikimedia.org/T221350#10940324 (10Krinkle) [22:24:33] 10Cloud-VPS (Quota-requests): Increase Pixel project disk quota to 160 GB - https://phabricator.wikimedia.org/T397266#10940330 (10komla) This has been done: ` 100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'sudo -i wmcs-ope...-cloud novaadmin'. 100.0% (1/1) success ratio (>= 100.0% threshold) of... [22:24:39] 10Cloud-VPS (Quota-requests): Increase Pixel project disk quota to 160 GB - https://phabricator.wikimedia.org/T397266#10940331 (10komla) 05Open→03Resolved [23:49:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-33 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses