[00:05:55] FIRING: MaxConntrack: Max conntrack at 80.36% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:10:55] RESOLVED: MaxConntrack: Max conntrack at 80.11% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:11:25] FIRING: MaxConntrack: Max conntrack at 80.23% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:16:25] RESOLVED: MaxConntrack: Max conntrack at 80.11% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:21:25] FIRING: MaxConntrack: Max conntrack at 81.07% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:46:26] RESOLVED: MaxConntrack: Max conntrack at 80.49% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:47:55] FIRING: MaxConntrack: Max conntrack at 80.58% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:52:56] RESOLVED: MaxConntrack: Max conntrack at 80.39% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:53:25] FIRING: MaxConntrack: Max conntrack at 81.44% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:58:25] RESOLVED: MaxConntrack: Max conntrack at 82.87% on cloudvirt1067:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [02:57:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 1d 0h 5m 6s - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [03:23:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-70 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [04:48:05] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [06:40:03] 10Tool-Pageviews: Pageview API – No data from 2025-06-28 - https://phabricator.wikimedia.org/T398157#10957222 (10Aklapper) [07:01:29] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on gitlab-runners-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [07:26:47] 06cloud-services-team, 10Cloud-VPS: Creation of Hiera Puppet Prefix via OpenTofu fails - https://phabricator.wikimedia.org/T398117#10957288 (10taavi) `counterexample 2025-06-28 17:19:09.861 3678954 ERROR uwsgi_file__usr_local_lib_python3_9_dist-packages_puppet-enc [None req-bc681680-3458-47d4-80a2-21b4c5364c69... [07:28:08] 06cloud-services-team, 10Cloud-VPS: Creation of Hiera Puppet Prefix via OpenTofu fails - https://phabricator.wikimedia.org/T398117#10957299 (10taavi) Based on the stack trace, it seems like the provider (or the go-cloudvps library, not sure which one is the problem here) sends `"roles": null` when roles haven'... [07:30:39] 06cloud-services-team, 10Cloud-VPS, 07Documentation: [tofu-cloudvps] Document using `cloudvps_puppet_project` to manage project-wide and instance specific puppet classes and hiera settings - https://phabricator.wikimedia.org/T397994#10957303 (10taavi) >>! In T397994#10955587, @bd808 wrote: > The API level ma... [08:13:36] 06cloud-services-team, 10Cloud-VPS: Creation of Hiera Puppet Prefix via OpenTofu fails - https://phabricator.wikimedia.org/T398117#10957371 (10taavi) a:03taavi [08:44:13] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Creation of Hiera Puppet Prefix via OpenTofu fails - https://phabricator.wikimedia.org/T398117#10957499 (10taavi) As expected, the above patch changes the API to return a proper error: ` cloudvps_puppet_prefix.testinstances: Creating... ╷ │ Error: Unabl... [08:51:02] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170 (10fnegri) 03NEW [08:52:24] 06cloud-services-team, 10Striker: Stop trying to store MW real name in Striker - https://phabricator.wikimedia.org/T384206#10957545 (10Arendpieter) 05Open→03Resolved [08:52:34] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.989% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [08:55:16] 06cloud-services-team, 10Striker: Concatenated URLs in toolinfo.json - https://phabricator.wikimedia.org/T345776#10957549 (10Arendpieter) 05Open→03Resolved [08:56:48] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170#10957555 (10fnegri) @Hamishcn your tool hamishbot is causing some performance issues on ToolsDB. Please consider adding some indexes to the `admin_data_... [09:00:08] 10Tool-Pageviews: Pageview API – No data from 2025-06-28 - https://phabricator.wikimedia.org/T398157#10957574 (10Aklapper) →14Duplicate dup:03T398150 [09:07:15] (03open) 10taavi: puppet: Use omitempty for optional attributes [repos/cloud/cloud-vps/go-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/go-cloudvps/-/merge_requests/5 (https://phabricator.wikimedia.org/T398117) [09:08:01] (03update) 10taavi: puppet: Use omitempty for optional attributes [repos/cloud/cloud-vps/go-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/go-cloudvps/-/merge_requests/5 (https://phabricator.wikimedia.org/T398117) [09:10:19] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170#10957610 (10fnegri) Replication is flowing again and should catch up soon. [09:10:45] (03open) 10taavi: puppet_prefix: Save details directly after creation [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/6 [09:10:46] (03update) 10taavi: puppet_prefix: Save details directly after creation [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/6 [09:10:46] (03update) 10taavi: puppet_prefix: Default roles to an empty list [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/7 [09:10:46] (03open) 10taavi: puppet_prefix: Default roles to an empty list [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/7 [09:10:52] (03update) 10taavi: puppet_prefix: Save details directly after creation [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/6 [09:10:57] (03update) 10taavi: puppet_prefix: Default roles to an empty list [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/7 [09:12:10] (03update) 10taavi: puppet_prefix: Default roles to an empty list [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/7 [09:12:16] (03update) 10taavi: puppet_prefix: Save details directly after creation [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/6 [09:13:47] (03update) 10taavi: puppet: Use omitempty for optional attributes [repos/cloud/cloud-vps/go-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/go-cloudvps/-/merge_requests/5 (https://phabricator.wikimedia.org/T398117) [09:19:31] FIRING: ToolsToolsDBReplicationMissing: ToolsDB replication is not running on tools-db-5 (errno 0) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationMissing [09:19:31] FIRING: ToolsToolsDBReplicationError: ToolsDB replication is broken on tools-db-5 (errno 1032) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationError [09:20:00] 06cloud-services-team, 10Toolforge: tools-static.wmflabs.org down (504) 2025-06-28 - https://phabricator.wikimedia.org/T398103#10957627 (10dcaro) During the weekend the script did restart nginx ~12 times :/ [09:23:01] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 1d 5h 14m 43s - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [09:51:13] 10Toolforge (Toolforge iteration 21), 07Documentation: [components-api,components-cli] add user documentation page - https://phabricator.wikimedia.org/T394279#10957751 (10dcaro) Created https://wikitech.wikimedia.org/wiki/Help:Toolforge/Deploy_your_tool [09:51:20] 06cloud-services-team, 10Toolforge (Toolforge iteration 21): [components-api] allow stopping a deployment that's running - https://phabricator.wikimedia.org/T388644#10957755 (10dcaro) 05Duplicate→03Resolved [09:51:33] 10Toolforge (Toolforge iteration 21), 07Documentation: [components-api,components-cli] add user documentation page - https://phabricator.wikimedia.org/T394279#10957758 (10dcaro) a:03dcaro [09:51:38] 10Toolforge (Toolforge iteration 21), 07Documentation: [components-api,components-cli] add user documentation page - https://phabricator.wikimedia.org/T394279#10957760 (10dcaro) 05Open→03Resolved [09:56:25] 06cloud-services-team, 10Toolforge (Toolforge iteration 21): [tools-static,infra] NFS issues should not bring tools-static down - https://phabricator.wikimedia.org/T397634#10957772 (10dcaro) Related {T398103} [09:56:56] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170#10957776 (10fnegri) p:05Triage→03High [09:57:17] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: wmcs-package-build syntax warning - https://phabricator.wikimedia.org/T396004#10957777 (10taavi) 05Open→03Resolved [09:59:14] 06cloud-services-team, 10Toolforge (Toolforge iteration 21), 13Patch-For-Review: `toolforge jobs dump` fails for tools.stewardsbot - https://phabricator.wikimedia.org/T396210#10957786 (10dcaro) > This patch addressed that https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/173. I mer... [09:59:21] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170#10957787 (10fnegri) Replication failed shortly after restarting with: ` Last_SQL_Error: Could not execute Delete_rows_v1 event on table s53993__prc_adm... [10:06:52] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Creation of Hiera Puppet Prefix via OpenTofu fails - https://phabricator.wikimedia.org/T398117#10957817 (10taavi) p:05Triage→03Medium [10:17:44] 06cloud-services-team, 10Cloud-VPS: Instance accounts-appserver7 in account-creation-assistance cannot connect to internal NTP - https://phabricator.wikimedia.org/T398099#10957879 (10taavi) p:05Triage→03Medium [10:20:15] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170#10957882 (10fnegri) Looking at the mariadb logs I realized what went wrong: The replication was stuck on transaction with GTID `2886729896-2886729896-3... [10:21:33] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170#10957887 (10fnegri) I will recreate the replica from scratch using the standard and well-tested [procedure](https://wikitech.wikimedia.org/wiki/Portal:T... [10:25:38] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Quarry: Improve Quarry's observability - https://phabricator.wikimedia.org/T396770#10957905 (10taavi) p:05High→03Triage a:05taavi→03None [10:25:52] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [toolforge,infra] Centralized logging for Toolforge infrastructure logs - https://phabricator.wikimedia.org/T97861#10957908 (10dcaro) Hmm, does this mean that you foresee having two centralized places for logs? It's better than not having any, but I wou... [10:25:53] 10Quarry: Improve Quarry's observability - https://phabricator.wikimedia.org/T396770#10957909 (10taavi) [10:25:59] 10Quarry: Deploy prometheus-redis-exporter - https://phabricator.wikimedia.org/T396771#10957910 (10taavi) p:05High→03Triage a:05taavi→03None [10:29:21] 10Quarry: Fix metrics collection from quarry app pods - https://phabricator.wikimedia.org/T398184 (10taavi) 03NEW [10:36:14] (03open) 10dcaro: global: don't return tracebacks to users [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/102 [10:43:51] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' (T398170) [10:43:57] T398170: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170 [10:44:11] !log fnegri@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' (T398170) [10:45:10] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.openstack.quota_increase (T398170) [10:45:18] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T398170) [10:45:37] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' (T398170) [10:45:57] !log fnegri@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' (T398170) [10:46:51] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.openstack.quota_increase (T398170) [10:46:58] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T398170) [10:47:02] !log fnegri@cloudcumin1001 tools START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' (T398170) [10:51:01] !log fnegri@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' (T398170) [10:51:06] T398170: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170 [10:55:28] FIRING: TargetDown: Job toolsdb-mariadb is unreachable in project tools instance tools-db-6 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [11:01:01] 10Tool-quickcategories: QuickCategories is extremely slow to load - https://phabricator.wikimedia.org/T398104#10958073 (10adiba_anjum) 05Open→03Resolved a:03adiba_anjum [11:06:47] (03open) 10fnegri: toolsdb: new replica host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/57 (https://phabricator.wikimedia.org/T398170) [11:16:27] (03approved) 10taavi: toolsdb: new replica host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/57 (https://phabricator.wikimedia.org/T398170) (owner: 10fnegri) [11:23:38] FIRING: [2x] ProbeDown: Service toolsbeta-proxy-8:443 has failed probes (http_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-proxy-8:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:25:39] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Trove managed instances should be dual stack - https://phabricator.wikimedia.org/T398189 (10taavi) 03NEW [11:26:24] 06cloud-services-team, 10Cloud-VPS, 07IPv6, 13Patch-For-Review: Support IPv6 for Cloud VPS DNS services - https://phabricator.wikimedia.org/T396448#10958148 (10taavi) 05Open→03Resolved This is done except {T397822} which is tracked there separately. [11:37:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-db-6 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:38:38] FIRING: [3x] ProbeDown: Service toolsbeta-proxy-8:443 has failed probes (http_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:43:38] FIRING: [6x] ProbeDown: Service api.svc.beta.toolforge.org:443 has failed probes (http_api_svc_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:48:24] 06cloud-services-team, 10Cloud-VPS, 07IPv6, 07Upstream: Trove managed instances should be dual stack - https://phabricator.wikimedia.org/T398189#10958231 (10taavi) [11:48:38] FIRING: [6x] ProbeDown: Service api.svc.beta.toolforge.org:443 has failed probes (http_api_svc_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:53:38] RESOLVED: [6x] ProbeDown: Service api.svc.beta.toolforge.org:443 has failed probes (http_api_svc_beta_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:52:49] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.491% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [12:55:48] (03merge) 10fnegri: toolsdb: new replica host [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/57 (https://phabricator.wikimedia.org/T398170) [13:01:57] 10Toolforge (Toolforge iteration 21): [lima-kilo,misctools] no arm64 version for mac-os based installations - https://phabricator.wikimedia.org/T398016#10958509 (10dcaro) [13:08:54] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations: Remove feature to connect SUL account to Striker (read from Bitu instead) - https://phabricator.wikimedia.org/T371595#10958537 (10Arendpieter) 05Open→03Declined [13:13:32] 10Toolforge (Toolforge iteration 21): [lima-kilo,misctools] no arm64 version for mac-os based installations - https://phabricator.wikimedia.org/T398016#10958547 (10dcaro) Yep, the issue is that misctools has some compiled binaries (`take`) that will need to be recompiled for each arch, maybe we can build that in... [13:14:54] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations: Remove feature to connect SUL account to Striker (read from Bitu instead) - https://phabricator.wikimedia.org/T371595#10958555 (10Arendpieter) >>! In T388498#10629171, @bd808 wrote: >> It is better to have one place where Developer... [13:16:11] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations, 10Phabricator: Inconsistent mapping of Developer accounts and SUL accounts across Phabricator, Bitu, and Striker - https://phabricator.wikimedia.org/T388498#10958558 (10Arendpieter) [13:18:50] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations, 10Phabricator: Inconsistent mapping of Developer accounts and SUL accounts across Phabricator, Bitu, and Striker - https://phabricator.wikimedia.org/T388498#10958568 (10Arendpieter) [13:23:50] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge, 13Patch-For-Review: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-06-30 - https://phabricator.wikimedia.org/T398170#10958576 (10fnegri) Currently copying the snapshot of tools-db-4 to the new host tools-db-6. Snapshot was taken with GTID `2886729... [13:30:35] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations: Remove feature to connect SUL account to Striker (read from Bitu instead) - https://phabricator.wikimedia.org/T371595#10958590 (10Bugreporter) 05Declined→03Open Reopen - what I expected is if the user already has a Wikimedia accoun... [13:32:59] 06cloud-services-team, 10Striker, 10Bitu, 06Infrastructure-Foundations: Remove feature to connect SUL account to Striker (read from Bitu instead) - https://phabricator.wikimedia.org/T371595#10958600 (10taavi) →14Duplicate dup:03T148048 [13:33:00] 06cloud-services-team, 10Striker, 06Infrastructure-Foundations, 07LDAP, 13Patch-For-Review: Store Wikimedia unified account name (SUL) in LDAP directory - https://phabricator.wikimedia.org/T148048#10958602 (10taavi) [13:37:10] 10Cloud-VPS (Project-requests): Request creation of torrents VPS project - https://phabricator.wikimedia.org/T397861#10958611 (10Andrew) This sounds mostly fine, although I have a couple of concerns: 1) Can you pick a more specific name that specifies what is being torrented? Like maybe dumptorrents or similar?... [13:41:37] 06cloud-services-team, 10Toolforge: toolforge: rework toollabs debian package (misctools) - https://phabricator.wikimedia.org/T207968#10958629 (10taavi) [13:42:41] 10VPS-project-Phabricator, 06collaboration-services: User "Recent Activity" feeds don't display anything on the Phabricator test instance - https://phabricator.wikimedia.org/T397626#10958631 (10Aklapper) May need a `./bin/phd restart` [13:43:26] (03PS1) 10David Caro: Move to gitlab [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 [13:43:43] (03CR) 10CI reject: [V:04-1] Move to gitlab [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 (owner: 10David Caro) [13:45:11] 06cloud-services-team, 10Toolforge: Migrate misctools package to GitLab - https://phabricator.wikimedia.org/T398202 (10taavi) 03NEW [13:45:17] 06cloud-services-team, 10Toolforge: Migrate misctools package to GitLab - https://phabricator.wikimedia.org/T398202#10958661 (10taavi) p:05Triage→03Low [13:45:37] 06cloud-services-team, 10Toolforge: toolforge: rework toollabs debian package (misctools) - https://phabricator.wikimedia.org/T207968#10958666 (10taavi) 05Open→03Resolved I think most of this has been done over the years, and {T398202} tracks the rest of the standardization here. [13:45:52] 06cloud-services-team: debian packaging: create common guidelines and workflow - https://phabricator.wikimedia.org/T212291#10958672 (10taavi) [13:45:55] 06cloud-services-team, 10Toolforge: toolforge: rework toollabs debian package (misctools) - https://phabricator.wikimedia.org/T207968#10958673 (10taavi) [13:46:16] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69, tools-k8s-worker-nfs-70 [13:48:18] 06cloud-services-team: debian packaging: create common guidelines and workflow - https://phabricator.wikimedia.org/T212291#10958684 (10taavi) Minor status update here: Practically all of the #Toolforge packages have been migrated to a GitLab and a single CI pipeline in https://gitlab.wikimedia.org/repos/cloud/ci... [13:58:07] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69, tools-k8s-worker-nfs-70 [13:58:32] (03PS2) 10David Caro: Move to gitlab [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 (https://phabricator.wikimedia.org/T398202) [13:58:48] (03CR) 10CI reject: [V:04-1] Move to gitlab [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 (https://phabricator.wikimedia.org/T398202) (owner: 10David Caro) [14:03:08] (03approved) 10fnegri: puppet_prefix: Default roles to an empty list [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/7 (owner: 10taavi) [14:03:35] !log andrew@cloudcumin1001 catalyst START - Cookbook wmcs.openstack.quota_increase (T397716) [14:03:39] T397716: Quota increase required for Catalyst - https://phabricator.wikimedia.org/T397716 [14:03:42] !log andrew@cloudcumin1001 catalyst END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T397716) [14:04:05] !log andrew@cloudcumin1001 catalyst START - Cookbook wmcs.openstack.quota_increase (T397716) [14:04:11] !log andrew@cloudcumin1001 catalyst END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T397716) [14:04:39] !log andrew@cloudcumin1001 catalyst START - Cookbook wmcs.openstack.quota_increase (T397716) [14:04:46] !log andrew@cloudcumin1001 catalyst END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T397716) [14:05:28] 06cloud-services-team, 10Cloud-VPS (Quota-requests): Quota increase required for Catalyst - https://phabricator.wikimedia.org/T397716#10958733 (10Andrew) 05Open→03Resolved a:03Andrew [14:05:46] 06cloud-services-team, 10Cloud-VPS: Keystone not cleaning up ldap groups on project delete - https://phabricator.wikimedia.org/T397648#10958742 (10Andrew) p:05Triage→03High [14:06:16] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: cloudcephosd200[567]-dev service implementation - https://phabricator.wikimedia.org/T397237#10958743 (10Andrew) p:05High→03Medium [14:06:55] 06cloud-services-team, 10Cloud-VPS: Secrets management on cloud-vps - https://phabricator.wikimedia.org/T283032#10958744 (10Andrew) p:05High→03Medium [14:08:17] (03approved) 10fnegri: puppet_prefix: Save details directly after creation [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/6 (owner: 10taavi) [14:08:19] 10wikitech.wikimedia.org, 10MW-on-K8s, 06serviceops: Cleanup: Wikitech code leftovers - https://phabricator.wikimedia.org/T371378#10958747 (10Arendpieter) 05Open→03Resolved [14:09:08] 06cloud-services-team, 10Striker, 06Infrastructure-Foundations, 07LDAP, 13Patch-For-Review: Store Wikimedia unified account name (SUL) in LDAP directory - https://phabricator.wikimedia.org/T148048#10958759 (10Arendpieter) [14:09:10] (03merge) 10taavi: puppet_prefix: Save details directly after creation [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/6 [14:09:12] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10958760 (10Arendpieter) [14:09:13] (03update) 10taavi: puppet_prefix: Default roles to an empty list [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/7 [14:09:44] (03approved) 10dcaro: puppet: Use omitempty for optional attributes [repos/cloud/cloud-vps/go-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/go-cloudvps/-/merge_requests/5 (https://phabricator.wikimedia.org/T398117) (owner: 10taavi) [14:10:05] (03merge) 10taavi: puppet: Use omitempty for optional attributes [repos/cloud/cloud-vps/go-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/go-cloudvps/-/merge_requests/5 (https://phabricator.wikimedia.org/T398117) [14:14:49] (03merge) 10taavi: puppet_prefix: Default roles to an empty list [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/7 [14:15:17] (03open) 10taavi: Upgrade go-cloudvps to v0.3.1 [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/8 (https://phabricator.wikimedia.org/T398117) [14:15:21] (03update) 10taavi: Upgrade go-cloudvps to v0.3.1 [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/8 (https://phabricator.wikimedia.org/T398117) [14:15:30] (03update) 10taavi: Upgrade go-cloudvps to v0.3.1 [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/8 (https://phabricator.wikimedia.org/T398117) [14:15:34] (03update) 10taavi: Upgrade go-cloudvps to v0.3.1 [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/8 (https://phabricator.wikimedia.org/T398117) [14:17:29] (03approved) 10fnegri: Upgrade go-cloudvps to v0.3.1 [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/8 (https://phabricator.wikimedia.org/T398117) (owner: 10taavi) [14:17:49] (03merge) 10taavi: Upgrade go-cloudvps to v0.3.1 [repos/cloud/cloud-vps/terraform-cloudvps] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps/-/merge_requests/8 (https://phabricator.wikimedia.org/T398117) [14:22:28] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-db-6 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [14:23:05] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [14:23:50] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Creation of Hiera Puppet Prefix via OpenTofu fails - https://phabricator.wikimedia.org/T398117#10958833 (10taavi) 05Open→03Resolved I published v0.3.1 of the provider with this fix included. [14:38:05] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-69 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce [14:38:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-70 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [14:43:35] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-70 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [14:43:46] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10958915 (10Arendpieter) [14:53:57] (03CR) 10Majavah: [C:04-1] "Unfortunately PAWS hardcodes a path to this repo: https://github.com/toolforge/paws/blob/9c42a38a368b3ad460fd122e60851c67335c3ed2/images/s" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 (https://phabricator.wikimedia.org/T398202) (owner: 10David Caro) [14:55:01] (03CR) 10David Caro: "Great catch! How did you find it?" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 (https://phabricator.wikimedia.org/T398202) (owner: 10David Caro) [14:55:26] (03open) 10raymond-ndibe: [lima-kilo.toolforge] ensure use of amd64 arch [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/249 (https://phabricator.wikimedia.org/T398016) [14:58:55] (03update) 10dcaro: logging: Deploy remaining Loki buckets [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/55 (https://phabricator.wikimedia.org/T386480 https://phabricator.wikimedia.org/T97861) (owner: 10taavi) [15:02:04] (03CR) 10Majavah: [C:04-1] "I searched for `toollabs` on https://codesearch.wmcloud.org. That didn't actually find the PAWS repo (which we may want to fix separately)" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 (https://phabricator.wikimedia.org/T398202) (owner: 10David Caro) [15:08:27] 06cloud-services-team, 10Cloud-VPS, 10Bitu, 06Infrastructure-Foundations: developer service accounts and email - https://phabricator.wikimedia.org/T398074#10959042 (10bd808) >>! In T398074#10955669, @bd808 wrote: > ** for example https://ldap.toolforge.org/user/betadevopsbot uses `bdavis+betadevopsbot@wiki... [15:15:25] 06cloud-services-team, 10Cloud-VPS, 10Bitu, 06Infrastructure-Foundations: developer service accounts and email - https://phabricator.wikimedia.org/T398074#10959113 (10taavi) Is not having an email at all an option? [15:21:17] david-caro opened https://github.com/toolforge/paws/pull/489 [15:21:31] (03CR) 10David Caro: "This would be the fix https://github.com/toolforge/paws/pull/489" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 (https://phabricator.wikimedia.org/T398202) (owner: 10David Caro) [15:24:41] 06cloud-services-team, 10Cloud-VPS, 10Bitu, 06Infrastructure-Foundations: developer service accounts and email - https://phabricator.wikimedia.org/T398074#10959158 (10bd808) >>! In T398074#10959113, @taavi wrote: > Is not having an email at all an option? Technically for OpenStack, yes. For Developer acco... [15:25:20] (03PS1) 10David Caro: repos: add paws repository [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1165049 [15:31:49] 10Cloud-VPS (Project-requests): Request creation of torrents VPS project - https://phabricator.wikimedia.org/T397861#10959279 (10dcaro) +1 (pending an Andrew's questions) [15:33:58] (03approved) 10dcaro: [lima-kilo.toolforge] ensure use of amd64 arch [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/249 (https://phabricator.wikimedia.org/T398016) (owner: 10raymond-ndibe) [15:38:33] 06cloud-services-team, 10Cloud-VPS: Add check for cloud-wide root keys to the offboarding script - https://phabricator.wikimedia.org/T398214 (10taavi) 03NEW [15:39:17] (03CR) 10David Caro: "Adding the paws gh repo to codesearch: https://gerrit.wikimedia.org/r/c/labs/codesearch/+/1165049" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/1165027 (https://phabricator.wikimedia.org/T398202) (owner: 10David Caro) [15:40:08] (03PS1) 10Majavah: Remove root keys for former staff [labs/private] - 10https://gerrit.wikimedia.org/r/1165052 [15:41:28] 06cloud-services-team, 10Cloud-VPS: Add check for cloud-wide root keys to the offboarding script - https://phabricator.wikimedia.org/T398214#10959328 (10taavi) [15:41:33] 06cloud-services-team, 10Cloud-VPS, 07Security: Move cloud-wide root keys to the main puppet repo - https://phabricator.wikimedia.org/T317362#10959329 (10taavi) [15:43:15] 06cloud-services-team: Improve WMCS offboarding process - https://phabricator.wikimedia.org/T398215 (10taavi) 03NEW [15:43:32] 06cloud-services-team: Improve WMCS offboarding process - https://phabricator.wikimedia.org/T398215#10959353 (10taavi) [15:43:34] 06cloud-services-team, 10Cloud-VPS: Add check for cloud-wide root keys to the offboarding script - https://phabricator.wikimedia.org/T398214#10959352 (10taavi) [15:52:30] 06cloud-services-team: Sync WMCS GitLab group membership from LDAP - https://phabricator.wikimedia.org/T398217 (10taavi) 03NEW [15:52:35] 06cloud-services-team, 10DNS, 06Infrastructure-Foundations, 10netbox, and 2 others: Cloud: define relationship between wikimediacloud.org domain, CIDR prefixes and netbox automation - https://phabricator.wikimedia.org/T266331#10959423 (10ayounsi) 05Open→03Declined Closing for now, please reopen if... [15:52:56] 06cloud-services-team, 10Cloud-VPS, 10Toolforge, 10GitLab (Auth & Access): Sync WMCS GitLab group membership from LDAP - https://phabricator.wikimedia.org/T398217#10959425 (10taavi) [16:10:49] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21), 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, and 2 others: [Hypothesis] WE6.3.10 start a beta for the push-to-deploy features - https://phabricator.wikimedia.org/T393564#10959590 (10dcaro) Beta announced ht... [16:11:44] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 21), 07Epic: [components-api] First iteration of the component API - https://phabricator.wikimedia.org/T362051#10959595 (10dcaro) 05In progress→03Resolved This task is not useful anymore, I'll close (the subtasks are still us... [16:14:05] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge, 05Cloud-Services-Origin-Team, and 3 others: [Epic,builds-api,components-api,webservice,jobs-api] Make Toolforge a proper platform as a service with push-to-deploy and build... - https://phabricator.wikimedia.org/T194332#10959617 [16:19:06] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [toolforge,infra] Centralized logging for Toolforge infrastructure logs - https://phabricator.wikimedia.org/T97861#10959645 (10taavi) I agree that in an ideal world with infinite engineering resources, it would be great to have some system to collect al... [16:52:49] FIRING: DiskSpace: Disk space cloudbackup1004:9100:/srv 4.916% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [16:55:58] (03update) 10dcaro: [maintain-harbor] persist log [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/42 (https://phabricator.wikimedia.org/T383081) (owner: 10raymond-ndibe) [17:09:34] 06cloud-services-team, 10Cloud-VPS, 10Toolforge, 10GitLab (Auth & Access): Sync WMCS GitLab group membership from LDAP - https://phabricator.wikimedia.org/T398217#10959885 (10bd808) I think this would have to be done with a bot or runbook. LDAP based group membership management is a closed extension to the... [17:09:56] (03CR) 10FNegri: [C:03+1] Remove root keys for former staff [labs/private] - 10https://gerrit.wikimedia.org/r/1165052 (owner: 10Majavah) [17:19:38] 06cloud-services-team, 10Cloud-VPS, 10Toolforge, 10GitLab (Auth & Access): Sync WMCS GitLab group membership from LDAP - https://phabricator.wikimedia.org/T398217#10959948 (10thcipriani) FWIW, there are utilities that run in systemd (managed by puppet) to manage ldap -> gitlab group sync for a few groups:... [17:21:57] (03CR) 10Majavah: [V:03+2 C:03+2] Remove root keys for former staff [labs/private] - 10https://gerrit.wikimedia.org/r/1165052 (owner: 10Majavah) [17:25:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-39 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [17:52:25] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135#10960127 (10Nokib_Sarkar) >>! In T395135#10948050, @fnegri wrote: >> We have https://network-tests.toolforge.org/ > > TIL! That's perfect for testing. Example from my location (Milan,... [19:39:38] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/248 [19:39:56] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/248 [19:47:39] (03close) 10andrew: Add 'magnum' service project in codfw1dev [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/248 (https://phabricator.wikimedia.org/T393782) [20:05:05] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-14 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [20:37:34] RESOLVED: DiskSpace: Disk space cloudbackup1004:9100:/srv 5.959% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [22:41:43] 10Cloud-VPS (Project-requests): Request creation of wikidata-deleted VPS project - https://phabricator.wikimedia.org/T398254 (10Bovlb) 03NEW [22:46:14] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - string Wikitech not found on https://wikitech-static.wikimedia.org:443/wiki/Main_Page?debug=true - 331 bytes in 0.119 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [22:47:14] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29767 bytes in 0.211 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [22:50:33] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-14 [23:01:47] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-14 [23:30:05] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-14 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [23:32:07] (03open) 10addshore: openapi spec: ToolConfig-Output config_version is nullable [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/103 [23:35:16] (03open) 10addshore: openapi spec: Add servers [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/104 [23:50:05] RESOLVED: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-14 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce