[00:06:12] (03PS1) 10Andrea Denisse: alert: Add slack_bot_token Bug: T401730 [labs/private] - 10https://gerrit.wikimedia.org/r/1184613 (https://phabricator.wikimedia.org/T401730) [00:17:13] (03CR) 10Andrea Denisse: [V:03+2 C:03+2] alert: Add slack_bot_token Bug: T401730 [labs/private] - 10https://gerrit.wikimedia.org/r/1184613 (https://phabricator.wikimedia.org/T401730) (owner: 10Andrea Denisse) [01:33:08] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [01:34:41] (03update) 10raymond-ndibe: k8s: upgrade to 1.30 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/241 (https://phabricator.wikimedia.org/T362869) (owner: 10dcaro) [01:34:42] (03approved) 10raymond-ndibe: k8s: upgrade to 1.30 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/241 (https://phabricator.wikimedia.org/T362869) (owner: 10dcaro) [01:36:20] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [01:37:28] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [01:44:12] (03open) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [01:47:09] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [01:53:25] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [01:59:33] 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress): Quote increase request for wikitextexp - https://phabricator.wikimedia.org/T403114#11146363 (10ssastry) Even after I deleted the 1.5TB volume (parsing), looks like I cannot create a new volume larger than 499 GB. Now, this needs fix... [02:04:46] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [02:09:50] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [02:13:40] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [02:14:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [02:17:17] 10Tool-gawa: [Code] Conception de la page STATISTIQUES - https://phabricator.wikimedia.org/T401767#11146377 (10PenScribe) Pour le TAF de conception (code) de La page de statistiques les 4 fonctionnalités voulues sont deja terminées : ✅ Des graphes ✅ Une fonctionnalité permettant d'afficher les statistiques... [02:27:31] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [02:27:52] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [04:04:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [04:04:27] 06cloud-services-team: HighIOWaitStalling High iowait detected on clouddumps1002:9100. - https://phabricator.wikimedia.org/T403684 (10phaultfinder) 03NEW [04:09:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [04:09:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [06:34:40] (03CR) 10Sebastian Berlin (WMSE): "I'm no longer involved in the development and I can't do a full review." [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1181245 (https://phabricator.wikimedia.org/T316197) (owner: 10Rehan_khan_78) [07:25:13] !log filippo@cloudcumin1001 wikitextexp START - Cookbook wmcs.openstack.quota_increase (T403114) [07:25:16] T403114: Quote increase request for wikitextexp - https://phabricator.wikimedia.org/T403114 [07:25:20] !log filippo@cloudcumin1001 wikitextexp END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T403114) [07:26:23] 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress): Quote increase request for wikitextexp - https://phabricator.wikimedia.org/T403114#11146616 (10fgiunchedi) >>! In T403114#11145839, @bd808 wrote: > Hmmm... still getting this error: > `counterexample > Error: Unable to create volum... [08:01:49] FIRING: ObjectStorageSizeQuotaFull: Object storage quota by 'size' is 80% full for project toolsbeta - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageSizeQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageSizeQuotaFull [08:03:48] (03merge) 10dcaro: pre-commit: add golangci-lint check [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/63 [08:04:26] (03merge) 10dcaro: gitlab: move to bookworm images [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/86 [08:04:35] (03merge) 10dcaro: gitlab: move to bookworm images [repos/cloud/toolforge/image-config] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/15 [08:06:26] 06cloud-services-team: HighIOWaitStalling High iowait detected on clouddumps1002:9100. - https://phabricator.wikimedia.org/T403684#11146672 (10dcaro) 05Open→03Resolved a:03dcaro [08:07:02] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: image-config: bump to 0.0.22-20250904080448-f3620fc9 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/953 [08:09:15] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: envvars-api: bump to 0.0.74-20250904080401-f39b2dc6 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/954 [08:30:01] 10Tools: Migrate Incubator Dashboard to public Superset + improvements - https://phabricator.wikimedia.org/T392813#11146758 (10KCVelaga_WMF) 05Open→03Resolved [09:14:57] 06cloud-services-team, 10Cloud-VPS, 10Toolforge, 13Patch-For-Review: Enable SSL in Trove MariaDB - Trixie MariaDB client requires SSL but SSL is not enabled in the Trove server - https://phabricator.wikimedia.org/T401861#11146978 (10dcaro) Thanks @JJMC89 for the fix, that default has been set now on all th... [09:34:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-redis-5 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:36:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-proxy-8 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:36:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance mx-out06 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:39:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance tools-puppetdb-2 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:41:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-proxy-8 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:41:28] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-idp-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:43:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance runner-1032 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:44:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance tools-elastic-5 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:46:28] FIRING: [3x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-proxy-8 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:46:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-idp-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:48:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance runner-1032 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:49:28] FIRING: [8x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:51:28] FIRING: [9x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:52:51] FIRING: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:53:28] FIRING: [6x] PuppetAgentNoResources: No Puppet resources found on instance runner-1031 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:54:28] FIRING: [11x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:56:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-proxy-7 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:57:51] RESOLVED: ProbeDown: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:58:28] FIRING: PuppetAgentNoResources: No Puppet resources found on instance proxy-6 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:58:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance runner-1031 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:59:28] FIRING: [15x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:01:28] FIRING: [9x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:01:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:03:28] FIRING: [2x] PuppetAgentNoResources: No Puppet resources found on instance proxy-5 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:03:28] FIRING: [7x] PuppetAgentNoResources: No Puppet resources found on instance runner-1031 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:06:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:06:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:08:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance project-proxy-acme-chief-02 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:08:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance runner-1031 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:09:28] FIRING: [15x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:11:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:11:43] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:13:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance runner-1031 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:14:28] FIRING: [15x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:16:13] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11147156 (10fgiunchedi) p:05Triage→03Medium [10:16:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:16:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-acme-chief-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:18:28] FIRING: [4x] PuppetAgentNoResources: No Puppet resources found on instance project-proxy-acme-chief-02 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:19:08] !log filippo@cloudcumin1001 wikilink START - Cookbook wmcs.openstack.quota_increase (T403609) [10:19:12] T403609: wikilink: Increase cpu + ram + disk quotas - https://phabricator.wikimedia.org/T403609 [10:19:15] !log filippo@cloudcumin1001 wikilink END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T403609) [10:19:28] FIRING: [15x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-3 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:20:43] 10Cloud-VPS (Quota-requests), 06Moderator-Tools-Team, 10Wikilink-Tool: wikilink: Increase cpu + ram + disk quotas - https://phabricator.wikimedia.org/T403609#11147168 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Done [10:21:28] FIRING: [10x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:23:28] RESOLVED: [4x] PuppetAgentNoResources: No Puppet resources found on instance project-proxy-acme-chief-02 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:23:28] RESOLVED: [8x] PuppetAgentNoResources: No Puppet resources found on instance runner-1031 on project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:24:28] FIRING: [14x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-4 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:26:28] FIRING: [9x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:26:28] FIRING: [6x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-db03 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:29:28] RESOLVED: [10x] PuppetAgentNoResources: No Puppet resources found on instance tools-acme-chief-4 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:31:28] RESOLVED: [6x] PuppetAgentNoResources: No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:31:28] FIRING: [6x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-db03 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:34:07] 10Tool-digitalt2commons, 10WMNO-General: Digitalt2Commons: Investigate possibility for adding a button to the tool in Digitalt Museum proper - https://phabricator.wikimedia.org/T403708 (10jhsoby-WMNO) 03NEW [10:36:28] RESOLVED: [5x] PuppetAgentNoResources: No Puppet resources found on instance cloudinfra-db03 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:44:58] 10Tool-Global-user-contributions: Replace rc_type with rc_source - https://phabricator.wikimedia.org/T403710 (10MarcoAurelio) 03NEW [11:29:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:34:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-5 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [12:02:04] FIRING: ObjectStorageSizeQuotaFull: Object storage quota by 'size' is 80.03% full for project toolsbeta - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageSizeQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageSizeQuotaFull [12:15:08] 06cloud-services-team: HighIOWaitStalling High iowait detected on clouddumps1002:9100. - https://phabricator.wikimedia.org/T403684#11147490 (10fgiunchedi) For the record, after discussion on IRC we decided to nuke the alert in https://gerrit.wikimedia.org/r/c/operations/alerts/+/1184715 [12:16:43] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [12:17:07] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [12:23:58] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [12:24:14] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [12:24:55] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/25 [12:24:55] (03open) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/14 [12:32:59] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [13:14:19] RESOLVED: ObjectStorageSizeQuotaFull: Object storage quota by 'size' is 80.03% full for project toolsbeta - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/ObjectStorageSizeQuotaFull - https://grafana.wikimedia.org/d/7120b794-4638-49f5-bccd-9716efc60f24/wmcs-object-storage-quotas - https://alerts.wikimedia.org/?q=alertname%3DObjectStorageSizeQuotaFull [13:17:19] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11147793 (10fnegri) It's slightly more complicated, as it does sometimes complete before the 3 hours. This is the number of times it was killed per day in the last... [13:33:53] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11147900 (10dschwen) Ok, I have set this to run once a day, and I think I can split this up into many small queries, by querying ranges of page IDs. Would that help? [13:38:08] 10Tool-Global-user-contributions: Replace rc_type with rc_source - https://phabricator.wikimedia.org/T403710#11147918 (10MarcoAurelio) [13:41:51] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [13:48:35] 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress): Quote increase request for wikitextexp - https://phabricator.wikimedia.org/T403114#11147974 (10ssastry) Worked! Thanks! [13:58:05] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148018 (10fnegri) Yes, that would help thank you! The other thing that would help is if you could make sure that a new query is not started if the previous one i... [13:58:22] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148022 (10dschwen) It's done. Let's see if that helps [13:59:38] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148026 (10dschwen) Yes, they are running on a wmcloud VM. With spacing them out to run daily there should not be any overlap, but I can add a lock to my script to... [13:59:48] 06cloud-services-team, 10Cloud-VPS: systemctl restart keystone doesn't actually restart keystone sometimes - https://phabricator.wikimedia.org/T340127#11148030 (10Andrew) 05Invalid→03Open Unfortunately this is still true, anytime I want a proper config change with keystone I have to manually kill things. [14:00:20] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [14:01:06] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148036 (10dschwen) The new query template is ` sql = ( "SELECT /* SLOW_OK */ cl_from, page_id, " "CAST(cl_type AS CHAR) AS cl_type "... [14:08:36] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [14:15:18] (03PS2) 10MarcoAurelio: WIP: Replace rc_type with rc_source [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/1184777 (https://phabricator.wikimedia.org/T403710) [14:16:03] (03PS3) 10MarcoAurelio: Replace rc_type with rc_source [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/1184777 (https://phabricator.wikimedia.org/T403710) [14:18:29] (03CR) 10MarcoAurelio: "Patch needs review/testing. Thanks." [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/1184777 (https://phabricator.wikimedia.org/T403710) (owner: 10MarcoAurelio) [14:18:32] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148124 (10dschwen) ChatGPT suggests ` SELECT /* SLOW_OK */ cl.cl_from, p.page_id, cl.cl_type FROM page AS p STRAIGHT_JOIN categorylinks AS cl ON p.page_n... [14:20:56] 10Tool-Global-user-contributions, 13Patch-For-Review: Replace rc_type with rc_source - https://phabricator.wikimedia.org/T403710#11148136 (10MarcoAurelio) [14:36:28] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [14:36:52] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [14:39:29] 06cloud-services-team, 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148266 (10fnegri) > With spacing them out to run daily there should not be any overlap Yes that should be fine without a lock, if you split into smaller more fre... [14:40:08] 06cloud-services-team, 10Toolforge: [jobs-api] Allow configuring health check timeout - https://phabricator.wikimedia.org/T403733 (10DamianZaremba) 03NEW [14:40:08] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [14:40:19] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [14:48:22] 10cloud-services-team (FY2025/26-Q1), 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148293 (10fnegri) a:03fnegri [14:48:25] 10cloud-services-team (FY2025/26-Q1), 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148295 (10fnegri) 05Open→03In progress [14:48:38] 06cloud-services-team, 10Toolforge: [jobs-api] Allow configuring health check timeout - https://phabricator.wikimedia.org/T403733#11148296 (10DamianZaremba) As a data point, this is also happening on ClueBot NG which is more impactful ` Events: Type Reason Age From Message... [14:48:56] 06cloud-services-team, 10Toolforge: [jobs-api] inconsistent command modification for continuous job - https://phabricator.wikimedia.org/T403735 (10DamianZaremba) 03NEW [15:05:28] 10cloud-services-team (FY2025/26-Q1), 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11148410 (10dschwen) > Can I ask you the VM name? I was trying to reverse-engineer it but it's hard because there's a proxy in between :) Oh yes, of... [15:11:11] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/25 (owner: 10l10n-bot) [15:11:23] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/ranker] - 10https://gitlab.wikimedia.org/toolforge-repos/ranker/-/merge_requests/25 (owner: 10l10n-bot) [15:13:11] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/14 (owner: 10l10n-bot) [15:14:54] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [15:15:16] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [15:16:03] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/lexeme-forms] - 10https://gitlab.wikimedia.org/toolforge-repos/lexeme-forms/-/merge_requests/14 (owner: 10l10n-bot) [15:18:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:20:46] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [15:21:03] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [15:22:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:33:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:38:13] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [15:39:18] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [15:42:28] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [15:47:39] 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress): Quote increase request for wikitextexp - https://phabricator.wikimedia.org/T403114#11148660 (10ssastry) >>! In T403114#11128863, @dcaro wrote: > +1 > > For the disk, it's not possible to shrink an existing volume, so the process he... [15:54:14] 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress): Quote increase request for wikitextexp - https://phabricator.wikimedia.org/T403114#11148684 (10fnegri) a:05fgiunchedi→03fnegri Claiming as I'm now on clinic duty. I'll lower the quota. [15:54:19] 10Cloud-VPS (Quota-requests), 10Content-Transform-Team (Work In Progress): Quote increase request for wikitextexp - https://phabricator.wikimedia.org/T403114#11148686 (10fnegri) 05Resolved→03In progress [16:20:20] 06cloud-services-team, 10Toolforge: [components-api] restart rather than delete/create continuous jobs - https://phabricator.wikimedia.org/T403321#11148800 (10DamianZaremba) Another side effect of this I just found; ` tools.cluebotng-review@tools-bastion-13:~$ toolforge jobs list +-----------------------------... [16:26:29] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [16:27:15] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [16:32:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T395910) [16:32:11] T395910: cloudcephosd10[48-52] service implementation - https://phabricator.wikimedia.org/T395910 [16:35:25] PROBLEM - Host cloudcephosd1049 is DOWN: PING CRITICAL - Packet loss = 100% [16:35:53] RECOVERY - Host cloudcephosd1049 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [16:36:08] 10Cloud-VPS (Debian Bullseye Deprecation), 06The-Wikipedia-Library, 07Epic, 10Moderator-Tools-Team (Kanban): hashtags: Replace deprecated Bullseye VM in Cloud VPS - https://phabricator.wikimedia.org/T402056#11148897 (10Kgraessle) 05In progress→03Resolved [16:36:16] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T395910) [16:36:29] 06cloud-services-team, 10Toolforge: [jobs-api] Allow customizing time to request Loki logs for - https://phabricator.wikimedia.org/T400917#11148900 (10dcaro) p:05Medium→03High [16:36:44] 06cloud-services-team, 10Toolforge: [components-api] restart rather than delete/create continuous jobs - https://phabricator.wikimedia.org/T403321#11148902 (10dcaro) Yep, the time limiting is a big issue, I'll reprioritize. [16:39:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:39:41] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [16:40:08] (03update) 10raymond-ndibe: [DO NOT MERGE] testing gitlab ci changes [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/210 [16:41:10] (03update) 10raymond-ndibe: [helm_publish, image_publish]: skip harbor helm and image publish if PR owner not repo owner [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/61 (https://phabricator.wikimedia.org/T394595) [16:44:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:51:14] (03PS4) 10Rehan_khan_78: Split main.css into multiple CSS files by page and update templates [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1181245 [16:52:38] 10Tools: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11148993 (10Reedy) [16:55:57] 10Toolforge (Toolforge iteration 24): jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760 (10DamianZaremba) 03NEW [16:56:14] 06cloud-services-team, 10Toolforge: jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149041 (10DamianZaremba) [16:56:25] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149044 (10DamianZaremba) [16:57:38] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149053 (10DamianZaremba) [16:58:44] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149062 (10dcaro) The slowness might be related to it recreating the jobs, it has to wait for the pods to go away. This might be a regression, looking [16:58:46] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149064 (10DamianZaremba) https://github.com/cluebotng/reviewer/commit/897c1c7a0713b0f68b4fdb269fde90dd76995e96 & https://github.com/cluebotng/utilities/compare/350f39b9e59bfd1758c91... [17:00:16] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149096 (10dcaro) We added 'disable-ssl' to the default config for mariadb/mysql cli, as that was causing issues when connecting to wikireplicas and toolsdb. Let me manually change... [17:00:59] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149111 (10dcaro) Btw. we have tests for this, so if it's a regression the tests have a bug, but still, worth checking. What's in your jobs.yaml? [17:02:53] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149136 (10dcaro) @DamianZaremba Changed, can you verify that now it works? [17:03:56] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149140 (10dcaro) Wait, is this hapenning now for you? or it happened before yesterday (when I did the changes in the default config) [17:05:25] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149144 (10dcaro) (added the `disable-ssl` to the `replica.my.cnf` file for you to test with the new config) [17:05:26] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149145 (10DamianZaremba) > What's in your jobs.yaml? https://github.com/cluebotng/utilities/blob/main/jobs/cluebotng.yaml or https://github.com/cluebot... [17:07:15] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149149 (10dcaro) It does not happen for every job, looking: ` tools.wm-lol@tools-bastion-13:~$ toolforge jobs load myjobs.yaml INFO: loading job 'logrot... [17:07:49] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149155 (10bd808) https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/1182805 was the change to the mariadb base image that would have caused this behavioral... [17:10:30] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149174 (10DamianZaremba) There is no change in the runtime config: ` tools.cluebotng-review@tools-bastion-13:~$ kubectl get deployment redis -oyaml > pre... [17:12:49] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149179 (10DamianZaremba) There is no change in the runtime config: ` tools.cluebotng-review@tools-bastion-13:~$ sha256sum jobs.yaml; toolforge jobs load... [17:14:55] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149186 (10dcaro) @DamianZaremba I think I have the fix, the issue is that the cpu and memory defaults are not being stripped of correctly. Sending a patch. [17:16:12] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149189 (10DamianZaremba) Entry in `jobs.yaml`: ` - name: irc-relay image: tool-cluebotng-review/irc-relay:latest command: run-relay continuous: tru... [17:17:01] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149192 (10DamianZaremba) > @DamianZaremba I think I have the fix, the issue is that the cpu and memory defaults are not being stripped of correctly. Send... [17:19:30] (03open) 10dcaro: get_job_from_k8s: use the correct cpu and mem defaults [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/211 [17:20:37] (03open) 10dcaro: jobs: add test to check manually created jobs don't get overwritten [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/955 [17:24:08] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149234 (10DamianZaremba) > Wait, is this hapenning now for you? or it happened before yesterday (when I did the changes in the default config) It looks like this did get fixed for... [17:24:41] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149236 (10DamianZaremba) > solve general usage (When NFS is mounted and not using envvars) [17:28:48] (03update) 10dcaro: get_job_from_k8s: use the correct cpu and mem defaults [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/211 [17:29:55] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149261 (10dcaro) > (When NFS is mounted and not using envvars) Unfortunately yes, that's why the task is still open [17:30:27] 06cloud-services-team, 10Toolforge: `mariadb` image changed behaviour around Aug 28 - https://phabricator.wikimedia.org/T403755#11149267 (10dcaro) →14Duplicate dup:03T401861 [17:30:34] 06cloud-services-team, 10Cloud-VPS, 10Toolforge: Enable SSL in Trove MariaDB - Trixie MariaDB client requires SSL but SSL is not enabled in the Trove server - https://phabricator.wikimedia.org/T401861#11149269 (10dcaro) [17:53:00] 10Tool-gawa: [Code] Conception de la page STATISTIQUES - https://phabricator.wikimedia.org/T401767#11149371 (10PenScribe) 05In progress→03Resolved [18:04:06] (03update) 10dcaro: get_job_from_k8s: use the correct cpu and mem defaults [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/211 [18:07:08] (03update) 10dcaro: get_job_from_k8s: use the correct cpu and mem defaults [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/211 [18:09:38] (03open) 10dcaro: toolforge_deploy_mr: also wait when pipeline is creating [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/272 [18:38:49] (03approved) 10dcaro: jobs: add test to check manually created jobs don't get overwritten [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/955 [18:38:52] (03merge) 10dcaro: jobs: add test to check manually created jobs don't get overwritten [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/955 [18:38:54] (03approved) 10dcaro: get_job_from_k8s: use the correct cpu and mem defaults [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/211 [18:39:13] (03merge) 10dcaro: get_job_from_k8s: use the correct cpu and mem defaults [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/211 [18:41:32] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.412-20250904183922-18515223 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/956 (https://phabricator.wikimedia.org/T403760) [18:44:00] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [18:54:20] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [18:58:48] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [18:59:22] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [18:59:39] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [19:06:26] (03open) 10perry: help: use correct repo link [toolforge-repos/jouncebot] - 10https://gitlab.wikimedia.org/toolforge-repos/jouncebot/-/merge_requests/6 [19:15:02] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [19:16:16] (03approved) 10dcaro: jobs-api: bump to 0.0.412-20250904183922-18515223 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/956 (https://phabricator.wikimedia.org/T403760) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [19:16:20] (03merge) 10dcaro: jobs-api: bump to 0.0.412-20250904183922-18515223 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/956 (https://phabricator.wikimedia.org/T403760) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [19:17:12] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149712 (10dcaro) @DamianZaremba So this should be fixed now, can you test? (I added also the missing test to the functional tests f... [19:21:03] (03open) 10raymond-ndibe: [toolforge_deploy_mr.py] support deploy of MRs from external contributors [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/273 (https://phabricator.wikimedia.org/T394595) [19:22:11] (03update) 10raymond-ndibe: [toolforge_deploy_mr.py] support deploy of MRs from external contributors [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/273 (https://phabricator.wikimedia.org/T394595) [19:24:53] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [19:24:55] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149743 (10DamianZaremba) Unfortunately it still appears to be happening; ` tools.cluebotng-review@tools-bastion-13:~$ cat jobs.yam... [19:27:34] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149751 (10dcaro) hmm, I think the `filelog` might also not being parsed ok, looking [19:35:11] (03open) 10dcaro: get_job_from_k8s: remove correctly the default filelog [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/212 [19:50:25] 06cloud-services-team, 10Toolforge: [jobs-cli] jobs are being updated (deleted/created) when no changes present - https://phabricator.wikimedia.org/T403760#11149837 (10DamianZaremba) I'm out of time for now, but feel free to load the jobs on `cluebotng-review` as above for testing; it will cause the bot to dis... [20:41:15] 10PAWS: New upstream release for Wikimedia Commons Extension for OpenRefine - https://phabricator.wikimedia.org/T403780#11150002 (10LibUp-bot) [22:37:00] 10cloud-services-team (FY2025/26-Q1), 10Data-Services: [wikireplicas] slow query runs every hour, but never completes - https://phabricator.wikimedia.org/T403639#11150476 (10dschwen) A test run finished in just under 4h.