[00:01:08] (03open) 10group_203_bot_4866fc124f4b41659f667468a6115cf3: builds-builder: bump to 0.0.125-20250219235943-91ad05c3 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/671 (https://phabricator.wikimedia.org/T384327) [00:26:41] (03open) 10raymond-ndibe: [builds-builder] create and use maintain-harbor robot account [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/66 (https://phabricator.wikimedia.org/T361698) [00:29:24] (03update) 10raymond-ndibe: [builds-builder] create and use maintain-harbor robot account [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/66 (https://phabricator.wikimedia.org/T361698) [00:33:13] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10566153 (10Gryllida) Works now, was able to login with Wikimedia SUL account into Wikitech. Thank you. [00:35:29] 10Tools, 07Privacy: wmtran Tool Loads External Resource - https://phabricator.wikimedia.org/T333894#10566156 (10Gryllida) Hello, 1. This is my tool, why wasn't I assigned to this task immediately? 2. Can I use jQuery if I save a copy locally in toolforge? Thanks [01:02:39] (03update) 10raymond-ndibe: [builds-builder] create and use maintain-harbor robot account [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/66 (https://phabricator.wikimedia.org/T361698) [01:08:40] (03update) 10raymond-ndibe: [builds-builder] create and use maintain-harbor robot account [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/66 (https://phabricator.wikimedia.org/T361698) [01:12:58] (03open) 10raymond-ndibe: [toolforge-deploy] maintain-harbor use robot account [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/672 (https://phabricator.wikimedia.org/T361698) [02:03:10] (03update) 10raymond-ndibe: [toolforge-deploy] maintain-harbor use robot account [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/672 (https://phabricator.wikimedia.org/T361698) [02:08:58] (03update) 10raymond-ndibe: [toolforge-deploy] maintain-harbor use robot account [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/672 (https://phabricator.wikimedia.org/T361698) [02:44:29] (03open) 10raymond-ndibe: [builds-api] use maintain-harbor robot account locally [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361698) [03:12:48] (03update) 10raymond-ndibe: [builds-api] use maintain-harbor robot account locally [repos/cloud/toolforge/builds-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/119 (https://phabricator.wikimedia.org/T361698) [03:35:20] (03update) 10raymond-ndibe: [builds-builder] create and use maintain-harbor robot account [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/66 (https://phabricator.wikimedia.org/T361698) [08:53:00] 06cloud-services-team, 10Cloud-VPS: [monitoring] KernelErrors alerts trigger incorrectly when a host is reimaged - https://phabricator.wikimedia.org/T386850#10566876 (10aborrero) 05Open→03Resolved a:03aborrero [08:59:07] 06cloud-services-team, 10Cloud-VPS: [monitoring] KernelErrors alerts trigger incorrectly when a host is reimaged - https://phabricator.wikimedia.org/T386850#10566880 (10aborrero) 05Resolved→03In progress p:05Triage→03Medium Resolved by mistake, reopening now, and I'll work on a fix. [09:21:23] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10566934 (10dcaro) > This sounds good - provided it will not cause any functionality issues, can we give adding the special co... [09:34:56] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10566954 (10dcaro) This should kinda do it: ` MariaDB [prometheusconfig]> select * from projects as p join contact_groups as c... [09:37:21] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10566956 (10taavi) > Now, you won't be able to silence alerts unless I add an acl_group (or if you do it from production alert... [09:45:26] 06cloud-services-team: FQDN virt.cloudgw.eqiad1.wikimediacloud.org is missing - https://phabricator.wikimedia.org/T386907 (10aborrero) 03NEW [09:46:03] 06cloud-services-team: FQDN virt.cloudgw.eqiad1.wikimediacloud.org is missing - https://phabricator.wikimedia.org/T386907#10566972 (10aborrero) 05Open→03In progress p:05Triage→03High [09:49:29] 06cloud-services-team, 10Cloud-VPS: FQDN virt.cloudgw.eqiad1.wikimediacloud.org is missing - https://phabricator.wikimedia.org/T386907#10566977 (10taavi) [09:50:41] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10566978 (10dcaro) >>! In T386416#10566956, @taavi wrote: >> Now, you won't be able to silence alerts unless I add an acl_grou... [09:54:17] 06cloud-services-team, 10Cloud-VPS: FQDN virt.cloudgw.eqiad1.wikimediacloud.org is missing - https://phabricator.wikimedia.org/T386907#10566981 (10aborrero) Deleted by `sre_bot` https://netbox.wikimedia.org/extras/changelog/210721/ I tried re-creating: https://netbox.wikimedia.org/extras/changelog/210827/ [09:54:29] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10566982 (10dcaro) 05Open→03Resolved a:03dcaro I'll close this task, but please test (ex. creating a VM and stopping... [10:24:53] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: [monitoring] KernelErrors alerts trigger incorrectly when a host is reimaged - https://phabricator.wikimedia.org/T386850#10567046 (10fnegri) @aborrero thanks for fixing the "missing directory" problem. My patch above should get rid of the other false po... [10:25:57] 06cloud-services-team, 10Cloud-VPS: FQDN virt.cloudgw.eqiad1.wikimediacloud.org is missing - https://phabricator.wikimedia.org/T386907#10567047 (10aborrero) 05In progress→03Resolved a:03aborrero I think the FQDN was removed by either the reimage or the decomission worklows, and that is a bit concern... [10:32:04] 10Cloud Services Proposals, 06cloud-services-team, 10Cloud-VPS: Decision Request - How openstack projects relate to tofu-infra - https://phabricator.wikimedia.org/T385604#10567062 (10dcaro) I would like to see the 'TBD' parts filled up before expressing an opinion. Some ideas to fill them up: I think it wo... [10:59:42] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T386865#10567099 (10aborrero) 05Open→03Resolved a:03aborrero [11:24:37] (03update) 10aborrero: Draft: test new project module [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/93 (https://phabricator.wikimedia.org/T375283) (owner: 10fnegri) [11:27:21] 06cloud-services-team, 10Cloud-VPS: FQDN virt.cloudgw.eqiad1.wikimediacloud.org is missing - https://phabricator.wikimedia.org/T386907#10567149 (10Andrew) Here is the decom script deleting it: {F58433379} [11:28:11] (03update) 10aborrero: Draft: test new project module [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/93 (https://phabricator.wikimedia.org/T375283) (owner: 10fnegri) [11:38:11] (03update) 10aborrero: Draft: test new project module [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/93 (https://phabricator.wikimedia.org/T375283) (owner: 10fnegri) [12:00:51] (03open) 10aborrero: volumes: mount /etc/openstack/clouds.yaml [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/24 (https://phabricator.wikimedia.org/T379030) [12:03:03] (03CR) 10Andrew Bogott: Get openstack project list from keystone (031 comment) [labs/tools/stashbot] - 10https://gerrit.wikimedia.org/r/1093997 (https://phabricator.wikimedia.org/T379030) (owner: 10Andrew Bogott) [12:03:45] (03PS6) 10Andrew Bogott: Get openstack project list from keystone [labs/tools/stashbot] - 10https://gerrit.wikimedia.org/r/1093997 (https://phabricator.wikimedia.org/T379030) [12:08:39] 06cloud-services-team: KernelErrors Server cloudgw1001 logged kernel errors - https://phabricator.wikimedia.org/T386852#10567312 (10fnegri) 05Open→03Resolved a:03fnegri cloudgw1001 is being decommissioned in {T386810}. [12:09:41] (03open) 10aborrero: toolforge: introduce /etc/openstack/clouds.yaml [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/228 (https://phabricator.wikimedia.org/T379030) [12:17:10] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge, 13Patch-For-Review: [toolsdb] Remove apt pinning and upgrade to latest version - https://phabricator.wikimedia.org/T385885#10567335 (10fnegri) I haven't upgraded the primary yet. I will do the upgrade next Monday so I can give some advance notice to users. [12:23:24] 06cloud-services-team, 10Toolforge: [lima-kilo] when using "--ha", some containers are not restarting after restarting the VM - https://phabricator.wikimedia.org/T385082#10567360 (10fnegri) [12:23:25] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17): [infra,k8s] Upgrade Toolforge Kubernetes to version 1.29 - https://phabricator.wikimedia.org/T362868#10567361 (10fnegri) [12:49:07] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [k8s, infra] update pause image to 3.9 - https://phabricator.wikimedia.org/T374193#10567431 (10fnegri) 05In progress→03Resolved Verified that `tools-k8s-worker-nfs-55` (which was rebooted recently) is now using pause:3.9: ` fnegri@tools-k8s-... [12:50:50] (03open) 10aborrero: kyverno: add cluster-wide-policy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/673 (https://phabricator.wikimedia.org/T386921) [13:01:23] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245#10567485 (10fnegri) 05In progress→03Resolved I merged the patch above, and Puppet removed... [13:07:26] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 - https://phabricator.wikimedia.org/T362867#10567511 (10fnegri) [13:07:32] 14cloud-services-team (FY2024/2025-Q1-Q2), 14Toolforge (Toolforge iteration 15): [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641#10567512 (10fnegri) [13:08:18] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 - https://phabricator.wikimedia.org/T362867#10567513 (10fnegri) 05In progress→03Resolved We've been running 1.28 for a while, and all subtasks... [13:14:29] (03reopen) 10raymond-ndibe: jobs: add job for managing harbor quotas [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/22 (https://phabricator.wikimedia.org/T352417) (owner: 10sstefanova) [13:16:57] (03CR) 10Andrew Bogott: [C:04-1] "...or maybe not? The plan in shifting" [labs/tools/stashbot] - 10https://gerrit.wikimedia.org/r/1093997 (https://phabricator.wikimedia.org/T379030) (owner: 10Andrew Bogott) [13:18:13] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer (T320284) [13:18:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:18:18] T320284: [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components - https://phabricator.wikimedia.org/T320284 [13:26:35] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer (T320284) [13:26:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:26:40] T320284: [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components - https://phabricator.wikimedia.org/T320284 [13:27:26] (03update) 10aborrero: kyverno: add cluster-wide-policy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/673 (https://phabricator.wikimedia.org/T386921) [13:27:48] (03approved) 10dcaro: jobs-emailer: bump to 0.0.52-20250219170643-a14ae54d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/670 (https://phabricator.wikimedia.org/T320284 https://phabricator.wikimedia.org/T379924) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:27:53] (03merge) 10dcaro: jobs-emailer: bump to 0.0.52-20250219170643-a14ae54d [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/670 (https://phabricator.wikimedia.org/T320284 https://phabricator.wikimedia.org/T379924) (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:30:09] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [jobs-emailer] http requests are blocked by the loops - https://phabricator.wikimedia.org/T379924#10567603 (10dcaro) 05In progress→03Resolved [13:34:28] (03update) 10raymond-ndibe: jobs: add job for managing harbor quotas [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/22 (https://phabricator.wikimedia.org/T352417) (owner: 10sstefanova) [13:34:34] (03update) 10raymond-ndibe: jobs: add job for managing harbor quotas [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/22 (https://phabricator.wikimedia.org/T352417) (owner: 10sstefanova) [13:36:03] (03update) 10aborrero: kyverno: add cluster-wide-policy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/673 (https://phabricator.wikimedia.org/T386921) [13:54:39] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:59:39] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:03:24] 06cloud-services-team, 10Toolforge, 07Privacy: Make tools-static fontcdn/ and cdnjs/ redact UA - https://phabricator.wikimedia.org/T210959#10567752 (10stjn) Yeah, I also think that the proper way to deal with this is to pass the uniform user agent for all requests that would just output WOFF2 fonts and nothi... [14:09:36] (03open) 10dcaro: jobs-emailer: add basic alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/28 [14:15:13] (03update) 10dcaro: jobs-emailer: add basic alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/28 [14:28:30] (03update) 10dcaro: scheduled job: add timeout parameter [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/76 (https://phabricator.wikimedia.org/T306391) [14:45:14] (03update) 10dcaro: scheduled job: add timeout parameter [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/76 (https://phabricator.wikimedia.org/T306391) [15:15:39] (03update) 10dcaro: scheduled jobs: add timeout option [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/129 (https://phabricator.wikimedia.org/T306391) [15:26:44] (03update) 10dcaro: scheduled job: add timeout parameter [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/76 (https://phabricator.wikimedia.org/T306391) [15:27:55] (03open) 10aborrero: k9s: bump version to 0.40.5 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/229 [15:28:24] (03approved) 10aborrero: jobs-emailer: add basic alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/28 (owner: 10dcaro) [15:30:32] (03approved) 10dcaro: k9s: bump version to 0.40.5 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/229 (owner: 10aborrero) [15:30:59] (03merge) 10dcaro: jobs-emailer: add basic alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/28 [15:31:35] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components - https://phabricator.wikimedia.org/T320284#10568119 (10dcaro) [15:37:14] (03open) 10aborrero: ansible: update YAML output setting [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/230 [15:39:35] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components - https://phabricator.wikimedia.org/T320284#10568150 (10dcaro) Added them to the toolforge overview dashboard: {F58435136} [15:39:50] 06cloud-services-team, 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [jobs-api,jobs-emailer] Prometheus monitoring toolforge-jobs server side components - https://phabricator.wikimedia.org/T320284#10568152 (10dcaro) 05In progress→03Resolved [15:40:28] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: openstack: wmfkeystonehooks: project ids rather than names are being used in LDAP group creation - https://phabricator.wikimedia.org/T379030#10568154 (10Andrew) Here is the list of VMs currently running with project IDs in the fqdn/puppet cert: ` 664e... [15:41:23] (03approved) 10dcaro: ansible: update YAML output setting [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/230 (owner: 10aborrero) [15:52:12] (03update) 10dcaro: scheduled jobs: add timeout option [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/129 (https://phabricator.wikimedia.org/T306391) [16:24:41] (03update) 10dcaro: scheduled jobs: add timeout option [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/129 (https://phabricator.wikimedia.org/T306391) [16:38:35] 10Toolforge (Toolforge iteration 17): [maintain-harbor] refactor config handling - https://phabricator.wikimedia.org/T386953 (10Raymond_Ndibe) 03NEW [16:39:06] 10Toolforge (Toolforge iteration 17): [maintain-harbor] refactor config handling - https://phabricator.wikimedia.org/T386953#10568407 (10Raymond_Ndibe) [16:39:11] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review, 07Upstream: [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417#10568408 (10Raymond_Ndibe) [16:39:21] 10Toolforge (Toolforge iteration 17): [maintain-harbor] refactor config handling - https://phabricator.wikimedia.org/T386953#10568412 (10Raymond_Ndibe) 05Open→03In progress [16:40:45] (03open) 10raymond-ndibe: [maintain-harbor] move image_retention_policy_template to config [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/44 (https://phabricator.wikimedia.org/T386953) [16:40:54] (03update) 10raymond-ndibe: [maintain-harbor] move image_retention_policy_template to config [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/44 (https://phabricator.wikimedia.org/T386953) [16:48:30] (03open) 10raymond-ndibe: [do-not-merge][lima-kilo] test maintain-harbor robot account [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/231 (https://phabricator.wikimedia.org/T361698) [16:50:40] (03update) 10raymond-ndibe: [toolforge-deploy] maintain-harbor use robot account [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/672 (https://phabricator.wikimedia.org/T361698) [16:51:01] (03update) 10raymond-ndibe: [do-not-merge][lima-kilo] test maintain-harbor robot account [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/231 (https://phabricator.wikimedia.org/T361698) [16:52:04] (03update) 10raymond-ndibe: [do-not-merge][lima-kilo] test maintain-harbor robot account [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/231 (https://phabricator.wikimedia.org/T361698) [16:54:20] (03update) 10raymond-ndibe: [do-not-merge][lima-kilo] test maintain-harbor robot account [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/231 (https://phabricator.wikimedia.org/T361698) [16:59:09] (03update) 10raymond-ndibe: [do-not-merge][lima-kilo] test maintain-harbor robot account [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/231 (https://phabricator.wikimedia.org/T361698) [17:01:53] (03update) 10aborrero: Draft: kyverno: add cluster-wide-policy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/673 (https://phabricator.wikimedia.org/T386921) [17:07:35] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10568497 (10dcaro) [17:24:24] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 on cloudvirt1047 - https://phabricator.wikimedia.org/T386083#10568570 (10Jhancock.wm) a:03VRiley-WMF [17:25:16] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (T386083) [17:25:22] T386083: [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 on cloudvirt1047 - https://phabricator.wikimedia.org/T386083 [17:25:25] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) (T386083) [17:25:53] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 on cloudvirt1047 - https://phabricator.wikimedia.org/T386083#10568579 (10Andrew) 05Resolved→03Invalid I'm putting this host back in service and closing the... [18:00:30] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245#10568742 (10Raymond_Ndibe) There is an issue here. I just checked and `containerRuntimeEndpoi... [18:20:45] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245#10568824 (10Raymond_Ndibe) >>! In T370245#10568742, @Raymond_Ndibe wrote: > There is an issue... [18:22:46] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 - https://phabricator.wikimedia.org/T362867#10568828 (10Raymond_Ndibe) [18:46:15] (03update) 10raymond-ndibe: jobs: add job for managing harbor quotas [repos/cloud/toolforge/maintain-harbor] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/22 (https://phabricator.wikimedia.org/T352417) (owner: 10sstefanova) [18:46:40] (03update) 10raymond-ndibe: jobs: add job for managing harbor quotas [repos/cloud/toolforge/maintain-harbor] (refactor_config) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor/-/merge_requests/22 (https://phabricator.wikimedia.org/T352417) (owner: 10sstefanova) [18:48:40] 10Toolforge (Toolforge iteration 17): [harbor] move harbor setup scripts from builds-builder to maintain-harbor - https://phabricator.wikimedia.org/T386964 (10Raymond_Ndibe) 03NEW [19:08:55] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10568981 (10EBomani) 05Resolved→03Open @dcaro Thank you for making the changes. I made an instance to test whether we'd ge... [19:11:14] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10568998 (10taavi) Instances that have been manually shut down will automatically be removed from the list of Prometheus scrap... [19:22:44] 06cloud-services-team, 10Data-Services, 06Data-Engineering, 06Data-Engineering-Icebox, 10Datasets-General-or-Unknown: Provide dumps using bittorrent - https://phabricator.wikimedia.org/T29653#10569081 (10Ahoelzl) [19:28:21] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10569130 (10EBomani) I see.. is there anything we can do to test this then? If not, I am alright with closing this task and wi... [21:21:54] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) - https://phabricator.wikimedia.org/T370245#10569365 (10fnegri) Thanks @Raymond_Ndibe for checking and fixing this! A couple of nodes we... [22:14:35] 10Tools, 10Wikidata: blocked Wikidata user doing automated misconduct with QuickStatements - https://phabricator.wikimedia.org/T386978#10569593 (10A_smart_kitten) [23:32:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [23:52:11] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature