[00:04:17] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [01:49:01] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [02:04:42] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [02:32:21] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [03:15:56] FIRING: SystemdUnitDown: The service unit opentofu-infra-diff.service is in failed status on host cloudcontrol1007. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [03:19:42] (03update) 10raymond-ndibe: [jobs-api] replace load with diff_job runtime method [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T359804) [03:23:24] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [03:29:47] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [03:30:54] (03update) 10raymond-ndibe: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [03:31:05] (03update) 10raymond-ndibe: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:20:07] (03update) 10raymond-ndibe: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:21:09] (03update) 10raymond-ndibe: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:21:54] (03update) 10raymond-ndibe: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:22:31] (03update) 10raymond-ndibe: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:23:18] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:23:51] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:24:10] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:32:57] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:40:05] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:47:07] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:49:09] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [04:52:40] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [05:10:56] FIRING: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:11:45] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [05:41:02] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [06:19:37] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [06:58:05] 10Tools: Geohack page link timing out - https://phabricator.wikimedia.org/T386670#10558397 (10Aklapper) [07:25:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-75 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:10:56] FIRING: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [09:18:47] 06cloud-services-team, 10Cloud-VPS: Changing the IPs of cloudcephmons should not require VM reboots - https://phabricator.wikimedia.org/T385288#10558734 (10dcaro) >>! In T385288#10512381, @Andrew wrote: > Yes, I think service IPs/fqdns is the solution to this. This discussion post states that a reboot is neces... [09:19:46] 06cloud-services-team, 10Cloud-VPS: Changing the IPs of cloudcephmons should not require VM reboots - https://phabricator.wikimedia.org/T385288#10558736 (10dcaro) We can for sure add an alert/warning detecting that situation though, so we don't forget it's happening. [09:48:00] 06cloud-services-team, 10Toolforge, 07Epic: loki into lima-kilo - https://phabricator.wikimedia.org/T386480#10558833 (10dcaro) > I fussed a little with the git clone in roles/k8s/tasks/toolforge-deploy_components.yaml to pull T386480 as it isn't merged yet, so I don't really have a full test of the patch. Is... [09:57:18] 06cloud-services-team, 10Data-Services, 10Toolforge, 07Documentation: Restructure and improve content for: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database - https://phabricator.wikimedia.org/T232404#10558857 (10taavi) I just moved a bunch of content from https://wikitech.wikimedia.org/wiki/Help... [09:58:28] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: Add catalyst project to prometheus-alerts alertmanager. - https://phabricator.wikimedia.org/T386416#10558858 (10dcaro) [10:04:30] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10558868 (10dcaro) [10:10:21] 06cloud-services-team, 10Data-Services, 10Toolforge, 07Documentation: Restructure and improve content for: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database - https://phabricator.wikimedia.org/T232404#10558893 (10fnegri) Thanks @taavi this task has been in my radar forever, but I never got to sta... [10:13:13] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10558898 (10dcaro) Unfortunately we don't have a self-service alerting setup yet, but we can help :) I don't see any alerts t... [10:14:08] 06cloud-services-team, 10Cloud-VPS, 10VPS-Projects, 10Catalyst: metricsinfra: send alerts for the catalyst project to catalyst@w.o email - https://phabricator.wikimedia.org/T386416#10558900 (10dcaro) Note also that any custom alerts will need adding some stuff to the metricsinfra DB for now, so let me know... [10:24:30] 06cloud-services-team, 10Cloud-VPS: SystemdUnitDown The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T386543#10558941 (10fnegri) Now it's failing again because https://gitlab.wikimedia.org/repos/cloud/cloud-vps/t... [10:25:00] 06cloud-services-team, 10Cloud-VPS: SystemdUnitDown The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T386543#10558942 (10fnegri) 05Open→03In progress [10:25:12] 06cloud-services-team, 10Cloud-VPS: SystemdUnitDown The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://phabricator.wikimedia.org/T386543#10558943 (10fnegri) p:05Triage→03Medium [10:26:39] 06cloud-services-team, 10Toolforge, 07Epic: loki into lima-kilo - https://phabricator.wikimedia.org/T386480#10558944 (10dcaro) > Will get loki installed in lima-kilo. @aborrero @dcaro @Andrew @Raymond_Ndibe opinions? The patches look ok to me (have not tested them yet, testing something else...), they have... [10:47:34] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, 07Epic: [toolforge,storage,infra,k8s] Investigate persistent volume support - https://phabricator.wikimedia.org/T384596#10559010 (10aborrero) Regarding network connectivity. Assuming our current cloudceph... [10:47:42] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project, 07Epic: [toolforge,storage,infra,k8s] Investigate persistent volume support - https://phabricator.wikimedia.org/T384596#10559011 (10aborrero) [11:00:54] (03update) 10dcaro: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) (owner: 10raymond-ndibe) [11:00:56] (03update) 10dcaro: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) (owner: 10raymond-ndibe) [11:01:31] (03update) 10dcaro: [jobs-api] replace load with diff_job runtime method [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T359804) (owner: 10raymond-ndibe) [11:05:46] (03update) 10dcaro: [jobs-api] replace load with diff_job runtime method [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T359804) (owner: 10raymond-ndibe) [11:12:16] 06cloud-services-team, 10Cloud-VPS: Puppet fails on cloudcontrol when updating /srv/tofu-infra - https://phabricator.wikimedia.org/T373815#10559067 (10fnegri) 05Open→03Resolved This has not happened in a few weeks, I'll resolve this task. We have the parent task {T374022} to track possible improvements. [11:32:26] 06cloud-services-team, 10Cloud-VPS: petscan5 unresponsive - https://phabricator.wikimedia.org/T384642#10559090 (10Magnus) I have limited the RAM for PetScan via `systemctl`, which also should restart PetScan after VM reboot. Please let me know if that doesn't take care of the problem. [11:47:45] 06cloud-services-team, 10Toolforge, 07Epic: loki into lima-kilo - https://phabricator.wikimedia.org/T386480#10559123 (10rook) >>! In T386480#10558944, @dcaro wrote: > The patches look ok to me (have not tested them yet, testing something else...), they have no extra config for loki yet right? Correct this i... [11:58:10] 10wikitech.wikimedia.org: ☂ Wikitech account linking and SUL error reporting - https://phabricator.wikimedia.org/T376267#10559149 (10Ladsgroup) Your wikitech account has been renamed away from Gryllida (https://wikitech.wikimedia.org/w/index.php?title=User:Gryllida&action=history). Fixing this means we are goin... [12:10:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [12:13:07] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [12:13:17] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [12:13:38] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [12:52:48] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T385399#10559295 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/481 [12:53:35] vivian-rook opened https://github.com/toolforge/paws/pull/481 [12:59:42] (03CR) 10Majavah: [C:03+2] build: Updating mediawiki/mediawiki-codesniffer to 46.0.0 [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1115535 (owner: 10Libraryupgrader) [13:05:16] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:10:16] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:10:56] RESOLVED: SystemdUnitDown: The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1007 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:14:49] 06cloud-services-team, 10Toolforge: toolforge-legacy-redirector: constant failed probes by prometheus - https://phabricator.wikimedia.org/T385908#10559436 (10taavi) One option to reduce load on this box would be to add the hostnames it serves to HSTS preload lists and then stop responding anything on port 80.... [13:21:18] 06cloud-services-team, 10Data-Services, 10Toolforge, 07Documentation: Restructure and improve content for: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database - https://phabricator.wikimedia.org/T232404#10559473 (10taavi) I'd rather not mix ToolsDB and Trove things into the same doc, but I'd be fin... [13:32:02] 10PAWS: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T385399#10559505 (10rook) 05Open→03Resolved a:03rook [13:32:20] vivian-rook closed https://github.com/toolforge/paws/pull/481 [13:32:49] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10559511 (10taavi) >>! In T380679#10544786, @Andrew wrote: > OK! I can already report that people are using those short names quite a lot. I see queries incoming from a... [13:35:55] 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T386408#10559531 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/482 [13:36:16] vivian-rook opened https://github.com/toolforge/paws/pull/482 [13:41:28] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Unable to persistently set fs.inotify.max_user_instances and fs.inotify.max_user_watches - https://phabricator.wikimedia.org/T385530#10559566 (10Andrew) 05Open→03Resolved @SDunlap you can now add 'base::sysctl::inotify' to your VM hiera and ever... [13:52:39] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715 (10aborrero) 03NEW [13:52:59] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715#10559632 (10aborrero) p:05Triage→03Medium [13:53:03] 06cloud-services-team, 10Data-Services, 10Toolforge, 07Documentation: Restructure and improve content for: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database - https://phabricator.wikimedia.org/T232404#10559631 (10fnegri) Agreed, linking to all offerings from Help:Toolforge/Database seems cleaner. [13:57:48] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715#10559661 (10fnegri) [14:01:35] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715#10559682 (10fnegri) [14:05:39] 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T386408#10559692 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/482 [14:06:01] vivian-rook closed https://github.com/toolforge/paws/pull/482 [14:34:34] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715#10559786 (10Aklapper) @aborrero: Is there an underlying template somewhere which could be changed from "Associate WMF mediawiki account with phab user account" to "Associate both WMF ITS provided Mediawiki.org account and D... [14:37:57] 10Tool-eswstool2: Failed uploads to Commons and the various kinds of errors they throw - https://phabricator.wikimedia.org/T384101#10559792 (10Ninovolador) https://bdh.bne.es/bnesearch/detalle/bdh0000258110 https://bdh-rd.bne.es/high.raw?id=bdh0000258110&name=00000001.original.pdf 67.4M chunk_size = 1024 * 1024... [14:40:21] 06cloud-services-team: Onboard Chuck Onwumelu - https://phabricator.wikimedia.org/T386715#10559796 (10fnegri) @Aklapper https://www.mediawiki.org/wiki/Wikimedia_Cloud_Services_team/Onboarding_template [14:44:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:49:39] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:53:16] 06cloud-services-team, 10Data-Services, 10Toolforge, 07Documentation: Restructure and improve content for: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database - https://phabricator.wikimedia.org/T232404#10559825 (10dcaro) >>! In T232404#10559631, @fnegri wrote: > Agreed, linking to all offerings fr... [15:01:37] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 (T380679) [15:01:41] T380679: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679 [15:03:47] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 (T380679) [15:04:03] 06cloud-services-team, 10Data-Services, 10Toolforge, 07Documentation: Restructure and improve content for: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database - https://phabricator.wikimedia.org/T232404#10559865 (10taavi) ok, I did a very basic split there. I'm sure there are still many places to u... [15:04:13] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 (T380679) [15:04:14] FIRING: [2x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-8.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [15:07:40] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 (T380679) [15:07:54] T380679: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679 [15:09:14] FIRING: [4x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-7.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [15:14:14] RESOLVED: [4x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAproxy server down: tools-k8s-control-7.tools.eqiad1.wikimedia.cloud - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [16:29:50] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 [16:35:25] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 [17:31:19] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-54 [17:47:47] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-54 [17:51:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [17:55:17] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [17:59:07] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [17:59:22] (03update) 10dcaro: [jobs-api] replace load with diff_job runtime method [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T359804) (owner: 10raymond-ndibe) [18:01:09] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:01:11] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [18:03:19] (03update) 10dcaro: deploy-token: prevent accidental token overwrites [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/49 (owner: 10sstefanova) [18:03:21] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:06:56] (03update) 10dcaro: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) (owner: 10raymond-ndibe) [18:08:04] (03update) 10dcaro: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) (owner: 10raymond-ndibe) [18:18:57] 10Tool-erinnermich: [ErinnerMichBot] Possible support for other languages and projects? - https://phabricator.wikimedia.org/T384842#10560624 (10Tkarcher) The bot I created two weeks ago for the French Wikipedia is indeed called [[ https://meta.wikimedia.org/wiki/Special:CentralAuth/RappelleMoiBot | RappelleMoiBo... [18:19:32] 10Data-Services, 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 06Privacy Engineering: Create views for SecurePoll db tables in Toolforge replicas - https://phabricator.wikimedia.org/T381197#10560629 (10Ottomata) [18:21:08] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:24:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [18:24:55] (03update) 10raymond-ndibe: [jobs-api] custom resource definition deployment templates [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/101 (https://phabricator.wikimedia.org/T359650) [18:27:42] (03update) 10raymond-ndibe: [jobs-api] replace load with diff_job runtime method [repos/cloud/toolforge/jobs-api] (save_business_models_to_db) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/143 (https://phabricator.wikimedia.org/T359804) [18:29:10] (03PS1) 10QChris: Allow “Gerrit Managers” to import history [labs/tools/wdaudiolex-be] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1120627 [18:29:10] (03CR) 10QChris: [V:03+2 C:03+2] Allow “Gerrit Managers” to import history [labs/tools/wdaudiolex-be] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1120627 (owner: 10QChris) [18:29:38] (03PS1) 10QChris: Import done. Revoke import grants [labs/tools/wdaudiolex-be] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1120629 [18:29:38] (03CR) 10QChris: [V:03+2 C:03+2] Import done. Revoke import grants [labs/tools/wdaudiolex-be] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/1120629 (owner: 10QChris) [18:29:40] (03update) 10raymond-ndibe: [jobs-api] create seperate api.py and move flask things there [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/91 (https://phabricator.wikimedia.org/T359804) [18:31:00] (03update) 10raymond-ndibe: [jobs-api] save business models in a DB [repos/cloud/toolforge/jobs-api] (split_logic_from_api) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [18:31:29] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: openstack: wmfkeystonehooks: project ids rather than names are being used in LDAP group creation - https://phabricator.wikimedia.org/T379030#10560684 (10Andrew) Server fqdn is a bit of a puzzle. Right now the fqdn is generated by cloud-init, based on op... [18:45:18] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 on cloudvirt1047 - https://phabricator.wikimedia.org/T386083#10560719 (10VRiley-WMF) After looking into this, it seems it was a small glitch with the memory, h... [18:45:24] 06cloud-services-team, 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 on cloudvirt1047 - https://phabricator.wikimedia.org/T386083#10560720 (10VRiley-WMF) 05Open→03Resolved [18:49:11] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [18:54:36] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): [jobs-api] move jobs.toolforge.org/* labels to annotations - https://phabricator.wikimedia.org/T385904#10560760 (10Raymond_Ndibe) 05Declined→03Resolved [18:55:58] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10560767 (10Raymond_Ndibe) closing because this is no longer happening. This happened when I was testing the harbor upgrade patch. That pat... [18:56:09] 06cloud-services-team, 10Toolforge (Toolforge iteration 17): toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809#10560772 (10Raymond_Ndibe) 05Open→03Resolved a:03Raymond_Ndibe [19:02:55] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: openstack: wmfkeystonehooks: project ids rather than names are being used in LDAP group creation - https://phabricator.wikimedia.org/T379030#10560835 (10bd808) >>! In T379030#10560684, @Andrew wrote: > 1) write a custom dynamic metadata service to provi... [19:04:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [19:09:13] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: openstack: wmfkeystonehooks: project ids rather than names are being used in LDAP group creation - https://phabricator.wikimedia.org/T379030#10560867 (10Andrew) >>! In T379030#10560835, @bd808 wrote: > > https://docs.openstack.org/nova/2024.2/admin/ven... [19:09:59] (03CR) 10BryanDavis: [C:03+2] "Usage of these legacy bits was removed in fe2aaa216fd0eafcf2bb5c3cd6f08db421a7931c when switching to gitlab repos." [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [19:10:05] (03CR) 10CI reject: [V:04-1] templates: Remove unused Phabricator policy formatting code [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [19:11:19] (03CR) 10BryanDavis: templates: Remove unused Phabricator policy formatting code [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [19:11:43] (03CR) 10BryanDavis: [C:03+2] templates: Remove unused Phabricator policy formatting code [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [19:11:50] (03CR) 10CI reject: [V:04-1] templates: Remove unused Phabricator policy formatting code [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [19:14:11] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [19:17:19] (03CR) 10Majavah: [C:03+2] "retry" [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [19:17:26] (03CR) 10CI reject: [V:04-1] templates: Remove unused Phabricator policy formatting code [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [19:27:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [19:44:26] RESOLVED: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [19:53:07] (03update) 10rook: Adding loki to install [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/669 (https://phabricator.wikimedia.org/T386480) [19:55:36] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: Support HTTP health checks in jobs framework - https://phabricator.wikimedia.org/T362621#10561103 (10Raymond_Ndibe) updated changelog and documentation https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog [19:55:50] 10Toolforge (Toolforge iteration 17), 13Patch-For-Review: Support HTTP health checks in jobs framework - https://phabricator.wikimedia.org/T362621#10561105 (10Raymond_Ndibe) 05In progress→03Resolved [19:55:51] (03update) 10sstefanova: Adding loki to install [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/669 (https://phabricator.wikimedia.org/T386480) (owner: 10rook) [20:13:08] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10561154 (10xcollazo) Please detach 'Xcollazo' from SUL, rename it to 'XCollazo-WMF', and reattach to SUL. [20:25:11] FIRING: Temperature: Inlet Temp issue on clouddumps1001:9290 - https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook - https://grafana.wikimedia.org/d/ZA1I-IB4z/ipmi-sensor-state?orgId=1&viewPanel=92&var-server=clouddumps1001 - https://alerts.wikimedia.org/?q=alertname%3DTemperature [20:31:52] 06cloud-services-team, 10Toolforge, 10Tools: Flickr blocking image requests from Toolforge k8s, breaking multiple tools - https://phabricator.wikimedia.org/T384468#10561220 (10Don-vip) >>! In T384468#10552651, @Andrew wrote: > I would very much like to follow up with flickr support about this. Can I please g... [20:51:04] (03CR) 10BryanDavis: templates: Remove unused Phabricator policy formatting code [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [20:51:19] (03CR) 10BryanDavis: [C:03+2] "retry #3" [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [20:51:25] (03CR) 10CI reject: [V:04-1] templates: Remove unused Phabricator policy formatting code [labs/striker] - 10https://gerrit.wikimedia.org/r/1119865 (owner: 10Majavah) [20:56:57] 10Striker, 10ci-test-error (WMF-deployed Build Failure): striker-pipeline-test failing to load pipelinelib with git error - https://phabricator.wikimedia.org/T386755 (10bd808) 03NEW [21:13:53] 10Striker, 10ci-test-error (WMF-deployed Build Failure): striker-pipeline-test failing to load pipelinelib with git error - https://phabricator.wikimedia.org/T386755#10561324 (10bd808) A [[https://integration.wikimedia.org/ci/job/striker-pipeline-test/474/|build from a few days ago]] (2025-02-16) shows what is... [21:14:14] 10Striker, 10ci-test-error (WMF-deployed Build Failure): striker-pipeline-test failing to load pipelinelib with git error - https://phabricator.wikimedia.org/T386755#10561325 (10bd808) I think the failures are happening on the Jenkins server itself. The crash has been very reproducible today via gerrit/zuul/J... [21:15:08] 10Striker, 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): striker-pipeline-test failing to load pipelinelib with git error - https://phabricator.wikimedia.org/T386755#10561327 (10bd808) [21:27:30] 10Tool-erinnermich: [ErinnerMichBot] Possible support for other languages and projects? - https://phabricator.wikimedia.org/T384842#10561365 (10M-J) >>! In T384842#10560624, @Tkarcher hat geschrieben: > The bot I created two weeks ago for the French Wikipedia is indeed called [[ https://meta.wikimedia.org/wiki/S... [21:36:54] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10561394 (10bd808) >>! In T386026#10561154, @xcollazo wrote: > Please detach 'Xcollazo' from SUL, rename it to 'XCollazo-WMF', and reattach... [21:53:02] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: openstack: wmfkeystonehooks: project ids rather than names are being used in LDAP group creation - https://phabricator.wikimedia.org/T379030#10561446 (10Andrew) There is more background than you might like about extra vendor data vs. cloud-init here htt... [22:18:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:23:06] RESOLVED: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:05:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:10:06] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:16:06] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_redirects_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:21:06] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown