[00:04:38] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic, 13Patch-For-Review: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10542240 (10Bugreporter) Not a requirement in any way, but (after matches based on Phabricator) we can still match several of th... [00:58:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:59:36] 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 07Epic, 13Patch-For-Review: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#10542370 (10bd808) >>! In T161859#10542240, @Bugreporter wrote: > What **is** an issue is some of accounts renamed by you are no... [01:03:06] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:06:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:11:39] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:40:39] FIRING: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:45:39] RESOLVED: [4x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:11:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:16:06] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:44:45] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:49:45] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [05:17:44] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10542601 (10abi_) >>! In T384061#10541494, @abi_ wrote: >>>! In T384061#10541391, @LucasWerkmeister wr... [08:30:36] 14cloud-services-team (Kanban), 10wikitech.wikimedia.org, 10CirrusSearch, 07Wikimedia-production-error: DBQueryError on Wikitech Static Search - https://phabricator.wikimedia.org/T243730#10542758 (10Gehel) [08:33:09] 10Cloud-Services, 10Elasticsearch: Replicate production elasticsearch indices to labs - https://phabricator.wikimedia.org/T109715#10542837 (10Gehel) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace... [08:40:14] 14cloud-services-team (Kanban), 10VPS-Projects: Several 'search' project instances not accessible via cloud-cumin - https://phabricator.wikimedia.org/T306491#10543032 (10Gehel) [11:24:23] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10543818 (10Jnanaranjan_sahu) Please detach 'Jnanaranjan Sahu' from SUL, rename it to 'Jnanaranjan sahu', and reattach to SUL. [11:28:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 35490 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [11:47:48] 10wikitech.wikimedia.org: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account - https://phabricator.wikimedia.org/T386026#10543916 (10Ladsgroup) >>! In T386026#10543818, @Jnanaranjan_sahu wrote: > Please detach 'Jnanaranjan Sahu' from SUL, rename it to 'Jnanara... [12:10:57] (03open) 10samwilson: Upgrade to Symfony 7 [toolforge-repos/wdlocator] - 10https://gitlab.wikimedia.org/toolforge-repos/wdlocator/-/merge_requests/34 [12:12:10] (03update) 10samwilson: Upgrade to Symfony 7 [toolforge-repos/wdlocator] - 10https://gitlab.wikimedia.org/toolforge-repos/wdlocator/-/merge_requests/34 [12:15:55] (03merge) 10samwilson: Upgrade to Symfony 7 [toolforge-repos/wdlocator] - 10https://gitlab.wikimedia.org/toolforge-repos/wdlocator/-/merge_requests/34 [12:30:31] (03Abandoned) 10Andrew Bogott: wmcs.toolforge.k8s.reboot: always do reboot --hard [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1116059 (https://phabricator.wikimedia.org/T385264) (owner: 10Andrew Bogott) [12:56:57] (03open) 10samwilson: Add toolforge-jobs job file for installing [toolforge-repos/wdlocator] - 10https://gitlab.wikimedia.org/toolforge-repos/wdlocator/-/merge_requests/35 [12:59:34] (03merge) 10samwilson: Add toolforge-jobs job file for installing [toolforge-repos/wdlocator] - 10https://gitlab.wikimedia.org/toolforge-repos/wdlocator/-/merge_requests/35 [13:06:18] FIRING: KernelErrors: Server cloudvirt1041 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudvirt1041 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [14:46:44] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10544745 (10Andrew) > So I propose to add cnames for tools-redis.tools.eqiad1.wikimedia.cloud -> redis.svc.tools.eqiad1.wikimedia.cloud and tools-db.tools.eqiad1.wikime... [14:50:49] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10544775 (10Andrew) I've put a trace on the recursor to see if people are still using the short names 'tools-redis' and 'tools-db'. Of course due to resolv.conf behavi... [14:53:29] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10544786 (10Andrew) OK! I can already report that people are using those short names quite a lot. I see queries incoming from a variety of k8s workers. I also see queri... [14:57:07] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10544805 (10fnegri) > I propose to add cnames for tools-redis.tools.eqiad1.wikimedia.cloud -> redis.svc.tools.eqiad1.wikimedia.cloud SGTM. > we should skip the middl... [15:10:06] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:15:06] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_tool_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:16:41] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10544893 (10Andrew) I've added those two new cnames. Now my test script looks like this: ` andrew@abogott-nstesting:~$ sh ./nstest.sh tools-redis.tools.eqiad1.wikime... [15:25:12] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10544957 (10fnegri) > I've added those two new cnames. Where did you add them? I was expecting to find them in a `tools.eqiad1.wikimedia.cloud` zone. [16:11:39] (03update) 10raymond-ndibe: Draft: [jobs-api] use job k8s custom resources in code [repos/cloud/toolforge/jobs-api] (diff_job_runtime_method) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/114 (https://phabricator.wikimedia.org/T359650) [16:28:54] Change on 12wikitech.wikimedia.org a page News/2025 Eliminating the .wmflabs domain was created, changed by Andrewbogott link https://wikitech.wikimedia.org/wiki/News/2025_Eliminating_the_.wmflabs_domain edit summary: Created page with "{{Draft}} {{Tracked|T380679}} As of June 1, 2025, the obsolete domain .wmflabs will be fully removed from cloud-vps and toolforge infrastructure. == What's already done == * There are no longer any VMs with A records in the .wmflabs tld. * All service records in .wmflabs are also present in .wikimedia.cloud * The search string in resolv.conf prefers to look up domains in .wikimedia.cloud and only uses .wmflabs as a last resort. == Timeline == * The .e..." [16:28:55] T380679: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679 [16:42:31] Change on 12wikitech.wikimedia.org a page News/2025 Eliminating the .wmflabs domain was modified, changed by Taavi link https://wikitech.wikimedia.org/w/index.php?diff=2271453 edit summary: /* More information */ [16:42:44] Change on 12wikitech.wikimedia.org a page News/2025 Eliminating the .wmflabs domain was modified, changed by Taavi link https://wikitech.wikimedia.org/w/index.php?diff=2271454 edit summary: /* Frequently asked questions */ +1 [16:43:23] Change on 12wikitech.wikimedia.org a page News/2025 Eliminating the .wmflabs domain was modified, changed by Taavi link https://wikitech.wikimedia.org/w/index.php?diff=2271456 edit summary: c/e [16:50:26] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10545465 (10Andrew) >>! In T380679#10544957, @fnegri wrote: >> I've added those two new cnames. > > Where did you add them? I was expecting to find them in a `tools.eq... [16:52:32] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10545484 (10Andrew) ` root@cloudcontrol1006:~# openstack zone list --all-projects | grep " eqiad1.wikimedia.cloud" | 67603ef4-3d64-40d6-90d3-5b7776a99034 | cloudinfra... [17:11:33] 06cloud-services-team, 10Toolforge, 07Epic: Migrate largest ToolsDB users to Trove - https://phabricator.wikimedia.org/T291782#10545551 (10fnegri) This is a snapshot of the current top DBs by size: ` root@tools-db-4:/srv/labsdb/binlogs# sudo du -chs /srv/labsdb/data/* | sort -rh | head 2.0T total 281G... [17:19:28] 06cloud-services-team, 10Toolforge, 07Epic: Migrate largest ToolsDB users to Trove - https://phabricator.wikimedia.org/T291782#10545574 (10fnegri) I'm confused by the fact that https://tool-db-usage.toolforge.org/ is reporting different numbers for some DBs, but the top 3 ones are the same: | Owner... [17:22:48] 06cloud-services-team, 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240 (10fnegri) 03NEW [17:24:22] 06cloud-services-team, 10Toolforge: [toolsdb] Replica is frequently lagging behind the primary - https://phabricator.wikimedia.org/T357624#10545624 (10fnegri) [17:26:32] 06cloud-services-team, 10Toolforge: [toolsdb] Replica is frequently lagging behind the primary - https://phabricator.wikimedia.org/T357624#10545643 (10fnegri) > We are currently using slave_parallel_mode=conservative. Setting slave_parallel_mode=optimistic (the same that is used in production) is likely to hel... [17:29:08] 06cloud-services-team, 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240#10545660 (10fnegri) [17:29:09] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly on primary host - https://phabricator.wikimedia.org/T385900#10545661 (10fnegri) [17:31:02] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly on primary host - https://phabricator.wikimedia.org/T385900#10545675 (10fnegri) > Then I was able to restart mariadb and replication resumed. Replication was back in sync for a moment, then it started lagging again:... [17:35:47] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly on primary host - https://phabricator.wikimedia.org/T385900#10545689 (10fnegri) 05In progress→03Resolved I think the best way to reduce both the crashes/deadlocks and replication lag issues is prioritizing {T2... [17:35:55] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10545693 (10Andrew) proposed announcement email: ` tl;dr: Minor change to DNS resolution[0] for toolforge and cloud-vps services on Monday. Should have no effect but... [17:36:43] 06cloud-services-team, 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240#10545697 (10fnegri) 05Open→03In progress p:05Triage→03High [17:36:52] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240#10545701 (10fnegri) [17:56:05] 06cloud-services-team, 10Toolforge: ConnectTimeoutError when trying to pip install - https://phabricator.wikimedia.org/T386059#10545931 (10bd808) The bug seems somehow related to running inside the deadlinkscanner tool's Kubernetes namespace: `lang=shell-session,COUNTEREXAMPLE $ sudo become deadlinkscanner $... [17:56:21] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Tools, 10linkwatcher: Reduce size of linkwatcher db if at all possible - https://phabricator.wikimedia.org/T224154#10545933 (10fnegri) p:05High→03Medium a:05fnegri→03None The current size is 912GB, which is less that it was at the time it was migrated to T... [17:56:37] 06cloud-services-team, 10Tools, 10linkwatcher: Reduce size of linkwatcher db if at all possible - https://phabricator.wikimedia.org/T224154#10545936 (10fnegri) [18:03:44] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-02-12 - https://phabricator.wikimedia.org/T386240#10546027 (10fnegri) Compared to previous situations like {T370760} this looks a bit different: * CPU usage is close to zero, so replication is not CPU-... [18:05:17] 06cloud-services-team, 10Toolforge: ConnectTimeoutError when trying to pip install - https://phabricator.wikimedia.org/T386059#10546035 (10bd808) I tried `mv .cache .cache-possibly-corrupt` to see if something in there had become corrupted, but nothing changed. `lang=shell-session,COUNTEREXAMPLE tools.deadlink... [18:06:24] 06cloud-services-team, 10Toolforge, 10Tools: ConnectTimeoutError when trying to `pip install` inside the deadlinkscanner Kubernetes namespace - https://phabricator.wikimedia.org/T386059#10546044 (10bd808) [18:14:06] 06cloud-services-team, 10Toolforge, 10Tools: ConnectTimeoutError when trying to `pip install` inside the deadlinkscanner Kubernetes namespace - https://phabricator.wikimedia.org/T386059#10546120 (10bd808) These envvars are causing the problem @Ederporto. They tell things like `pip` and `curl` that all outbou... [18:17:45] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Drop support for VMs with .wmflabs FQDNs - https://phabricator.wikimedia.org/T380679#10546145 (10fnegri) > For historical reasons (and also coding simplicity) most ..eqiad1.wikimedia.cloud records actually belong in eqiad1.wikimedia.... [18:36:40] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly (innodb_fatal_semaphore_wait_threshold) - https://phabricator.wikimedia.org/T385900#10546208 (10fnegri) [18:40:07] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly (innodb_fatal_semaphore_wait_threshold) - https://phabricator.wikimedia.org/T385900#10546226 (10fnegri) [18:48:33] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly (innodb_fatal_semaphore_wait_threshold) - https://phabricator.wikimedia.org/T385900#10546250 (10fnegri) I discovered that the replica crashed twice with the same error, but not at the same time of the primary. I a... [18:49:29] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly (innodb_fatal_semaphore_wait_threshold) - https://phabricator.wikimedia.org/T385900#10546252 (10fnegri) [18:58:36] 10Tool-ranker, 06translatewiki.net, 10LPL Essential (LPL Essential 2024 Nov-Jan), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add Ranker to translatewiki.net - https://phabricator.wikimedia.org/T384061#10546300 (10LucasWerkmeister) Alright, thanks! >>! In T384061#10512190, @LucasWerkmeister wrote: > I... [19:16:19] 06cloud-services-team, 10Toolforge, 10observability: [toolforge.infra] Provide centralized logging for Toolforge platform - https://phabricator.wikimedia.org/T97861#10546357 (10Andrew) Just so I understand the relationship between this and T127367... this is specifically about logs for admins/infra component... [21:31:54] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge: [toolsdb] mariadb crashing repeatedly (innodb_fatal_semaphore_wait_threshold) - https://phabricator.wikimedia.org/T385900#10546842 (10aborrero) Conversation with @Marostegui: * upgrade to mariadb 16.6.20 (which is what they are running on prod) ** ma... [22:09:01] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 4406 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [22:15:09] 06cloud-services-team, 10Toolforge, 10observability: [toolforge.infra] Provide centralized logging for Toolforge platform - https://phabricator.wikimedia.org/T97861#10546976 (10dcaro) >>! In T97861#10546357, @Andrew wrote: > Just so I understand the relationship between this and T127367... this is specifical... [22:15:22] 06cloud-services-team, 10Toolforge, 10observability: [toolforge,infra] Provide centralized logging for Toolforge platform - https://phabricator.wikimedia.org/T97861#10546977 (10dcaro) [22:25:45] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_toolserver_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:30:45] RESOLVED: [3x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:54:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 3681 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [23:09:39] FIRING: ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:14:39] FIRING: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [23:19:39] RESOLVED: [2x] ProbeDown: Service tools-legacy-redirector-2:443 has failed probes (http_tools_wmflabs_org_main_page_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-legacy-redirector-2:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown