[00:43:23] <wikibugs>	 10Data-Services: Replicate DiscussionTools items tables to cloud - https://phabricator.wikimedia.org/T374584 (10Bugreporter) 03NEW
[00:56:57] <wikibugs>	 10Toolforge (Toolforge iteration 14): [lima-kilo] allow for the creation of a multi-node high availability cluster - https://phabricator.wikimedia.org/T374585 (10Raymond_Ndibe) 03NEW
[01:01:45] <wikibugs>	 10Toolforge (Toolforge iteration 14): [lima-kilo] allow for the creation of a multi-node high availability cluster - https://phabricator.wikimedia.org/T374585#10139492 (10Raymond_Ndibe)
[01:05:26] <wikibugs>	 (03open) 10raymond-ndibe: [lima-kilo] configure high-availability [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/189 (https://phabricator.wikimedia.org/T374585)
[01:37:07] <wikibugs>	 (03open) 10raymond-ndibe: [toolforge-deploy] upgrade metrics-server [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/520 (https://phabricator.wikimedia.org/T359641)
[01:55:21] <wikibugs>	 (03update) 10raymond-ndibe: Draft: [lima-kilo] configure high-availability [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/189 (https://phabricator.wikimedia.org/T374585)
[02:20:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[02:30:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[03:06:26] <wikibugs>	 (03open) 10chandrapratap25: Draft: Resolve T373919 "Center Align Username Placeholder for Consistency" [toolforge-repos/yearinreview] - 10https://gitlab.wikimedia.org/toolforge-repos/yearinreview/-/merge_requests/6
[03:10:14] <wikibugs>	 (03close) 10chandrapratap25: Draft: Resolve T373919 "Center Align Username Placeholder for Consistency" [toolforge-repos/yearinreview] - 10https://gitlab.wikimedia.org/toolforge-repos/yearinreview/-/merge_requests/6
[03:10:22] <wikibugs>	 (03reopen) 10chandrapratap25: Draft: Resolve T373919 "Center Align Username Placeholder for Consistency" [toolforge-repos/yearinreview] - 10https://gitlab.wikimedia.org/toolforge-repos/yearinreview/-/merge_requests/6
[03:22:11] <wikibugs>	 10Tool-yearinreview, 07good first task: Center Align Username Placeholder for Consistency - https://phabricator.wikimedia.org/T373919#10139555 (10ChandraPratap25) 05Open→03Resolved Resolve T373919 "Center Align Username Placeholder for Consistency"  toolforge-repos/yearinreview!6
[03:52:52] <wikibugs>	 (03open) 10raymond-ndibe: [toolforge-deploy] test multi-replica support for continuous jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/521 (https://phabricator.wikimedia.org/T341066)
[04:01:36] <wikibugs>	 (03update) 10raymond-ndibe: [toolforge-deploy] test multi-replica support for continuous jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/521 (https://phabricator.wikimedia.org/T341066)
[04:03:08] <wikibugs>	 (03update) 10raymond-ndibe: [toolforge-deploy] test multi-replica support for continuous jobs [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/521 (https://phabricator.wikimedia.org/T341066)
[06:59:40] <jinxer-wm>	 FIRING: CephClusterInUnknown: #page Ceph cluster in eqiad is in unknown status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInUnknown - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInUnknown
[06:59:48] <wikibugs>	 06cloud-services-team: CephClusterInUnknown - https://phabricator.wikimedia.org/T374593 (10phaultfinder) 03NEW
[07:00:30] <jinxer-wm>	 FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[07:10:09] <jinxer-wm>	 FIRING: CephSlowOps: Ceph cluster in eqiad has 1 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[07:10:23] <wikibugs>	 06cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T373632#10139781 (10phaultfinder)
[07:22:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[07:32:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[08:43:06] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 07Wikimedia-Slow-DB-Query: [wikireplicas] Log very slow queries - https://phabricator.wikimedia.org/T372859#10139859 (10fnegri) a:03fnegri
[09:12:09] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2): Lint problems for NeutronAgentDownForLong and NeutronAgentDown - https://phabricator.wikimedia.org/T374513#10139915 (10aborrero) →14Duplicate dup:03T373878
[09:13:52] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: openstack: fix missing prometheus metrics - https://phabricator.wikimedia.org/T373878#10139917 (10aborrero)
[09:35:09] <jinxer-wm>	 RESOLVED: CephSlowOps: Ceph cluster in eqiad has 1 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[09:35:39] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599 (10aborrero) 03NEW
[09:35:55] <wikibugs>	 06cloud-services-team, 10Cloud-VPS: openstack: fix missing prometheus metrics - https://phabricator.wikimedia.org/T373878#10140004 (10aborrero)
[09:35:58] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10140005 (10aborrero)
[09:36:35] <wikibugs>	 06cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T373632#10140007 (10aborrero)
[09:36:36] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10140008 (10aborrero)
[09:36:44] <wikibugs>	 06cloud-services-team: CephClusterInUnknown - https://phabricator.wikimedia.org/T374593#10140009 (10aborrero)
[09:36:46] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10140010 (10aborrero)
[09:37:47] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10140015 (10aborrero) p:05Triage→03High
[09:42:06] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10140040 (10aborrero)
[09:42:54] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10140043 (10aborrero)
[09:55:09] <jinxer-wm>	 FIRING: CephSlowOps: Ceph cluster in eqiad has 6 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[09:55:18] <wikibugs>	 06cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T373632#10140078 (10phaultfinder)
[09:55:30] <jinxer-wm>	 RESOLVED: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[09:56:50] <wmcs-alerts>	 FIRING: [3x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[09:57:02] <icinga-wm>	 PROBLEM - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 504 Gateway Time-out - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 324 bytes in 60.003 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[10:00:31] <wmcs-alerts>	 FIRING: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of -1 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState
[10:01:21] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10140080 (10fgiunchedi) From my investigation so far on IRC:  ` 09:51  <godog> so far I have more questions than answers :( both prometheus1005...
[10:01:28] <wmcs-alerts>	 FIRING: InstanceDown: Project tools instance tools-k8s-ingress-9 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[10:01:51] <wmcs-alerts>	 FIRING: [5x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[10:02:29] <wmcs-alerts>	 FIRING: InstanceDown: Project gitlab-runners instance runner-1026 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[10:08:00] <icinga-wm>	 RECOVERY - toolschecker: NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 35.464 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[10:08:23] <wmcs-alerts>	 FIRING: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-6 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady
[10:11:28] <wmcs-alerts>	 RESOLVED: InstanceDown: Project tools instance tools-k8s-ingress-9 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[10:11:51] <wmcs-alerts>	 RESOLVED: [5x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[10:12:29] <wmcs-alerts>	 RESOLVED: InstanceDown: Project gitlab-runners instance runner-1026 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[10:13:23] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesNodeNotReady: Kubernetes node tools-k8s-worker-nfs-6 is not ready - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesNodeNotReady - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesNodeNotReady
[10:15:31] <wmcs-alerts>	 RESOLVED: ToolsToolsDBWritableState: There should be exactly one writable MariaDB instance instead of 0 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsToolsDBWritableState  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBWritableState
[10:24:01] <wikibugs>	 06cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T373632#10140111 (10aborrero)
[10:24:02] <wikibugs>	 06cloud-services-team, 10observability: cloud: prometheus: investigate weirdness with metrics and alertmanager - https://phabricator.wikimedia.org/T374599#10140112 (10aborrero)
[10:28:39] <jinxer-wm>	 RESOLVED: CephSlowOps: Ceph cluster in eqiad has 104 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[10:30:02] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140145 (10fnegri) 05In progress→03Resolved
[10:31:27] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140140 (10fnegri) > the procedure looks similar to what is documented in MariaDB#Manipulating_the_Replication_Tree  Re-r...
[10:40:15] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140197 (10jcrespo) I wouldn't be responsible if I didn't tell you that GTID has been very error prone to us, and that is...
[10:45:36] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140243 (10fnegri) @jcrespo thanks, does it mean that `repl.pl` is still used? Are the docs at [MariaDB#Manipulating_the_...
[10:52:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[10:56:21] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140266 (10Ladsgroup) I think we use a script in wmfmariadbpy called `move_replica`
[10:57:00] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140272 (10jcrespo) This method is used: https://wikitech.wikimedia.org/wiki/Primary_database_switchover but it only work...
[10:57:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[11:01:03] <wmcs-alerts>	 FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-27 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[11:06:03] <wmcs-alerts>	 FIRING: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-23 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[11:11:03] <wmcs-alerts>	 FIRING: [5x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-16 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[11:16:03] <wmcs-alerts>	 FIRING: [9x] ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-12 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[11:17:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[11:21:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-27 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[11:22:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[11:34:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[11:36:04] <wikibugs>	 06cloud-services-team: toolforge: workers with many procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612 (10aborrero) 03NEW
[11:37:26] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 (T374612)
[11:37:30] <stashbot>	 T374612: toolforge: workers with many procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612
[11:39:08] <wikibugs>	 06cloud-services-team: toolforge: workers with many procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612#10140418 (10aborrero)
[11:39:09] <wikibugs>	 06cloud-services-team: CephSlowOps  Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T373632#10140419 (10aborrero)
[11:39:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[11:42:57] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 (T374612)
[11:43:01] <stashbot>	 T374612: toolforge: workers with many procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612
[11:48:47] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 (T374612)
[11:48:51] <stashbot>	 T374612: toolforge: workers with many procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612
[11:51:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[11:52:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[11:54:13] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 (T374612)
[11:54:17] <stashbot>	 T374612: toolforge: workers with many procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612
[11:57:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[11:59:42] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 (T374612)
[11:59:46] <stashbot>	 T374612: toolforge: workers with many procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612
[12:01:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[12:06:00] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 (T374612)
[12:06:04] <stashbot>	 T374612: toolforge: workers with many procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612
[12:16:18] <wikibugs>	 06cloud-services-team: toolforge: workers with many D procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612#10140482 (10aborrero)
[12:17:04] <wikibugs>	 06cloud-services-team: toolforge: workers with many D procs (2024-09-12 edition) - https://phabricator.wikimedia.org/T374612#10140479 (10aborrero) 05Open→03Resolved a:03aborrero
[12:23:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[12:24:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1072530 (owner: 10L10n-bot)
[12:29:25] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140497 (10fnegri) @jcrespo I have added your comment above to [MariaDB#Manipulating_the_Replication_Tree](https://wikite...
[12:32:13] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140507 (10jcrespo) >>! In T365717#10140497, @fnegri wrote: > @jcrespo I have added your comment above to [MariaDB#Manipu...
[12:33:52] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[12:45:02] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140589 (10fnegri) I wasn't sure if "cannot apply to wikireplicas" included Sanitariums or only clouddbs. If it also incl...
[12:47:53] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140604 (10jcrespo) >>! In T365717#10140589, @fnegri wrote: > I wasn't sure if "cannot apply to wikireplicas" included Sa...
[12:52:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[12:57:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:00:54] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 06Data-Persistence, 06Data-Persistence-SRE: [wikireplicas] Update Admin docs - https://phabricator.wikimedia.org/T365717#10140655 (10fnegri) Thanks, I've updated again [MariaDB/Sanitarium_and_clouddb_instances](https://wikitech.wikimedia.org/w...
[13:53:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[13:58:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:00:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:05:52] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:10:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:13:36] <icinga-wm>	 PROBLEM - Host wikitech-static.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:16:21] <wmcs-alerts>	 FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:20:55] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[14:21:06] <wmcs-alerts>	 RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:29:32] <wikibugs>	 (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1072530 (owner: 10L10n-bot)
[14:31:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:36:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[14:50:49] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Quarry, 07User-notice: Support queries against Quarry's own database and ToolsDB - https://phabricator.wikimedia.org/T151158#10141169 (10fnegri) @UOzurumba apologies for the delay. Yes, it's a good idea to add it to the next Tech News! Here's a summary, feel f...
[14:50:59] <icinga-wm>	 RECOVERY - Host wikitech-static.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 24.11 ms
[15:09:45] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [lima-kilo] allow for the creation of a multi-node high availability cluster - https://phabricator.wikimedia.org/T374585#10141230 (10dcaro) +1 for being able to test upgrades in lima-kilo, though we would need to be able to replicate the upgrade procedu...
[15:11:03] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: instrument VXLAN-based flat network - https://phabricator.wikimedia.org/T374020#10141235 (10aborrero) The network is in better shape now, VMs have now connectivity by default.  I have recreated the VMs. To test:  * `ssh arturo-test-vm3.cloudinf...
[15:13:13] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [lima-kilo] allow for the creation of a multi-node high availability cluster - https://phabricator.wikimedia.org/T374585#10141239 (10dcaro) This might be interesting: https://pkg.go.dev/k8s.io/kubeadm/kinder#section-readme
[15:15:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[15:20:51] <wmcs-alerts>	 RESOLVED: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4)  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[15:29:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[15:34:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[15:47:33] <wikibugs>	 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge: Decision Request: To strictly enforce semantic versioning rules for toolforge services' APIs or not - https://phabricator.wikimedia.org/T373072#10141412 (10fnegri) I think Option 2 is a good trade off, it's true it is prone to errors, but I like...
[16:43:52] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#10141703 (10wiki_willy) It looks like it'll be 3 drives minimum from the latest email today, and @Jclark-ctr - you c...
[16:44:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[16:49:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[16:50:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:00:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:00:56] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:01:11] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:01:26] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:02:11] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:02:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:12:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:15:52] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:20:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:36:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:41:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:54:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[17:59:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[18:28:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[18:33:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[18:46:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[18:51:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[19:06:52] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[19:11:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[19:25:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-5 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[19:35:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-5 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[19:35:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[19:40:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-6:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-6:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[19:50:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[20:00:41] <jinxer-wm>	 RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[20:13:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:18:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:42:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[20:47:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:16:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:21:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:38:29] <wikibugs>	 10VPS-project-Wikistats: Add moswiki to wikistats - https://phabricator.wikimedia.org/T374648#10142558 (10Dzahn) a:03Dzahn
[21:39:51] <wmcs-alerts>	 FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[21:44:51] <wmcs-alerts>	 RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-5:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown
[22:35:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-55 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses