[00:04:48] FIRING: [66x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [00:08:24] FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [00:26:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:39:38] FIRING: [13x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [00:44:38] FIRING: [14x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [00:54:38] FIRING: [14x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [00:59:52] FIRING: [32x] CertAlmostExpired: Certificate for service people1005:30443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [01:09:43] (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1277250 [01:09:43] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1277250 (owner: 10TrainBranchBot) [01:13:40] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:19:52] (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1277250 (owner: 10TrainBranchBot) [01:29:38] FIRING: [10x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [02:00:43] !log mwpresync@deploy1003 Started scap build-images: Publishing wmf/next image [02:01:38] !log mwpresync@deploy1003 Finished scap build-images: Publishing wmf/next image (duration: 00m 55s) [02:09:19] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:14:38] FIRING: [11x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [02:29:38] FIRING: [15x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [02:34:19] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [02:44:10] (03PS1) 10Harej: Change IP address for Scatter mirror [puppet] - 10https://gerrit.wikimedia.org/r/1277254 [02:59:38] FIRING: [21x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:24:38] FIRING: [21x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:29:38] FIRING: [20x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:34:03] FIRING: HelmReleaseBadStatus: Helm release mw-script/nngkzgw8 on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [03:34:38] FIRING: [20x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [03:54:38] FIRING: [20x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [04:04:48] FIRING: [66x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [04:08:24] FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [04:26:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:59:52] FIRING: [32x] CertAlmostExpired: Certificate for service people1005:30443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [05:09:38] FIRING: [20x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [05:13:40] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:14:38] FIRING: [17x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [06:14:38] FIRING: [10x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [06:24:38] FIRING: [10x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [06:28:02] (03PS1) 10HakanIST: Add sva to wmgExtraLanguageNames [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1277256 (https://phabricator.wikimedia.org/T407106) [06:29:38] FIRING: [12x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [07:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20260426T0700) [07:34:03] FIRING: HelmReleaseBadStatus: Helm release mw-script/nngkzgw8 on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [08:04:48] FIRING: [66x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [08:08:25] FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:14:38] FIRING: [18x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [08:19:38] FIRING: [24x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [08:26:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:44:38] FIRING: [25x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [08:59:52] FIRING: [32x] CertAlmostExpired: Certificate for service people1005:30443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [09:13:40] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:24:38] FIRING: [25x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [09:54:38] FIRING: [23x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [10:04:38] FIRING: [22x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [10:09:38] FIRING: [20x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [10:14:38] FIRING: [19x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [10:29:38] FIRING: [16x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [10:54:38] FIRING: [10x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [10:59:38] FIRING: [10x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [11:34:03] FIRING: HelmReleaseBadStatus: Helm release mw-script/nngkzgw8 on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [12:04:48] FIRING: [66x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [12:08:25] FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [12:19:40] PROBLEM - librenms.wikimedia.org requires authentication on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [12:19:52] PROBLEM - SSH on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [12:20:16] PROBLEM - librenms.wikimedia.org tls expiry on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [12:24:19] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [12:25:12] RECOVERY - librenms.wikimedia.org tls expiry on netmon2002 is OK: OK - Certificate librenms.wikimedia.org will expire on Sun 12 Jul 2026 02:51:32 AM GMT +0000. https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [12:26:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:28:16] PROBLEM - librenms.wikimedia.org tls expiry on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [12:30:14] RECOVERY - librenms.wikimedia.org tls expiry on netmon2002 is OK: OK - Certificate librenms.wikimedia.org will expire on Sun 12 Jul 2026 02:51:32 AM GMT +0000. https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [12:33:16] PROBLEM - librenms.wikimedia.org tls expiry on netmon2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [12:40:14] RECOVERY - librenms.wikimedia.org tls expiry on netmon2002 is OK: OK - Certificate librenms.wikimedia.org will expire on Sun 12 Jul 2026 02:51:32 AM GMT +0000. https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [12:40:16] PROBLEM - Postfix SMTP on crm2001 is CRITICAL: CRITICAL - Certificate crm2001.codfw.wmnet expires in 15 day(s) (Tue 12 May 2026 12:40:00 PM GMT +0000). https://wikitech.wikimedia.org/wiki/Mail%23Troubleshooting [12:40:30] RECOVERY - librenms.wikimedia.org requires authentication on netmon2002 is OK: HTTP OK: Status line output matched HTTP/1.1 302 - 701 bytes in 0.136 second response time https://wikitech.wikimedia.org/wiki/CAS-SSO/Administration [12:40:42] RECOVERY - SSH on netmon2002 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u9 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [12:49:19] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [12:59:52] FIRING: [32x] CertAlmostExpired: Certificate for service people1005:30443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [13:13:40] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:14:38] FIRING: [9x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [13:29:38] FIRING: [11x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [13:34:38] FIRING: [12x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [13:59:34] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [13:59:38] FIRING: [14x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:04:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:05:05] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 6d 23h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [14:14:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:19:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:24:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:29:38] FIRING: [14x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:34:16] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2021.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2011.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [14:34:34] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:34:38] FIRING: [11x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:34:48] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2021.codfw.wmnet, wdqs2007.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2011.codfw.wmnet, wdqs2010.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [14:34:53] FIRING: [11x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:35:48] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [14:37:16] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [14:39:34] FIRING: [235x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:39:38] FIRING: [2x] CertAlmostExpired: Certificate for service wdqs1019:443 is about to expire - https://wikitech.wikimedia.org/wiki/TLS/Runbook#wdqs1019:443 - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:40:16] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-main_443: Servers wdqs2013.codfw.wmnet, wdqs2021.codfw.wmnet, wdqs2014.codfw.wmnet, wdqs2008.codfw.wmnet, wdqs2022.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [14:42:16] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [14:44:34] FIRING: [235x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:49:34] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [14:59:34] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [15:24:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [15:34:03] FIRING: HelmReleaseBadStatus: Helm release mw-script/nngkzgw8 on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [15:44:38] FIRING: [4x] CertAlmostExpired: Certificate for service wdqs1013:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [15:59:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:04:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:08:25] FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [16:09:19] FIRING: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:09:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:19:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:26:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:34:19] RESOLVED: [2x] JobUnavailable: Reduced availability for job sidekiq in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:44:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:46:32] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - eventstreams_4892: Servers wikikube-worker1280.eqiad.wmnet, wikikube-worker1067.eqiad.wmnet, wikikube-worker1303.eqiad.wmnet, wikikube-worker1007.eqiad.wmnet, wikikube-worker1165.eqiad.wmnet, wikikube-worker1079.eqiad.wmnet, wikikube-worker1161.eqiad.wmnet, wikikube-worker1304.eqiad.wmnet, wikikube-worker1298.eqiad.wmnet, wikikube-worker1306.eqiad.wm [16:46:32] ikube-worker1259.eqiad.wmnet, wikikube-worker1155.eqiad.wmnet, wikikube-worker1257.eqiad.wmnet, wikikube-worker1348.eqiad.wmnet, wikikube-worker1242.eqiad.wmnet, wikikube-worker1050.eqiad.wmnet, wikikube-worker1042.eqiad.wmnet, wikikube-worker1122.eqiad.wmnet, wikikube-worker1297.eqiad.wmnet, wikikube-worker1080.eqiad.wmnet, wikikube-worker1049.eqiad.wmnet, wikikube-worker1359.eqiad.wmnet, wikikube-worker1260.eqiad.wmnet, wikikube-worker1 [16:46:32] d.wmnet, wikikube-worker1047.eqiad.wmnet, wikikube-worker1352.eqiad.wmnet, wikikube-worker1144.eqiad.wmnet, wikikube-worker1118.eqiad.wmnet, wikikube-worker1077.eqiad.wmnet, wikikube-wo https://wikitech.wikimedia.org/wiki/PyBal [16:46:44] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - eventstreams_4892: Servers wikikube-worker1280.eqiad.wmnet, wikikube-worker1144.eqiad.wmnet, wikikube-worker1291.eqiad.wmnet, wikikube-worker1322.eqiad.wmnet, wikikube-worker1304.eqiad.wmnet, wikikube-worker1148.eqiad.wmnet, wikikube-worker1345.eqiad.wmnet, wikikube-worker1116.eqiad.wmnet, wikikube-worker1281.eqiad.wmnet, wikikube-worker1036.eqiad.wm [16:46:44] ikube-worker1371.eqiad.wmnet, wikikube-worker1315.eqiad.wmnet, wikikube-worker1132.eqiad.wmnet, wikikube-worker1247.eqiad.wmnet, wikikube-worker1136.eqiad.wmnet, wikikube-worker1260.eqiad.wmnet, wikikube-worker1279.eqiad.wmnet, wikikube-worker1282.eqiad.wmnet, wikikube-worker1263.eqiad.wmnet, wikikube-worker1338.eqiad.wmnet, wikikube-worker1307.eqiad.wmnet, wikikube-worker1296.eqiad.wmnet, wikikube-worker1159.eqiad.wmnet, wikikube-worker1 [16:46:44] d.wmnet, wikikube-worker1288.eqiad.wmnet, wikikube-worker1278.eqiad.wmnet, wikikube-worker1340.eqiad.wmnet, wikikube-worker1289.eqiad.wmnet, wikikube-worker1135.eqiad.wmnet, wikikube-wo https://wikitech.wikimedia.org/wiki/PyBal [16:48:30] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:48:44] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:49:19] FIRING: [3x] JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:49:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:54:19] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [16:54:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [16:59:52] FIRING: [32x] CertAlmostExpired: Certificate for service people1005:30443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:13:40] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:14:38] FIRING: [5x] CertAlmostExpired: Certificate for service wdqs1013:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:29:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:34:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:34:38] FIRING: [6x] CertAlmostExpired: Certificate for service wdqs1013:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:39:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:39:38] FIRING: [11x] CertAlmostExpired: Certificate for service wdqs1013:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:44:38] FIRING: [19x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:49:38] FIRING: [23x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:54:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [17:54:38] FIRING: [27x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:05:20] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 6d 19h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [18:19:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:19:38] FIRING: [27x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:24:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:24:38] FIRING: [26x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:29:38] FIRING: [28x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:34:38] FIRING: [28x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:39:38] FIRING: [24x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:44:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:49:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [18:59:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:04:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:09:38] FIRING: [22x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:14:34] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:14:38] FIRING: [22x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:19:34] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:19:38] FIRING: [20x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:29:34] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:34:03] FIRING: HelmReleaseBadStatus: Helm release mw-script/nngkzgw8 on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [19:39:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:54:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [19:59:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [20:08:25] FIRING: [16x] ProbeDown: Service aqs1010-a:7000 has failed probes (tcp_cassandra_a_ssl_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [20:24:34] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [20:24:49] FIRING: [236x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [20:26:40] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph.service on wdqs1016:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:29:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [20:44:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [20:44:41] 10ops-eqiad, 06DC-Ops: Alert for device ps1-b2-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T424455 (10phaultfinder) 03NEW [20:54:19] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:59:19] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [20:59:52] FIRING: [32x] CertAlmostExpired: Certificate for service people1005:30443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [21:04:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [21:09:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [21:09:38] FIRING: [17x] CertAlmostExpired: Certificate for service wdqs1011:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [21:13:40] FIRING: SystemdUnitFailed: wmf_auto_restart_rsyslog.service on ml-serve2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:34:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [21:39:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:05:20] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 6d 15h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [22:19:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:19:38] FIRING: [16x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:24:38] FIRING: [18x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:29:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:29:38] FIRING: [18x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:34:38] FIRING: [18x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:39:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:39:38] FIRING: [18x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:44:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [22:44:38] FIRING: [18x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:03:43] 10ops-eqiad, 06SRE, 06DC-Ops: Alert for device ps1-b2-eqiad.mgmt.eqiad.wmnet - PDU sensor over limit - https://phabricator.wikimedia.org/T424455#11859096 (10Jclark-ctr) 05Open→03Resolved a:03Jclark-ctr [23:14:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:19:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:30:05] (03Abandoned) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1277247 (owner: 10TrainBranchBot) [23:34:03] FIRING: HelmReleaseBadStatus: Helm release mw-script/nngkzgw8 on k8s@eqiad in state pending-install - https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments#Rolling_back_in_an_emergency - https://grafana.wikimedia.org/d/UT4GtK3nz?var-site=eqiad&var-cluster=k8s&var-namespace=mw-script - https://alerts.wikimedia.org/?q=alertname%3DHelmReleaseBadStatus [23:39:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:39:38] FIRING: [18x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:39:44] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1277282 [23:39:44] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1277282 (owner: 10TrainBranchBot) [23:44:34] FIRING: [237x] CertAlmostExpired: Certificate for service logstash1023:443 is about to expire - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:44:38] FIRING: [19x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired [23:50:10] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1277282 (owner: 10TrainBranchBot) [23:54:19] FIRING: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:59:19] RESOLVED: JobUnavailable: Reduced availability for job atlas_exporter in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [23:59:38] FIRING: [21x] CertAlmostExpired: Certificate for service wdqs1012:443 is about to expire - https://grafana.wikimedia.org/d/K1dRhGCnz/probes-tls-dashboard - https://alerts.wikimedia.org/?q=alertname%3DCertAlmostExpired