[00:16:36] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:17:38] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 241, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[00:33:55] <wikibugs>	 10ops-eqiad: PDU sensor over limit - https://phabricator.wikimedia.org/T314997 (10phaultfinder)
[00:42:02] <icinga-wm>	 RECOVERY - Check systemd state on logstash1026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:42:07] <wikibugs>	 (03PS1) 10Stang: kowiki: Add logo (legacy vector and vector-2022) for 600k articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/822717 (https://phabricator.wikimedia.org/T315127)
[00:42:56] <icinga-wm>	 RECOVERY - Check systemd state on logstash2026 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:44:27] <wikibugs>	 (03PS1) 10Stang: kowiki: Change logo for 600k articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/822718 (https://phabricator.wikimedia.org/T315127)
[00:58:42] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:02:55] <wikibugs>	 (03PS2) 10Stang: kowiki: Add logo (legacy vector and vector-2022) for 600k articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/822717 (https://phabricator.wikimedia.org/T315127)
[01:03:21] <wikibugs>	 (03PS2) 10Stang: kowiki: Change logo for 600k articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/822718 (https://phabricator.wikimedia.org/T315127)
[01:04:06] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] kowiki: Change logo for 600k articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/822718 (https://phabricator.wikimedia.org/T315127) (owner: 10Stang)
[01:08:32] <wikibugs>	 (03PS3) 10Stang: kowiki: Change logo for 600k articles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/822718 (https://phabricator.wikimedia.org/T315127)
[01:30:52] <icinga-wm>	 RECOVERY - Check systemd state on gitlab1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:39:45] <jinxer-wm>	 (JobUnavailable) firing: (9) Reduced availability for job nginx in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:46:54] <icinga-wm>	 PROBLEM - puppet last run on gitlab1003 is CRITICAL: CRITICAL: Puppet last ran 1 day ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[01:49:45] <jinxer-wm>	 (JobUnavailable) firing: (11) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[01:50:08] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) is CRITICAL: Test Get structured talk page for enwiki Salt article returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:50:28] <icinga-wm>	 RECOVERY - SSH on db1110.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:52:02] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase
[01:52:50] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:55:13] <jinxer-wm>	 (ThanosCompactIsDown) firing: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown
[02:01:34] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:09:45] <jinxer-wm>	 (JobUnavailable) firing: (11) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:12:52] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[02:13:18] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:15:40] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:17:32] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[02:17:58] <icinga-wm>	 RECOVERY - puppet last run on gitlab1003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[02:19:45] <jinxer-wm>	 (JobUnavailable) firing: (11) Reduced availability for job gitaly in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:25:18] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:27:24] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:29:12] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on api_appserver in codfw on alert1001 is CRITICAL: cluster=api_appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[02:31:28] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on api_appserver in codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=codfw+prometheus/ops&var-cluster=api_appserver&var-method=GET
[02:31:54] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:32:08] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:38:58] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:45:59] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:59:06] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:01:16] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:03:10] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:03:46] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:14:44] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:17:08] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:27:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[03:37:30] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10cloud-services-team (Kanban): More public IPs for codfw1dev - https://phabricator.wikimedia.org/T313977 (10Andrew) 05Open→03Resolved all good now!
[03:40:58] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:45:38] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:57:04] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqsin is OK: OK: host 103.102.166.130, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:57:22] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:57:46] <icinga-wm>	 RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 70, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:02:04] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:11:28] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:14:16] <icinga-wm>	 PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:15:58] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqsin is CRITICAL: CRITICAL: host 103.102.166.130, interfaces up: 69, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[04:16:12] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:28:04] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:30:24] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:32:14] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[04:34:38] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 10 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[04:54:22] <icinga-wm>	 PROBLEM - SSH on db1110.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:29:36] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:32:00] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:43:50] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:52:46] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[05:55:08] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 4 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[05:55:13] <jinxer-wm>	 (ThanosCompactIsDown) firing: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown
[06:00:24] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:20:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[06:41:36] <icinga-wm>	 RECOVERY - SSH on wtp1044.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220814T0700)
[07:06:38] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[07:27:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[08:17:50] <icinga-wm>	 PROBLEM - Check systemd state on logstash2026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:33:12] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[08:35:34] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 4 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[08:54:13] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
[08:54:38] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
[08:54:44] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32380 and previous config saved to /var/cache/conftool/dbconfig/20220814-085443-ladsgroup.json
[08:54:47] <stashbot>	 T312863: Schema change to change primary key of templatelinks - https://phabricator.wikimedia.org/T312863
[08:57:28] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:59:40] <icinga-wm>	 RECOVERY - SSH on db1110.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:02:10] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:55:13] <jinxer-wm>	 (ThanosCompactIsDown) firing: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown
[10:13:06] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:15:28] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:20:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[10:27:28] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:32:14] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[11:11:56] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[11:19:13] <wikibugs>	 (03PS1) 10Urbanecm: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/822725 (https://phabricator.wikimedia.org/T315182)
[11:22:57] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "change would work, but 'w' is the more standard prefix used in import sources." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/819071 (https://phabricator.wikimedia.org/T314820) (owner: 10MdsShakil)
[11:27:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[11:28:50] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] "extension needs to be present in at least two trains to be addable to extension-list (scap sync-world breaks otherwise). looks to be only " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/821249 (https://phabricator.wikimedia.org/T314294) (owner: 10Samtar)
[11:46:06] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] "Personally speaking, I don't like w.wiki URLs in user agents. Contact/link info in user agents is usually important when the infrastructur" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/821246 (owner: 10Samtar)
[11:59:01] <wikibugs>	 (03CR) 10JMeybohm: Create basic haproxy container (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/821275 (https://phabricator.wikimedia.org/T233196) (owner: 10Hnowlan)
[12:09:18] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:16:26] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:54:14] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:01:18] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:05:14] <icinga-wm>	 PROBLEM - SSH on db1110.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:22:40] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:28:49] <wikibugs>	 10ops-eqiad: PDU sensor over limit - https://phabricator.wikimedia.org/T314997 (10phaultfinder)
[13:32:08] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:37:44] <wikibugs>	 (03PS12) 10MdsShakil: Add bnwiki in wgImportSources to bnwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/819071 (https://phabricator.wikimedia.org/T314820)
[13:40:17] <wikibugs>	 (03CR) 10MdsShakil: Add bnwiki in wgImportSources to bnwikibooks (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/819071 (https://phabricator.wikimedia.org/T314820) (owner: 10MdsShakil)
[13:40:49] <wikibugs>	 (03CR) 10MdsShakil: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/819071 (https://phabricator.wikimedia.org/T314820) (owner: 10MdsShakil)
[13:55:13] <jinxer-wm>	 (ThanosCompactIsDown) firing: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown
[14:03:48] <wikibugs>	 10ops-eqiad: PDU sensor over limit - https://phabricator.wikimedia.org/T314997 (10phaultfinder)
[14:20:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:03:49] <wikibugs>	 10ops-eqiad: PDU sensor over limit - https://phabricator.wikimedia.org/T314997 (10phaultfinder)
[15:27:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[16:03:57] <wikibugs>	 (03PS3) 10Urbanecm: Pin the reason migration stage to read and write old [mediawiki-config] - 10https://gerrit.wikimedia.org/r/820838 (https://phabricator.wikimedia.org/T233004) (owner: 10Dreamy Jazz)
[16:04:28] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "LGTM, will merge tomorrow." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/820838 (https://phabricator.wikimedia.org/T233004) (owner: 10Dreamy Jazz)
[16:05:12] <wikibugs>	 (03PS4) 10Urbanecm: Pin wgCheckUserLogReasonMigrationStage to read and write old [mediawiki-config] - 10https://gerrit.wikimedia.org/r/820838 (https://phabricator.wikimedia.org/T233004) (owner: 10Dreamy Jazz)
[16:09:10] <icinga-wm>	 RECOVERY - SSH on db1110.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:12:06] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:24:56] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.74 ms
[16:34:04] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:47:04] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms
[17:15:26] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[17:21:50] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.86 ms
[17:23:09] <wikibugs>	 (03CR) 10BryanDavis: Introduce DriverInterface (031 comment) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/699719 (owner: 10Giuseppe Lavagetto)
[17:30:18] <wikibugs>	 (03PS1) 10BryanDavis: Add missing attrs dependency [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/822734
[17:31:02] <wikibugs>	 (03CR) 10BryanDavis: Introduce DriverInterface (031 comment) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/699719 (owner: 10Giuseppe Lavagetto)
[17:32:58] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add missing attrs dependency [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/822734 (owner: 10BryanDavis)
[17:50:05] <wikibugs>	 (03PS1) 10David Caro: openstack: update control nodes after refresh [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/822735
[17:50:07] <wikibugs>	 (03PS1) 10David Caro: wmcs.quota_increase: fix not needed parameter [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/822736
[17:55:13] <jinxer-wm>	 (ThanosCompactIsDown) firing: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown
[17:56:44] <icinga-wm>	 PROBLEM - SSH on wtp1044.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:59:10] <icinga-wm>	 PROBLEM - Check systemd state on webperf2004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:01:34] <icinga-wm>	 RECOVERY - Check systemd state on webperf2004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:03:08] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[18:09:36] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING WARNING - Packet loss = 66%, RTA = 0.83 ms
[18:20:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[18:26:03] <wikibugs>	 (03CR) 10Andrew Bogott: "Apologies! I duplicated most of this patch in https://gerrit.wikimedia.org/r/c/operations/puppet/+/822145" [puppet] - 10https://gerrit.wikimedia.org/r/800949 (owner: 10Majavah)
[18:26:51] <wikibugs>	 (03PS5) 10Andrew Bogott: P:openstack::glance: tidy up monitoring params [puppet] - 10https://gerrit.wikimedia.org/r/800949 (owner: 10Majavah)
[18:27:42] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[18:30:53] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:openstack::glance: tidy up monitoring params [puppet] - 10https://gerrit.wikimedia.org/r/800949 (owner: 10Majavah)
[18:31:40] <wikibugs>	 (03PS5) 10Andrew Bogott: openstack::cinder: monitor the backend port [puppet] - 10https://gerrit.wikimedia.org/r/800950 (owner: 10Majavah)
[18:34:33] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack::cinder: monitor the backend port [puppet] - 10https://gerrit.wikimedia.org/r/800950 (owner: 10Majavah)
[18:34:55] <wikibugs>	 (03PS5) 10Andrew Bogott: openstack::nova: monitor the backend port [puppet] - 10https://gerrit.wikimedia.org/r/800951 (owner: 10Majavah)
[18:36:43] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack::nova: monitor the backend port [puppet] - 10https://gerrit.wikimedia.org/r/800951 (owner: 10Majavah)
[18:37:29] <wikibugs>	 (03PS5) 10Andrew Bogott: P:openstack::haproxy: codfw1dev: remove non-tls ports [puppet] - 10https://gerrit.wikimedia.org/r/800952 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[19:12:32] <icinga-wm>	 PROBLEM - SSH on db1110.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:16:35] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:openstack::haproxy: codfw1dev: remove non-tls ports [puppet] - 10https://gerrit.wikimedia.org/r/800952 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[19:19:06] <wikibugs>	 (03PS5) 10Andrew Bogott: P:openstack::haproxy: eqiad1: remove non-tls ports [puppet] - 10https://gerrit.wikimedia.org/r/800953 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[19:22:46] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.95 ms
[19:26:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:openstack::haproxy: eqiad1: remove non-tls ports [puppet] - 10https://gerrit.wikimedia.org/r/800953 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[19:27:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[19:36:36] <wikibugs>	 (03PS5) 10Andrew Bogott: P:openstack::designate::firewall: cleanup [puppet] - 10https://gerrit.wikimedia.org/r/800954 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[19:39:18] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] "diff lgtm  https://puppet-compiler.wmflabs.org/pcc-worker1003/36736/cloudcontrol1005.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/800954 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[19:42:41] <wikibugs>	 (03PS5) 10Andrew Bogott: P:openstack: misc cleanup for non-tls ports [puppet] - 10https://gerrit.wikimedia.org/r/800955 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[19:49:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] P:openstack: misc cleanup for non-tls ports [puppet] - 10https://gerrit.wikimedia.org/r/800955 (https://phabricator.wikimedia.org/T267194) (owner: 10Majavah)
[20:01:41] <wikibugs>	 (03PS1) 10Andrew Bogott: profile::openstack::base::designate::firewall::api: add a missing ) [puppet] - 10https://gerrit.wikimedia.org/r/822738 (https://phabricator.wikimedia.org/T267194)
[20:04:51] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] profile::openstack::base::designate::firewall::api: add a missing ) [puppet] - 10https://gerrit.wikimedia.org/r/822738 (https://phabricator.wikimedia.org/T267194) (owner: 10Andrew Bogott)
[20:13:44] <icinga-wm>	 RECOVERY - SSH on db1110.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:42:32] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:53:56] <wikibugs>	 (03CR) 10Samtar: extension-list: Add Phonos (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/821249 (https://phabricator.wikimedia.org/T314294) (owner: 10Samtar)
[21:23:52] <wikibugs>	 10ops-eqiad: PDU sensor over limit - https://phabricator.wikimedia.org/T314997 (10phaultfinder)
[21:25:48] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:38:48] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING OK - Packet loss = 0%, RTA = 11.90 ms
[21:47:52] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[21:55:13] <jinxer-wm>	 (ThanosCompactIsDown) firing: Thanos component has disappeared. - https://wikitech.wikimedia.org/wiki/Thanos#Alerts - https://grafana.wikimedia.org/d/0cb8830a6e957978796729870f560cda/thanos-overview - https://alerts.wikimedia.org/?q=alertname%3DThanosCompactIsDown
[22:06:52] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.86 ms
[22:20:00] <jinxer-wm>	 (JobUnavailable) firing: Reduced availability for job thanos-compact in ops@codfw - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[22:26:30] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:26:36] <icinga-wm>	 PROBLEM - Check systemd state on webperf1004 is CRITICAL: CRITICAL - degraded: The following units failed: arclamp_generate_svgs.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:29:02] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[22:31:20] <icinga-wm>	 RECOVERY - Check systemd state on webperf1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:41:58] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING OK - Packet loss = 0%, RTA = 3.01 ms
[22:45:12] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:19:20] <icinga-wm>	 PROBLEM - Host cp1089.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[23:23:09] <wikibugs>	 10SRE-swift-storage, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Page-deletion, and 3 others: Some files cannot be deleted "Error deleting file: An unknown error occurred in storage backend "local-multiwrite". " - https://phabricator.wikimedia.org/T244567 (10Base)
[23:27:13] <jinxer-wm>	 (KubernetesRsyslogDown) firing: (2) rsyslog on kubestage2001:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[23:45:30] <icinga-wm>	 RECOVERY - Host cp1089.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms