[00:09:37] <icinga-wm>	 RECOVERY - SSH on cp5005.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[06:16:05] <icinga-wm>	 PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1001), No backups: 6 (dbprov1001, ...), Fresh: 97 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210801T0700)
[08:58:43] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:00:33] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:10:45] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[09:12:41] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 7 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[09:29:05] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=rails site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:32:53] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[11:21:51] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[11:27:39] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[11:58:14] <wikibugs>	 10SRE, 10SRE-swift-storage: Can't delete a file - https://phabricator.wikimedia.org/T287828 (10Peachey88)
[11:59:11] <wikibugs>	 10SRE, 10SRE-swift-storage: Unable to delete `Балістичні таблиці P1720666.JPG` on uk.wikipedia - An unknown error occurred in storage backend "local-multiwrite" - https://phabricator.wikimedia.org/T287828 (10Peachey88)
[11:59:27] <wikibugs>	 10SRE, 10SRE-swift-storage: Unable to delete `Балістичні таблиці P1720666.JPG` on uk.wikipedia - An unknown error occurred in storage backend "local-multiwrite" - https://phabricator.wikimedia.org/T287828 (10RhinosF1) T244567 ?
[12:00:02] <wikibugs>	 10SRE, 10SRE-swift-storage: Unable to delete `Балістичні таблиці P1720666.JPG` on uk.wikipedia - An unknown error occurred in storage backend "local-multiwrite" - https://phabricator.wikimedia.org/T287828 (10RhinosF1)
[12:00:13] <wikibugs>	 10SRE-swift-storage, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Page-deletion, and 3 others: Some files cannot be deleted "Error deleting file: An unknown error occurred in storage backend "local-multiwrite". " - https://phabricator.wikimedia.org/T244567 (10RhinosF1)
[12:00:50] <wikibugs>	 10SRE-swift-storage, 10MediaWiki-File-management, 10Structured Data Engineering, 10Structured-Data-Backlog, 10Wikimedia-production-error: Cannot delete one image file on Thai Wikipedia: Error deleting file: An unknown error occurred in storage backend "local-mult... - https://phabricator.wikimedia.org/T270811
[12:08:47] <wikibugs>	 10SRE-swift-storage, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Page-deletion, and 3 others: Some files cannot be deleted "Error deleting file: An unknown error occurred in storage backend "local-multiwrite". " - https://phabricator.wikimedia.org/T244567 (10Andriy.v) >>! In T244567#7231975, @WindE...
[12:17:17] <icinga-wm>	 PROBLEM - MariaDB memory on clouddb1019 is CRITICAL: CRIT Memory 98% used. Largest process: mysqld (18326) = 75.8% https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[12:19:52] <wikibugs>	 10SRE-swift-storage, 10MediaWiki-File-management, 10Structured Data Engineering, 10Structured-Data-Backlog, 10Wikimedia-production-error: Cannot delete one image file on Thai Wikipedia: Error deleting file: An unknown error occurred in storage backend "local-mult... - https://phabricator.wikimedia.org/T270811
[12:20:02] <wikibugs>	 10SRE-swift-storage, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Page-deletion, and 3 others: Some files cannot be deleted "Error deleting file: An unknown error occurred in storage backend "local-multiwrite". " - https://phabricator.wikimedia.org/T244567 (10Zabe)
[12:21:03] <RhinosF1>	 Ty zabe
[12:21:52] <zabe>	 np
[12:22:12] * RhinosF1 didn't do it in case he was blind and missing anything
[12:25:04] * zabe just thought to himself: 'They can reopen the task again anyway in such a case'
[12:25:59] <zabe>	 T174269 seems to also be a duplicate
[12:26:00] <stashbot>	 T174269: Two cases of local-multiwrite storage backend failure - https://phabricator.wikimedia.org/T174269
[12:26:41] <RhinosF1>	 Amir1: that's yours ^
[12:27:29] <RhinosF1>	 That seems older
[12:27:34] <RhinosF1>	 By 3 years than the main
[12:30:07] <wikibugs>	 10SRE, 10SRE-swift-storage: Two cases of local-multiwrite storage backend failure - https://phabricator.wikimedia.org/T174269 (10Zabe) Is this the same as T244567?
[13:45:22] <wikibugs>	 (03PS5) 10Labdajiwa: Set the project namespace and sitename for Javanese Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/708206 (https://phabricator.wikimedia.org/T287437)
[15:45:11] <icinga-wm>	 PROBLEM - MariaDB memory on clouddb1019 is CRITICAL: CRIT Memory 98% used. Largest process: mysqld (18326) = 75.9% https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[17:26:03] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[17:28:00] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 11 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[18:24:39] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:26:35] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:12:52] <wikibugs>	 10SRE, 10ops-eqiad: Degraded RAID on cloudcephosd1008 - https://phabricator.wikimedia.org/T287838 (10ops-monitoring-bot)
[20:26:01] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[20:27:57] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 11 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[21:02:24] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[21:04:19] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 14 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[21:37:00] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab1001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator
[21:38:55] <icinga-wm>	 RECOVERY - PHD should be supervising processes on phab1001 is OK: PROCS OK: 4 processes with UID = 497 (phd) https://wikitech.wikimedia.org/wiki/Phabricator