[00:05:47] PROBLEM - Puppet errors on deployment-cumin is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:46:10] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-09 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:46:38] PROBLEM - App Server Main HTTP Response on deployment-mediawiki06 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:49:10] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:49:16] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-07 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:49:29] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:49:51] Project beta-scap-eqiad build #210252: 04FAILURE in 6 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/210252/ [01:52:18] PROBLEM - Puppet errors on deployment-chromium01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:54:19] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47487 bytes in 3.537 second response time [01:54:41] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:55:37] PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:58:32] PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:59:23] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:59:25] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:59:53] Project beta-scap-eqiad build #210253: 04STILL FAILING in 5 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/210253/ [01:59:53] PROBLEM - Puppet errors on deployment-ores01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:01:12] PROBLEM - Puppet errors on deployment-kafka-main-2 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:02:13] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:04:23] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:05:00] PROBLEM - Puppet errors on deployment-kafka-main-1 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:06:48] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:08:24] PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:09:08] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-07 is OK: HTTP OK: HTTP/1.1 200 OK - 46950 bytes in 1.866 second response time [02:09:33] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 35876 bytes in 1.687 second response time [02:09:49] Project beta-scap-eqiad build #210254: 04STILL FAILING in 5 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/210254/ [02:09:53] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:10:11] PROBLEM - Puppet errors on deployment-cassandra3-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:10:25] PROBLEM - Puppet errors on deployment-cpjobqueue is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:10:38] PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [02:11:00] PROBLEM - Puppet errors on deployment-eventlog05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [02:12:26] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [02:13:04] PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:13:42] PROBLEM - Puppet errors on deployment-webperf11 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:14:20] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:16:12] PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:16:47] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [02:17:12] PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:19:49] Project beta-scap-eqiad build #210255: 04STILL FAILING in 5 min 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/210255/ [02:21:02] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-09 is OK: HTTP OK: HTTP/1.1 200 OK - 46942 bytes in 0.901 second response time [02:21:28] RECOVERY - App Server Main HTTP Response on deployment-mediawiki06 is OK: HTTP OK: HTTP/1.1 200 OK - 47536 bytes in 1.138 second response time [02:24:11] uh [02:26:21] puppet was probably me, idk about the app server HTTP thing [02:29:01] Yippee, build fixed! [02:29:01] Project beta-scap-eqiad build #210256: 09FIXED in 5 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/210256/ [02:34:23] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:35:48] RECOVERY - Puppet errors on deployment-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [02:37:16] RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:38:32] RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:39:24] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [02:39:26] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:39:52] RECOVERY - Puppet errors on deployment-ores01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:41:11] RECOVERY - Puppet errors on deployment-kafka-main-2 is OK: OK: Less than 1.00% above the threshold [0.0] [02:41:47] RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:43:20] RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [02:44:54] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:45:00] RECOVERY - Puppet errors on deployment-kafka-main-1 is OK: OK: Less than 1.00% above the threshold [0.0] [02:45:09] RECOVERY - Puppet errors on deployment-cassandra3-01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:45:25] RECOVERY - Puppet errors on deployment-cpjobqueue is OK: OK: Less than 1.00% above the threshold [0.0] [02:45:39] RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [02:46:00] RECOVERY - Puppet errors on deployment-eventlog05 is OK: OK: Less than 1.00% above the threshold [0.0] [02:47:24] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:48:05] RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:48:43] RECOVERY - Puppet errors on deployment-webperf11 is OK: OK: Less than 1.00% above the threshold [0.0] [02:51:48] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:52:12] RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:52:54] might've been https://phabricator.wikimedia.org/T196252 [02:53:09] that host crashing probably causes issues with wmflabs.org DNS [02:54:10] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [02:54:19] RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:56:16] RECOVERY - Puppet errors on deployment-cassandra3-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:57:16] RECOVERY - Puppet errors on deployment-chromium01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:00:37] RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:24:27] PROBLEM - Puppet errors on deployment-dumps-puppetmaster is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [03:25:05] RECOVERY - Puppet errors on deployment-snapshot01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:34:26] RECOVERY - Puppet errors on deployment-dumps-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [04:01:04] PROBLEM - Puppet errors on deployment-snapshot01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [04:15:12] I declare victory over puppet [04:15:19] +deployment-snapshot01.deployment-prep.eqiad.wmflabs,deployment-snapshot01,10.68.19.94 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBB9inbQwRHIuiyIWdD0emR9rXeCP7fESQKsKFCV/6Gax7x5PPsy2ACSGZFooBXOiszX0utPFGXKOVePTUaBnT7U= [04:15:23] in SSH known hosts [04:16:03] I'm not entirely thrilled by Puppet DB's security but anyway [04:26:07] RECOVERY - Puppet errors on deployment-snapshot01 is OK: OK: Less than 1.00% above the threshold [0.0] [05:09:42] PROBLEM - Puppet errors on deployment-webperf11 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [05:39:06] PROBLEM - Puppet errors on deployment-puppetdb02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:49:05] RECOVERY - Puppet errors on deployment-puppetdb02 is OK: OK: Less than 1.00% above the threshold [0.0] [05:49:43] RECOVERY - Puppet errors on deployment-webperf11 is OK: OK: Less than 1.00% above the threshold [0.0] [05:51:00] PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [05:51:37] PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:52:21] PROBLEM - Puppet errors on deployment-puppetmaster03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [05:52:33] PROBLEM - Puppet errors on deployment-sca04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [05:53:03] PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [05:53:17] PROBLEM - Puppet errors on deployment-mediawiki-09 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [05:53:21] PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [05:55:03] PROBLEM - Puppet errors on deployment-puppetdb02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:55:37] PROBLEM - Puppet errors on deployment-jobrunner03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:56:40] PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [05:57:04] PROBLEM - Puppet errors on deployment-snapshot01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:57:04] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:58:14] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [05:59:32] PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:00:22] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:00:24] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:00:52] PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:01:49] PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:05:05] RECOVERY - Puppet errors on deployment-puppetdb02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:09:07] PROBLEM - Puppet errors on deployment-kafka-jumbo-2 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:12:12] PROBLEM - Puppet errors on deployment-kafka-jumbo-1 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:12:34] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<50.00%) [06:13:08] PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:17:14] PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:18:10] PROBLEM - Puppet errors on deployment-certcentral-testclient02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [06:21:00] PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:21:04] PROBLEM - Puppet errors on deployment-puppetdb02 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [06:28:03] RECOVERY - Puppet errors on deployment-ms-be03 is OK: OK: Less than 1.00% above the threshold [0.0] [06:29:31] PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [06:30:57] RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0] [06:31:37] RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:31:40] RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:32:35] RECOVERY - Puppet errors on deployment-sca04 is OK: OK: Less than 1.00% above the threshold [0.0] [06:33:15] RECOVERY - Puppet errors on deployment-mediawiki-09 is OK: OK: Less than 1.00% above the threshold [0.0] [06:33:23] RECOVERY - Puppet errors on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0] [06:35:37] RECOVERY - Puppet errors on deployment-jobrunner03 is OK: OK: Less than 1.00% above the threshold [0.0] [06:36:45] RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [06:39:34] RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0] [06:47:33] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [06:48:08] RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [06:49:08] RECOVERY - Puppet errors on deployment-kafka-jumbo-2 is OK: OK: Less than 1.00% above the threshold [0.0] [06:50:08] PROBLEM - Puppet errors on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:52:10] RECOVERY - Puppet errors on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [0.0] [06:52:14] RECOVERY - Puppet errors on deployment-cassandra3-02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:53:09] RECOVERY - Puppet errors on deployment-certcentral-testclient02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:53:35] PROBLEM - Puppet errors on deployment-sca04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:54:19] PROBLEM - Puppet errors on deployment-mediawiki-09 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:56:37] PROBLEM - Puppet errors on deployment-jobrunner03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [06:57:19] RECOVERY - Puppet errors on deployment-puppetmaster03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:01:02] RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [07:05:22] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:07:05] RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:07:05] RECOVERY - Puppet errors on deployment-snapshot01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:09:31] RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:30:10] RECOVERY - Puppet errors on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [07:31:37] RECOVERY - Puppet errors on deployment-jobrunner03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:33:15] RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:33:31] RECOVERY - Puppet errors on deployment-sca04 is OK: OK: Less than 1.00% above the threshold [0.0] [07:34:15] RECOVERY - Puppet errors on deployment-mediawiki-09 is OK: OK: Less than 1.00% above the threshold [0.0] [07:35:23] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:35:53] RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:40:42] PROBLEM - Puppet errors on deployment-webperf11 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [08:15:42] RECOVERY - Puppet errors on deployment-webperf11 is OK: OK: Less than 1.00% above the threshold [0.0] [10:07:16] 10Gerrit, 10Release-Engineering-Team, 10Documentation: Document process to merge a reviewed but not submitted change using the gerrit web-ui - https://phabricator.wikimedia.org/T196265#4251391 (10Physikerwelt) [13:18:47] PROBLEM - Puppet errors on deployment-maps03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:03:47] 10Phabricator (Upstream), 10Upstream: Option to Turn Off Status Updates in Phabricator Task-Threads - https://phabricator.wikimedia.org/T195728#4251985 (10Johnywhy) @Aklapper phabricator dev says muting is stronger than mentions. https://discourse.phabricator-community.org/t/request-less-obtrusive-status-upda... [16:09:37] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:47:46] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Set up puppet exported resources to collect ssh host keys for beta - https://phabricator.wikimedia.org/T72792#4252230 (10Krenair) >>! In T72792#4247163, @Krenair wrote: > Also probably isn't handling deployment-snapshot01 as that uses deployment-du... [20:43:50] PROBLEM - Puppet errors on integration-slave-jessie-android is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:57:27] (03PS1) 10Legoktm: Automatically fix `@gropu` to `@group` [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/437151