[00:06:22] PROBLEM - Puppet failure on deployment-noc is CRITICAL 100.00% of data above the critical threshold [0.0] [00:06:30] PROBLEM - App Server bits response on deployment-noc is CRITICAL - Socket timeout after 10 seconds [00:07:39] PROBLEM - App Server Main HTTP Response on deployment-noc is CRITICAL - Socket timeout after 10 seconds [00:11:29] PROBLEM - Host deployment-noc is DOWN: CRITICAL - Host Unreachable (10.68.16.134) [00:12:18] well [00:12:21] I did just delete that [00:12:24] so... yeah? [00:13:00] * Krenair sighs [03:54:06] !log fixed rebase conflict with "Enable firejail containment for zotero" by removing stale cherry-pick [03:54:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [03:54:34] !log cherry-picked https://gerrit.wikimedia.org/r/#/c/224219/ [03:54:36] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 40.00% of data above the critical threshold [0.0] [03:54:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [03:55:54] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 20.00% of data above the critical threshold [0.0] [03:58:06] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 33.33% of data above the critical threshold [0.0] [04:00:54] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 60.00% of data above the critical threshold [0.0] [04:00:56] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 50.00% of data above the critical threshold [0.0] [04:01:19] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 44.44% of data above the critical threshold [0.0] [04:08:27] PROBLEM - Puppet failure on deployment-stream is CRITICAL 40.00% of data above the critical threshold [0.0] [04:09:55] RECOVERY - Puppet staleness on deployment-logstash2 is OK Less than 1.00% above the threshold [3600.0] [04:12:21] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 66.67% of data above the critical threshold [0.0] [04:14:26] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 30.00% of data above the critical threshold [0.0] [04:15:08] something not awesome must have been stuck in rebase on the puppet repo. [04:16:48] Upgraded Elasticsearch to 1.6.0 on logstash1006; replicas recovering now [04:17:07] !log Upgraded Elasticsearch to 1.6.0 on logstash1006; replicas recovering now [04:17:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [04:18:43] !log Logstash cluster upgrade complete! Kibana working again [04:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [04:19:33] The recovery on the last node was awesome. ~2 minutes vs ~4 hours for each of the others [04:20:45] doh wrong channel [04:24:34] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [04:25:51] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [04:25:53] RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0] [04:26:17] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [04:28:07] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [04:35:13] !log Updated /var/lib/git/labs/private to latest upstream [04:35:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [04:35:54] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [04:44:31] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [04:57:20] RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0] [04:58:26] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [06:46:32] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<30.00%) [06:54:58] PROBLEM - Puppet failure on integration-slave-trusty-1017 is CRITICAL 30.00% of data above the critical threshold [0.0] [07:01:29] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [07:21:03] (03CR) 10Zfilipin: [C: 032] Delete the ULS job [integration/config] - 10https://gerrit.wikimedia.org/r/224034 (https://phabricator.wikimedia.org/T94158) (owner: 10Amire80) [07:23:09] (03Merged) 10jenkins-bot: Delete the ULS job [integration/config] - 10https://gerrit.wikimedia.org/r/224034 (https://phabricator.wikimedia.org/T94158) (owner: 10Amire80) [07:27:17] (03CR) 10Zfilipin: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [07:30:27] (03CR) 10Zfilipin: [C: 031] Create and cleanup TMPDIR for mw-selenium builder [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [07:34:59] RECOVERY - Puppet failure on integration-slave-trusty-1017 is OK Less than 1.00% above the threshold [0.0] [08:21:24] Yippee, build fixed! [08:21:24] Project browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #639: FIXED in 1 min 22 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/639/ [09:22:25] PROBLEM - Puppet failure on deployment-mx is CRITICAL 100.00% of data above the critical threshold [0.0] [12:22:30] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<40.00%) [12:54:56] Yippee, build fixed! [12:54:56] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #530: FIXED in 55 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/530/ [12:56:16] 10Browser-Tests, 10Gather: Failed Jenkins job sets Sauce Labs job to passed - https://phabricator.wikimedia.org/T105589#1447336 (10zeljkofilipin) 3NEW [15:23:28] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL 100.00% of data above the critical threshold [0.0] [15:24:44] 10Beta-Cluster, 6Release-Engineering, 7Tracking: create API level tests to monitor services on beta/test2wiki and also on production (tracking) - https://phabricator.wikimedia.org/T60353#1447464 (10jayvdb) >>! In T60353#594652, @Legoktm wrote: > Pywikibot has a decently large suite of tests that actually use... [18:31:18] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<55.56%) [21:18:26] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:26:16] PROBLEM - Puppet failure on deployment-cache-text03 is CRITICAL 100.00% of data above the critical threshold [0.0] [23:20:09] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 100.00% of data above the critical threshold [0.0]