[00:06:22] <shinken-wm>	 PROBLEM - Puppet failure on deployment-noc is CRITICAL 100.00% of data above the critical threshold [0.0]
[00:06:30] <shinken-wm>	 PROBLEM - App Server bits response on deployment-noc is CRITICAL - Socket timeout after 10 seconds
[00:07:39] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-noc is CRITICAL - Socket timeout after 10 seconds
[00:11:29] <shinken-wm>	 PROBLEM - Host deployment-noc is DOWN: CRITICAL - Host Unreachable (10.68.16.134)
[00:12:18] <Krenair>	 well
[00:12:21] <Krenair>	 I did just delete that
[00:12:24] <Krenair>	 so... yeah?
[00:13:00] * Krenair sighs
[03:54:06] <bd808>	 !log fixed rebase conflict with "Enable firejail containment for zotero" by removing stale cherry-pick
[03:54:09] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[03:54:34] <bd808>	 !log cherry-picked https://gerrit.wikimedia.org/r/#/c/224219/
[03:54:36] <shinken-wm>	 PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 40.00% of data above the critical threshold [0.0]
[03:54:36] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[03:55:54] <shinken-wm>	 PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 20.00% of data above the critical threshold [0.0]
[03:58:06] <shinken-wm>	 PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 33.33% of data above the critical threshold [0.0]
[04:00:54] <shinken-wm>	 PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 60.00% of data above the critical threshold [0.0]
[04:00:56] <shinken-wm>	 PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 50.00% of data above the critical threshold [0.0]
[04:01:19] <shinken-wm>	 PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 44.44% of data above the critical threshold [0.0]
[04:08:27] <shinken-wm>	 PROBLEM - Puppet failure on deployment-stream is CRITICAL 40.00% of data above the critical threshold [0.0]
[04:09:55] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-logstash2 is OK Less than 1.00% above the threshold [3600.0]
[04:12:21] <shinken-wm>	 PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 66.67% of data above the critical threshold [0.0]
[04:14:26] <shinken-wm>	 PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 30.00% of data above the critical threshold [0.0]
[04:15:08] <bd808>	 something not awesome must have been stuck in rebase on the puppet repo.
[04:16:48] <bd808>	 Upgraded Elasticsearch to 1.6.0 on logstash1006; replicas recovering now
[04:17:07] <bd808>	 !log Upgraded Elasticsearch to 1.6.0 on logstash1006; replicas recovering now
[04:17:10] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[04:18:43] <bd808>	 !log Logstash cluster upgrade complete! Kibana working again
[04:18:46] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[04:19:33] <bd808>	 The recovery on the last node was awesome. ~2 minutes vs ~4 hours for each of the others
[04:20:45] <bd808>	 doh wrong channel
[04:24:34] <shinken-wm>	 RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0]
[04:25:51] <shinken-wm>	 RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0]
[04:25:53] <shinken-wm>	 RECOVERY - Puppet failure on deployment-kafka02 is OK Less than 1.00% above the threshold [0.0]
[04:26:17] <shinken-wm>	 RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0]
[04:28:07] <shinken-wm>	 RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0]
[04:35:13] <bd808>	 !log Updated /var/lib/git/labs/private to latest upstream
[04:35:16] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[04:35:54] <shinken-wm>	 RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0]
[04:44:31] <shinken-wm>	 RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0]
[04:57:20] <shinken-wm>	 RECOVERY - Puppet failure on deployment-sca01 is OK Less than 1.00% above the threshold [0.0]
[04:58:26] <shinken-wm>	 RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0]
[06:46:32] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<30.00%)
[06:54:58] <shinken-wm>	 PROBLEM - Puppet failure on integration-slave-trusty-1017 is CRITICAL 30.00% of data above the critical threshold [0.0]
[07:01:29] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK
[07:21:03] <grrrit-wm>	 (03CR) 10Zfilipin: [C: 032] Delete the ULS job [integration/config] - 10https://gerrit.wikimedia.org/r/224034 (https://phabricator.wikimedia.org/T94158) (owner: 10Amire80)
[07:23:09] <grrrit-wm>	 (03Merged) 10jenkins-bot: Delete the ULS job [integration/config] - 10https://gerrit.wikimedia.org/r/224034 (https://phabricator.wikimedia.org/T94158) (owner: 10Amire80)
[07:27:17] <grrrit-wm>	 (03CR) 10Zfilipin: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall)
[07:30:27] <grrrit-wm>	 (03CR) 10Zfilipin: [C: 031] Create and cleanup TMPDIR for mw-selenium builder [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall)
[07:34:59] <shinken-wm>	 RECOVERY - Puppet failure on integration-slave-trusty-1017 is OK Less than 1.00% above the threshold [0.0]
[08:21:24] <wmf-insecte>	 Yippee, build fixed!
[08:21:24] <wmf-insecte>	 Project browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #639: FIXED in 1 min 22 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/639/
[09:22:25] <shinken-wm>	 PROBLEM - Puppet failure on deployment-mx is CRITICAL 100.00% of data above the critical threshold [0.0]
[12:22:30] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<40.00%)
[12:54:56] <wmf-insecte>	 Yippee, build fixed!
[12:54:56] <wmf-insecte>	 Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #530: FIXED in 55 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/530/
[12:56:16] <wikibugs>	 10Browser-Tests, 10Gather: Failed Jenkins job sets Sauce Labs job to passed - https://phabricator.wikimedia.org/T105589#1447336 (10zeljkofilipin) 3NEW
[15:23:28] <shinken-wm>	 PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL 100.00% of data above the critical threshold [0.0]
[15:24:44] <wikibugs>	 10Beta-Cluster, 6Release-Engineering, 7Tracking: create API level tests to monitor services on beta/test2wiki and also on production (tracking) - https://phabricator.wikimedia.org/T60353#1447464 (10jayvdb) >>! In T60353#594652, @Legoktm wrote: > Pywikibot has a decently large suite of tests that actually use...
[18:31:18] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<55.56%)
[21:18:26] <shinken-wm>	 PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 100.00% of data above the critical threshold [0.0]
[21:26:16] <shinken-wm>	 PROBLEM - Puppet failure on deployment-cache-text03 is CRITICAL 100.00% of data above the critical threshold [0.0]
[23:20:09] <shinken-wm>	 PROBLEM - Puppet failure on deployment-bastion is CRITICAL 100.00% of data above the critical threshold [0.0]