[00:08:37] <wikibugs>	 10Deployment-Systems, 10MediaWiki-extensions-LocalisationUpdate, 10I18n, 10Patch-For-Review: Set l10nupdate cron to run Mon-Thursday - https://phabricator.wikimedia.org/T164035#3365979 (10Dzahn) 05Open>03Resolved ``` [tin:~] $ sudo crontab -u l10nupdate -l | tail -2 # Puppet Name: l10nupdate 0 2 * * 1,...
[00:17:37] <wikibugs>	 (03PS2) 10Thcipriani: puppet docker: add missing wrappers [integration/config] - 10https://gerrit.wikimedia.org/r/360363 (https://phabricator.wikimedia.org/T166888) (owner: 10Hashar)
[00:25:48] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] puppet docker: add missing wrappers [integration/config] - 10https://gerrit.wikimedia.org/r/360363 (https://phabricator.wikimedia.org/T166888) (owner: 10Hashar)
[00:28:02] <wikibugs>	 (03Merged) 10jenkins-bot: puppet docker: add missing wrappers [integration/config] - 10https://gerrit.wikimedia.org/r/360363 (https://phabricator.wikimedia.org/T166888) (owner: 10Hashar)
[00:40:32] <wikibugs>	 (03PS1) 10MaxSem: Add Phan tests to LoginNotify [integration/config] - 10https://gerrit.wikimedia.org/r/360587
[01:34:33] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-aqs01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[04:08:27] <wmf-insecte>	 Yippee, build fixed!
[04:08:27] <wmf-insecte>	 Project selenium-MultimediaViewer » safari,beta,OS X 10.9,BrowserTests build #429: 09FIXED in 12 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=BrowserTests/429/
[05:20:39] <wikibugs>	 10Gerrit, 10Operations, 10Patch-For-Review: Gerrit constantly throws HTTP 500 error when reviewing patches (due to "Too many open files") - https://phabricator.wikimedia.org/T168360#3366375 (10demon) Hmm, all this started after we tried swapping SysV init for systemd. Funny how that correlates 🤔 😏
[05:23:17] <wikibugs>	 10Gerrit, 10Operations, 10Patch-For-Review: Gerrit constantly throws HTTP 500 error when reviewing patches (due to "Too many open files") - https://phabricator.wikimedia.org/T168360#3366376 (10Dzahn) Better to raise it (https://gerrit.wikimedia.org/r/#/c//1) than not raise it. I am happy to build the new deb...
[07:19:43] <wikibugs>	 10Deployment-Systems, 10ArchCom-RfC, 10I18n: RFC: Reevaluate LocalisationUpdate extension for WMF - https://phabricator.wikimedia.org/T158360#3366521 (10Nemo_bis)
[07:36:04] <wikibugs>	 10Scap, 10Discovery, 10Interactive-Sprint, 10Maps (Kartotherian), 10Patch-For-Review: Break Kartotherian scap3 deployment into 2 groups - https://phabricator.wikimedia.org/T147337#3366535 (10Gehel) Patch merged, next deployment will use it. This can be closed.
[07:38:38] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-History-or-Diffs, 10TCB-Team, 10wikidiff2, and 3 others: php-compile-hhvm jenkins job failing with "command not found" error - https://phabricator.wikimedia.org/T168410#3366547 (10Tobi_WMDE_SW) Thank you @Paladox&@hashar!
[07:39:53] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-History-or-Diffs, 10TCB-Team, 10wikidiff2, and 3 others: Migrate php-compile-hhvm test to jessie - https://phabricator.wikimedia.org/T168410#3366550 (10Tobi_WMDE_SW) 05Open>03Resolved
[08:34:27] <wikibugs>	 (03CR) 10Hashar: [C: 032] Add Phan tests to LoginNotify [integration/config] - 10https://gerrit.wikimedia.org/r/360587 (owner: 10MaxSem)
[08:36:16] <wikibugs>	 (03Merged) 10jenkins-bot: Add Phan tests to LoginNotify [integration/config] - 10https://gerrit.wikimedia.org/r/360587 (owner: 10MaxSem)
[08:37:24] <wmf-insecte>	 Project beta-scap-eqiad build #160628: 04FAILURE in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160628/
[08:45:58] <wmf-insecte>	 Yippee, build fixed!
[08:45:59] <wmf-insecte>	 Project beta-scap-eqiad build #160629: 09FIXED in 2 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160629/
[08:55:58] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[09:16:17] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[09:30:57] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[09:38:27] <hashar>	 !log upgrading/Rebooting all instances from integration project to catch up with Linux kernel upgrades
[09:38:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:39:07] <hashar>	 !log integration: deleting swift and and swift-storage-01  unused
[09:39:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:39:44] <shinken-wm>	 PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[09:40:48] <shinken-wm>	 PROBLEM - Host swift is DOWN: CRITICAL - Host Unreachable (10.68.20.150)
[09:41:18] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:42:18] <wikibugs>	 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10Wikidata, and 2 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3366736 (10Aleksey_WMDE) Hey!  Today I received another pack of emails about failing...
[09:42:27] <shinken-wm>	 PROBLEM - Host swift-storage-01 is DOWN: CRITICAL - Host Unreachable (10.68.22.30)
[09:49:44] <shinken-wm>	 RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:57:11] <hashar>	 !log Upgrading puppet 3.7.2 .. 3.8.5 on integration-slave-docker-1001 and integration-slave-docker-1002
[09:57:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:59:28] <hashar>	 !log integration: removing swift / python-swift from integration-puppetmaster01
[09:59:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[10:01:07] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[10:04:29] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-docker-1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[10:04:33] <shinken-wm>	 PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[10:09:56] <shinken-wm>	 PROBLEM - Puppet errors on integration-saltmaster is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0]
[10:14:34] <shinken-wm>	 RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:16:06] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:18:17] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[10:19:29] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-docker-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:19:55] <shinken-wm>	 RECOVERY - Puppet errors on integration-saltmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[10:20:01] <wikibugs>	 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10Wikidata, and 2 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3366845 (10Tobi_WMDE_SW) Yeah, most of them are failing due to  ``` Sauce could not s...
[10:21:56] <hashar>	 !log deployment-zotero01  apt-get upgrade and rebooted
[10:21:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[10:22:27] <shinken-wm>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[10:28:18] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:35:54] <wmf-insecte>	 Project beta-scap-eqiad build #160640: 04FAILURE in 2 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160640/
[10:35:59] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[10:36:07] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[10:46:08] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[10:46:12] <wmf-insecte>	 Yippee, build fixed!
[10:46:13] <wmf-insecte>	 Project beta-scap-eqiad build #160641: 09FIXED in 2 min 29 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160641/
[10:48:50] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[10:51:01] <shinken-wm>	 RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:59:32] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[11:00:28] <hashar>	 !log deployment-prep apt-get upgrade and reboot all hosts
[11:00:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[11:07:51] <shinken-wm>	 PROBLEM - Host Generic Beta Cluster is DOWN: PING CRITICAL - Packet loss = 100%
[11:08:13] <hashar>	 !log deployment-prep : rebooting deployment-tin deployment-mira deployment-cache-text04 deployment-cache-upload04
[11:08:15] <shinken-wm>	 PROBLEM - SSH on deployment-tin is CRITICAL: Connection refused
[11:08:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[11:08:27] <shinken-wm>	 RECOVERY - Host Generic Beta Cluster is UP: PING OK - Packet loss = 0%, RTA = 6.77 ms
[11:11:15] <shinken-wm>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - string 'Wikipedia' not found on 'https://en.m.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2298 bytes in 0.048 second response time
[11:12:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0]
[11:12:12] <hashar>	 !log varnish fails on deployment-cache-text04
[11:12:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[11:12:18] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[11:13:16] <shinken-wm>	 RECOVERY - SSH on deployment-tin is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[11:14:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[11:14:32] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0]
[11:15:00] <hashar>	 !log deployment-cache-text04 : apt-get dist-upgrade
[11:15:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[11:18:29] <hashar>	 Backend host '"citoid.wmflabs.org"' could not be resolved to an IP address:
[11:18:29] <hashar>	 bah
[11:18:31] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[11:18:51] <shinken-wm>	 RECOVERY - Puppet errors on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:22:17] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:23:59] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-History-or-Diffs, 10TCB-Team, 10wikidiff2, and 3 others: Migrate php-compile-hhvm test to jessie - https://phabricator.wikimedia.org/T168410#3367000 (10Paladox) Your welcome :)
[11:24:29] <shinken-wm>	 RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[11:24:44] <wmf-insecte>	 Project beta-code-update-eqiad build #160748: 04FAILURE in 11 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/160748/
[11:24:45] <wmf-insecte>	 Project beta-update-databases-eqiad build #17909: 04FAILURE in 4 min 43 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17909/
[11:25:47] <shinken-wm>	 PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0]
[11:26:54] <wmf-insecte>	 Yippee, build fixed!
[11:26:55] <wmf-insecte>	 Project beta-code-update-eqiad build #160749: 09FIXED in 51 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/160749/
[11:26:58] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[11:28:59] <wmf-insecte>	 Project beta-scap-eqiad build #160644: 04FAILURE in 2 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160644/
[11:33:16] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster varnish fails VCL compilation because citoid.wmflabs.org does not resolve - https://phabricator.wikimedia.org/T168519#3367010 (10hashar)
[11:35:07] <wmf-insecte>	 Project beta-scap-eqiad build #160645: 04STILL FAILING in 1 min 24 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160645/
[11:35:49] <shinken-wm>	 RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:36:19] <shinken-wm>	 RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 33390 bytes in 2.236 second response time
[11:37:26] <wikibugs>	 10Beta-Cluster-Infrastructure: Could not find class role::etcd::common for deployment-conf03.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T168520#3367026 (10hashar)
[11:42:30] <wikibugs>	 (03CR) 10Aude: [C: 032] Update Wikidata to wmf/1.30.0-wmf.6 [tools/release] - 10https://gerrit.wikimedia.org/r/360351 (owner: 10Aude)
[11:43:20] <wikibugs>	 (03Merged) 10jenkins-bot: Update Wikidata to wmf/1.30.0-wmf.6 [tools/release] - 10https://gerrit.wikimedia.org/r/360351 (owner: 10Aude)
[11:45:06] <wmf-insecte>	 Project beta-scap-eqiad build #160646: 04STILL FAILING in 1 min 26 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160646/
[11:45:09] <wikibugs>	 10Beta-Cluster-Infrastructure: Could not find class role::etcd::common for deployment-conf03.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T168520#3367047 (10hashar) At https://horizon.wikimedia.org/project/prefixpuppet/  Changed `role::etcd::common` to `profile::etcd`
[11:45:49] <wikibugs>	 10Beta-Cluster-Infrastructure: Could not find class role::etcd::common for deployment-conf03.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T168520#3367049 (10hashar)
[11:45:51] <wikibugs>	 10Beta-Cluster-Infrastructure: Beta cluster varnish fails VCL compilation because citoid.wmflabs.org does not resolve - https://phabricator.wikimedia.org/T168519#3367048 (10hashar)
[11:51:22] <wikibugs>	 10Beta-Cluster-Infrastructure: Could not find class role::etcd::common for deployment-conf03.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T168520#3367052 (10hashar) It is broken further due to  `profile:etcd:discovery` not being set. In production that seems to point to a k8s cluster.
[11:54:23] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:55:08] <wmf-insecte>	 Project beta-scap-eqiad build #160647: 04STILL FAILING in 1 min 27 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160647/
[11:57:52] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Beta cluster varnish fails VCL compilation because citoid.wmflabs.org does not resolve - https://phabricator.wikimedia.org/T168519#3367071 (10hashar) I have cherry picked https://gerrit.wikimedia.org/r/360639 on the beta cluster. Somehow that causes the citoid...
[11:58:31] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:59:01] <hashar>	 !log armed keyholder on deployment-tin and deployment-mira
[11:59:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[12:02:04] <wmf-insecte>	 Yippee, build fixed!
[12:02:04] <wmf-insecte>	 Project beta-scap-eqiad build #160648: 09FIXED in 2 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160648/
[12:04:50] <hashar>	 !log apt-get dist-upgrade on deployment-mediawiki hosts
[12:04:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[12:07:01] <wikibugs>	 10Release-Engineering-Team, 10Page-Previews, 10Reading-Web-Backlog, 10Reading-Web-Kanban-Board: Create bot that automatically rebases and rebuilds patches to master - https://phabricator.wikimedia.org/T167181#3367096 (10Jhernandez) >>! In T167181#3363861, @Jdlrobson wrote: > Yep. I did originally do this f...
[12:12:15] <shinken-wm>	 PROBLEM - SSH on deployment-mediawiki06 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:15:47] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0]
[12:17:05] <shinken-wm>	 RECOVERY - SSH on deployment-mediawiki06 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[12:17:18] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0]
[12:17:56] <hashar>	 ^^ that is me upgrading the hosts
[12:18:20] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0]
[12:22:16] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:23:20] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:25:01] <shinken-wm>	 PROBLEM - SSH on deployment-jobrunner02 is CRITICAL: Connection refused
[12:25:49] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:26:18] <wmf-insecte>	 Project beta-scap-eqiad build #160651: 04FAILURE in 2 min 29 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160651/
[12:28:22] <hashar>	 !log deployment-imagescaler01 removed puppetmaster and puppetmaster-common packages
[12:28:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[12:28:47] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[12:29:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0]
[12:29:55] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[12:30:00] <shinken-wm>	 RECOVERY - SSH on deployment-jobrunner02 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[12:30:32] <shinken-wm>	 PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[12:33:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[12:34:23] <wmf-insecte>	 Yippee, build fixed!
[12:34:24] <wmf-insecte>	 Project beta-update-databases-eqiad build #17910: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17910/
[12:36:04] <wmf-insecte>	 Yippee, build fixed!
[12:36:04] <wmf-insecte>	 Project beta-scap-eqiad build #160652: 09FIXED in 2 min 19 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160652/
[12:37:10] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[12:38:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:39:09] <shinken-wm>	 PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:39:21] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:39:25] <shinken-wm>	 PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[12:39:49] <shinken-wm>	 RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:39:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[12:39:55] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[12:40:09] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:40:31] <shinken-wm>	 RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0]
[12:40:31] <shinken-wm>	 PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[12:42:08] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:42:34] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:42:54] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[12:43:02] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[12:43:16] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sentry01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:43:16] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[12:43:34] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[12:43:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-trending01 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:44:24] <shinken-wm>	 RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:44:40] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:44:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[12:44:44] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[12:45:12] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[12:45:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[12:45:51] <shinken-wm>	 PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[12:45:51] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:45:55] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[12:47:03] <shinken-wm>	 PROBLEM - Puppet errors on deployment-puppetmaster02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:47:10] <hashar>	 !log broke deployment-prep puppet master while upgrading it :(
[12:47:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[12:48:19] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:49:19] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0]
[12:49:31] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:49:39] <shinken-wm>	 PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:49:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:50:25] <shinken-wm>	 PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:51:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[12:52:26] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[12:52:58] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:53:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:53:06] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:53:44] <shinken-wm>	 PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:53:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[12:54:16] <shinken-wm>	 PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:55:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[12:55:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:55:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[12:56:32] <shinken-wm>	 PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:56:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:58:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[12:59:27] <shinken-wm>	 PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[13:00:31] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[13:00:47] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[13:01:58] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[13:03:08] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[13:03:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[13:04:02] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[13:04:04] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[13:04:24] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[13:04:28] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[13:04:34] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[13:04:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[13:04:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[13:06:34] <wikibugs>	 10Gerrit, 10Operations, 10Patch-For-Review: Gerrit constantly throws HTTP 500 error when reviewing patches (due to "Too many open files") - https://phabricator.wikimedia.org/T168360#3367211 (10Paladox) According to bin/gerrit.sh status  this is what the init script has    GERRIT_FDS             =  12000  (th...
[13:09:25] <shinken-wm>	 PROBLEM - Puppet errors on deployment-stream is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[13:09:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[13:09:57] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[13:12:07] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[13:20:51] <wikibugs>	 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Run WebdriverIO tests in CI for extensions - https://phabricator.wikimedia.org/T164721#3367250 (10zeljkofilipin) >>! In T164721#3355289, @Jdlrobson wrote: > The purpose is only for browser tests....
[13:26:35] <hashar>	 !log deployment-prep: puppet master got erroneously upgrade to puppet* 4.8.  Roll it back to 3.8 which fail, and then back to 3.7!
[13:26:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[13:27:24] <hashar>	 f**** that
[13:29:17] <paladox>	 hashar what worked for me when that happened was downloading the debs for puppet manually.
[13:29:34] <paladox>	 ie search on debian, download deb and install it.
[13:29:38] <hashar>	 apt-get install puppetmaster=3.7.2-4+deb8u1 puppetmaster-common=3.7.2-4+deb8u1 puppet-common=3.7.2-4+deb8u1 puppetmaster-passenger=3.7.2-4+deb8u1
[13:29:39] <hashar>	 works :)
[13:30:05] <paladox>	 oh. I wonder why it is trying to update you to 4.8 :)
[13:41:13] <hashar>	 ok maybe I have fixed soemthing
[13:41:20] <hashar>	 but still fail to find roles :  could not find class role::beta::mediawiki
[13:50:13] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:54:19] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:54:39] <shinken-wm>	 RECOVERY - Puppet errors on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:56:29] <wikibugs>	 10Release-Engineering-Team (Watching / External), 10Operations, 10Performance-Team, 10monitoring, 10Wikimedia-Incident: MediaWiki load time regression should trigger an alarm / page people - https://phabricator.wikimedia.org/T146125#3367352 (10Gilles) 05Open>03Resolved a:03Gilles Our new Grafana-ba...
[14:01:31] <shinken-wm>	 RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0]
[14:02:53] <hashar>	 !log deployment-puppmaster (cd /etc/puppet && ln -s /var/lib/git/operations/puppet/manifests && ln -s /var/lib/git/operations/puppet/modules)
[14:02:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[14:03:41] <hashar>	 now they all fail with "Could not find data item discovery::app_routes in any Hiera data file and no default supplied at /etc/puppet/manifests/realm.pp:51"
[14:04:29] <shinken-wm>	 PROBLEM - Host deployment-ms-be03 is DOWN: CRITICAL - Host Unreachable (10.68.22.125)
[14:04:49] <shinken-wm>	 PROBLEM - Host deployment-etcd-01 is DOWN: CRITICAL - Host Unreachable (10.68.19.227)
[14:04:54] <hashar>	 :(
[14:09:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-trending01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:11:09] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[14:17:21] <hashar>	 !log finally fixed puppet on deployment-prep !
[14:17:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[14:18:54] <hashar>	 ;(
[14:19:51] <shinken-wm>	 RECOVERY - Host deployment-etcd-01 is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms
[14:20:19] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0]
[14:20:59] <shinken-wm>	 RECOVERY - Host deployment-ms-be03 is UP: PING OK - Packet loss = 0%, RTA = 2.00 ms
[14:21:54] <hashar>	 !log deployment-prep: force running puppet on all instances
[14:21:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[14:24:56] <shinken-wm>	 PROBLEM - Host deployment-ms-fe02 is DOWN: CRITICAL - Host Unreachable (10.68.19.247)
[14:24:56] <shinken-wm>	 PROBLEM - Host deployment-eventlogging03 is DOWN: CRITICAL - Host Unreachable (10.68.18.111)
[14:24:58] <shinken-wm>	 PROBLEM - Host deployment-elastic06 is DOWN: CRITICAL - Host Unreachable (10.68.23.242)
[14:25:00] <shinken-wm>	 PROBLEM - Host integration-slave-trusty-1001 is DOWN: CRITICAL - Host Unreachable (10.68.16.168)
[14:25:26] <shinken-wm>	 RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:25:38] <shinken-wm>	 PROBLEM - Host integration-slave-trusty-1006 is DOWN: CRITICAL - Host Unreachable (10.68.17.118)
[14:25:41] <wmf-insecte>	 Project beta-scap-eqiad build #160664: 04FAILURE in 1 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160664/
[14:27:05] <shinken-wm>	 RECOVERY - Puppet errors on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:27:25] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:27:27] <shinken-wm>	 PROBLEM - Host deployment-salt02 is DOWN: CRITICAL - Host Unreachable (10.68.17.58)
[14:27:33] <shinken-wm>	 PROBLEM - Host deployment-sca03 is DOWN: CRITICAL - Host Unreachable (10.68.21.183)
[14:27:43] <shinken-wm>	 PROBLEM - Host deployment-sca01 is DOWN: CRITICAL - Host Unreachable (10.68.20.183)
[14:28:16] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:28:16] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:28:41] <shinken-wm>	 PROBLEM - Host deployment-mx is DOWN: CRITICAL - Host Unreachable (10.68.17.78)
[14:28:49] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:28:51] <shinken-wm>	 PROBLEM - Host deployment-redis01 is DOWN: CRITICAL - Host Unreachable (10.68.16.177)
[14:29:04] <shinken-wm>	 PROBLEM - Host deployment-memc05 is DOWN: CRITICAL - Host Unreachable (10.68.23.49)
[14:29:30] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0]
[14:29:30] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:29:36] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:29:47] <shinken-wm>	 PROBLEM - Host integration-saltmaster is DOWN: CRITICAL - Host Unreachable (10.68.17.68)
[14:29:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:29:59] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:30:51] <shinken-wm>	 RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:30:51] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ms-be03 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:31:09] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:31:14] <shinken-wm>	 RECOVERY - Host integration-slave-trusty-1006 is UP: PING OK - Packet loss = 0%, RTA = 3.44 ms
[14:31:24] <shinken-wm>	 RECOVERY - Host integration-slave-trusty-1001 is UP: PING OK - Packet loss = 0%, RTA = 4.82 ms
[14:31:48] <shinken-wm>	 RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:32:59] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:33:01] <shinken-wm>	 RECOVERY - Host deployment-eventlogging03 is UP: PING OK - Packet loss = 0%, RTA = 1.66 ms
[14:33:05] <shinken-wm>	 RECOVERY - Host deployment-ms-fe02 is UP: PING OK - Packet loss = 0%, RTA = 2.91 ms
[14:33:07] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:33:15] <shinken-wm>	 RECOVERY - Host deployment-elastic06 is UP: PING OK - Packet loss = 0%, RTA = 7.87 ms
[14:33:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]
[14:34:16] <shinken-wm>	 RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:34:20] <wmf-insecte>	 Project selenium-WikiLove » firefox,beta,Linux,BrowserTests build #430: 04FAILURE in 2 min 18 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/430/
[14:34:26] <shinken-wm>	 RECOVERY - Puppet errors on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[14:34:40] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:34:58] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:35:20] <shinken-wm>	 RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:35:34] <shinken-wm>	 RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0]
[14:35:40] <wmf-insecte>	 Project beta-scap-eqiad build #160665: 04STILL FAILING in 1 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160665/
[14:35:50] <shinken-wm>	 RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:35:57] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[14:36:07] <shinken-wm>	 RECOVERY - Host deployment-memc05 is UP: PING OK - Packet loss = 0%, RTA = 1.75 ms
[14:36:49] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:37:07] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:37:11] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:37:17] <shinken-wm>	 RECOVERY - Host deployment-mx is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms
[14:37:35] <shinken-wm>	 RECOVERY - Host deployment-sca03 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms
[14:37:43] <shinken-wm>	 RECOVERY - Host deployment-sca01 is UP: PING OK - Packet loss = 0%, RTA = 1.65 ms
[14:38:23] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:38:30] <shinken-wm>	 RECOVERY - Host deployment-redis01 is UP: PING OK - Packet loss = 0%, RTA = 0.82 ms
[14:39:03] <shinken-wm>	 RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:39:09] <shinken-wm>	 RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:39:27] <shinken-wm>	 RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:39:28] <wmf-insecte>	 Project beta-scap-eqiad build #160666: 04STILL FAILING in 2 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160666/
[14:39:36] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:40:12] <shinken-wm>	 PROBLEM - Host deployment-stream is DOWN: CRITICAL - Host Unreachable (10.68.17.106)
[14:40:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[14:40:44] <shinken-wm>	 PROBLEM - Host saucelabs-01 is DOWN: CRITICAL - Host Unreachable (10.68.21.186)
[14:41:07] <shinken-wm>	 RECOVERY - Host integration-saltmaster is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms
[14:41:19] <shinken-wm>	 PROBLEM - Host deployment-zotero01 is DOWN: CRITICAL - Host Unreachable (10.68.17.102)
[14:41:28] <hashar>	 !log deployment-tmh01 is down for some reason
[14:41:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[14:41:37] <shinken-wm>	 PROBLEM - Host deployment-kafka04 is DOWN: CRITICAL - Host Unreachable (10.68.17.9)
[14:42:22] <hashar>	 ah no
[14:42:25] <paladox>	 hashar they are rebooting labvirts
[14:42:27] <hashar>	 that is just instances being repooled
[14:42:34] <shinken-wm>	 PROBLEM - Host deployment-redis02 is DOWN: CRITICAL - Host Unreachable (10.68.16.231)
[14:42:58] <shinken-wm>	 PROBLEM - Host deployment-tmh01 is DOWN: CRITICAL - Host Unreachable (10.68.16.211)
[14:43:06] <hashar>	 that explains it :)
[14:43:48] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:44:02] <shinken-wm>	 RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:44:50] <shinken-wm>	 RECOVERY - Puppet errors on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:44:52] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ms-fe02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:45:45] <wmf-insecte>	 Project beta-scap-eqiad build #160667: 04STILL FAILING in 2 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160667/
[14:45:49] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:45:55] <shinken-wm>	 PROBLEM - Puppet errors on integration-saltmaster is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[14:47:09] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[14:47:38] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:47:44] <shinken-wm>	 RECOVERY - Host deployment-stream is UP: PING OK - Packet loss = 0%, RTA = 2.93 ms
[14:47:52] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:48:02] <shinken-wm>	 RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:48:06] <shinken-wm>	 RECOVERY - Host deployment-tmh01 is UP: PING OK - Packet loss = 0%, RTA = 1.77 ms
[14:48:16] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:48:52] <shinken-wm>	 RECOVERY - Host deployment-zotero01 is UP: PING OK - Packet loss = 0%, RTA = 2.24 ms
[14:49:04] <shinken-wm>	 RECOVERY - Host deployment-redis02 is UP: PING OK - Packet loss = 0%, RTA = 1.86 ms
[14:49:21] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:49:45] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:49:49] <shinken-wm>	 RECOVERY - Host deployment-kafka04 is UP: PING OK - Packet loss = 0%, RTA = 3.41 ms
[14:50:11] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:50:24] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:50:39] <shinken-wm>	 RECOVERY - Puppet errors on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:50:57] <shinken-wm>	 RECOVERY - Puppet errors on integration-saltmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[14:52:12] <shinken-wm>	 RECOVERY - Host saucelabs-01 is UP: PING OK - Packet loss = 0%, RTA = 4.91 ms
[14:52:26] <shinken-wm>	 RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0]
[14:52:57] <shinken-wm>	 RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:53:35] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:55:18] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:55:44] <shinken-wm>	 PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[14:55:48] <shinken-wm>	 PROBLEM - Host deployment-memc04 is DOWN: CRITICAL - Host Unreachable (10.68.23.25)
[14:56:06] <wmf-insecte>	 Yippee, build fixed!
[14:56:07] <wmf-insecte>	 Project beta-scap-eqiad build #160668: 09FIXED in 2 min 26 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/160668/
[14:56:27] <shinken-wm>	 PROBLEM - Host deployment-zookeeper02 is DOWN: CRITICAL - Host Unreachable (10.68.18.75)
[14:57:34] <shinken-wm>	 PROBLEM - Host integration-slave-trusty-1003 is DOWN: CRITICAL - Host Unreachable (10.68.17.54)
[14:57:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[14:58:06] <shinken-wm>	 PROBLEM - Host deployment-aqs01 is DOWN: CRITICAL - Host Unreachable (10.68.18.237)
[14:58:08] <shinken-wm>	 PROBLEM - Host deployment-restbase01 is DOWN: CRITICAL - Host Unreachable (10.68.16.128)
[14:58:41] <shinken-wm>	 PROBLEM - Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76)
[15:00:28] <shinken-wm>	 PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:00:36] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:00:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:01:30] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[15:01:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:02:57] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[15:03:59] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:04:21] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:05:01] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:05:01] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:05:16] <shinken-wm>	 RECOVERY - Host deployment-puppetdb01 is UP: PING OK - Packet loss = 0%, RTA = 1.06 ms
[15:05:24] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:05:34] <shinken-wm>	 RECOVERY - Host deployment-aqs01 is UP: PING OK - Packet loss = 0%, RTA = 1.51 ms
[15:05:42] <shinken-wm>	 RECOVERY - Host deployment-zookeeper02 is UP: PING OK - Packet loss = 0%, RTA = 2.17 ms
[15:05:42] <shinken-wm>	 RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:05:42] <shinken-wm>	 RECOVERY - Host deployment-memc04 is UP: PING OK - Packet loss = 0%, RTA = 0.93 ms
[15:06:00] <shinken-wm>	 RECOVERY - Host integration-slave-trusty-1003 is UP: PING OK - Packet loss = 0%, RTA = 0.58 ms
[15:06:04] <shinken-wm>	 RECOVERY - Host deployment-restbase01 is UP: PING OK - Packet loss = 0%, RTA = 7.58 ms
[15:06:26] <shinken-wm>	 PROBLEM - Puppet errors on deployment-stream is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:08:11] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:09:34] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [0.0]
[15:10:08] <shinken-wm>	 PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 83.33% of data above the critical threshold [0.0]
[15:10:18] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:10:47] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:10:51] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[15:10:51] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:11:57] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:13:07] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:14:25] <shinken-wm>	 PROBLEM - Host Generic Beta Cluster is DOWN: CRITICAL - Host Unreachable (en.wikipedia.beta.wmflabs.org)
[15:14:28] <shinken-wm>	 PROBLEM - Host deployment-restbase02 is DOWN: CRITICAL - Host Unreachable (10.68.17.189)
[15:15:08] <shinken-wm>	 PROBLEM - Host deployment-ms-be04 is DOWN: CRITICAL - Host Unreachable (10.68.16.139)
[15:15:52] <shinken-wm>	 PROBLEM - Host deployment-sentry01 is DOWN: CRITICAL - Host Unreachable (10.68.19.148)
[15:16:36] <shinken-wm>	 PROBLEM - Host deployment-cache-text04 is DOWN: CRITICAL - Host Unreachable (10.68.18.103)
[15:17:02] <shinken-wm>	 PROBLEM - Host deployment-imagescaler01 is DOWN: CRITICAL - Host Unreachable (10.68.19.158)
[15:17:04] <shinken-wm>	 PROBLEM - Host deployment-trending01 is DOWN: CRITICAL - Host Unreachable (10.68.18.186)
[15:17:18] <shinken-wm>	 PROBLEM - Host deployment-aqs02 is DOWN: CRITICAL - Host Unreachable (10.68.17.90)
[15:17:54] <shinken-wm>	 PROBLEM - Host deployment-fluorine02 is DOWN: CRITICAL - Host Unreachable (10.68.23.106)
[15:18:08] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:20:45] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:21:31] <shinken-wm>	 RECOVERY - Host deployment-fluorine02 is UP: PING OK - Packet loss = 0%, RTA = 1.81 ms
[15:21:39] <shinken-wm>	 RECOVERY - Host deployment-aqs02 is UP: PING OK - Packet loss = 0%, RTA = 2.16 ms
[15:22:01] <shinken-wm>	 RECOVERY - Host Generic Beta Cluster is UP: PING OK - Packet loss = 0%, RTA = 1.70 ms
[15:22:01] <shinken-wm>	 RECOVERY - Host deployment-imagescaler01 is UP: PING OK - Packet loss = 0%, RTA = 1.16 ms
[15:22:23] <shinken-wm>	 RECOVERY - Host deployment-restbase02 is UP: PING OK - Packet loss = 0%, RTA = 4.98 ms
[15:22:59] <shinken-wm>	 RECOVERY - Host deployment-cache-text04 is UP: PING OK - Packet loss = 0%, RTA = 1.92 ms
[15:22:59] <shinken-wm>	 RECOVERY - Host deployment-trending01 is UP: PING OK - Packet loss = 0%, RTA = 2.82 ms
[15:23:03] <shinken-wm>	 RECOVERY - Host deployment-sentry01 is UP: PING OK - Packet loss = 0%, RTA = 2.91 ms
[15:24:32] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:24:57] <shinken-wm>	 RECOVERY - Host deployment-ms-be04 is UP: PING OK - Packet loss = 0%, RTA = 3.20 ms
[15:26:25] <shinken-wm>	 PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0]
[15:29:47] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]