[00:03:58] <wmf-insecte>	 Project beta-scap-eqiad build #164825: 04STILL FAILING in 14 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164825/
[00:17:22] <wmf-insecte>	 Project beta-scap-eqiad build #164826: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164826/
[00:30:44] <wmf-insecte>	 Project beta-scap-eqiad build #164827: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164827/
[00:44:00] <wmf-insecte>	 Project beta-scap-eqiad build #164828: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164828/
[00:57:20] <wmf-insecte>	 Project beta-scap-eqiad build #164829: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164829/
[00:59:57] <paladox>	 00:56:59 bash: /var/lib/mwdeploy/.bashrc: Permission denied
[01:00:11] <paladox>	 00:56:59 00:56:59 ['/usr/bin/scap', 'pull', '--no-update-l10n'] on deployment-mediawiki05.deployment-prep.eqiad.wmflabs returned [70]: Could not chdir to home directory /var/lib/mwdeploy: Permission denied
[01:00:11] <paladox>	 00:56:59 bash: /var/lib/mwdeploy/.bashrc: Permission denied
[01:00:43] <thcipriani>	 blerg, I think I know what's happening...
[01:01:07] <thcipriani>	 we lose our connection to ldap and puppet then creates a local mwdeploy user the shadows the ldap user
[01:01:17] <thcipriani>	 rsync gets confused because of different uids
[01:02:46] <paladox>	 oh, i guess delete the local user and run puppet? :)
[01:03:32] <thcipriani>	 yeah, vipw
[01:05:09] <paladox>	 thanks :)
[01:05:35] * paladox goes again - 02:05am
[01:10:32] <wmf-insecte>	 Project beta-scap-eqiad build #164830: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164830/
[01:20:14] <thcipriani>	 alright, I think after this next failure beta-scap-eqiad should work again...
[01:21:28] <wmf-insecte>	 Project beta-scap-eqiad build #164831: 04STILL FAILING in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164831/
[01:30:55] <thcipriani>	 and after this next failure :(
[01:31:13] <wmf-insecte>	 Project beta-scap-eqiad build #164832: 04STILL FAILING in 8 min 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164832/
[01:37:21] <wmf-insecte>	 Yippee, build fixed!
[01:37:21] <wmf-insecte>	 Project beta-scap-eqiad build #164833: 09FIXED in 5 min 24 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/164833/
[01:38:26] <thcipriani>	 !log scap on beta was failing because during the ldap downtime puppet created a shadow mwdeploy user, fixed using vipw and vigr
[01:38:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[01:46:18] <wikibugs>	 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MW-1.30-release-notes, 10MediaWiki-Core-Tests, and 7 others: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink - https://phabricator.wikimedia.org/T170880#3455178 (10Legoktm) @Jdlrobson I basically did the same con...
[01:47:50] <wikibugs>	 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MW-1.30-release-notes, 10MediaWiki-Core-Tests, and 7 others: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink - https://phabricator.wikimedia.org/T170880#3455180 (10Legoktm) (Also whenever I would try testing it w...
[02:14:31] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[02:15:05] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[02:18:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[02:20:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[02:21:06] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[02:21:36] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[02:23:06] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[02:23:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[02:23:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[02:24:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[02:25:17] <shinken-wm>	 PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[02:25:29] <shinken-wm>	 PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[02:28:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[02:29:46] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-docker-1002 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[02:31:28] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[02:34:03] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[02:34:11] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[02:34:51] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[02:35:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-stream is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[02:35:11] <shinken-wm>	 PROBLEM - Puppet errors on saucelabs-03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[02:35:25] <shinken-wm>	 PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[02:35:59] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[02:36:47] <shinken-wm>	 PROBLEM - Puppet errors on integration-publishing is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[02:36:53] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[02:38:54] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-trusty-1004 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[02:39:46] <icinga-wm>	 PROBLEM - Check for valid instance states on labnodepool1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:39:48] <shinken-wm>	 PROBLEM - Puppet errors on saucelabs-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[02:43:37] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[02:46:24] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[02:54:31] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:55:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:56:05] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:58:04] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:58:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:58:46] <shinken-wm>	 RECOVERY - Puppet errors on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:00:18] <shinken-wm>	 RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]
[03:00:28] <shinken-wm>	 RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:01:13] <icinga-wm>	 PROBLEM - Check for valid instance states on labnodepool1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:01:36] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:03:43] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:03:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[03:04:45] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-docker-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:04:45] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[03:08:02] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:08:04] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[03:08:51] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[03:09:05] <shinken-wm>	 RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:09:12] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:09:28] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[03:09:40] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[03:09:54] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[03:11:04] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[03:11:30] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:11:32] <shinken-wm>	 PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[03:11:36] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:11:48] <shinken-wm>	 RECOVERY - Puppet errors on integration-publishing is OK: OK: Less than 1.00% above the threshold [0.0]
[03:12:19] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sentry01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[03:12:21] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[03:13:22] <shinken-wm>	 PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[03:14:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:14:47] <shinken-wm>	 RECOVERY - Puppet errors on saucelabs-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:15:33] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[03:15:35] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-jessie-android is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[03:15:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:16:01] <shinken-wm>	 RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:16:25] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:19:39] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[03:19:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[03:19:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[03:20:44] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[03:21:29] <shinken-wm>	 PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[03:21:41] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:22:05] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:22:33] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[03:23:40] <shinken-wm>	 PROBLEM - Puppet errors on jenkinstest is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[03:23:52] <shinken-wm>	 PROBLEM - Puppet errors on integration-saltmaster is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[03:24:04] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:24:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:24:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:25:16] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[03:25:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[03:26:17] <shinken-wm>	 PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[03:26:40] <shinken-wm>	 PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[03:26:40] <shinken-wm>	 PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[03:26:54] <shinken-wm>	 PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:29:11] <shinken-wm>	 PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[03:29:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[03:30:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[03:31:16] <shinken-wm>	 PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:35:13] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[03:38:43] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:40:11] <shinken-wm>	 RECOVERY - Puppet errors on saucelabs-03 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:44:51] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0]
[03:45:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[03:46:38] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:46:54] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:47:19] <shinken-wm>	 RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:47:19] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:48:05] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:48:52] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ms-fe02 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:48:52] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-trusty-1004 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:49:28] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:49:40] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[03:49:48] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:49:56] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:50:34] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-jessie-android is OK: OK: Less than 1.00% above the threshold [0.0]
[03:50:42] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:51:01] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:51:25] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:51:27] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:51:33] <shinken-wm>	 RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:53:24] <shinken-wm>	 RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:53:40] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:55:06] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:55:33] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:56:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:57:07] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:59:05] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:59:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:59:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:59:45] <shinken-wm>	 RECOVERY - Puppet errors on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:59:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:00:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:00:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[04:01:17] <shinken-wm>	 RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]
[04:01:29] <shinken-wm>	 RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:01:39] <shinken-wm>	 RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0]
[04:01:55] <shinken-wm>	 RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:02:34] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:03:40] <shinken-wm>	 RECOVERY - Puppet errors on jenkinstest is OK: OK: Less than 1.00% above the threshold [0.0]
[04:03:50] <shinken-wm>	 RECOVERY - Puppet errors on integration-saltmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[04:04:46] <shinken-wm>	 RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:05:16] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:06:17] <shinken-wm>	 RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:06:39] <shinken-wm>	 RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:08:22] <wmf-insecte>	 Yippee, build fixed!
[04:08:22] <wmf-insecte>	 Project selenium-MultimediaViewer » safari,beta,OS X 10.9,BrowserTests build #458: 09FIXED in 12 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=BrowserTests/458/
[04:08:28] <wikibugs>	 (03PS1) 10Legoktm: Reduce false positives in ReferenceThisSniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/366504 (https://phabricator.wikimedia.org/T170316)
[04:09:10] <shinken-wm>	 RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:42] <shinken-wm>	 RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:10:13] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:10:21] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:10:25] <shinken-wm>	 RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:18:46] <wmf-insecte>	 Yippee, build fixed!
[04:18:46] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #458: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/458/
[05:25:42] <wikibugs>	 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MW-1.30-release-notes, 10MediaWiki-Core-Tests, and 7 others: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink - https://phabricator.wikimedia.org/T170880#3455358 (10Jdlrobson) Legroom you1
[05:53:27] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Recommendation-API: recommendation_api module breaking beta labs puppet - https://phabricator.wikimedia.org/T171075#3455380 (10Joe) Please apply the same role/profile we use in production to beta too.
[06:00:10] <shinken-wm>	 PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[06:34:08] <icinga-wm>	 RECOVERY - Check for valid instance states on labnodepool1001 is OK: nodepool state management is OK
[06:40:09] <shinken-wm>	 RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[07:05:07] <_joe_>	 !log adding myself to projectadmins for integration, trying to troubleshoot castor
[07:05:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[07:09:55] <_joe_>	 !log rebooting castor, jobs are failing, and no one seems able to login
[07:09:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[07:34:56] <Krinkle>	 _joe_: aye, it seems CI isn't working (well) indeed.
[07:34:58] <Krinkle>	 Status?
[07:35:48] <_joe_>	 Krinkle: the integration puppetmaster is broken, we are waiting for hashar to come online as we know little about the ci infrastructure
[07:36:04] <Krinkle>	 It seems the jobs start fine, but then timeout on trying to write to castor
[07:36:16] <Krinkle>	 which is the very last step
[07:36:22] <_joe_>	 yeah, and I cannot log into castor either
[07:36:24] <Krinkle>	 e.g. https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit-jessie/36027/console
[07:36:27] <_joe_>	 as puppet is broken there
[07:37:02] <Krinkle>	 been hanging for 34 minutes at "00:02:45.352 Waiting for the completion of castor-save" – after "00:02:45.233 Done. 00:02:45.348 [PostBuildScript] - Execution post build scripts."
[07:37:05] <_joe_>	 so we need someone with admin rights _before_ the incident to log onto the puppetmaster (which is broken since ~ 10 days I'd say)
[07:37:16] <_joe_>	 Krinkle: you could just cancel that job
[07:37:20] <_joe_>	 castor-save I mean
[07:37:35] <Krinkle>	 https://integration.wikimedia.org/ci/job/castor-save/
[07:37:41] <Krinkle>	 No, because hte job has't ben created yet
[07:37:51] <Krinkle>	 _joe_: which exact wmflabs server?
[07:38:03] <_joe_>	 Krinkle: heh, lemme check
[07:38:15] <Krinkle>	 I might be able
[07:38:19] <_joe_>	 Krinkle: if you can log into castor, then you can log into the project
[07:38:29] <Krinkle>	 other way around I assume
[07:38:30] <Krinkle>	 but yes
[07:38:32] <Krinkle>	 which server :)
[07:38:47] <_joe_>	 I have to find out, one sec
[07:39:02] <Krinkle>	 only 1 at https://tools.wmflabs.org/openstack-browser/project/integration
[07:39:04] <Krinkle>	 so I'll take that one
[07:39:16] <_joe_>	 yeah integration-puppetmaster01
[07:39:26] <Krinkle>	 hm.. key denied at castor.integration.eqiad.wmflabs
[07:39:50] <Krinkle>	 integration-puppetmaster01 works fine though
[07:39:54] <Krinkle>	 what do you want me to do
[07:40:02] <_joe_>	 dpkg -l apache2
[07:40:06] <_joe_>	 for starters
[07:41:25] <Krinkle>	 _joe_: for the record - https://gist.github.com/Krinkle/e8f07deadc3963d42be24721bc82f30b
[07:41:41] <Krinkle>	 Desired=Unknown/Install/Remove/Purge/Hold
[07:41:41] <Krinkle>	 | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
[07:41:41] <Krinkle>	 |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
[07:41:41] <Krinkle>	 ||/ Name                                       Version                    Architecture               Description
[07:41:41] <Krinkle>	 +++-==========================================-==========================-==========================-==========================================================================================
[07:41:41] <Krinkle>	 ii  apache2                                    2.4.10-10+deb8u9+wmf1      amd64                      Apache HTTP Server
[07:41:50] <_joe_>	 so it's at the correct version
[07:41:55] <_joe_>	 damn
[07:42:09] <_joe_>	 hashar: can you check why castor cannot run puppet?
[07:42:53] <Krinkle>	 hashar: CI is down (mostly) jobs start and run but timeout at saving to castor (also, why are non-gate jobs trying to save to castor? maybe we can make it skip earlier somehow based on pipeline)
[07:43:45] <Krinkle>	 https://integration.wikimedia.org/ci/job/castor-save/494271/console
[07:45:05] <Krinkle>	 OK. I gotta run unfortunately. It's been a long day.
[07:45:06] <Krinkle>	 o/
[07:45:14] <_joe_>	 so if puppet is not failing globally it's a castor issue
[07:54:02] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148#3455468 (10hashar)
[07:55:05] <hashar>	 !log Refreshing all Jenkins jobs defined in JJB in order to then disable castor entirely for T171148
[07:55:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[07:55:11] <stashbot>	 T171148: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148
[07:55:14] <hashar>	 Krinkle: yeah filled it as https://phabricator.wikimedia.org/T171148
[07:55:19] <hashar>	 seems something somehow is broken entirely
[07:55:23] <hashar>	 I am going to disable castor
[07:59:55] <wikibugs>	 (03PS1) 10Hashar: Disable castor entirely [integration/config] - 10https://gerrit.wikimedia.org/r/366520 (https://phabricator.wikimedia.org/T171148)
[08:00:31] <hashar>	 !log Disabled castor entirely via https://gerrit.wikimedia.org/r/366520 . The instance is broken - T171148
[08:00:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[08:00:34] <stashbot>	 T171148: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148
[08:05:35] <hashar>	 so castor should be disabled now and no more blocks jobs
[08:05:53] <hashar>	 _joe_: the instance is no more reachable by any mean :(
[08:06:06] <hashar>	 we usually use salt as a fallback but the minion is not responding
[08:08:49] <_joe_>	 hashar: did you try the openstack console?
[08:09:01] <_joe_>	 else ask someone with a working root access in labs for assistance
[08:09:19] <hashar>	 we dont have access to it. It is probably easier to just recreate the instance
[08:09:41] <hashar>	 (hoping a newly created instance is actually reachable)
[08:13:20] <_joe_>	 wait
[08:13:33] <_joe_>	 ask someone with a working root labs account to help you
[08:13:48] <_joe_>	 mine was outdated, so it doesn't work on castor
[08:13:53] <hashar>	 ah directly attaching to the kvm host?
[08:16:08] <hashar>	 _joe_: it is probably easier to just recreate it from scratch. The instance is in puppet, we would just lose the cache that can be repopulated manually for the busiest repos
[08:20:20] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review, 10User-Joe: CI for operations/puppet is taking too long - https://phabricator.wikimedia.org/T166888#3455526 (10Joe)
[08:38:37] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148#3455549 (10hashar) From the console log, puppet-agent on boot reports: ``` SSL_connect returned=1 errno=0 state=error...
[08:41:46] <wikibugs>	 (03CR) 10Hashar: [C: 032] "Jobs refreshed. I will restore it when a new instance is ready." [integration/config] - 10https://gerrit.wikimedia.org/r/366520 (https://phabricator.wikimedia.org/T171148) (owner: 10Hashar)
[08:44:10] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Cloud-VPS: Labs Jessie images come with puppet 3.7.2, should be 3.8.5 - https://phabricator.wikimedia.org/T168511#3455552 (10hashar) 05Open>03Resolved a:03hashar I have booted a Jessie instance with the latest labs image and it comes with puppet 3.8.5: ``` apt-cache...
[08:44:33] <wikibugs>	 (03Merged) 10jenkins-bot: Disable castor entirely [integration/config] - 10https://gerrit.wikimedia.org/r/366520 (https://phabricator.wikimedia.org/T171148) (owner: 10Hashar)
[08:53:56] <hashar>	 !log Created castor02.integration.eqiad.wmflabs with puppet role role::ci::castor::server and adding it to Jenkins. Will then update the Jenkins jobs to point to it - T171148  
[08:54:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[08:54:00] <stashbot>	 T171148: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148
[08:57:13] <wikibugs>	 (03PS1) 10Hashar: Revert "Disable castor entirely" [integration/config] - 10https://gerrit.wikimedia.org/r/366523 (https://phabricator.wikimedia.org/T171148)
[08:57:15] <wikibugs>	 (03PS1) 10Hashar: Point Castor to castor02.integration.eqiad.wmflabs [integration/config] - 10https://gerrit.wikimedia.org/r/366524 (https://phabricator.wikimedia.org/T171148)
[08:59:18] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148#3455575 (10hashar) p:05Triage>03Unbreak! a:03hashar
[09:01:35] <wikibugs>	 (03CR) 10Hashar: [C: 032] "Transient change" [integration/config] - 10https://gerrit.wikimedia.org/r/366523 (https://phabricator.wikimedia.org/T171148) (owner: 10Hashar)
[09:02:11] <wikibugs>	 (03CR) 10Hashar: [C: 032] Point Castor to castor02.integration.eqiad.wmflabs [integration/config] - 10https://gerrit.wikimedia.org/r/366524 (https://phabricator.wikimedia.org/T171148) (owner: 10Hashar)
[09:03:12] <hashar>	 !log Restoring castorby updating all jobs  to point to castor02 ( https://gerrit.wikimedia.org/r/366524 ) Starts with a cold cache :( - T171148
[09:03:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:03:15] <stashbot>	 T171148: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148
[09:03:39] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "Disable castor entirely" [integration/config] - 10https://gerrit.wikimedia.org/r/366523 (https://phabricator.wikimedia.org/T171148) (owner: 10Hashar)
[09:03:44] <wikibugs>	 (03Merged) 10jenkins-bot: Point Castor to castor02.integration.eqiad.wmflabs [integration/config] - 10https://gerrit.wikimedia.org/r/366524 (https://phabricator.wikimedia.org/T171148) (owner: 10Hashar)
[09:13:13] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148#3455601 (10hashar) I have manually repopulated the cache for operations/puppet.git by triggering https://integration....
[09:13:18] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148#3455602 (10hashar) 05Open>03Resolved
[09:15:55] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Set up experimental Docker CI slave - https://phabricator.wikimedia.org/T150502#3455621 (10hashar) I have removed `integration-slave-docker-1000` since puppet is completely broken on it.
[09:17:31] <shinken-wm>	 PROBLEM - Host castor is DOWN: CRITICAL - Host Unreachable (10.68.23.216)
[09:17:34] <hashar>	 !log Spawning and pooling integration-slave-docker-1003  as replacement to integration-slave-docker-1000 (broken)  - T150502
[09:17:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:17:38] <stashbot>	 T150502: Set up experimental Docker CI slave - https://phabricator.wikimedia.org/T150502
[09:18:09] <shinken-wm>	 PROBLEM - Host integration-slave-docker-1000 is DOWN: CRITICAL - Host Unreachable (10.68.19.131)
[09:25:54] <hashar>	 ^^^ I have deleted both castor and integration-slave-docker-1000
[10:10:14] <wikibugs>	 (03CR) 10Zfilipin: [C: 031] Set up CI for ReadingLists extension [integration/config] - 10https://gerrit.wikimedia.org/r/366248 (https://phabricator.wikimedia.org/T168975) (owner: 10Gergő Tisza)
[10:12:49] <hashar>	 zeljkof: you can deploy that one :-)
[10:13:02] <zeljkof>	 hashar: will do
[10:15:43] <wikibugs>	 (03CR) 10Zfilipin: [C: 032] Set up CI for ReadingLists extension [integration/config] - 10https://gerrit.wikimedia.org/r/366248 (https://phabricator.wikimedia.org/T168975) (owner: 10Gergő Tisza)
[10:16:34] <wikibugs>	 (03Merged) 10jenkins-bot: Set up CI for ReadingLists extension [integration/config] - 10https://gerrit.wikimedia.org/r/366248 (https://phabricator.wikimedia.org/T168975) (owner: 10Gergő Tisza)
[10:18:02] <wikibugs>	 (03PS1) 10Zfilipin: WIP Run WebdriverIO tests in CI for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/366531 (https://phabricator.wikimedia.org/T164721)
[10:19:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] WIP Run WebdriverIO tests in CI for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/366531 (https://phabricator.wikimedia.org/T164721) (owner: 10Zfilipin)
[10:20:19] <zeljkof>	 !log Reloading Zuul to deploy 80b9d855443a2f572d877b280783110684344c5d
[10:20:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[10:20:49] <wikibugs>	 (03CR) 10Zfilipin: "Deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/366248 (https://phabricator.wikimedia.org/T168975) (owner: 10Gergő Tisza)
[11:01:05] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Cloud-VPS: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3455895 (10hashar)
[11:01:36] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Cloud-VPS: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3455907 (10hashar)
[11:15:51] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Cloud-VPS: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3455925 (10hashar) labnet1001.eqiad.wmnet has a lot of such errors in /var/log/nova/nova-network.log*  The first suspicious one:...
[11:16:19] <wikibugs>	 10Release-Engineering-Team, 10Cloud-Services, 10Operations, 10Patch-For-Review: contintcloud project thinks it is using 206 fixed-ip quota errantly - https://phabricator.wikimedia.org/T158350#3034394 (10hashar) That is happening again after something got restarted yesterday. Filled as T171158
[11:21:55] <wikibugs>	 (03Abandoned) 10Zfilipin: Use RelatedArticles' LocalSettings.php when running Selenium tests [integration/config] - 10https://gerrit.wikimedia.org/r/366236 (https://phabricator.wikimedia.org/T164721) (owner: 10Zfilipin)
[11:25:02] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Cloud-VPS: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3455941 (10hashar) The Nodepool launch errors https://grafana.wikimedia.org/dashboard/db/nodepool?panelId=12&fullscreen&orgId=1&...
[11:37:15] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Cloud-VPS: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3455950 (10hashar) Seems the nova database is on `m5-master.eqiad.wmnet` db name `nova`.
[11:39:25] <wikibugs>	 (03PS2) 10Zfilipin: WIP Run WebdriverIO tests in CI for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/366531 (https://phabricator.wikimedia.org/T164721)
[11:41:19] <paladox>	 hashar the reason why castor.integration.eqiad.wmflabs was unaccissible because it needed two service restarted
[11:41:30] <paladox>	 due to the ldap certificate being updated
[11:41:40] <paladox>	 nscd and nslcd
[11:43:02] <hashar>	 we did reboot it
[11:43:14] <hashar>	 but puppet was broken on the instance so the new CA was not provisioned on the host
[11:43:24] <paladox>	 ah 
[11:43:34] <hashar>	 so even with a restart, the instance still had the old/obsolete cert and thus would not connect
[11:43:47] <paladox>	 yeh
[11:44:00] <paladox>	 did you manage to salt in?
[11:44:11] <hashar>	 no it was broken as well
[11:44:16] <hashar>	 so I just deleted the instance
[11:44:16] <paladox>	 oh
[11:44:39] <paladox>	 hashar you can recreate it :). It should work now.
[11:44:43] <hashar>	 yeah it does
[11:44:55] <paladox>	 oh i see castor02
[11:45:00] <hashar>	 next issue is that openstack is broken and refuses to spawn more instances
[11:45:08] <paladox>	 oh
[11:45:11] <hashar>	 and there are tens of instances on beta cluster which are broken
[11:45:25] <paladox>	 hashar can it ssh?
[11:45:42] <hashar>	 I dont know
[11:45:49] <paladox>	 ok
[11:46:16] <paladox>	 nodepool probaly needs to pick up the new certificate for ldap
[11:46:25] <paladox>	 since the images are rebuilt at 2pm every day
[11:46:40] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3455974 (10Ladsgroup)
[11:49:09] <paladox>	 hmm how do i fix
[11:49:11] <paladox>	 Could not chdir to home directory /home/paladox: Permission denied
[11:50:41] <wikibugs>	 (03PS3) 10Zfilipin: WIP Run WebdriverIO tests in CI for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/366531 (https://phabricator.wikimedia.org/T164721)
[12:01:37] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Cloud-VPS: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3455996 (10Luke081515) p:05Triage>03High
[12:20:06] <wmf-insecte>	 Project beta-update-databases-eqiad build #18593: 04FAILURE in 5.2 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/18593/
[12:25:34] <wmf-insecte>	 Yippee, build fixed!
[12:25:34] <wmf-insecte>	 Project beta-update-databases-eqiad build #18594: 09FIXED in 1 min 39 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/18594/
[12:32:01] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:35:32] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:37:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:37:28] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:37:58] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[12:39:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[12:40:03] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[12:40:53] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[12:41:01] <shinken-wm>	 PROBLEM - Puppet errors on deployment-stream is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[12:41:13] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:43:17] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sentry01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[12:44:03] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:45:29] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:45:38] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[12:45:54] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:47:02] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:47:40] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[12:48:18] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[12:49:55] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[12:50:50] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[12:51:06] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[12:52:27] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:52:29] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:52:33] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[12:52:35] <shinken-wm>	 PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:54:37] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:55:38] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[12:55:44] <shinken-wm>	 PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:57:41] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:57:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:58:07] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[12:59:29] <paladox>	 hashar does zuul support edcsa keys?
[12:59:33] <paladox>	 i get this warnning
[12:59:34] <paladox>	 UserWarning: Unknown ssh-rsa host key for [127.0.0.1]:29418: 61424736f3dea4ebc8cd59f27ec94a20
[12:59:52] <hashar>	 paladox: it uses Paramiko, a python implementation of ssh
[13:00:03] <paladox>	 ah /me checks paramiko
[13:00:04] <paladox>	 thanks
[13:00:06] <hashar>	 that message is because you have to manually accept the gerrit ssh key
[13:00:13] <paladox>	 oh, i did
[13:00:41] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[13:00:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[13:00:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[13:02:16] <shinken-wm>	 PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[13:02:30] <shinken-wm>	 PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[13:02:40] <shinken-wm>	 PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[13:02:44] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[13:02:54] <shinken-wm>	 PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[13:04:00] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[13:04:10] <paladox>	 hashar how do i manually verify it?
[13:04:18] <paladox>	 i've ssh into it by using the zuul user
[13:04:23] <paladox>	 to store it in known_host
[13:04:27] <paladox>	 but that does not seem to work
[13:04:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[13:05:10] <hashar>	 paladox: double check that it is actually in /var/lib/zuul/.ssh/known_hosts ? 
[13:05:17] <paladox>	 ok
[13:05:31] <hashar>	 I guess you can do something like:  sudo su - zuul
[13:05:34] <paladox>	 nope that's not in there
[13:05:40] <hashar>	 ssh -p 29418 127.0.0.1
[13:05:48] <paladox>	 it has the new key from the gerrit server
[13:05:56] <paladox>	 (it's now edcsa)
[13:05:58] <hashar>	 so maybe you add it added to /home/paladox/.ssh/known_host s:D
[13:06:17] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[13:06:20] <paladox>	  ssh -p 29418 jenkins@127.0.0.1 works
[13:07:17] <shinken-wm>	 PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[13:10:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[13:12:47] <hashar>	 :-}
[13:13:33] <paladox>	 but it dosen't work in zuul it seems getting unkown ssh-rsa key
[13:13:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[13:13:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[13:15:02] <paladox>	 hashar this https://github.com/paramiko/paramiko/issues/67 may be related
[13:15:38] <paladox>	 https://github.com/paramiko/paramiko/issues/88
[13:16:11] <shinken-wm>	 PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[13:16:35] <paladox>	 aha
[13:16:44] <paladox>	 hashar we are using a very old version of it
[13:17:02] <paladox>	 oh
[13:17:07] <paladox>	 1.8.0<2.0.0
[13:17:29] <wikibugs>	 10Continuous-Integration-Infrastructure, 10monitoring: tune gearman alarms - https://phabricator.wikimedia.org/T168085#3456140 (10faidon) p:05Triage>03Low
[13:20:40] <hashar>	 paladox: sorry I can't investigate it
[13:20:54] <paladox>	 ok
[13:21:32] <hashar>	 paladox: but on CI we seem to have paramiko 1.15.1  from jessie
[13:21:49] <paladox>	 oh
[13:21:54] <paladox>	 on stretch i have 2.0.0
[13:23:00] <hashar>	 most probably zuul does not work with it ?  
[13:23:03] <hashar>	 I have no idea really
[13:26:00] <paladox>	 It seems zuul only seems to work with rsa
[13:26:09] <paladox>	 as i managed to get the rsa key into known_host
[13:27:17] <paladox>	 i will file a task so that we can try to fix that for wmf (as when we upgrade to gerrit 2.14 that will be a problem if we do ssh <host> instead of ssh -o HostKeyAlgorithms=ssh-rsa -p 29418 jenkins@127.0.0.1
[13:27:41] <paladox>	 (that adds it to know_host you wont need to do that part again once it's added :))
[13:29:09] <wikibugs>	 10Release-Engineering-Team, 10Zuul: Add support for edcsa keys in zuul - https://phabricator.wikimedia.org/T171165#3456188 (10Paladox)
[13:39:08] <wikibugs>	 10Release-Engineering-Team, 10Zuul: Add support for ecdsa keys in zuul - https://phabricator.wikimedia.org/T171165#3456245 (10Paladox)
[13:39:15] <wikibugs>	 10Release-Engineering-Team, 10Zuul: Add support for ecdsa keys in zuul - https://phabricator.wikimedia.org/T171165#3456188 (10Paladox)
[13:49:17] <paladox>	 hashar aha
[13:49:21] <paladox>	 i think https://github.com/paramiko/paramiko/commit/0ddb28f3313e793cf574ed5fed42761be1adf6d5 this fixes it
[13:51:44] <paladox>	 hashar yep that fixes it
[13:51:56] * paladox tested it
[13:52:09] <wikibugs>	 10Release-Engineering-Team, 10Zuul: Add support for ecdsa keys in zuul - https://phabricator.wikimedia.org/T171165#3456278 (10Paladox) Fixed by  https://github.com/paramiko/paramiko/commit/0ddb28f3313e793cf574ed5fed42761be1adf6d5
[13:56:49] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Cloud-VPS: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3456299 (10Andrew) 05Open>03Resolved a:03Andrew I resolved this by running the query in https://ask.openstack.org/en/quest...
[13:57:39] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Cloud-VPS: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3456302 (10hashar) I can confirm that resolved the issue completely. Thank you!
[14:04:39] <wikibugs>	 (03CR) 10Zfilipin: "Tested using mediawiki-core-qunit-selenium-337602-jessie job." [integration/config] - 10https://gerrit.wikimedia.org/r/366531 (https://phabricator.wikimedia.org/T164721) (owner: 10Zfilipin)
[14:12:09] <wikibugs>	 (03CR) 10Zfilipin: "One more test for core when EXT_NAME is not set." [integration/config] - 10https://gerrit.wikimedia.org/r/366531 (https://phabricator.wikimedia.org/T164721) (owner: 10Zfilipin)
[14:12:38] <wikibugs>	 (03PS4) 10Zfilipin: Run WebdriverIO tests in CI for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/366531 (https://phabricator.wikimedia.org/T164721)
[14:31:10] <hashar>	 !log deployment-prep: manually cleaned out the puppet master configuration. It was all screwed up.  Notably I removed bits about the puppetdb
[14:31:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[14:34:24] <shinken-wm>	 PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[14:39:23] <shinken-wm>	 RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:41:11] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:42:31] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:42:43] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456426 (10hashar)
[14:43:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:43:18] <shinken-wm>	 RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:45:50] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0]
[14:45:54] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:46:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[14:48:13] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Recommendation-API: recommendation_api module breaking beta labs puppet - https://phabricator.wikimedia.org/T171075#3456464 (10mobrovac) This is a general problem with `service::node` in beta, it seems, closing as duplicate.
[14:48:18] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:48:30] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Recommendation-API: recommendation_api module breaking beta labs puppet - https://phabricator.wikimedia.org/T171075#3456466 (10mobrovac)
[14:48:32] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456469 (10mobrovac)
[14:49:03] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:49:53] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ms-fe02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:50:02] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next): puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456426 (10mobrovac)
[14:50:29] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:50:39] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[14:50:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:52:01] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:52:24] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:52:26] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:52:32] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:52:34] <shinken-wm>	 RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:52:42] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:52:50] <tgr>	 zeljkof: https://gerrit.wikimedia.org/r/#/c/366248/ was deployed (thanks!) but CI still does not seem to work: https://gerrit.wikimedia.org/r/#/c/365986/
[14:52:56] <tgr>	 did I miss something?
[14:53:10] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Cloud-Services, 10Operations, 10Services: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3456488 (10hashar)
[14:54:04] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148#3455468 (10hashar) Beta cluster instances have the exact same issue. Filled as T171174
[14:54:06] <zeljkof>	 tgr: maybe I made a mistake while deploying, will try again, just a minute
[14:54:36] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:55:03] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Cloud-Services, 10Operations, 10Services: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3456488 (10Paladox) Now that puppet is fixed, you can either wait a few hours for puppet t...
[14:55:36] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:55:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:55:56] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Cloud-Services, 10Operations, 10Services: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3456519 (10hashar)
[14:56:06] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:57:08] <zeljkof>	 !log reloading Zuul to deploy 80b9d85
[14:57:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[14:57:42] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:57:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:58:06] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:58:11] <zeljkof>	 tgr: should be fine now, sorry, looks like I did not deploy correcly
[15:00:29] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next): puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456558 (10hashar) deployment-trending01.deployment-prep.eqiad.wmflabs has a similar issue: ```     (Exec[trendingedits config deploy] => Se...
[15:00:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:00:45] <shinken-wm>	 RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:00:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:02:17] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next): puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456565 (10mobrovac) Yeah, there seems to be something weird in the Scap3 config deploy part of `service::node`. The difference between Beta...
[15:02:28] <shinken-wm>	 RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:02:40] <shinken-wm>	 RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0]
[15:02:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[15:02:57] <shinken-wm>	 RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:03:10] <hashar>	 yeah
[15:03:27] <shinken-wm>	 RECOVERY - Puppet errors on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:04:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:06:16] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:07:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:07:01] <shinken-wm>	 PROBLEM - Puppet errors on deployment-stream is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:07:15] <shinken-wm>	 RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:07:15] <shinken-wm>	 RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]
[15:07:21] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:08:10] <hashar>	 !log removed profile::recommendation_api  from deployment-sca01  to try to fix the ssh access for mobrovac  T171173  T171174
[15:08:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[15:08:15] <stashbot>	 T171174: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174
[15:08:15] <stashbot>	 T171173: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173
[15:08:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[15:08:49] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:08:56] <shinken-wm>	 PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:09:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:09:20] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:09:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sentry01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:09:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:10:03] <wikibugs>	 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.30-release-notes (WMF-deploy-2017-07-11_(1.30.0-wmf.9)), 10Patch-For-Review, 10User-zeljkofilipin: Run WebdriverIO tests in CI for extensions - https://phabricator.wikimedia.org/T164721#3456600 (10zeljkofilipin) Done:  - I have c...
[15:10:04] <tgr>	 thx!
[15:10:04] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:10:06] <shinken-wm>	 RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:10:22] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next): puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456602 (10hashar) On deployment-sca01 I have removed `profile::recommendation_api` puppet then fails with: ``` Error: Failed to apply catal...
[15:10:33] <shinken-wm>	 RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:10:43] <shinken-wm>	 RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:10:53] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ms-fe02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[15:11:09] <shinken-wm>	 RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:11:31] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:11:39] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:11:50] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:11:56] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:13:04] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[15:13:24] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:13:26] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:13:36] <shinken-wm>	 PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:13:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:13:43] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Services, 10VPS-Projects: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3456611 (10bd808)
[15:15:15] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Services, 10VPS-Projects: New instance in deployment prep can't run puppet for the first time - https://phabricator.wikimedia.org/T171177#3456618 (10Ottomata)
[15:15:36] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:16:45] <zeljkof>	 tgr: sorry for messing up, I do zuul deploys rarely, I'm not even sure what I did wrong, since it is just one command :/
[15:16:58] <zeljkof>	 anyway, deploying again worked :)
[15:17:04] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:17:15] <tgr>	 zeljkof: thanks for the quick fix!
[15:17:23] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next): puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456637 (10hashar) I have added `profile::recommendation_api` back on deployment-sca01.
[15:18:33] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Services, 10VPS-Projects: New instance in deployment prep can't run puppet for the first time - https://phabricator.wikimedia.org/T171177#3456656 (10hashar)
[15:18:35] <shinken-wm>	 PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:18:41] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:18:48] <zeljkof>	 tgr: not sure if you can deploy, but it's just `fab deploy_zuul`
[15:18:56] <zeljkof>	 https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Deploy_configuration
[15:19:05] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:19:07] <zeljkof>	 in case you have to do it yourself one of these days
[15:20:09] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Services, 10VPS-Projects: New instance in deployment prep can't run puppet for the first time - https://phabricator.wikimedia.org/T171177#3456618 (10hashar) Seems the initial puppet run refuses to process for whatever rea...
[15:20:44] <tgr>	 thanks, that's good to know
[15:21:23] <wikibugs>	 10Gerrit, 10Developer-Relations, 10Documentation: [[mw:Gerrit/Tutorial]] is way too much information for new contributors - https://phabricator.wikimedia.org/T161901#3456688 (10Aklapper)
[15:21:39] <shinken-wm>	 PROBLEM - Puppet errors on deployment-zookeeper02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:21:41] <shinken-wm>	 PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:21:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-fluorine02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[15:21:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-salt02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:21:49] <hashar>	 What the fuck :(
[15:22:10] <hashar>	 Error: Failed to apply catalog: Could not find dependent Service[eventlogging/init] for File[/usr/local/lib/eventlogging/filters.py] at /etc/puppet/modules/eventlogging/manifests/plugin.pp:49
[15:23:15] <shinken-wm>	 PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:23:28] <shinken-wm>	 PROBLEM - Puppet errors on deployment-poolcounter04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[15:23:40] <shinken-wm>	 PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:23:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:23:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[15:24:26] <shinken-wm>	 PROBLEM - Puppet errors on deployment-puppetmaster02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:25:48] <shinken-wm>	 PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[15:26:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[15:27:16] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:28:16] <shinken-wm>	 PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:29:41] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:29:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-kafka03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:29:59] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:30:29] <hashar>	 !log deployment-prep : removing project wide puppet classes from https://horizon.wikimedia.org/project/puppet/   All are role::eventlogging::analytics::*
[15:30:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[15:30:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:31:03] <shinken-wm>	 PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:31:31] <shinken-wm>	 PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[15:31:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[15:33:01] <shinken-wm>	 PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[15:33:13] <shinken-wm>	 PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:33:13] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[15:33:29] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:33:59] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:38:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:42:49] <greg-g>	 hashar: do you need help from cloud team? they were dealing with the CA issues yesterday...
[15:43:16] <shinken-wm>	 RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:43:28] <hashar>	 greg-g: yeah I reached out to andrew as soon as he connected and fixed up an issue with nodepool
[15:43:34] <hashar>	 and provided guidances for ssh/ldap etc issue
[15:43:39] <hashar>	 it is mostly sorted out now
[15:43:44] <hashar>	 I am filling my bits in https://wikitech.wikimedia.org/wiki/Incident_documentation/20170719-ldap
[15:44:06] <greg-g>	 cool, just saw the "WHAT THE FUCK" and was worried :)
[15:45:04] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:46:29] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:46:39] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[15:46:55] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:46:59] <shinken-wm>	 RECOVERY - Puppet errors on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[15:48:14] <hashar>	 greg-g: how the WTF is me being exhausted. But I found the reason, someone added some faulty puppet classes on all beta instances which broke puppet :}
[15:48:34] <shinken-wm>	 RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:48:40] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:48:54] <greg-g>	 hashar: heh, "great"
[15:49:19] <shinken-wm>	 RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:49:20] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:49:21] <James_F>	 greg-g: Though better for Staging than Prod. ;-)
[15:49:52] <greg-g>	 James_F: indeed, then only hashar/I get upset, as opposed to all of Ops ;)
[15:49:55] <hashar>	 greg-g: https://phabricator.wikimedia.org/p/hashar/ more or less captures my day
[15:50:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:50:16] <James_F>	 Indeed.
[15:50:36] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:50:50] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ms-fe02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:51:48] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:52:06] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:53:01] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:53:27] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:53:27] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:53:33] <shinken-wm>	 RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:56:35] <shinken-wm>	 RECOVERY - Puppet errors on deployment-zookeeper02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:56:43] <shinken-wm>	 RECOVERY - Puppet errors on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:58:28] <shinken-wm>	 RECOVERY - Puppet errors on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:58:42] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:58:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:59:05] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:01:43] <shinken-wm>	 RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:01:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:01:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:02:17] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:03:15] <shinken-wm>	 RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]
[16:03:40] <shinken-wm>	 RECOVERY - Puppet errors on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0]
[16:03:44] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[16:03:56] <shinken-wm>	 RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:04:26] <shinken-wm>	 RECOVERY - Puppet errors on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:04:51] <wikibugs>	 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.30-release-notes (WMF-deploy-2017-07-11_(1.30.0-wmf.9)), 10Patch-For-Review, 10User-zeljkofilipin: Run WebdriverIO tests in CI for extensions - https://phabricator.wikimedia.org/T164721#3456823 (10Jdlrobson) > looks like a page i...
[16:05:46] <shinken-wm>	 RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:06:47] <shinken-wm>	 RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:08:01] <shinken-wm>	 RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:08:09] <shinken-wm>	 RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:08:23] <shinken-wm>	 RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:08:29] <shinken-wm>	 RECOVERY - Puppet errors on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:09:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[16:09:42] <wikibugs>	 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MW-1.30-release-notes, 10MediaWiki-Core-Tests, and 5 others: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink - https://phabricator.wikimedia.org/T170880#3456857 (10Jdlrobson)
[16:09:48] <shinken-wm>	 RECOVERY - Puppet errors on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:10:46] <shinken-wm>	 RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:11:04] <shinken-wm>	 RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:11:33] <shinken-wm>	 RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:13:10] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:14:01] <shinken-wm>	 RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:18:00] <wikibugs>	 (03Abandoned) 10Jdlrobson: Include Vector in phpunit tests for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/366470 (https://phabricator.wikimedia.org/T170880) (owner: 10Jdlrobson)
[16:19:10] <wikibugs>	 (03CR) 10Umherirrender: [C: 031] Reduce false positives in ReferenceThisSniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/366504 (https://phabricator.wikimedia.org/T170316) (owner: 10Legoktm)
[16:26:20] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next), 10User-Joe: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456941 (10mobrovac) I can't make sense of this part:  > `User[deploy-service] => Exec[recommendation_api config deploy]`  I j...
[16:29:48] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-Incident: CI jobs are blocked because castor is unreachable - https://phabricator.wikimedia.org/T171148#3456944 (10hashar) https://wikitech.wikimedia.org/wiki/Incident_documentation/20170719-ldap#CI.2Fbeta
[16:30:05] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Cloud-VPS, 10Wikimedia-Incident: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded - https://phabricator.wikimedia.org/T171158#3456946 (10hashar) https://wikitech.wikimedia.org/wiki/Incident_d...
[16:30:21] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Services, and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3456948 (10hashar) https://wikitech.wikimedia.org/wiki/Incident_documentation/20170719-ldap#CI.2...
[16:35:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Services, and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3456966 (10hashar) So the state as I understand it right now:  The puppet master was broken, I h...
[16:35:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Services, and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3456969 (10hashar) p:05Triage>03High
[16:36:07] <hashar>	 greg-g: I have filled my bits on the incident report ( warning it is long , you wanna skip reading https://wikitech.wikimedia.org/wiki/Incident_documentation/20170719-ldap#CI.2Fbeta  ) :D
[16:36:23] <hashar>	 the aftermath for beta is to fix up ssh on all the instances https://phabricator.wikimedia.org/T171174
[16:36:33] <hashar>	 I had a bunch fixed by running puppet and mass restarting  nslcd  
[16:36:41] <hashar>	 but I havent verified whether they all work.  No idea how to do that
[16:36:51] <hashar>	 at least I have left some instructions
[16:37:31] <ottomata>	 hashar: I rebooted deployment-eventlog01, no luck
[16:37:43] <hashar>	 ottomata: recreate it I guess
[16:37:56] <ottomata>	 hm, really?  but i just created it this morning
[16:38:10] <ottomata>	 can i delete it and recreate it with the same name?
[16:41:33] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Services, and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3456998 (10hashar) Announced on the QA list pointing back to this task
[16:41:35] <hashar>	 ottomata: I dont know
[16:41:38] <ottomata>	 hah ok
[16:41:45] <ottomata>	 new name it is :/
[16:41:47] <hashar>	 ottomata: but labs /  beta puppet master were all f**d up today
[16:41:51] <ottomata>	 ya
[16:41:52] <hashar>	 so it does not surprise me it is broken somehow
[16:41:57] <hashar>	 I would say
[16:41:58] <hashar>	 delete it 
[16:42:02] <hashar>	 wait a minute or so
[16:42:10] <hashar>	 and create with same name
[16:42:10] <hashar>	 with no class applied
[16:42:22] <shinken-wm>	 PROBLEM - Host deployment-eventlog01 is DOWN: CRITICAL - Host Unreachable (10.68.22.64)
[16:42:23] <hashar>	 (sometime if you apply class to an instance  puppet will fail the first provisionning)
[16:42:47] <hashar>	 !log How to fix ssh access on beta cluster instances: https://phabricator.wikimedia.org/T171174#3456966
[16:42:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[16:54:04] <paladox>	 hashar as you fixed puppet on the beta cluster, you can now tell users to restart there instance once and wait 5-10 mins then restart it again.
[16:54:09] <paladox>	 They should regain access :)
[16:54:31] <hashar>	 potentiall
[16:54:32] <hashar>	 y
[16:54:40] <hashar>	 but I did that for most of them
[16:54:47] <hashar>	 the rest are instances for which puppet does not run properly
[16:55:11] <hashar>	 and hence the new CA Certificate is not provisioned, thus even a reboot would not fix it :(
[16:55:21] <hashar>	 I am heading back home
[16:55:23] <hashar>	 been a busy day
[16:55:32] <paladox>	 ok
[16:55:39] <ottomata>	 hashar:  magic, i'm into the new instance
[16:55:39] <ottomata>	 thanks
[16:56:43] <hashar>	 ottomata: \O/
[16:57:10] <hashar>	 ottomata: I guess the previous one had a bad initial provisioning which prevented it from running puppet
[16:57:12] <hashar>	 ottomata: I am happy to see it fixed :}
[16:57:32] <ottomata>	 aye
[16:57:34] <ottomata>	 ya thanks
[16:59:48] <ottomata>	  hm hashar except, puppet won't run to connect to the deployment-prep puppetmaster :(
[16:59:55] <hashar>	 -;(
[16:59:56] <ottomata>	 certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster02.deployment-prep.eqiad.wmflabs
[17:00:00] <hashar>	 ah yeah
[17:00:24] <hashar>	 ottomata: puppet is broken for new instances when the project has a puppet master
[17:00:25] <hashar>	 https://phabricator.wikimedia.org/T152941
[17:00:28] <ottomata>	 haha
[17:00:29] <hashar>	 that has the workaround to copy paste
[17:00:30] <hashar>	 :(((
[17:00:32] <ottomata>	 ok
[17:01:14] <hashar>	 have a good afternoon!
[17:01:45] <ottomata>	 laters!
[17:01:48] <Zppix>	 have a good one
[17:01:50] <ottomata>	 (gonna keep posting here, feel free to ignore)
[17:04:57] <ottomata>	 few ok, some version of the workaround worked, but not quite the one(s) pasted
[17:04:59] <ottomata>	 phew*
[17:05:10] <shinken-wm>	 PROBLEM - Puppet errors on deployment-eventlog02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[17:06:49] <ottomata>	 say whaaa?
[17:06:50] <ottomata>	 Provider scap3 is not functional on this host
[17:06:51] <ottomata>	 ??
[17:10:17] <ottomata>	 no scap package for trusty?
[17:10:29] <ottomata>	 thcipriani: ?
[17:11:21] <thcipriani>	 ottomata: which host?
[17:11:28] <ottomata>	 deployment-eventlog02 
[17:11:31] <ottomata>	 in deployment-prep
[17:12:02] <ottomata>	 missing python-semver ?
[17:12:16] <thcipriani>	 hrm, scap is not installed on that host, but there is an available...oh
[17:12:32] <ottomata>	 its a brand new instance
[17:12:37] <ottomata>	 i'm trying to spin up a new beta eventlogging host there
[17:12:41] <ottomata>	 since the old one seems bugsted
[17:12:42] <ottomata>	 busted
[17:12:46] <ottomata>	 and we've wanted to make a new one anyway
[17:13:40] <thcipriani>	 blerg. I thought we added that dependency a while ago, but all the trusty instances probably had scap installed by that point.
[17:13:56] <thcipriani>	 why a new trusty host?
[17:14:48] <paladox>	 install python-semver from xenial
[17:14:50] <paladox>	 works for me
[17:14:58] <thcipriani>	 lemme see where we're using semver, I can't remember...maybe we can move it to suggests
[17:15:06] <thcipriani>	 RainbowSprinkles: ^
[17:15:08] <paladox>	 http://mirrors.kernel.org/ubuntu/pool/universe/p/python-semver/python-semver_2.0.1-1_all.deb
[17:15:12] <ottomata>	 thcipriani:  i'm replacing an old trusty host
[17:15:13] <ottomata>	 because it broke
[17:15:19] <ottomata>	 EL stuff still uses upstart
[17:15:22] <RainbowSprinkles>	 I haven't used it yet
[17:15:24] <ottomata>	 big task to change
[17:15:30] <RainbowSprinkles>	 It was just to get the dependency in for future use
[17:15:32] <paladox>	 ottomata why not go with jessie?
[17:15:47] <ottomata>	 need upstart for now
[17:15:57] <paladox>	 that's on jessie too
[17:16:03] <ottomata>	 oh?
[17:16:05] <ottomata>	 hmmm
[17:16:10] <paladox>	 https://packages.debian.org/jessie/upstart
[17:16:25] <thcipriani>	 RainbowSprinkles: can we move it to suggests for now?
[17:16:32] <RainbowSprinkles>	 That's fine
[17:16:35] * thcipriani does
[17:16:39] <ottomata>	 paladox:  interesting
[17:16:45] <ottomata>	 i would prefer to go with trusty if we can for now
[17:16:49] <ottomata>	 but that might help us migrate faster in the future
[17:16:50] <paladox>	 it's not installed by default though
[17:16:53] <ottomata>	 we're on trusty in prod
[17:16:56] <paladox>	 oh
[17:17:13] <ottomata>	 and we have this instance in beta to test prod deployments (and puppet) so ya
[17:18:06] <paladox>	 iridium would have problems updating scap too
[17:18:10] <paladox>	 as it's also trusty
[17:18:33] <paladox>	 ottomata wget http://mirrors.kernel.org/ubuntu/pool/universe/p/python-semver/python-semver_2.0.1-1_all.deb
[17:18:39] <paladox>	 dpkg -i python-semver_2.0.1-1_all.deb
[17:18:42] <paladox>	 apt-get install scap
[17:18:45] <paladox>	 that should work
[17:18:46] <paladox>	 :)
[17:18:47] <ottomata>	 n2it :)
[17:19:16] <thcipriani>	 RainbowSprinkles: could you bless https://phabricator.wikimedia.org/D724
[17:19:20] <ottomata>	 thanks paladox
[17:19:23] <paladox>	 (because python-semver is not on trusty but is in xenial. Tested on a trusty instance myself and found no conflicts)
[17:19:26] <paladox>	 your welcome :)
[17:19:27] <ottomata>	 aye
[17:19:46] <RainbowSprinkles>	 thcipriani: Ok, I did the holy incantations and lit some incense.
[17:20:07] <thcipriani>	 :D
[17:20:21] <paladox>	 thcipriani we could backport https://packages.ubuntu.com/xenial/python-semver onto the trusty wikimedia apt repo.
[17:20:30] <paladox>	 it has no conflicts
[17:20:32] <RainbowSprinkles>	 Or we could just stop using trusty ;-)
[17:20:37] <paladox>	 yeh :)
[17:20:44] <RainbowSprinkles>	 (continuing to /make trusty work/ is a losing battle :))
[17:24:04] <thcipriani>	 ottomata: FWIW, whenever https://phabricator.wikimedia.org/D724 makes its way through the pipes (you'll see the scap version update on beta to something that contains 20170720) you should be able to install
[17:24:34] <thcipriani>	 "the pipes" == "jenkins debian glue"
[17:38:15] <wikibugs>	 10Release-Engineering-Team, 10Packaging, 10Release: MediaWiki 1.29 tarball comes with the wrong extensions - and misses some - https://phabricator.wikimedia.org/T171197#3457185 (10Joergi123)
[17:41:04] <wikibugs>	 10Release-Engineering-Team, 10Packaging, 10Release: MediaWiki 1.29 tarball comes with the wrong extensions - and misses some - https://phabricator.wikimedia.org/T171197#3457199 (10Joergi123) Reedy wrote on IRC:   SimpleAntiSpam was there in 1.22 and removed in 1.23 Vector was removed in 1.23 Looks like a bug...
[17:44:23] <ottomata>	 hm, other q
[17:44:27] <ottomata>	 i'm on deployment-tin
[17:44:28] <ottomata>	 in
[17:44:37] <ottomata>	 oh wait, i think i know
[17:45:12] <shinken-wm>	 RECOVERY - Puppet errors on deployment-eventlog02 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:54:06] <wmf-insecte>	 Yippee, build fixed!
[17:54:06] <wmf-insecte>	 Project mediawiki-core-code-coverage build #2895: 09FIXED in 2 hr 54 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2895/
[18:02:07] <wikibugs>	 10Release-Engineering-Team, 10Release: MediaWiki 1.29 tarball comes with the wrong extensions - and misses some - https://phabricator.wikimedia.org/T171197#3457290 (10Aklapper)
[18:02:57] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Operations, 10Phabricator: replace sdb and then setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3457291 (10Cmjohnson) a:05Cmjohnson>03RobH Disk has been replaced:   Return shipping info is   USPS 9202 3946 5301 2436 1520 81 FEDEX 96119...
[18:03:08] <wikibugs>	 10Release-Engineering-Team, 10MW-1.29-release, 10Release: MediaWiki 1.29 tarball comes with the wrong extensions - and misses some - https://phabricator.wikimedia.org/T171197#3457293 (10greg)
[18:09:11] <mepps>	 hi!
[18:09:26] <mepps>	 i'm struggling to figure out why these tests are failing: https://integration.wikimedia.org/ci/job/mwext-donationinterfacecore-REL1_27-zend56-jessie/76/console
[18:09:33] <mepps>	 when they are not failing locally for me or ejegg
[18:20:32] <shinken-wm>	 PROBLEM - Host deployment-eventlogging03 is DOWN: CRITICAL - Host Unreachable (10.68.18.111)
[18:28:10] <wikibugs>	 10Scap, 10ORES, 10Scoring-platform-team-Backlog: ORES deployment finish "successfully" even when uwsgi and celery fail to successfully start up - https://phabricator.wikimedia.org/T170950#3457373 (10Ladsgroup)
[18:34:23] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10VPS-Projects, 10Services (watching): New instance in deployment prep can't run puppet for the first time - https://phabricator.wikimedia.org/T171177#3457390 (10mobrovac)
[18:35:05] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10VPS-Projects, and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3457392 (10mobrovac)
[18:37:50] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next), 10User-Joe: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3457398 (10mobrovac) p:05Triage>03High Setting to high prio, as this is now precluding us from logging into the boxes and...
[18:40:56] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next), 10User-Joe: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456426 (10Paladox) @mobrovac you could remove the puppet class from the instance. Restart the instance after that wait 5-10 m...
[18:54:45] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[19:34:43] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[20:08:40] <wmf-insecte>	 Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #16: 04FAILURE in 1 hr 19 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/16/
[20:15:33] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next), 10User-Joe: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3457781 (10hashar) I did remove `profile::recommendation_api` on deployment-sca01 earlier but was hitting another puppet issue...
[20:19:03] <wikibugs>	 10Release-Engineering-Team, 10MW-1.29-release, 10Release: MediaWiki 1.29 tarball comes with the wrong extensions - and misses some - https://phabricator.wikimedia.org/T171197#3457792 (10MacFan4000) a:03MacFan4000
[20:19:21] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10VPS-Projects, and 2 others: a lot of beta cluster instances are not reachable over SSH - https://phabricator.wikimedia.org/T171174#3457796 (10hashar)
[20:19:26] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10VPS-Projects, 10Services (watching): New instance in deployment prep can't run puppet for the first time - https://phabricator.wikimedia.org/T171177#3457793 (10hashar) 05Open>03Resolved a:03Ottomata Andrew has delet...
[20:23:39] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:25:15] <paladox>	 twentyafterfour hi, question for you since your git-ssh codfw patch will probaly take alot longer to be reviewed, we could use git-ssh from eqiad instead of it being in codfw for now. The question is can traffic in codfw reach eqiad git-ssh?
[20:34:29] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Operations, 10Phabricator: replace sdb and then setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3457839 (10RobH)
[20:39:39] <robh>	 well, i just realized i installed phab1001 with jessie, and I guess i should have asked if it could be stretch
[20:39:53] <robh>	 paladox: jessie fine or should this be strech?
[20:39:54] <robh>	 stretch
[20:40:25] <paladox>	 robh uh, will have to forward that to releng (twentyafterfour, greg-g)
[20:40:41] <robh>	 i pinged since you asked me about the install
[20:40:43] <paladox>	 but stretch wont work, will have to be jessie i think but will leave that up to releng
[20:40:45] <robh>	 i assumed you were involved ;]
[20:40:51] <Zppix>	 i would assume if stretch supports everything phab needs i see why not
[20:40:58] <robh>	 itd be a nice way to find out no?
[20:41:05] <paladox>	 Zppix php7 is not supported by phabricator
[20:41:12] <paladox>	 php7.1 is but that's not in stretch
[20:41:21] <Zppix>	 you could downgrade no?
[20:41:26] <robh>	 well, jessie is being isntalled now but it could be changed to stretch 
[20:41:29] <robh>	 installed even
[20:41:32] <paladox>	 ok thanks
[20:41:54] <paladox>	 Zppix no it carn't php5 wont work on stretch
[20:42:04] <paladox>	 needs to be compiled by someone.
[20:42:52] <Reedy>	 paladox: can't
[20:42:55] <Reedy>	 there's never an r in it
[20:43:09] <paladox>	 woops sorry. thanks
[20:44:24] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: MW 1.30.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T168050#3457863 (10demon) 05Open>03Resolved
[20:44:39] <shinken-wm>	 PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:53:48] <greg-g>	 robh: please wait until twentyafterfour gives you an answer
[20:54:05] <robh>	 greg-g: well, i had already had jessie installing when i asked ;]
[20:54:13] <greg-g>	 robh: jessie should be fine
[20:54:15] <robh>	 so its really already done, but same amount of work to reimage
[20:54:23] <greg-g>	 robh: mostly just "mukunda is authority" :)
[20:54:25] <robh>	 ie: if stretch seems something to try im happy to reimage whenever!
[20:58:41] <shinken-wm>	 RECOVERY - Puppet errors on deployment-sca04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:09:15] <wikibugs>	 (03PS1) 10Thcipriani: Dockerfiles use build container pattern [integration/config] - 10https://gerrit.wikimedia.org/r/366726 (https://phabricator.wikimedia.org/T166888)
[21:30:53] <wikibugs>	 (03PS2) 10Thcipriani: Dockerfiles use build container pattern [integration/config] - 10https://gerrit.wikimedia.org/r/366726 (https://phabricator.wikimedia.org/T166888)
[22:13:40] <wikibugs>	 10MediaWiki-Codesniffer: MediaWiki.ExtraCharacters.CharacterBeforePHPOpeningTag.Found broken on hhvm-fatal-error.php - https://phabricator.wikimedia.org/T171234#3458354 (10Reedy)
[22:14:29] <wikibugs>	 10MediaWiki-Codesniffer: MediaWiki.ExtraCharacters.CharacterBeforePHPOpeningTag.Found broken on hhvm-fatal-error.php - https://phabricator.wikimedia.org/T171234#3458370 (10Reedy)
[22:17:29] <wikibugs>	 10Browser-Tests-Infrastructure, 10MinervaNeue, 10Reading-Web-Backlog: MinervaNeue browser test are flaking (waiting for {:class=>"mw-notification", :tag_name=>"div"} to become present ) - https://phabricator.wikimedia.org/T170890#3458397 (10Jdlrobson)
[22:17:32] <wikibugs>	 10Browser-Tests-Infrastructure, 10MinervaNeue, 10Reading-Web-Backlog: MinervaNeue browser test are flaking (waiting for {:class=>"mw-notification", :tag_name=>"div"} to become present ) - https://phabricator.wikimedia.org/T170890#3446357 (10Jdlrobson) p:05Normal>03High This is happening more and more oft...
[22:37:41] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] "This is live now and appears working." [integration/config] - 10https://gerrit.wikimedia.org/r/366726 (https://phabricator.wikimedia.org/T166888) (owner: 10Thcipriani)
[22:38:42] <wikibugs>	 (03Merged) 10jenkins-bot: Dockerfiles use build container pattern [integration/config] - 10https://gerrit.wikimedia.org/r/366726 (https://phabricator.wikimedia.org/T166888) (owner: 10Thcipriani)