[00:23:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0]
[00:33:35] <wikibugs>	 03Scap3, 10scap: File ownership differences between Scap3 and Trebuchet - https://phabricator.wikimedia.org/T116632#2647176 (10thcipriani) 05Open>03Resolved a:03thcipriani This has been solved in `scap::target`.
[00:36:29] <wikibugs>	 03Scap3: Local config deploys should use the target's current version - https://phabricator.wikimedia.org/T145373#2647179 (10thcipriani) a:03thcipriani
[03:04:55] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[03:06:43] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:12:43] <wmf-insecte>	 Project mediawiki-core-code-coverage build #2271: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2271/
[03:44:53] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:46:44] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[05:43:37] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[05:54:10] <shinken-wm>	 RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster is OK: OK: Less than 100.00% above the threshold [0.0]
[06:23:39] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:22:09] <shinken-wm>	 PROBLEM - Free space - all mounts on mira02 is CRITICAL: CRITICAL: deployment-prep.mira02.diskspace._srv.byte_percentfree (<11.11%)
[11:22:33] <Amir1>	 Jenkins can't install mediawiki core
[11:22:47] <Amir1>	 https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests-sqlite-hhvm/11859/console
[11:29:56] <hashar>	 Amir1: that repo is broken / not compatible with mediawiki/core @ master
[11:30:17] <hashar>	 Amir1: look at the stacktrace?!  exception 'MediaWiki\Services\ServiceDisabledException' with message 'Service disabled: DBLoadBalancerFactory' in /mnt/jenkins-workspace/workspace/mwext-Wikibase-repo-tests-sqlite-hhvm/src/includes/Services/ServiceContainer.php:340
[11:30:34] <Amir1>	 okay, strange
[11:30:37] <Amir1>	 aude: ^
[11:30:41] <Amir1>	 thanks hashar 
[11:31:27] <hashar>	 Amir1: ask about it in #wikidata , there might be a bug filled for it already
[11:31:44] <Amir1>	 I actually brought up the issue from there :D
[11:32:10] <shinken-wm>	 RECOVERY - Free space - all mounts on mira02 is OK: OK: All targets OK
[11:35:48] <aude>	 Amir1: https://phabricator.wikimedia.org/T146019
[11:35:56] <aude>	 maybe that's a duplicate, not sure
[11:36:14] <aude>	 happens just with a fresh mediawiki install.... no wikibase
[11:37:04] <aude>	 it's probably trivial to fix, but i have to look into the changes aaron has been doing
[12:41:14] <addshore>	 hey hashar !
[12:42:41] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2525383 (10aude) at the moment, i am runn...
[12:46:31] <addshore>	 aude: that looks like a horrible segfault :/
[12:46:45] <grrrit-wm>	 (03PS1) 10Tobias Gritschacher: Add 2ColConflict extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311408 (https://phabricator.wikimedia.org/T145411) 
[12:46:49] <aude>	 but now i'm not getting it :(
[12:46:59] * aude doesn't know how to reliably reproduce this
[12:47:35] <aude>	 btw, i have mw core wmf/1.28.0-wmf.19 checked out with whatever extensions it uses
[12:48:47] <addshore>	 aude: so there is one specific test that segfaults right? / should I try?
[12:49:14] <aude>	 i don't know if it's a specific test
[12:50:03] * aude tried 5 times now
[12:54:36] <hashar>	 addshore: aude: for the random php5.5 segfault,  the CI slaves do generate a core dump for them
[12:54:51] <hashar>	 I ran one via gdb with a super long trace.
[12:54:57] <hashar>	 that hint at the garbage collector
[12:55:08] <hashar>	 I have pasted to whatever task is open about it
[12:55:12] <hashar>	 but haven't looked further
[12:55:30] <hashar>	 one thing I remember is that we had the Zend 5.3 garbage collector segfaulting
[12:55:43] <hashar>	 so went backporting a few patches to our debian package
[12:55:57] <hashar>	 and we also had some hack in phpunit.php to disable the garbage collector
[12:56:13] <addshore>	 i vaugly remember that has
[12:56:14] <addshore>	 hack
[13:02:00] <aude>	 i had just checked out new / different code
[13:02:23] <aude>	 so maybe more memory, etc. involved in running tests on fresh (uncached) code
[13:02:46] <aude>	 and then maybe hit the garbage collector or something
[13:14:32] <aude>	 checked out master and then checked out the branch again
[13:14:35] <aude>	 reproduced
[13:15:21] <grrrit-wm>	 (03PS1) 10Tobias Gritschacher: Add ElectronPdfService extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311412 (https://phabricator.wikimedia.org/T142201) 
[13:17:17] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2648071 (10aude) seems I am able to repro...
[13:38:40] <shinken-wm>	 PROBLEM - Puppet run on deployment-apertium02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[13:43:27] <aude>	 hashar: do you know who is doing the train this week?
[13:43:42] <aude>	 assuming the issues we had last week are resolved
[13:49:52] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2648220
[13:51:54] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2648253 (10aude) with set env MALLOC_CHEC...
[14:01:17] <hashar>	 aude: will talk about it tonight
[14:01:19] <grrrit-wm>	 (03CR) 10Addshore: [C: 031] Add ElectronPdfService extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311412 (https://phabricator.wikimedia.org/T142201) (owner: 10Tobias Gritschacher)
[14:01:32] <grrrit-wm>	 (03CR) 10Addshore: [C: 031] Add 2ColConflict extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311408 (https://phabricator.wikimedia.org/T145411) (owner: 10Tobias Gritschacher)
[14:02:12] <hashar>	 aude: looks like the train this week will be Tyler and there is no train next wee
[14:02:52] <aude>	 ok
[14:03:15] <hashar>	 MALLOC_CHECK_=3 !!
[14:03:22] <hashar>	 that is magic ( https://phabricator.wikimedia.org/T142158#2648253 )
[14:03:24] <aude>	 if possible, i'd like to deploy wmf20 to wikidata earlier in the day on wednesday
[14:03:30] <aude>	 :)
[14:03:43] <hashar>	 well
[14:03:50] <hashar>	 we might push wmf.19 this week
[14:03:56] <hashar>	 wmf.20 I have no clue.
[14:03:57] * aude will be on an airplane in the evening and hoo is busy with studeis
[14:04:04] <hashar>	 I guess you get some patches to catch up with recent changes in mw ?
[14:04:06] <aude>	 ok, wmf19 with new wikibase
[14:05:05] <hashar>	 guess you can list out by editing the blocker task: https://phabricator.wikimedia.org/T143328
[14:05:09] <hashar>	 just edit the task detail summary maybe
[14:05:41] <aude>	 ok
[14:06:02] <aude>	 right now i found a bug in lua on wikibase master
[14:06:32] <aude>	 so i'm not sure, but we might at least want to backport some patches then to make sure wikibase is compatible with changes in core
[14:07:02] <aude>	 and been trying to run our phpunit tests against wmf19 core + wikibase that is deployed now
[14:10:03] <hashar>	 aude:  Iam pretty sure it is a bug in the php5.5 package we have on Trusty
[14:10:10] <hashar>	 it must be missing some fix to Zend garbage collectors
[14:11:55] <aude>	 could be
[14:11:59] <aude>	 think i have the same package
[14:25:10] <hashar>	 aude: and the ci Trusty slaves do capture core dumps
[14:25:41] <aude>	 ok
[14:25:54] <aude>	 i am able to reliably reproduce the issue now
[14:25:56] <aude>	 for some reason
[14:41:02] <paladox>	 hashar hi, i managed to get integration.wikimedia.org homepage on http://gerrit-zuul.wmflabs.org/ :)
[15:00:16] <wmf-insecte>	 Project mediawiki-core-code-coverage build #2272: 04STILL FAILING in 14 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2272/
[15:00:26] <shinken-wm>	 PROBLEM - Keyholder status on mira02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:10:52] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Analytics-Kanban, 10Differential, 10EventBus, 10Wikimedia-Stream: Run Kasocki tests in Jenkins via Differential commits - https://phabricator.wikimedia.org/T145140#2648480 (10Nuria) 05Open>03Resolved
[15:11:01] <hashar>	 Trebuchet is broken on deployment-tin for /srv/deployment/jobrunner/jobrunner
[15:11:08] <hashar>	 and given it is just two hosts, I am not going to try to fix it
[15:11:23] <hashar>	 !log beta: updating jobrunner service 0dc341f..a0e8216
[15:11:27] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[15:19:48] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] Add 2ColConflict extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311408 (https://phabricator.wikimedia.org/T145411) (owner: 10Tobias Gritschacher)
[15:20:02] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] Add ElectronPdfService extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311412 (https://phabricator.wikimedia.org/T142201) (owner: 10Tobias Gritschacher)
[15:44:06] <shinken-wm>	 PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[15:44:32] <wikibugs>	 10Beta-Cluster-Infrastructure, 03Scap3: Fixup beta scap3 keyholder problems - https://phabricator.wikimedia.org/T144647#2648622 (10thcipriani) >>! In T144647#2634619, @bd808 wrote: > Honestly new hosts are spun up so infrequently that could just be managed manually by someone.  Done for right now.  Simplest th...
[16:20:47] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-extensions-Examples, 07Documentation, and 5 others: Improve documentation around running/writing (with lots of examples) browser tests - https://phabricator.wikimedia.org/T108108#1512435 (10zeljkofilipin) a:05zeljkofilipin>03None
[16:24:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:37:41] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[16:45:06] <shinken-wm>	 PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[16:45:56] <arlolra>	 mornin' releng folks
[16:46:05] <arlolra>	 mw-install-sqlite.sh is failing
[16:46:10] <arlolra>	 https://integration.wikimedia.org/ci/job/parsoidsvc-hhvm-parsertests-jessie/675/console
[16:46:15] <arlolra>	 https://integration.wikimedia.org/ci/job/parsoidsvc-hhvm-parsertests-jessie/674/console
[16:47:07] <paladox>	 hasharAway ^^
[16:47:14] <paladox>	 16:34:05 [Mon Sep 19 16:34:04 2016] [hphp] [1421:7f4c3061b100:0:000001] [] Unable to set ResourceLimit.CoreFileSize to 8589934592: Operation not permitted (1)
[16:49:55] <paladox>	 [Mon Sep 19 16:34:06 2016] [hphp] [1421:7f4c3061b100:0:000002] [] Exception handler threw an object exception: 
[16:50:08] <paladox>	 I have no idea why mediawiki Services are disabled.
[16:50:14] <paladox>	 Maybe to do with db?
[16:51:45] <wikibugs>	 10Beta-Cluster-Infrastructure, 10RESTBase, 06Services: Beta cluster RESTbase not getting new revisions(?), so "Error loading data from server: HTTP 504" in VE - https://phabricator.wikimedia.org/T146053#2649053 (10Jdforrester-WMF)
[16:52:01] <wikibugs>	 10Beta-Cluster-Infrastructure, 10RESTBase, 06Services: Beta cluster RESTbase not getting new revisions(?), so "Error loading data from server: HTTP 504" in VE - https://phabricator.wikimedia.org/T146053#2649041 (10Jdforrester-WMF) p:05Triage>03High
[16:52:57] <grrrit-wm>	 (03Abandoned) 10Jforrester: Finish removing MoodBar, including nl.wikipedia [tools/release] - 10https://gerrit.wikimedia.org/r/303575 (https://phabricator.wikimedia.org/T131340) (owner: 10Nemo bis)
[16:53:08] <grrrit-wm>	 (03Restored) 10Jforrester: Finish removing MoodBar, including nl.wikipedia [tools/release] - 10https://gerrit.wikimedia.org/r/303575 (https://phabricator.wikimedia.org/T131340) (owner: 10Nemo bis)
[17:04:18] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2635940 (10Andrew) This increase sounds fine to me.
[17:04:25] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Request increased quota for deployment-prep labs project - https://phabricator.wikimedia.org/T145636#2636577 (10Andrew) Yep, increase is fine with me.
[17:07:02] <wikibugs>	 10Beta-Cluster-Infrastructure, 10RESTBase, 06Services, 15User-mobrovac: Beta cluster RESTbase not getting new revisions(?), so "Error loading data from server: HTTP 504" in VE - https://phabricator.wikimedia.org/T146053#2649125 (10mobrovac) a:03mobrovac ``` Error: getaddrinfo ENOTFOUND deployment-mediawi...
[17:12:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:13:40] <shinken-wm>	 RECOVERY - Puppet run on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:16:16] <wikibugs>	 10Beta-Cluster-Infrastructure, 10RESTBase, 06Services, 15User-mobrovac: Beta cluster RESTbase not getting new revisions(?), so "Error loading data from server: HTTP 504" in VE - https://phabricator.wikimedia.org/T146053#2649189 (10AlexMonk-WMF) a:05mobrovac>03AlexMonk-WMF I just ran `sudo service restb...
[17:19:33] <shinken-wm>	 PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[17:20:05] <shinken-wm>	 RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:20:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 10RESTBase, 06Services, 15User-mobrovac: Beta cluster RESTbase not getting new revisions(?), so "Error loading data from server: HTTP 504" in VE - https://phabricator.wikimedia.org/T146053#2649211 (10AlexMonk-WMF) Seems it didn't because `modules/restbase/manifests/init.pp` s...
[17:23:43] <shinken-wm>	 PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[17:25:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 10RESTBase, 06Services, 15User-mobrovac: Beta cluster RESTbase not getting new revisions(?), so "Error loading data from server: HTTP 504" in VE - https://phabricator.wikimedia.org/T146053#2649275 (10mobrovac) 05Open>03Resolved >>! In T146053#2649211, @AlexMonk-WMF wrote:...
[17:38:16] <wikibugs>	 03Scap3: Local config deploys should use the target's current version - https://phabricator.wikimedia.org/T145373#2649338 (10thcipriani) p:05Normal>03High
[17:41:11] <wikibugs>	 10Browser-Tests-Infrastructure, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, 15User-zeljkofilipin: CentralNotice: Intermittent unexplained browser test failures - https://phabricator.wikimedia.org/T145718#2649345 (10DStrine)
[17:42:19] <wikibugs>	 03Scap3: DEPLOY_HEAD should be a symbolic ref - https://phabricator.wikimedia.org/T146062#2649353 (10thcipriani)
[17:42:39] <wikibugs>	 03Scap3: DEPLOY_HEAD should be a symbolic ref - https://phabricator.wikimedia.org/T146062#2649367 (10thcipriani) p:05Triage>03Low
[17:45:11] <wikibugs>	 03Scap3: Local config deploys should use the target's current version - https://phabricator.wikimedia.org/T145373#2649380 (10thcipriani) Per today's [[ https://www.mediawiki.org/wiki/Deployment_tooling/Cabal/2016-09-19 | deployment-tooling meeting ]], the easiest path forward here might be to cache `DEPLOY_HEAD`...
[17:59:04] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2641971
[17:59:23] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2649511
[18:03:45] <shinken-wm>	 RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[18:12:51] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2649573
[18:22:37] <wikibugs>	 10Beta-Cluster-Infrastructure, 10RESTBase, 06Services, 15User-mobrovac: Beta cluster RESTbase not getting new revisions(?), so "Error loading data from server: HTTP 504" in VE - https://phabricator.wikimedia.org/T146053#2649681 (10Jdforrester-WMF)
[18:25:51] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 13Patch-For-Review, 07Wikimedia-Incident: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#2649707 (10greg) This is really a follow-up item from a wikimedia incident.
[18:46:39] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Add 2ColConflict extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311408 (https://phabricator.wikimedia.org/T145411) (owner: 10Tobias Gritschacher)
[18:47:17] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add 2ColConflict extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311408 (https://phabricator.wikimedia.org/T145411) (owner: 10Tobias Gritschacher)
[18:47:44] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Add ElectronPdfService extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311412 (https://phabricator.wikimedia.org/T142201) (owner: 10Tobias Gritschacher)
[18:47:48] <grrrit-wm>	 (03PS2) 10Hashar: Add ElectronPdfService extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311412 (https://phabricator.wikimedia.org/T142201) (owner: 10Tobias Gritschacher)
[18:47:52] <grrrit-wm>	 (03CR) 10Hashar: Add ElectronPdfService extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311412 (https://phabricator.wikimedia.org/T142201) (owner: 10Tobias Gritschacher)
[18:47:54] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Add ElectronPdfService extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311412 (https://phabricator.wikimedia.org/T142201) (owner: 10Tobias Gritschacher)
[18:48:54] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add ElectronPdfService extension to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311412 (https://phabricator.wikimedia.org/T142201) (owner: 10Tobias Gritschacher)
[18:50:27] <paladox>	 Yay the new update to grrrit-wm is working, no more i18n bot merges being shown :) Plus npm 2 and node 6 :)
[18:51:04] <shinken-wm>	 PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[18:52:18] <ori>	 what does it mean for puppet run to be 55.56% above the critical threshold?
[18:53:58] <grrrit-wm>	 (03CR) 10Hashar: "If it is not broken, there is imho no need to update it is there?" [integration/docroot] - 10https://gerrit.wikimedia.org/r/311345 (https://phabricator.wikimedia.org/T109747) (owner: 10Paladox)
[18:57:54] <hasharAway>	 ori: no idea
[18:58:16] <hasharAway>	 ori: but deployment-prep has a lot of "wrong" puppet failure since a couple weeks ago or so
[18:58:20] <hasharAway>	 havent looked into it yet
[19:24:53] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2649943
[19:26:07] <shinken-wm>	 RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:32:59] <arlolra>	 hasharAway: should I file a bug for the above?
[19:33:36] <legoktm>	 !log creating T144951 integration-puppetmaster01 instance using m1.small and debian jessie
[19:33:40] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[19:33:52] <legoktm>	 arlolra: I think the install sqlite issue has been fixed...
[19:34:02] <addshore>	 thanks for the merges hasharAway !
[19:34:04] <arlolra>	 ok, thanks
[19:34:12] <legoktm>	 https://phabricator.wikimedia.org/T146019
[19:36:39] <hasharAway>	 chasemp: if you are around, could I get your +1 on the Nodepool patch that get rid of listing floating IP ? https://phabricator.wikimedia.org/T145142
[19:36:53] <hasharAway>	 chasemp: could probably get it pushed to apt / upgraded with eu ops in the morning
[19:39:52] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2649992 (10chasemp)
[19:39:56] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2649990 (10chasemp) 05Open>03Resolved a:03chasemp
[19:45:12] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Request increased quota for deployment-prep labs project - https://phabricator.wikimedia.org/T145636#2650016 (10chasemp) 05Open>03Resolved a:03chasemp should be gtg, there are a few stacked quota bumps for deployment-prep so let me know @fgiunchedi if you get hung u...
[19:45:26] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2650020 (10thcipriani) >>! In T144578#2639586, @hashar wrote: > @mmodell @thcipriani @demon @dduvall  can you check mira02 on beta is all fine ?...
[19:45:28] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2650022 (10hashar) New quotas:  | Cores | 171/192 | RAM | 350208/392400
[19:49:14] <chasemp>	 hasharAway: I commented, probably not the best nodepool specific reviewing but from my angle it's not a concern
[19:49:22] <legoktm>	 !log creating T144951 enabled role::puppetmaster::standalone role on integration-puppetmaster01
[19:49:25] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[19:49:32] <legoktm>	 s/creating //
[19:51:24] <hasharAway>	 chasemp: it will be fine :]  the code is not called anywhere else but where I have shortcircuited it :)
[20:03:17] <yuvipanda>	 !log disable puppet across integration project, moving puppetmasters
[20:03:20] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:03:47] <grrrit-wm>	 (03CR) 10Paladox: "Well, no it isen't broken but could do with an update. I am not sure if it fixes the bug linked, I think we may have to manually add the c" [integration/docroot] - 10https://gerrit.wikimedia.org/r/311345 (https://phabricator.wikimedia.org/T109747) (owner: 10Paladox)
[20:04:13] <yuvipanda>	 legoktm hmm, for some reason I can't autocomplete your name here
[20:04:21] <legoktm>	 o.O
[20:04:36] <legoktm>	 my client was having issues after the netsplits and I had to part/rejoin every channel
[20:04:44] <yuvipanda>	 heh
[20:04:49] <yuvipanda>	 i can now
[20:04:51] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:05:17] <yuvipanda>	 legoktm: puppet i srunning now
[20:05:18] <yuvipanda>	 *is
[20:05:53] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:06:08] <yuvipanda>	 legoktm: I guess ^ are all unrelated?
[20:07:09] <legoktm>	 deployment is unrelated...
[20:07:26] <yuvipanda>	 yeah ok
[20:07:57] <shinken-wm>	 PROBLEM - Puppet run on deployment-memc05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:08:45] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:08:46] <yuvipanda>	 !log reset puppetmaster of integration-puppetmaster01 to be labs puppetmaster
[20:08:49] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:09:12] <hasharAway>	 yuvipanda: legoktm: for CI make sure to get the cherry picks on puppet.git
[20:09:38] <hasharAway>	 specially that one:  * 7688f83 (DO NOT SUBMIT) contint: pin firefox to 46 on Trusty
[20:09:42] <yuvipanda>	 hasharAway: cherry picks are no longer supported, you will lose them all when we do this migration
[20:10:08] <yuvipanda>	 (just kidding)
[20:10:16] <hasharAway>	 ;D
[20:10:34] <yuvipanda>	 legoktm: ok, the puppetmaster is up
[20:10:37] <legoktm>	 hasharAway: yep, that's on our list :)
[20:10:39] <hasharAway>	 be careful with the oldies like me, we might well suffer from an hearth attack!
[20:10:41] <yuvipanda>	 legoktm: do you have a list of cherry picks already?
[20:10:55] <hasharAway>	 the firefox pinning to v46 might well get resolved now. But havent looked at it yet
[20:11:06] <legoktm>	 no, give me a minute
[20:11:13] <paladox>	 hasharAway i belive that was fixed in a recent firefox update
[20:11:18] <hasharAway>	 all patches should be retrievable from Gerrit  based on the Change-Id:
[20:11:22] <yuvipanda>	 legoktm: ok
[20:11:27] <yuvipanda>	 hasharAway: go sleep!
[20:11:31] <paladox>	 i read some where about it fixing a driver to do with something i forgot now
[20:11:36] <legoktm>	 oh nice, only 3 chery-picks
[20:11:37] <legoktm>	 f435b59 contint: role for Android testing
[20:11:37] <legoktm>	 7688f83 (DO NOT SUBMIT) contint: pin firefox to 46 on Trusty
[20:11:37] <legoktm>	 7ca33f5 ci: Role for running Raita
[20:12:03] <yuvipanda>	 legoktm: nice
[20:12:13] <yuvipanda>	 legoktm: I guess we need to pull those from gerrit somehow
[20:12:39] <legoktm>	 I already created patch files, putting them on the new puppetmaster in a minute
[20:13:04] <yuvipanda>	 legoktm: awesome
[20:13:50] <legoktm>	 yuvipanda: ok, they're in my home dir. Should I apply them to the git checkout now?
[20:14:03] <yuvipanda>	 legoktm: yup
[20:14:06] <yuvipanda>	 same place as before
[20:14:36] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:50] <legoktm>	 yuvipanda: done
[20:15:28] <yuvipanda>	 legoktm: ok
[20:15:33] <yuvipanda>	 legoktm: which instance do you wanna switch?
[20:15:54] <shinken-wm>	 PROBLEM - Puppet run on deployment-logstash2 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:16:56] <legoktm>	 uh, integration-slave-trusty-1001
[20:17:00] <yuvipanda>	 legoktm: ok
[20:17:05] <grrrit-wm>	 (03PS1) 10Paladox: [mediawiki/extensions] Add noop jenkins test [integration/config] - 10https://gerrit.wikimedia.org/r/311497 
[20:18:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:18:49] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:19:17] <yuvipanda>	 legoktm: https://wikitech.wikimedia.org/wiki/Hiera:Integration/host/integration-slave-trusty-1001
[20:19:39] <legoktm>	 and it'll automatically transition?
[20:19:45] <legoktm>	 :D
[20:20:07] <yuvipanda>	 legoktm: going to find out ;)
[20:20:11] <legoktm>	 I suppose we need to re-enable puppet there?
[20:20:12] <legoktm>	 ok
[20:20:27] <yuvipanda>	 !log re-enabled puppet on integration-slave-trusty-1001
[20:20:30] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:21:15] <grrrit-wm>	 (03PS2) 10Paladox: [mediawiki/extensions] Add noop jenkins test [integration/config] - 10https://gerrit.wikimedia.org/r/311497 
[20:24:18] <legoktm>	 yuvipanda: do we just wait now...?
[20:24:37] <yuvipanda>	 legoktm: I forced a puppet run, am waiting
[20:24:44] <legoktm>	 :D
[20:25:02] <yuvipanda>	 !log delete /etc/puppet/puppet.conf.d/10-self.conf and /var/lib/puppet/ssl on integration-slave-trusty-1001
[20:25:06] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:25:43] <yuvipanda>	 legoktm: seems ok. test?
[20:26:21] <legoktm>	 like...run a jenkins job on it?
[20:26:26] <yuvipanda>	 legoktm: idk
[20:26:30] <yuvipanda>	 hmm
[20:26:42] <yuvipanda>	 something to see it isn't totally utterly broken?
[20:27:35] * legoktm hacks something up
[20:27:59] <yuvipanda>	 if it works, here's my plan
[20:28:03] <legoktm>	 oh, jenkins already starte running a job
[20:28:12] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Zuul: Fix Zuul package "postinst called with unknown argument `triggered' - https://phabricator.wikimedia.org/T146084#2650129 (10hashar)
[20:28:16] <yuvipanda>	 copy /etc/puppet/puppet.conf from that node to all other nodes
[20:28:20] <yuvipanda>	 rm /etc/puppet/puppet.conf.d/10-self.conf from them all
[20:28:25] <yuvipanda>	 and same for /var/lib/puppet/ssl
[20:28:30] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0]
[20:28:30] <yuvipanda>	 and that should make them all automatically work
[20:28:40] <legoktm>	 uh ^
[20:28:53] <yuvipanda>	 legoktm: was the transient first failure before I rm'd /var/lib/puppet/ssl
[20:28:54] <yuvipanda>	 works now
[20:29:00] <yuvipanda>	 these lag by several minutes
[20:29:36] <legoktm>	 yuvipanda: ok, that slave looks fine
[20:29:42] <yuvipanda>	 ok!
[20:30:06] <yuvipanda>	 legoktm: ok, so I'm going to do part 1 now (copy puppet.conf to all instances)
[20:31:52] <legoktm>	 ok
[20:32:07] <legoktm>	 we have a working saltmaster thing btw, not sure how you were planning to do it
[20:32:42] <yuvipanda>	 legoktm: oh, I've been using clush
[20:32:50] <yuvipanda>	 a lot nicer than salt
[20:33:49] <yuvipanda>	 legoktm: ok, if we enable puppet again now, it should all 'just wor'
[20:33:53] <yuvipanda>	 let's try
[20:33:55] <legoktm>	 ok :D
[20:34:08] <yuvipanda>	 !log copied /etc/puppet/puppet.conf from integration-trusty-slave-1001 to all integration
[20:34:11] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:34:14] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Zuul: Fix Zuul package "postinst called with unknown argument `triggered' - https://phabricator.wikimedia.org/T146084#2650160 (10hashar)
[20:34:17] <yuvipanda>	 !log rm -rf /var/lib/puppet/ssl on all integration nodes
[20:34:21] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:34:49] <hasharAway>	 I am surprised scap does not have that already
[20:34:51] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:34:54] <hasharAway>	 scap puppet recycle
[20:35:36] <hasharAway>	 yuvipanda: legoktm thank you very much for taking care of the switch to Jessie
[20:36:09] <yuvipanda>	 legoktm: hang on, something is slightly fucked up
[20:36:16] <yuvipanda>	 don't enable puppet anywhere
[20:36:21] <legoktm>	 uh ok
[20:38:33] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:40:51] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:41:13] <shinken-wm>	 PROBLEM - Puppet run on integration-puppetmaster01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:41:14] <yuvipanda>	 legoktm: sorted out
[20:41:57] <yuvipanda>	 !log accidentally deleted /var/lib/puppet/ssl on integration-puppetmaster01 as well, causing it to lose keys. Reprovision by pointing to labs puppetmaster
[20:42:00] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:43:11] <yuvipanda>	 !log enable puppet and run on integration-slave-trusty-1003.eqiad.wmflabs
[20:43:15] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:43:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:44:32] <legoktm>	 heh oops
[20:44:32] <yuvipanda>	 legoktm: works fine now
[20:45:04] <yuvipanda>	 legoktm: wanna re-enable one by one?
[20:45:11] <yuvipanda>	 or shall I just mass re-enable? :D
[20:46:27] <yuvipanda>	 !log re-enable puppet everywhere
[20:46:31] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:47:31] <yuvipanda>	 legoktm: ok, I think it'll just run puppet on schedule now and we can watch for failures
[20:47:45] <yuvipanda>	 legoktm: also right now the puppetmaster itself is on labs puppet. I'm guessing we should change that
[20:47:56] <shinken-wm>	 RECOVERY - Puppet run on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:48:08] <legoktm>	 sorry, was getting more food
[20:48:33] <legoktm>	 mass-enable sounds good :D
[20:48:53] <legoktm>	 hmm, what was the old one set up to do?
[20:49:47] <wikibugs>	 06Release-Engineering-Team, 06Editing-Department, 10Monitoring, 06Operations, 07Wikimedia-Incident: High failure rate of account creation should trigger an alarm / page people - https://phabricator.wikimedia.org/T146090#2650309 (10hashar)
[20:50:29] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 13Patch-For-Review, 07Wikimedia-Incident: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#2481685 (10hashar) Account creation got broken entirely for 18 hours last week despite metrics being available. I have...
[20:51:14] <shinken-wm>	 RECOVERY - Puppet run on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:51:57] <yuvipanda>	 legoktm: the old one is its own puppetmaster
[20:53:42] <legoktm>	 I guess that makes sense?
[20:53:44] <yuvipanda>	 legoktm: I'm going to go afk to eat now
[20:53:49] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:53:52] <yuvipanda>	 legoktm: yeah, I agree. let's try tha too
[20:53:58] <yuvipanda>	 I'll check that after lunch
[20:54:10] <yuvipanda>	 legoktm: call me if anything goes wrong within the next 30 mins, won't be checking IRC
[20:54:27] <legoktm>	 ok :)
[20:54:28] <yuvipanda>	 legoktm: but you are now completely free of the terrible role::puppet::self :)
[20:54:34] <legoktm>	 woo!
[20:54:37] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:54:37] <yuvipanda>	 first project to be free of it even
[20:54:47] <yuvipanda>	 so your puppetmaster code is the same as prod/labs puppetmasters
[20:54:48] <yuvipanda>	 same for client
[20:54:54] <yuvipanda>	 not a bastardized copy pasta version
[20:55:53] <shinken-wm>	 RECOVERY - Puppet run on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:57:46] <legoktm>	 yay :D
[20:58:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:03:30] <subbu>	 Jenkins is still not happy .. https://phabricator.wikimedia.org/T146019#2650353 /cc arlolra 
[21:09:43] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1011 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[21:19:09] <wikibugs>	 10Beta-Cluster-Infrastructure, 03Scap3: Fixup beta scap3 keyholder problems - https://phabricator.wikimedia.org/T144647#2650420 (10hashar) That is a neat trick!  And indeed given a complete list of hostnames it is quite trivial to grab the keys.  I am hereby blaming everyone above to eventually have forced me...
[21:22:15] <shinken-wm>	 PROBLEM - Puppet run on integration-puppetmaster01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[21:25:46] <yuvipanda>	 ^ is me
[21:29:15] <shinken-wm>	 PROBLEM - Puppet run on integration-puppetmaster is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:29:46] <yuvipanda>	 !log regenerated client certs only on integration-puppetmaster01, seems ok now
[21:29:50] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[21:29:53] <yuvipanda>	 legoktm: ok, I think we can call this done noew
[21:32:42] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1004 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:33:20] <yuvipanda>	 hmmm
[21:33:28] <hasharAway>	 yuvipanda: shinken lags behind 
[21:33:33] <hasharAway>	 pretty sure about that
[21:33:47] <hasharAway>	 so might want to come back in like 10 - 15 minutes and let it settle
[21:33:56] <hasharAway>	 (or hook to instance to confirm)
[21:35:04] <yuvipanda>	 I just did on 3 instances
[21:35:06] <yuvipanda>	 all good
[21:35:50] <hasharAway>	 yuvipanda: great! :]
[21:35:54] <hasharAway>	 thank you!
[21:37:08] <yuvipanda>	 yw! now to write documentation
[21:37:43] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2650487
[21:38:19] <yuvipanda>	 krenair we should schedule some time to move deployment-prep over as well
[21:38:59] <Krenair>	 yuvipanda, move puppetmasters?
[21:41:16] <yuvipanda>	 yeah
[21:42:14] <shinken-wm>	 RECOVERY - Puppet run on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:47:45] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:50:45] <Krenair>	 yuvipanda, do we need to schedule stuff like that?
[21:51:05] <yuvipanda>	 Krenair: we don't, just need someone familiar with beta to be around for an hour or so when we do it
[21:51:19] <yuvipanda>	 and also to let people know, in case they are in the middle of cherry picking stuff while this is going on
[21:54:39] <Krenair>	 yuvipanda, well, I'm familiar with beta
[21:55:15] <Krenair>	 and no one is logged into the puppetmaster (except me, I just logged in to see if anyone else was)
[21:56:07] <yuvipanda>	 Krenair: hmm, do you have headroom to create another puppetmaster?
[21:56:18] <Krenair>	 can be m1.small right?
[21:57:18] <Krenair>	 oh, existing one is medium
[21:57:47] <yuvipanda>	 Krenair: should probably be medium then yeah
[21:58:13] <Krenair>	 I don't know where we are in terms of quotas because we just got a couple of bumps but haven't made the instances we requested those for yet
[21:59:06] <wmf-insecte>	 Yippee, build fixed!
[21:59:07] <wmf-insecte>	 Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #150: 09FIXED in 1 min 5 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/150/
[21:59:28] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:59:55] <legoktm>	 yuvipanda: I'm gonna shutdown the old puppetmaster, and then make a reminder for myself to delete it in a week...does that sound good?
[22:00:12] <legoktm>	 and I'll email qa@
[22:00:23] <yuvipanda>	 legoktm: yup
[22:01:11] <legoktm>	 !log shutdown integration-puppetmaster
[22:01:15] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[22:02:59] <yuvipanda>	 legoktm: the only material difference is that 'restarting the puppetmaster' is now 'service apache2 restart' rather than 'service puppetmaster restart'
[22:03:58] <legoktm>	 ok
[22:04:03] <legoktm>	 I'll mention that in my email
[22:05:15] <Krenair>	 yuvipanda, yeah we'll need another quota bump for that
[22:13:53] <yuvipanda>	 Krenair: yeah ouch.
[22:13:54] <yuvipanda>	 ok
[22:13:58] <yuvipanda>	 Krenair: I wrote up docs https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster
[22:14:21] <Krenair>	 I mean, we could right now
[22:14:22] <yuvipanda>	 legoktm: another thing I just realized is that there now is a manual step now when you setup a new instance
[22:14:31] <yuvipanda>	 you need to do 'rm -rf /var/lib/puppet/ssl'
[22:14:32] <Krenair>	 But then I'd be using quota space allocated for something entirely different
[22:14:45] <legoktm>	 yuvipanda: hmm, we should document this somewhere
[22:14:46] <Krenair>	 And we probably can't just terminate a puppetmaster without leaving it shutdown for a week first
[22:14:50] <yuvipanda>	 Krenair: well, if we can delete the old puppetmaster by the end
[22:14:54] <yuvipanda>	 legoktm: https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster
[22:14:58] <legoktm>	 oh xD
[22:25:04] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2650712 (10Matanya)
[22:34:27] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:36:38] <wikibugs>	 06Release-Engineering-Team, 06Editing-Department, 10Monitoring, 06Operations, 07Wikimedia-Incident: High failure rate of account creation should trigger an alarm / page people - https://phabricator.wikimedia.org/T146090#2650309 (10Tgr) We might want separate api and non-api metrics since they have differ...
[22:40:23] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 13Patch-For-Review, 07Wikimedia-Incident: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#2650747 (10greg)
[22:43:55] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 07Wikimedia-Incident: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#2650768 (10greg)
[23:01:16] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2650834
[23:46:18] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2564761 (10thcipriani) I briefly moved group0 to wmf.19 and saw a large spike in the overall fatalmonitor error-rate. I realized that the appserver...
[23:46:36] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2651007 (10thcipriani) a:05hashar>03thcipriani
[23:57:28] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2651025