[00:46:00] <grrrit-wm>	 (03CR) 10Krinkle: [C: 04-1] "The bootstrap change will need careful review to ensure the layout has no breaking changed that require further updates to our HTML conten" (032 comments) [integration/docroot] - 10https://gerrit.wikimedia.org/r/311345 (https://phabricator.wikimedia.org/T109747) (owner: 10Paladox)
[01:07:19] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[01:12:49] <wikibugs>	 10Deployment-Systems, 06Operations, 13Patch-For-Review: Make l10nupdate user a system user - https://phabricator.wikimedia.org/T120585#2651178 (10Dzahn) The last comment "Definitely better than hardcoding uids in the puppet tree." sounds like this ticket might be rejected?
[01:17:46] <wikibugs>	 10Gerrit, 06Repository-Ownership-Requests, 10grrrit-wm: Give paladox a +2 in labs/tools/grrrit - https://phabricator.wikimedia.org/T145416#2629393 (10Legoktm) @yuvipanda, could you comment that you're okay with this?
[01:19:06] <wikibugs>	 10Gerrit, 06Repository-Ownership-Requests, 10grrrit-wm: Give paladox a +2 in labs/tools/grrrit - https://phabricator.wikimedia.org/T145416#2651189 (10yuvipanda) I'm not a maintainer anymore :)
[01:31:52] <wikibugs>	 10Gerrit, 06Repository-Ownership-Requests, 10grrrit-wm: Give paladox a +2 in labs/tools/grrrit - https://phabricator.wikimedia.org/T145416#2651199 (10Legoktm) 05Open>03Resolved a:03Legoktm Alright then, done. Now that I re-read the request this seems uncontroversial to me since he already can deploy ch...
[01:42:18] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:55:37] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[01:55:40] <wikibugs>	 10Continuous-Integration-Infrastructure: Move integration-puppetmaster off of precise (probably to jessie) - https://phabricator.wikimedia.org/T144951#2651207 (10Legoktm) 05Open>03Resolved a:03Legoktm https://lists.wikimedia.org/pipermail/qa/2016-September/002552.html  I will delete integration-puppetmaste...
[02:30:36] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:34:45] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[03:11:17] <wmf-insecte>	 Project mediawiki-core-code-coverage build #2273: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2273/
[03:14:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:26:40] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[04:01:37] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[05:05:43] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[05:34:54] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-db03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[05:45:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:35:32] <wikibugs>	 10Gerrit, 06Repository-Ownership-Requests, 10grrrit-wm: Give paladox a +2 in labs/tools/grrrit - https://phabricator.wikimedia.org/T145416#2651409 (10Paladox) Thank you :)
[07:00:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[07:35:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[08:13:40] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 07Wikimedia-Incident: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#2651497 (10hashar)
[08:14:21] <wikibugs>	 10Browser-Tests-Infrastructure, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, 15User-zeljkofilipin: CentralNotice: Intermittent unexplained browser test failures - https://phabricator.wikimedia.org/T145718#2651498 (10zeljkofilipin) Only three faile...
[08:27:15] <wikibugs>	 06Release-Engineering-Team, 06Editing-Department, 10Monitoring, 06Operations, 07Wikimedia-Incident: High failure rate of account creation should trigger an alarm / page people - https://phabricator.wikimedia.org/T146090#2651523 (10hashar) The graph from https://grafana.wikimedia.org/dashboard/db/authenti...
[08:33:30] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 06Performance-Team, 07Wikimedia-Incident: MediaWiki load time regression should trigger an alarm / page people - https://phabricator.wikimedia.org/T146125#2651529 (10hashar)
[08:34:03] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2564761 (10hashar) From T146099 > load times increased by approx 600ms: > {F4488077}  That got introduced by wmf.18 and left unnoticed for almost 1...
[08:34:32] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 07Tracking, 07Wikimedia-Incident: Tracking: Monitoring and alerts for "business" metrics - https://phabricator.wikimedia.org/T140942#2481685 (10hashar)
[08:36:55] <wikibugs>	 06Release-Engineering-Team, 10Monitoring, 06Operations, 06Performance-Team, 07Wikimedia-Incident: MediaWiki load time regression should trigger an alarm / page people - https://phabricator.wikimedia.org/T146125#2651529 (10hashar)
[08:39:25] <grrrit-wm>	 (03PS1) 10Nschaaf: Add research/recommendation-api to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311667 (https://phabricator.wikimedia.org/T146057) 
[08:39:32] <wikibugs>	 10Continuous-Integration-Config, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: [Dev] Fix periodic tests - https://phabricator.wikimedia.org/T139137#2651585 (10hashar) @Mholloway definitely aced this :]  Well done!
[08:43:30] <wikibugs>	 06Release-Engineering-Team, 06Editing-Department, 10Monitoring, 06Operations, 07Wikimedia-Incident: High failure rate of account creation should trigger an alarm / page people - https://phabricator.wikimedia.org/T146090#2651615 (10Tgr) centrallogin is not interesting, it can be added to web or just ignor...
[08:47:47] <wikibugs>	 10Browser-Tests-Infrastructure, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, 15User-zeljkofilipin: CentralNotice: Intermittent unexplained browser test failures - https://phabricator.wikimedia.org/T145718#2651635 (10zeljkofilipin)
[09:02:02] <wikibugs>	 10Browser-Tests-Infrastructure, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, 15User-zeljkofilipin: CentralNotice: Intermittent unexplained browser test failures - https://phabricator.wikimedia.org/T145718#2651654 (10zeljkofilipin) All three failur...
[09:26:08] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2651679 (10elukey) As @AlexMonk-WMF reported, we caused an issue when dealing with restbase configs: https://phabricator.wikimedia.org/T146053  As follo...
[09:47:57] <elukey>	 hashar: --^
[10:13:00] <hashar>	 elukey: hello :)
[10:13:06] <elukey>	 o/ o/
[10:13:17] <elukey>	 sorry I forgot the client hello part :D
[10:13:19] <hashar>	 so yeah restbase points to some deployment-mw host
[10:13:31] <hashar>	 and puppet does not restart the restbase service on configuration change
[10:13:35] <hashar>	 (intentionally)
[10:14:02] <hashar>	 as I rember it
[10:14:04] <hashar>	 remember it
[10:15:33] <elukey>	 oh yes I wasn't blaming anybody but me for the outage, only suggesting the "tribal knowledge" page
[10:15:42] <elukey>	 if it is not there already
[10:15:43] <hashar>	 well it is hard to catch really
[10:16:17] <hashar>	 on other news mira02.deployment-prep.eqiad.wmflabs needs more disk :(
[10:34:01] <moritzm>	 hashar: since we now have sufficient quota again, I'll simply "reimage" mira02 with a new instance (and also change the name since people seem to prefer the redundant form :-)
[10:34:18] <moritzm>	 that way it's also a clean rebuiild
[10:34:44] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser. - https://phabricator.wikimedia.org/T94577#2651805 (10zeljkofilipin) @Jhernandez, @hashar:  Looks like this works great for Chrome, but it is not...
[10:35:28] <hashar>	 moritzm: sounds good
[10:35:41] <hashar>	 and I am 99 sure it is going to recreate all fine
[10:35:47] <hashar>	 then we can dish out the mira one
[10:36:15] <moritzm>	 yep, will start in half an hour or so
[10:39:57] <hashar>	 elukey: do you remember about my Debian packages building not including orig.tar.gz in .changes?  
[10:40:15] <hashar>	 I got the issue for another package (Nodepool)   and I am pretty sure I shared a link to debian policy but cant find it anymore :(
[10:40:36] <hashar>	 I tried instructing git buildpackage to pass -sa  to force orig.tar.gz to be included in the .changes but without any result :(
[10:42:48] <elukey>	 mmm IIRC you shared to me a link with a contact us from random people explaining the issue..
[10:42:55] <elukey>	 was it the one that you are looking?
[10:43:02] <hashar>	 yeah maybe :]
[10:43:22] <hashar>	 I want to fix that once for good
[10:45:15] <elukey>	 !log created deployment-jobrunner02 in deployment-prep
[10:45:19] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[10:45:55] <elukey>	 hashar: https://www.logilab.org/ticket/22071
[10:46:04] <hashar>	 you are awesome! 
[10:48:34] <elukey>	 say thank you to my Chrome history :P
[10:48:58] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Get the Nodepool/Zuul debian packaging to pass -sa so orig.tar.gz is always in the .changes file - https://phabricator.wikimedia.org/T145797#2651824 (10hashar)
[11:04:23] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Get the Nodepool/Zuul debian packaging to pass -sa so orig.tar.gz is always in the .changes file - https://phabricator.wikimedia.org/T145797#2651860 (10hashar) I have updated the task details, but apparently not including the original tarba...
[11:05:24] <hashar>	 I found the culpirt code  " dpkg-genchanges  "
[11:05:32] <hashar>	 now gotta figure out how to pass it -sa :D
[11:08:00] <hashar>	 (reads its 6th man page in a raw)
[11:13:49] <elukey>	 hashar: when you have time https://gerrit.wikimedia.org/r/#/c/311681/1
[11:13:57] <wikibugs>	 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 07Upstream, 15User-zeljkofilipin: Firefox v47 breaks mediawiki_selenium - https://phabricator.wikimedia.org/T137561#2651868 (10zeljkofilipin) I am not sure what needs to be done here to resolved the task. Firefox 47.0.1 is released and workin...
[11:14:08] <hashar>	 (trying to figure out how to have debian/gpb.conf to pass -sa reliably :D )
[11:14:26] <hashar>	 elukey: yeah be bold :]
[11:14:57] <hashar>	 elukey: the jobrunner to Jessie should be fine since the prod one got moved back in July.  I am more worried about deployment-tmh01  which deal with video scaling
[11:15:28] <elukey>	 ah yeah that one is still pending for production too
[11:15:51] <elukey>	 but we can survive for a while with a few ubuntu trustys rather than tons of them :D
[11:16:02] <hashar>	 guess whoever takes care of that for prod can spawn a Jessie videoscaler on beta
[11:16:07] <hashar>	 and catch up issues / test on beta
[11:20:19] <elukey>	 !log applied beta::deployaccess, role::labs::lvm::srv, role::mediawiki::jobrunner to jobrunner02
[11:20:23] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[11:22:16] <shinken-wm>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[11:26:18] <wmf-insecte>	 Project beta-scap-eqiad build #120853: 04FAILURE in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120853/
[11:36:16] <wmf-insecte>	 Project beta-scap-eqiad build #120854: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120854/
[11:41:10] <wmf-insecte>	 Project beta-scap-eqiad build #120855: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120855/
[11:42:03] <grrrit-wm>	 (03Abandoned) 10Hashar: Pass option -sa to git-pbuilder [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/310959 (https://phabricator.wikimedia.org/T145797) (owner: 10Paladox)
[11:42:38] <hashar>	 HURRAH FIXED !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[11:42:57] <hashar>	 (and it only took me an hour to figure it out)
[11:46:12] <wmf-insecte>	 Project beta-scap-eqiad build #120856: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120856/
[11:46:37] <shinken-wm>	 PROBLEM - Puppet run on deployment-jobrunner02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[11:52:03] <elukey>	 hashar: can you try to deploy again to jobrunner02?
[11:52:07] <elukey>	 it should be ok now
[11:52:15] <elukey>	 not sure about the puppet run since it worked
[11:54:10] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Get the Nodepool/Zuul debian packaging to pass -sa so orig.tar.gz is always in the .changes file - https://phabricator.wikimedia.org/T145797#2651885 (10hashar) So after much trial and errors, reading 6 or 7 different man pages and digging i...
[11:54:56] <hashar>	 elukey: you can login in Jenkins with your wmflabs account from https://integration.wikimedia.org/ci/job/beta-scap-eqiad/
[11:55:03] <hashar>	 elukey: then hit the "rebuild" link on the left
[11:55:11] * elukey learns
[11:55:36] <hashar>	 or eventually the job that runs every 10 minutes or so to pull new code will end up triggering the beta-scap-eqiad job
[11:55:42] <hashar>	 which it actually did already !
[11:55:50] <hashar>	 https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120857/
[11:56:00] <hashar>	 it shows:  Started by upstream project beta-code-update-eqiad build number 122288
[11:56:15] <wmf-insecte>	 Project beta-scap-eqiad build #120857: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120857/
[11:56:20] <hashar>	 I guess it ran too late :]
[11:56:21] <elukey>	 weird
[11:56:25] <elukey>	 ahhh okok
[11:56:29] <elukey>	 let me re-run it
[11:56:38] <shinken-wm>	 RECOVERY - Puppet run on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:57:02] <elukey>	 thanks
[11:58:20] <wmf-insecte>	 Project beta-scap-eqiad build #120858: 04STILL FAILING in 1 min 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120858/
[11:58:28] <elukey>	 grrr
[11:59:14] <grrrit-wm>	 (03PS2) 10Zfilipin: WIP Marionette [selenium] - 10https://gerrit.wikimedia.org/r/310286 (https://phabricator.wikimedia.org/T137540) 
[11:59:52] * elukey realized that he accepted the key on puppet master 
[11:59:59] * elukey cries a bit
[12:00:40] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2651889 (10AlexMonk-WMF) >>! In T144006#2651679, @elukey wrote: > As @AlexMonk-WMF reported, we caused an issue when dealing with restbase configs: http...
[12:01:27] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] WIP Marionette [selenium] - 10https://gerrit.wikimedia.org/r/310286 (https://phabricator.wikimedia.org/T137540) (owner: 10Zfilipin)
[12:01:56] <wmf-insecte>	 Yippee, build fixed!
[12:01:57] <wmf-insecte>	 Project beta-scap-eqiad build #120859: 09FIXED in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120859/
[12:02:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[12:03:27] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2651892 (10elukey) >>! In T144006#2651889, @AlexMonk-WMF wrote: >>>! In T144006#2651679, @elukey wrote: >> As @AlexMonk-WMF reported, we caused an issue...
[12:04:21] <elukey>	 goooood deployment fixed
[12:04:27] <elukey>	 the jobrunner seems working \o/
[12:06:02] <Krenair>	 hey elukey w
[12:06:10] <Krenair>	 what's the status of the migrations to jessie?
[12:06:24] <elukey>	 o/
[12:06:42] <elukey>	 the videoscaler is the only one left, but we are still working on it in production
[12:06:51] <elukey>	 so it might wait a bit
[12:06:56] <Krenair>	 have the old instances been terminated?
[12:07:20] <elukey>	 only the trusty jobrunner is still active, I'll kill it today if nothing comes up
[12:07:49] <Krenair>	 so we have room in the quota to create a new instance without using up the quota increase you got for that?
[12:08:20] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Get the Nodepool/Zuul debian packaging to pass -sa so orig.tar.gz is always in the .changes file - https://phabricator.wikimedia.org/T145797#2651911 (10hashar) a:03hashar Definitely solved on my local installation by adjusting my `~/.gbp....
[12:08:52] <elukey>	 so the quota increase was needed since I had to replace one by one old/new instances and I didn't want it to do for the jobrunner since it was alone
[12:09:10] <elukey>	 plus godog needs to room for his prometheus instances
[12:09:18] <Krenair>	 prometheus was a separate ticket
[12:10:07] <Krenair>	 I ask because yuvipanda and I are going to swap the puppetmaster soon
[12:10:07] <elukey>	 I think he can use the extra quota for it?
[12:10:13] <Krenair>	 yes
[12:10:43] <Krenair>	 but until he does use it, we have to make sure we don't accidentally use it for something else
[12:11:36] <paladox__>	 hashar hi, thankyou for reasearching on how to include the original source today +1 :)
[12:11:39] <elukey>	 from my side I will not need more instances, and moritzm is going to replace mira in place IIRC
[12:12:01] <elukey>	 the videoscaler is the only question mark but not sure when it will happen
[12:12:04] <elukey>	 we can coordinate
[12:12:37] <moritzm>	 should be no problem, the old mira instance can also go away after more tests
[12:13:29] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Labs, 06Operations, 07Nodepool: Upgrade Nodepool to 0.1.1-wmf5 to reduce requests made to OpenStack API - https://phabricator.wikimedia.org/T145142#2651917 (10hashar) I have refreshed the package on https://people.wikimedia.org/~hashar/debs/nodepool_0.1.1-wmf5/ ....
[12:13:38] <hashar>	 paladox__: that is rather messy :( but I got the doc updated!
[12:13:54] <paladox__>	 Yep
[12:14:04] <hashar>	 moritzm: elukey mira is not really used so I would just dish out in favor of mira02
[12:14:08] <paladox__>	 But at least we now know a solution we can use :)
[12:15:49] <hashar>	 yeah
[12:16:05] <hashar>	 and the root cause is really that I create package for a new upstream version
[12:16:12] <hashar>	 but that first iteration is not always send to apt.wm.o
[12:16:22] <hashar>	 so I then get a 2nd iteration (eg:  1.0.0-wmf2)
[12:16:23] <paladox__>	 Oh
[12:16:41] <hashar>	 which does not have the original tarball since 1.0.0-wmf1 has it and is supposed to have been uploaded already
[12:16:51] <paladox__>	 Yep
[12:17:04] <hashar>	 regardless, wikimedia system requires the original to always be included
[12:17:10] <hashar>	 so -sa all the time and we are covered \o/
[12:17:11] <paladox__>	 Oh
[12:17:18] <paladox__>	 Yep
[12:17:22] <paladox__>	 Lol
[12:17:28] * paladox__ is going now, use paladox so I can read messages later :)
[12:21:43] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Update mediawiki_selenium to use Marionette - https://phabricator.wikimedia.org/T137540#2651940 (10zeljkofilipin) ``` $ bundle exec cucumber tests/browser/features/create_account.feature:10 /usr/local/lib/ruby/gems/2.3.0/gems/page-obje...
[12:30:12] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Update mediawiki_selenium to use Marionette - https://phabricator.wikimedia.org/T137540#2651942 (10zeljkofilipin) https://github.com/SeleniumHQ/selenium/wiki/DesiredCapabilities#firefox-specific
[13:04:28] <wmf-insecte>	 Project selenium-Math » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #150: 04FAILURE in 27 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/150/
[13:07:33] <godog>	 !log add deployment-prometheus01 instance T53497
[13:07:37] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[13:10:36] <godog>	 elukey: did puppet work on jobrunner02 today? I'm getting a self-signed cert on prometheus01 when running puppet
[13:10:46] <godog>	 Error: Could not retrieve catalog from remote server: SSL_connect returned=1 errno=0 state=error: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster.deployment-prep.eqiad.wmflabs]
[13:16:04] <elukey>	 ah yes it is because of the self hosted puppet master
[13:16:21] <elukey>	 delete the cert on the host, re-run puppet and it will go
[13:16:40] <elukey>	 there is a weird use case in which the self hosted puppet master is not recognized
[13:16:57] <elukey>	 BUT I didn't have the time to follow up :)
[13:18:02] <godog>	 elukey: which cert should be deleted?
[13:19:28] <elukey>	 the ones on the target host
[13:19:31] <elukey>	 *one
[13:19:52] <elukey>	 the brutal find /var/lib/puppet/ssl -type f -exec rm {} \;
[13:20:44] <elukey>	 you'll see that when the first puppet run will go a little snippet for the self hosted puppet master is added
[13:20:58] <elukey>	 but don't ask me why :D
[13:22:13] <godog>	 heh, that also means that puppet won't work by default for freshly-provisioned instances?
[13:22:32] <godog>	 it does work indeed after removing the files in /var/lib/puppet/ssl
[13:22:50] <elukey>	 only with self-hosted puppet masters yes 
[13:28:47] <hashar>	 godog: yeah new instances have broken puppet
[13:29:07] <hashar>	 gotta dish bunch of material from /var and iirc redo the puppet.conf manually
[13:34:49] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka04 is CRITICAL
[13:34:49] <shinken-wm>	 PROBLEM - Puppet run on deployment-zookeeper01 is CRITICAL
[13:34:49] <shinken-wm>	 PROBLEM - Puppet staleness on integration-slave-jessie-1003 is CRITICAL
[13:34:50] <shinken-wm>	 PROBLEM - Puppet run on deployment-imagescaler01 is CRITICAL
[13:34:51] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-sentry01 is CRITICAL
[13:34:53] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-elastic06 is CRITICAL
[13:34:55] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-kafka05 is CRITICAL
[13:34:56] <shinken-wm>	 PROBLEM - Puppet staleness on mira02 is CRITICAL
[13:34:58] <shinken-wm>	 PROBLEM - Puppet run on deployment-pdf01 is CRITICAL
[13:35:01] <shinken-wm>	 PROBLEM - Puppet run on deployment-sentry01 is CRITICAL
[13:35:02] <shinken-wm>	 PROBLEM - Puppet staleness on integration-slave-precise-1002 is CRITICAL
[13:35:02] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL
[13:35:03] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-jessie-1002 is CRITICAL
[13:35:05] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-redis01 is CRITICAL
[13:35:06] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-cache-text04 is CRITICAL
[13:35:08] <shinken-wm>	 PROBLEM - Puppet run on deployment-conf03 is CRITICAL
[13:35:12] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-mediawiki04 is CRITICAL
[13:35:13] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-jessie-android is CRITICAL
[13:35:15] <shinken-wm>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster is CRITICAL
[13:36:31] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-jobrunner02 is CRITICAL
[13:36:31] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL
[13:36:35] <shinken-wm>	 PROBLEM - Puppet staleness on integration-slave-trusty-1003 is CRITICAL
[13:36:44] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-mediawiki05 is CRITICAL
[13:36:44] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-elastic05 is CRITICAL
[13:36:44] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL
[13:36:48] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-android is CRITICAL
[13:36:48] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1014 is CRITICAL
[13:36:52] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1004 is CRITICAL
[13:36:53] <shinken-wm>	 PROBLEM - Puppet run on deployment-logstash2 is CRITICAL
[13:36:53] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL
[13:36:54] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1017 is CRITICAL
[13:36:55] <shinken-wm>	 PROBLEM - Puppet run on deployment-db03 is CRITICAL
[13:36:57] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-logstash2 is CRITICAL
[13:36:58] <shinken-wm>	 PROBLEM - Puppet staleness on castor is CRITICAL
[13:36:58] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-tin is CRITICAL
[13:36:59] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-eventlogging04 is CRITICAL
[13:37:01] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-sca02 is CRITICAL
[13:37:01] <shinken-wm>	 PROBLEM - Puppet staleness on integration-slave-trusty-1011 is CRITICAL
[13:37:03] <shinken-wm>	 PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL
[13:37:03] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-ores-redis is CRITICAL
[13:37:06] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-db2 is CRITICAL
[13:37:06] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-precise-1011 is CRITICAL
[13:37:07] <shinken-wm>	 PROBLEM - Puppet run on deployment-zotero01 is CRITICAL
[13:37:08] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-mediawiki05 is CRITICAL
[13:37:09] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-precise-1012 is CRITICAL
[13:37:09] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-eventlogging03 is CRITICAL
[13:37:10] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-mx is CRITICAL
[13:37:10] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-memc04 is CRITICAL
[13:37:12] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1018 is CRITICAL
[13:37:13] <shinken-wm>	 PROBLEM - Free space - all mounts on mira02 is CRITICAL
[13:37:55] <hashar>	 bah
[13:38:19] <hashar>	 same in -labs
[13:38:23] <hashar>	 shinken is dead :D
[13:39:24] <shinken-wm>	 PROBLEM - Puppet staleness on integration-slave-jessie-1005 is CRITICAL
[13:39:25] <shinken-wm>	 PROBLEM - Puppet run on deployment-restbase02 is CRITICAL
[13:39:26] <shinken-wm>	 PROBLEM - Puppet run on deployment-stream is CRITICAL
[13:39:27] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-redis02 is CRITICAL
[13:39:28] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-eventlogging04 is CRITICAL
[13:39:28] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-sentry01 is CRITICAL
[13:51:36] <wikibugs>	 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Zuul: Get the Nodepool/Zuul debian packaging to pass -sa so orig.tar.gz is always in the .changes file - https://phabricator.wikimedia.org/T145797#2652108 (10hashar) 05Open>03Resolved Documentation patch has been reviewed and merged by @akosi...
[14:08:10] <aude>	 hashar: how are things going?
[14:08:21] <aude>	 problems with labs now?
[14:12:07] <hashar>	 aude: back again to wmf.18 
[14:12:19] <hashar>	 due to a huge regression in MediaWiki load time :(
[14:12:40] <hashar>	 !sal
[14:12:41] <wm-bot>	 https://tools.wmflabs.org/sal/releng
[14:13:02] <hashar>	 elukey: may i please mess up with deployment-jobrunner02 ? Would like to try a rsyslog config change :D
[14:13:26] <matanya>	 hashar: is https://phabricator.wikimedia.org/T146044 in your scope ?
[14:14:01] <elukey>	 please go, I think that we could delete 01
[14:14:07] <elukey>	 what do you think?
[14:15:25] <hashar>	 elukey: yeah lets do that
[14:15:30] <hashar>	 can further tweak as needed
[14:15:49] <hashar>	 matanya: I will let mobile folks do the first investigation :D
[14:16:04] <matanya>	 thanks hashar 
[14:16:38] <hashar>	 FORAZEIOAZ eppuet
[14:16:42] <hashar>	 disabling puppet
[14:16:50] <marxarelli>	 hashar: hey! checking in, since you've been awake for a while. i'm about to put beta cluster in read-only mode for the data migration
[14:17:04] <marxarelli>	 any problem with that right now?
[14:18:10] <aude>	 hashar: at this point, i don't know about trying to deploy any new wikidata code
[14:18:35] <hashar>	 elukey: looks like the whole migration of jobrunner/jobchron to systemd is fucked up
[14:18:42] <aude>	 might be too confusing but could at least get some backports out for the code we have now
[14:18:45] <aude>	 https://gerrit.wikimedia.org/r/#/c/311453/
[14:18:45] <hashar>	 elukey: service status jobrunner:  status: unrecognized service
[14:18:46] <aude>	 at swat
[14:19:00] <hashar>	 marxarelli: yeah be bold let s do that :]
[14:19:12] <hashar>	 marxarelli: if somebody complains we can point at the maintenance anouncement you wrote :D
[14:19:16] <marxarelli>	 hashar: alrighty :]
[14:19:54] <elukey>	 hashar: you are systemdinzing init :D
[14:20:03] <elukey>	 service jobrunner status
[14:20:09] <marxarelli>	 !log disabling beta cluster jenkins jobs in preparation for data migration (T138778)
[14:20:13] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:20:21] <hashar>	 elukey: ho for god sake
[14:20:25] <hashar>	 really
[14:20:41] <hashar>	 they switched the order of parameters ccompared to upstart...
[14:21:20] <marxarelli>	 hashar: it should be systemctl [action] [unit] for systemd i believe
[14:21:46] <hashar>	 I am just going to create a shell function
[14:22:07] <hashar>	 catch the wrongly ordered args and reorder :]
[14:23:40] <elukey>	 yes so systemctl status jobrunner or service jobrunner status
[14:23:46] <elukey>	 I confuse them all the times
[14:27:14] <elukey>	 !log stopped puppet, jobrunner and jobchron on deployment-jobrunner01
[14:27:18] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:37:05] <shinken-wm>	 PROBLEM - Puppet run on deployment-mira02 is CRITICAL
[14:37:48] <hashar>	 elukey: is that common practice to have the systemd units to have stdout/stderr send to syslog?
[14:37:57] <hashar>	 we used to just  > /var/log/foo/soft.log
[14:38:06] <hashar>	 which is really super easier compared to handling rsyslog config :]
[14:39:01] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-mira02 is CRITICAL
[14:39:07] <godog>	 yeah that's how production is done
[14:39:20] <godog>	 > redirection opens a can of worms wrt to e.g. rotation
[14:39:25] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-mira02 is CRITICAL
[14:39:47] <hashar>	 ahh i see
[14:39:58] <hashar>	 so the soft just write to the same io channels
[14:40:04] <hashar>	 and we can logrotate as needed
[14:40:49] <godog>	 yeah, whatever needs to be run is a binary that stays in the foreground and logs to stdout/stderr and that's it
[14:41:05] <godog>	 in production using syslog has the added benefit of central logging too
[14:41:12] <hashar>	 yeah
[14:41:18] <hashar>	 which is exactly what I need actually
[14:41:38] <hashar>	 know I am trying to figure out how to have rsyslog to not write to the generic syslog :D
[14:42:32] <godog>	 what do you mean?
[14:44:25] <marxarelli>	 !log entering read-only mode on beta cluster
[14:44:28] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:44:31] <hashar>	 godog: I crafted a rsyslog rule   to catch my program and write to a given file
[14:44:40] <hashar>	 but the messages are also written to /var/log/syslog
[14:45:59] <godog>	 ah, check puppet there should be some examples there
[14:46:02] <godog>	 for e.g. thumbor
[14:46:26] <hashar>	 found it
[14:46:59] <hashar>	 & ~ 
[14:47:50] <elukey>	 also if I am not mistaken everything is also handled by journald
[14:48:06] <elukey>	 and journald pushes to rsyslog if instructed to
[14:48:19] <hashar>	 E_TOOMANY_TERMINALS
[14:48:21] <elukey>	 so you have journalctl -u unit.service available
[14:48:23] <hashar>	 .... can you believe it ?
[14:49:50] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Update mediawiki_selenium to use Marionette - https://phabricator.wikimedia.org/T137540#2652243 (10zeljkofilipin) Explodes here:   ```lang=ruby def new_browser(options)   Watir::Browser.new(browser_name, options) end ```  ```lang=ruby...
[14:52:47] <elukey>	 hashar: https://gerrit.wikimedia.org/r/#/c/311717
[14:53:16] <elukey>	 after I merge the last step is to kill the jobrunner
[14:53:24] <elukey>	 if you agree
[14:54:14] <hashar>	 !log beta: cherry picking fix up for the jobrunner logging https://gerrit.wikimedia.org/r/#/c/311702/ and  https://gerrit.wikimedia.org/r/311719 T146040
[14:54:18] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:54:53] <hashar>	 elukey: the jobchron service fails for some reason :(
[14:54:56] <hashar>	 on the jessie instance
[14:57:17] <elukey>	 really?
[14:57:22] <elukey>	 what is the error?
[14:57:23] <elukey>	 didn't see it
[14:57:24] <elukey>	 snap
[15:00:01] <elukey>	 so it failed for LightProcess::closeShadow failed due to exception: Failed in afdt::sendRaw: Broken pipe
[15:00:06] <elukey>	 but then systemd restarted it
[15:00:33] <elukey>	 mmm no sorry it was stopped
[15:00:33] <elukey>	 Sep 20 14:57:08 deployment-jobrunner02 systemd[1]: Stopping "Mediawiki job queue chron loop"...
[15:00:37] <elukey>	 Sep 20 14:57:08 deployment-jobrunner02 jobchron[10839]: Caught signal (15)
[15:01:18] <hashar>	 elukey: yeah that was me play testing it
[15:01:22] <hashar>	 to fix rsyslog
[15:01:28] <hashar>	 I have reenabled puppet and done with tests sorry
[15:02:13] <elukey>	 ah okok
[15:02:27] <elukey>	 but what "elukey: the jobchron service fails for some reason :(" was about then?
[15:06:50] <shinken-wm>	 PROBLEM - Keyholder status on deployment-mira02 is CRITICAL
[15:12:37] <wmf-insecte>	 Project mediawiki-core-code-coverage build #2274: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2274/
[15:19:20] <hashar>	 elukey: on deployment-jobrunner02  in /var/log/mediawiki/jobchron.log there are some suspicious messages.
[15:19:30] <hashar>	 havent had time to investigate though and I am about to leave :(
[15:20:54] <elukey>	 hashar: sure.. do you want me to re-enable jobrunner01?
[15:21:32] <hashar>	 not sure
[15:21:35] <elukey>	 ah the  2016-09-20T14:57:07+0000 NOTICE: Raced out of periodic tasks.
[15:21:39] <hashar>	 maybe the issue also existed on runner01 :]
[15:21:42] <hashar>	 yeah those messages
[15:21:55] <hashar>	 one might want to compare with runner01
[15:21:58] <marxarelli>	 hashar: deployment-db1 is in read-only mode while i migrate the beta cluster dbs, in case that may be breaking jobchron
[15:22:08] <hashar>	 elukey: maybe it was just a transient issue
[15:22:27] <hashar>	 or what marxarelli said :]
[15:22:46] <hashar>	 elukey: lets forget about the trusty instance  just delete it indeed
[15:23:04] <hashar>	 we can then tune the jessie one as needed, but it is most probably all fine since prod runs on it
[15:23:11] <elukey>	 we can wait a bit and do it tomorrow, I'll keep it stopped (via puppet) but ready to go
[15:23:31] <elukey>	 would it be ok?
[15:25:32] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Update mediawiki_selenium to use Marionette - https://phabricator.wikimedia.org/T137540#2652378 (10zeljkofilipin) Firefox 47.0.1 mediawiki_selenium 1.7.2 watir-webdriver 0.9.3 selenium-webdriver 2.53.4 firefox driver   ```lang=bash $ b...
[15:28:22] <hashar>	 elukey: yeah
[15:28:37] <hashar>	 elukey: and we got a quota bump so we can afford to get some unused instances :]
[15:28:46] <hashar>	 also marxarelli is migrating the huge dbx Precies instances
[15:28:59] <hashar>	 which I believe will soonish free up ton of resources AND get us prometheus on beta \o/
[15:47:46] <marxarelli>	 !log completed innobackupex on deployment-db1. copying backup to deployment-db03 for restoration
[15:47:50] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:23:40] <marxarelli>	 !log applied innodb transaction logs to deployment-db1 backup and successfully restored on deployment-db03
[16:23:44] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:30:16] <moritzm>	 !log rebooting deployment-mira02
[16:30:20] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:31:41] <marxarelli>	 !log cherry picking operations/puppet patches (T138778) to deployment-puppetmaster
[16:31:45] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:54:52] <marxarelli>	 !log upgraded package and data to mariadb 10 on deployment-db03
[16:54:56] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:58:15] <wikibugs>	 10Beta-Cluster-Infrastructure, 07Puppet, 07Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2652713 (10AlexMonk-WMF)
[16:58:17] <wikibugs>	 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Puppet: Puppet failing on deployment-conf03 due to missing files - https://phabricator.wikimedia.org/T144703#2652712 (10AlexMonk-WMF) 05Open>03Resolved
[17:07:35] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2652761 (10MoritzMuehlenhoff) I've rebuilt the host as deployment-mira02 with /srv/ on a separate 20 GB partition and deleted the mira02 instance...
[17:09:29] <shinken-wm>	 PROBLEM - Host mira02 is DOWN: CRITICAL - Host Unreachable (10.68.22.226)
[17:26:48] <marktraceur>	 Hey e.g. greg-g, I'm looking at https://www.mediawiki.org/wiki/Review_queue - we already have a tracking task for beta cluster deployment (which is all I want to accomplish for now), should I add that to Wikimedia-Extension-setup?
[17:28:11] <greg-g>	 yup
[17:28:39] <marktraceur>	 Cool cool cool
[17:30:13] <greg-g>	 alright alright alright
[17:37:06] <marxarelli>	 !log deployment-db04 restored from backup and replication started
[17:37:09] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[17:37:21] <greg-g>	 wee
[17:39:08] <Krenair>	 nice, does this mean we're migrated?
[17:39:26] <marxarelli>	 Krenair: almost. just need to merge and deploy the mediawiki-config changes
[17:43:51] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2652884 (10thcipriani) >>! In T144578#2652761, @MoritzMuehlenhoff wrote: > I've rebuilt the host as deployment-mira02 with /srv/ on a separate 20...
[17:44:54] <marxarelli>	 hmm, copied up the new wmf-config/db-labs.php but still seeing the read-only message
[17:45:13] <marxarelli>	 db03 and db04 are definitely not read-only
[17:53:54] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Wikimedia-Extension-setup, 07Category, 10FileAnnotations (Beta Cluster Release): Release FileAnnotations on the Beta Cluster - https://phabricator.wikimedia.org/T144302#2652949 (10greg)
[18:00:59] <Krenair>	 marxarelli, you didn't deploy your change?
[18:01:04] <Krenair>	 you just merged it on deployment-tin
[18:01:05] <marxarelli>	 !log deployed mediawiki-config changes on beta cluster. back in read/write mode using new database instances
[18:01:09] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[18:01:35] <marxarelli>	 Krenair: no, worse than that. i didn't pull it after merging :) we're good now
[18:02:00] <Krenair>	 doesn't look okay to me
[18:02:17] <marxarelli>	 scap sync still failed on deployment-jobrunner02, but everything else synced
[18:02:18] <Krenair>	 `/home/krenair/foreachapache 'grep db1 /srv/mediawiki/wmf-config/db-labs.php'` still shows old db config
[18:02:18] <marxarelli>	 no?
[18:02:32] <|404>	 I don't get at least the read-only message anymore
[18:02:36] <Krenair>	 ah, deployment-jobrunner02
[18:02:43] <Krenair>	 deployment-tin too :/
[18:03:06] <marxarelli>	 well that's not good, using two different databases :/
[18:05:19] <Krenair>	 also `diff ../../mediawiki/wmf-config/db-labs.php db-labs.php` from /srv/mediawiki-staging/wmf-config
[18:05:25] <Krenair>	 didn't sync fully
[18:05:44] <marxarelli>	 yeah, i saw that
[18:06:35] <marxarelli>	 going to try a full scap by letting the jenkins job run
[18:07:03] <Krenair>	 btw, like the colour prompts on new hosts?
[18:10:16] <marxarelli>	 wait, i don't have color prompts!
[18:10:21] * marxarelli feels cheated
[18:11:08] <Krenair>	 which host?
[18:11:41] <marxarelli>	 db03 and db04
[18:11:50] <Krenair>	 oh, those might be just old enough
[18:11:59] <Krenair>	 try mira02
[18:12:21] <Krenair>	 basically I fixed up the code in the skeleton bashrc files that linux copies to new users' home directories on their first login to the system
[18:12:38] <marxarelli>	 Krenair: fancy! :D
[18:12:52] <marxarelli>	 k. looks like a full scap worked from jenkins
[18:12:58] <marxarelli>	 maybe a permissions issue
[18:14:43] <Krenair>	 if you want colour prompts on existing hosts you can copy it in from /etc/skel/
[18:14:51] <marxarelli>	 well, that was fun. haven't done a database migration in a while and the tools are so much better now. only an hour and 15 over the window. no bigs, right greg-g? :)
[18:16:05] <greg-g>	 :)
[18:16:23] <marxarelli>	 Krenair: nice! thanks for the tip
[18:16:26] <Krenair>	 I'm supposed to be getting dinner but something is up with keyholder on -mira02
[18:16:30] <marxarelli>	 i always want more color
[18:16:50] <Krenair>	 it has the same private key as -tin but `SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-tin` says "Agent admitted failure to sign using the key."
[18:16:56] <Krenair>	 so if someone wants to look into that..
[18:17:29] <Krenair>	 diamond says in syslog 'diamond[426]: CRITICAL: Keyholder is not armed. Run 'keyholder arm' to arm it.', but keyholder arm says it successfully added the key
[18:20:49] <marxarelli>	 Krenair: i still know embarrassingly little about keyholder but i can poke at it
[18:22:34] <wikibugs>	 06Release-Engineering-Team, 06Editing-Department, 10Monitoring, 06Operations, 07Wikimedia-Incident: High failure rate of account creation should trigger an alarm / page people - https://phabricator.wikimedia.org/T146090#2653054 (10Tgr) Note that failure means the authentication code ran successfully but...
[18:25:57] <wmf-insecte>	 Project beta-scap-eqiad build #120875: 04FAILURE in 16 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120875/
[18:29:03] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-kafka01 is OK: OK: All targets OK
[18:29:04] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:05] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-trusty-1013 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:06] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-restbase01 is OK: OK: deployment-prep.deployment-restbase01.diskspace._var_log.byte_percentfree (No valid datapoints found)
[18:29:06] <shinken-wm>	 RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:08] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-memc05 is OK: OK: All targets OK
[18:29:09] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-kafka03 is OK: OK: All targets OK
[18:29:10] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-conftool is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:11] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-salt02 is OK: OK: All targets OK
[18:29:11] <shinken-wm>	 RECOVERY - Puppet run on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:14] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:14] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-urldownloader is OK: OK: All targets OK
[18:29:17] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-apertium01 is OK: OK: All targets OK
[18:29:18] <shinken-wm>	 RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:19] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-jessie-1005 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:21] <shinken-wm>	 RECOVERY - Puppet run on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:21] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:23] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-eventlogging04 is OK: OK: All targets OK
[18:29:23] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-sentry01 is OK: OK: All targets OK
[18:29:25] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-pdf02 is OK: OK: All targets OK
[18:29:26] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:26] <shinken-wm>	 RECOVERY - Puppet run on deployment-conftool is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:27] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-stream is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:28] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:30] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-mira02 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:31] <shinken-wm>	 RECOVERY - Puppet run on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:31] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-ores-redis is OK: OK: All targets OK
[18:29:35] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:35] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:36] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-trusty-1017 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:36] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:38] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-tin is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:39] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:39] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:40] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-tmh01 is OK: OK: All targets OK
[18:29:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:43] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-puppetmaster is OK: OK: All targets OK
[18:29:44] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:45] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:46] <shinken-wm>	 RECOVERY - Puppet staleness on integration-puppetmaster is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:47] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-zotero01 is OK: OK: All targets OK
[18:29:47] <shinken-wm>	 RECOVERY - Puppet run on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:29:48] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:49] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:29:53] <hashar>	 all that spam is check_graphite that got fixed up
[18:30:02] <greg-g>	 w005
[18:30:06] <greg-g>	 -5+t
[18:30:56] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:30:58] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:00] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:02] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-trusty-1004 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:06] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:08] <shinken-wm>	 RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:09] <shinken-wm>	 RECOVERY - Puppet staleness on integration-publisher is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:09] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:10] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:13] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-trusty-1004 is OK: OK: All targets OK
[18:31:16] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-trusty-1013 is OK: OK: All targets OK
[18:31:17] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:31] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-jobrunner02 is OK: OK: All targets OK
[18:31:31] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:35] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:36] <wmf-insecte>	 Project beta-scap-eqiad build #120876: 04STILL FAILING in 4 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120876/
[18:31:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:45] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:47] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-android is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:47] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:49] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:49] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1004 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:54] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:56] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-logstash2 is OK: OK: deployment-prep.deployment-logstash2.diskspace._srv.byte_percentfree (No valid datapoints found)
[18:31:57] <shinken-wm>	 RECOVERY - Puppet run on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:57] <shinken-wm>	 RECOVERY - Puppet run on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:57] <shinken-wm>	 RECOVERY - Puppet staleness on castor is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:31:58] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1017 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:31:59] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-tin is OK: OK: All targets OK
[18:32:00] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:32:02] <shinken-wm>	 RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[18:32:03] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:32:04] <shinken-wm>	 RECOVERY - Puppet staleness on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:32:04] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [3600.0]
[18:32:05] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-precise-1011 is OK: OK: All targets OK
[18:32:05] <shinken-wm>	 RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:32:06] <hashar>	  deployment-mira02.deployment-prep.eqiad.wmflabs returned [255]: Host key verification failed.
[18:32:06] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-eventlogging03 is OK: OK: All targets OK
[18:32:06] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-precise-1012 is OK: OK: All targets OK
[18:32:07] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-mx is OK: OK: All targets OK
[18:32:07] <hashar>	 known issue
[18:32:08] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-mediawiki05 is OK: OK: All targets OK
[18:32:08] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-memc04 is OK: OK: All targets OK
[18:32:10] <tom29739>	 @q shinken-wm 
[18:32:10] <shinken-wm>	 RECOVERY - Puppet run on deployment-mira02 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:32:11] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1018 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:32:12] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-trusty-1011 is OK: OK: All targets OK
[18:32:13] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:32:16] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1016 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:32:17] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1012 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:32:36] <hashar>	 documented on https://phabricator.wikimedia.org/T144006
[18:33:31] <hashar>	 !log on deployment-mira02  ran `sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mediawiki04.deployment-prep.eqiad.wmflabs` per T144006
[18:33:35] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[18:34:09] <hashar>	 marxarelli: have you completed the switch of db to jessie yet? :]
[18:36:14] <wmf-insecte>	 Project beta-scap-eqiad build #120877: 04STILL FAILING in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120877/
[18:37:30] <marxarelli>	 hashar: yep yep
[18:38:10] <hashar>	 !log on tin: `sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mira02.deployment-prep.eqiad.wmflabs`   - T144006
[18:38:14] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[18:38:25] <hashar>	 marxarelli: awesome :]
[18:39:04] <hashar>	 tom29739: was that meant to silent shinken-wm ? if so can you restore it ? thx!
[18:39:39] <tom29739>	 hashar, it was but it doesn't work in here because wm-bot doesn't have ops in here.
[18:44:45] <wmf-insecte>	 Project beta-scap-eqiad build #120878: 04STILL FAILING in 6 min 26 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120878/
[18:48:10] <Krenair>	 @uq shinken-wm
[18:48:13] <wmf-insecte>	 Project beta-scap-eqiad build #120879: 04STILL FAILING in 1 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120879/
[18:48:18] <wm-bot>	 I know: add, changepass, channel-info, channellist, commands, configure, drop, github-, github+, github-off, github-on, grant, grantrole, help, info, instance, join, language, notify, optools-off, optools-on, optools-permanent-off, optools-permanent-on, part, rc-ping, rc-restart, reauth, recentchanges-bot-off, recentchanges-bot-on, recentchanges-minor-off, recentchanges-minor-on, recentchanges-off, recentchanges-on, reload, restart, revoke, revokerole, seen, seen-host, seen-off, seen-on, seenrx, suppress-off, suppress-on, systeminfo, system-rm, time, traffic-off, traffic-on, translate, trustadd, trustdel, trusted, uptime, verbosity--, verbosity++, wd, whoami
[18:48:18] <Krenair>	 @commands
[18:48:19] <Krenair>	 ah
[18:51:45] <thcipriani>	 blerg. Looks like deployment-mira02 needs to be added to network constants as a deploy host.
[18:56:21] <wmf-insecte>	 Project beta-scap-eqiad build #120880: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120880/
[19:00:40] <thcipriani>	 !log cherry-picked https://gerrit.wikimedia.org/r/#/c/311760/ to deployment-puppetmaster to fix failing beta-scap-eqiad job, had to manually start rsync, puppet failed to start
[19:00:44] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[19:05:47] <grrrit-wm>	 (03CR) 1020after4: [C: 032] "testing" [integration/config] - 10https://gerrit.wikimedia.org/r/311497 (owner: 10Paladox)
[19:06:20] <paladox>	 twentyafterfour ^^ thanks
[19:06:49] <grrrit-wm>	 (03Merged) 10jenkins-bot: [mediawiki/extensions] Add noop jenkins test [integration/config] - 10https://gerrit.wikimedia.org/r/311497 (owner: 10Paladox)
[19:07:44] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[19:08:01] <shinken-wm>	 PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[19:08:31] <wmf-insecte>	 Yippee, build fixed!
[19:08:32] <wmf-insecte>	 Project beta-scap-eqiad build #120881: 09FIXED in 3 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120881/
[19:08:47] <Krenair>	 ^ puppetmaster puppet issue is me & yuvi, we're working on it
[19:21:58] <hashar>	 thcipriani: deployment-mira02 looks all in good shape
[19:22:04] <hashar>	 I guess we can get rid of mira (trusty)
[19:29:00] <thcipriani>	 hashar: last I looked deployment-mira02 still had a small /srv/ of 20GB
[19:29:16] <thcipriani>	 hashar: https://phabricator.wikimedia.org/T144578#2652884
[19:29:24] <hashar>	 oh man
[19:29:38] <thcipriani>	 but as far as all the deployment tooling stuff goes: it looks good to me.
[19:30:07] <thcipriani>	 it might be nice to actually cut over to running a deploy from it, although that is probably a fair amount of work :\
[19:30:39] <hashar>	 well on beta it is really just to test comaster switch isn't it ?
[19:32:15] <thcipriani>	 yeah, It should just be changing a hiera var and letting puppet run everywhere
[19:32:38] <thcipriani>	 would have to turn off beta-scap-eqiad or re-point it to the right master as part of that, too
[19:34:20] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[19:38:01] <shinken-wm>	 RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[19:42:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:45:37] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[19:48:04] <shinken-wm>	 PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[19:48:12] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic06 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[19:48:38] <shinken-wm>	 PROBLEM - Puppet run on deployment-jobrunner02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[19:49:20] <shinken-wm>	 PROBLEM - Puppet run on deployment-salt02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[19:49:46] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[19:49:56] <shinken-wm>	 PROBLEM - Puppet run on deployment-pdfrender is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[19:51:50] <shinken-wm>	 PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[19:53:22] <thcipriani>	 > Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class base::environment
[19:53:24] <thcipriani>	 why
[20:05:37] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:11:12] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 10Labs-Infrastructure, 07HHVM: OpenStack flavor for beta cluster deployment servers - https://phabricator.wikimedia.org/T146209#2653709 (10hashar)
[20:11:21] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2604309 (10hashar) Lets get a custom flavor for the deployment servers.  8 CPUs to get faster l10n rebuild 8 GB RAM: 2G for system, 6G for cache,...
[20:12:36] <hashar>	 thcipriani: transient issue :(
[20:13:48] <hashar>	 thcipriani: filled that has https://phabricator.wikimedia.org/T145631  which I later merged with https://phabricator.wikimedia.org/T131946
[20:14:03] <hashar>	 I suspect the auto git rebase causing puppet master to miss some files from time to time
[20:14:09] <hashar>	 eg it is not atomic enough :]
[20:14:18] <shinken-wm>	 RECOVERY - Puppet run on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:14:18] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:14:22] <Krenair>	 I just had to manually fix rebasing the puppet repo
[20:18:15] <grrrit-wm>	 (03PS2) 10Hashar: Add research/recommendation-api to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311667 (https://phabricator.wikimedia.org/T146057) (owner: 10Nschaaf)
[20:24:18] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "I have granted jenkins-bot the ability to submit patches." [integration/config] - 10https://gerrit.wikimedia.org/r/311667 (https://phabricator.wikimedia.org/T146057) (owner: 10Nschaaf)
[20:25:17] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add research/recommendation-api to integration [integration/config] - 10https://gerrit.wikimedia.org/r/311667 (https://phabricator.wikimedia.org/T146057) (owner: 10Nschaaf)
[20:25:18] <hashar>	 twentyafterfour: one of your patch did not get deployed on zuul server :/
[20:25:34] <hashar>	 the one adding noop jobs to mediawiki/extensions that is rather harmless though
[20:25:45] <hashar>	 CR+2 does not deploy on gallium. Has to be done manually
[20:26:48] <shinken-wm>	 RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0]
[20:27:50] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 10Labs-Infrastructure, 07HHVM: OpenStack flavor for beta cluster deployment servers - https://phabricator.wikimedia.org/T146209#2653807 (10Andrew)
[20:28:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:28:12] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:28:36] <shinken-wm>	 RECOVERY - Puppet run on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:47] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:29:59] <shinken-wm>	 RECOVERY - Puppet run on deployment-pdfrender is OK: OK: Less than 1.00% above the threshold [0.0]
[20:31:09] <twentyafterfour>	 hashar: sorry
[20:31:54] <hashar>	 twentyafterfour: np :]
[20:32:50] <hashar>	 twentyafterfour: actually the related doc to deploy a zuul config change was outdated and zeljkof prompted me to update it https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Deploy_configuration  :D
[20:32:51] <hashar>	 in short:
[20:32:56] <hashar>	 fab deploy_zuul
[20:32:58] <hashar>	 :]
[20:33:25] <paladox>	 twentyafterfour theres https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Actually_upgrade which hashar made
[20:33:33] <paladox>	 and super easy to follow, just copy and paste :)
[20:33:45] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2653842 (10Andrew)
[20:33:50] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 10Labs-Infrastructure, 07HHVM: OpenStack flavor for beta cluster deployment servers - https://phabricator.wikimedia.org/T146209#2653840 (10Andrew) 05Open>03Resolved a:03Andrew
[20:47:54] <hashar>	 !log Creating deployment-mira instance with flavor c8.m8.s60 (8 cpu, 8G RAM and 60G disk) T144578
[20:47:57] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[20:53:19] <hashar>	 !sal
[20:53:19] <wm-bot>	 https://tools.wmflabs.org/sal/releng
[20:54:29] <hashar>	 !log from deployment-tin for T144578, accept ssh host key of deployment-mira :  sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mira.deployment-prep.eqiad.wmflabs
[20:54:33] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[21:02:38] <RoanKattouw>	 greg-g: So what's up with the deployment train this week? I see we're still on wmf.18? Is today's train going forward? Are we going to go to 19 or skip it and go straight to 20?
[21:02:58] <RoanKattouw>	 Oh, I now see there was an email to wikitech but not engineering
[21:03:02] <hashar>	 RoanKattouw: on hold due to a performance regression that occured with  wmf.18 :/
[21:03:19] <RoanKattouw>	 Ugh, branching of 20 is paused?
[21:03:23] <RoanKattouw>	 Can we not do that, please?
[21:03:46] <RoanKattouw>	 My week is scheduled around the branch cut happening on Tuesdays, I have patches waiting to be merged post-cut
[21:03:57] <hashar>	 we havent branched wmf.20 afaik
[21:04:02] <hashar>	 since wmf.19 hasnt been deployed
[21:04:05] <RoanKattouw>	 Yeah that's whta bothers me
[21:04:09] <RoanKattouw>	 But I guess it makes some sense
[21:04:18] <hashar>	 and wikidata team is in a similar position
[21:04:20] <RoanKattouw>	 I'll ask on the list
[21:04:34] <shinken-wm>	 PROBLEM - Puppet run on deployment-mira is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0]
[21:04:42] <hashar>	 specially with mediawiki/core going to have a few breaking changes in wmf.20 :(
[21:04:50] <hashar>	 (such as DBFactory that got renamed)
[21:05:02] <hashar>	 yeah better on list or poke Tyler about it
[21:05:37] <hashar>	 Info: /Stage[main]/Redis/Sysctl::Parameters[vm.overcommit_memory]/Sysctl::Conffile[vm.overcommit_memory]/File[/etc/sysctl.d/70-vm-overcommit_memory.conf]: Scheduling refresh of Exec[update_sysctl]
[21:05:47] <hashar>	 sounds scary :D
[21:09:31] <shinken-wm>	 RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[21:10:43] <greg-g>	 we could cut it today but not do anything with it. We might not even deploy it, we might skip to wmf.21 if we can't resolve the issue in wmf.18/19 (cc thcipriani thoughts?)
[21:11:57] <thcipriani>	 we can cut so folks can continue with their cadence
[21:13:53] <thcipriani>	 what are the consequences of that on the other side? A few more backports?
[21:16:00] <thcipriani>	 may make it even harder to troubleshoot? Since now we'd be juggling 3 branches in limbo: wmf.18 (which has the regression), wmf.19 (which has the regression and is still untested), and wmf.20 (which has the regression and would be completely untested)
[21:17:03] <thcipriani>	 but, it's possible, that by delaying wmf.20 we'd be creating a more unstable branch since it seems like folk hold-off a merge to master until Tuesday afternoons
[21:18:15] <greg-g>	 right, also end of quarter complications :/
[21:18:47] <greg-g>	 I think we should have a policy of no non-urgent fixes deployed during SWATs while we're in a reverted state, as well
[21:18:52] <greg-g>	 gut check? :)
[21:20:33] <shinken-wm>	 PROBLEM - Puppet run on deployment-mira is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:20:44] <thcipriani>	 ehm, maybe. Seems like that could help isolate issues, but would likely result in other complications that I can't anticipate, particularly as SWAT is the only way to get *any* change into mw-config
[21:21:10] <greg-g>	 yeah, I'd say those are OK, but backports of code code are not so much
[21:21:10] <chasemp>	 is made more complicated by the freeze next week?
[21:21:17] <greg-g>	 chasemp: yes
[21:21:21] <greg-g>	 sadly
[21:21:24] <chasemp>	 (just confirming I'm following)
[21:21:41] <greg-g>	 I bet a lot of teams were counting on wmf.20 cutting today and going out this week with their end of quarter goals related code :)
[21:21:55] <chasemp>	 sure yeah
[21:22:19] <thcipriani>	 the thing about that is...nothing is actually *more* broken in wmf.20
[21:22:25] <greg-g>	 right
[21:22:41] <greg-g>	 (that we know of) and/or (just related to this one perf issue, probably)
[21:23:03] <thcipriani>	 blug. I'm going to cut the branch now. I'll update the mailing list.
[21:23:47] <thcipriani>	 If we haven't heard anything about performance by train time tomorrow, I'll go to group0 and keep a close eye on things and try to run a shortened schedule from there.
[21:23:51] <thcipriani>	 how does that sound for a plan?
[21:24:16] <greg-g>	 with 19 or 20?
[21:24:34] <thcipriani>	 with .20 and skip wmf.19 entirely
[21:24:39] * greg-g nods
[21:25:08] <greg-g>	 wmf.20 would then have whatever backports are in 19 now (modulu this evening's swat)
[21:25:22] <thcipriani>	 right
[21:25:23] <greg-g>	 s/modulu/modulo/ #effing coffe shop conversations next to me
[21:25:52] <RoanKattouw>	 Thanks, that's exactly what I was hoping for
[21:25:57] <thcipriani>	 ok, I'll cut the branch and then check in with performance folks then update wikitech-l
[21:26:38] <greg-g>	 Approved.
[21:26:44] <thcipriani>	 :D
[21:26:54] <greg-g>	 ;)
[21:27:39] <RoanKattouw>	 Ooh I also approve of the releng offsite coinciding with my team's offsite
[21:27:45] <RoanKattouw>	 No deployment worries during my offsite :)
[21:29:24] <greg-g>	 heh, you're welcome :P
[21:30:33] <greg-g>	 RoanKattouw: second opinions on "no non emergency or simple config chanes during SWAT deploys while we're in a reverted state" (mostly looking for language improvements, I feel we mostly do that now by default)
[21:33:17] <twentyafterfour>	 greg-g: "Only emergency hotfixes during SWAT while production is reverted to a previous branch" ?
[21:33:22] <RoanKattouw>	 Hmm, maybe
[21:33:42] <RoanKattouw>	 I'm not too convinced that simple config changes should be blocked in a reverted state, can you explain why?
[21:34:03] <greg-g>	 oh, bad english
[21:34:04] <RoanKattouw>	 Naive thinking: the reverted state is a stable state, so what's wrong with adding some import sources on the Kazakh Wikipedia
[21:34:07] <greg-g>	 simple configs ok
[21:34:10] <RoanKattouw>	 OK
[21:34:27] <greg-g>	 but not "enable new feature" obvs
[21:34:32] <wmf-insecte>	 Project beta-scap-eqiad build #120896: 04FAILURE in 0.3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120896/
[21:34:35] <RoanKattouw>	 OK, so only simple config changes and emergency fixes are allowed in SWAT while revertd?
[21:34:44] <greg-g>	 right
[21:35:01] <RoanKattouw>	 That makes sense to me
[21:35:09] <greg-g>	 word
[21:35:12] * greg-g documents more
[21:35:21] <RoanKattouw>	 Rules like that don't have to be too precise anyway, they can say "when in doubt ask Greg for approval" at the end
[21:35:31] <shinken-wm>	 RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[21:35:56] <greg-g>	 yup, or "the on-point train conductor"
[21:35:57] <hashar>	 scap failure on beta is me
[21:36:13] * greg-g still can't find suitable adult sized train conductor hats
[21:38:28] <thcipriani>	 ahem. Will just leave this here for greg-g https://www.amazon.com/Mineola-Hat-Store-mhs-eng-Engineer/dp/B000BBS82K/ref=sr_1_1
[21:39:29] <chasemp>	 they make tons of them.  source: my old neighbor was an adult model train guy
[21:39:36] <chasemp>	 those ppl are serious business
[21:39:39] <greg-g>	 engineer != conductor
[21:39:52] <thcipriani>	 https://www.amazon.com/Large-Navy-Blue-Conductor-Hat/dp/B000JGAF3C/ref=pd_sim_193_6
[21:40:10] <greg-g>	 :)
[21:40:31] <greg-g>	 Why didn't I like this one before (I remember seeing it).. I think I started looking for customizable options...
[21:41:19] <greg-g>	 " Only 3 left in stock. " crap
[21:41:22] <thcipriani>	 needs an embroidered scappy the scap pig
[21:41:32] <shinken-wm>	 PROBLEM - Puppet run on deployment-mira is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:41:39] <paladox>	 Its on sale in the us but not in the uk LOL
[21:41:50] <bd808>	 thcipriani: that would be so \m/
[21:42:03] <greg-g>	 oh right, the quality, many complaints of quality (for $9 not surprising)
[21:42:08] <paladox>	 $9 = £39
[21:42:25] <wmf-insecte>	 Project beta-scap-eqiad build #120897: 04STILL FAILING in 4 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120897/
[21:42:29] <paladox>	 Your technolly right but look at it in the uk ^^
[21:42:34] <greg-g>	 Question:
[21:42:35] <greg-g>	 What size(s) does it come in? I wear 7 3/8.
[21:42:35] <greg-g>	 Answer:
[21:42:35] <greg-g>	 Trying googling and see what you get.l
[21:42:41] <greg-g>	 ugh amazon
[21:43:33] <paladox>	 It would have to be great quality here otherwise for that price no one would buy it here
[21:43:54] <paladox>	 it looks like lots of people buy it here but wont in the us so there probaly doing a sale :)
[21:44:01] <chasemp>	 Interesting question, why do grown men all want to be train engineers and not conductors? One for the ages.
[21:44:07] <greg-g>	 it's not, the reviews are pretty clear on the quality
[21:44:31] <wmf-insecte>	 Project beta-scap-eqiad build #120898: 04STILL FAILING in 0.33 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120898/
[21:45:03] <hashar>	 race condition
[21:45:37] <thcipriani>	 what's happening?
[21:45:42] <paladox>	 But if i order from america here it is £56
[21:45:51] <paladox>	 57 i mean
[21:45:52] <paladox>	 https://www.amazon.co.uk/gp/offer-listing/B000JGAF3C/ref=dp_olp_new_mbc?ie=UTF8&condition=new
[21:46:14] <paladox>	 Plus delivery
[21:47:49] <hashar>	 thcipriani: for info, I got a new deployment-mira host (that is like the 3rd or 4th)  this time 
[21:47:57] <hashar>	 has 8cpu and 40G on /srv :]
[21:48:03] <thcipriani>	 :)
[21:48:07] <thcipriani>	 that ought to do it
[21:48:22] <wmf-insecte>	 Yippee, build fixed!
[21:48:23] <wmf-insecte>	 Project beta-scap-eqiad build #120899: 09FIXED in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120899/
[21:48:42] <hashar>	 ah evening sprint completed
[21:49:07] <hashar>	 !log Deleting deployment-mira02 /srv was too small. Replaced by deployment-mira
[21:49:10] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[21:51:59] <shinken-wm>	 PROBLEM - Host deployment-mira02 is DOWN: CRITICAL - Host Unreachable (10.68.19.67)
[21:53:38] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2654077 (10hashar) I did a sprint tonight:  * Got a new flavor in openstack with larger disk T146209, huge thanks to Andrew to have created it up...
[21:54:48] <hashar>	 sprint complete :]
[21:56:10] <greg-g>	 g'night :)
[21:57:32] <hashar>	 I also wonder whether deployment-db1 and deployment-db2 can be shutdown (not deleted)
[21:57:37] <hashar>	 seems the migration is complete
[22:01:32] <shinken-wm>	 RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[22:09:57] <hashar>	 ok deployment-mira good for service as far as I can tell
[22:10:01] <hashar>	 bed crash \o/
[22:16:23] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T142855#2654227 (10thcipriani) 05Open>03Resolved `1.28.0-wmf.18` is deployed and live everywhere.
[22:19:17] <wikibugs>	 06Release-Engineering-Team, 10DBA, 10MediaWiki-Maintenance-scripts, 06Operations, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2654239 (10greg) For the task at hand, I've added https://wikitech.wikim...
[22:20:20] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T144644#2654240 (10thcipriani) I just cut the `1.28.0-wmf.20` branch.  The tentative plan is simply to skip the roll out of `1.28.0-wmf.19` and sync `wmf.20` to group0 wikis tom...
[22:22:12] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T144644#2654246 (10thcipriani)
[22:24:03] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2654251 (10thcipriani) 05Open>03Invalid Closing this task since the tentative plan is to skip the roll out of wmf.19 entirely.  Any discussion...
[22:30:27] <wikibugs>	 06Release-Engineering-Team, 10DBA, 10MediaWiki-Maintenance-scripts, 06Operations, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2654312 (10greg) Ok, emailed. Resolving.  Thanks @jcrespo for the sugges...
[22:30:35] <wikibugs>	 06Release-Engineering-Team, 10DBA, 10MediaWiki-Maintenance-scripts, 06Operations, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2654313 (10greg) a:03greg
[22:31:13] <wikibugs>	 06Release-Engineering-Team, 10DBA, 10MediaWiki-Maintenance-scripts, 06Operations, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2606542 (10greg) 05Open>03Resolved p:05Triage>03Normal
[22:33:23] <legoktm>	 thcipriani: Could I have https://phabricator.wikimedia.org/T143328#2627818 apply to wmf.20 too? cherry-pick is https://gerrit.wikimedia.org/r/#/c/311851/
[22:34:23] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 10Labs-Infrastructure, 07HHVM: OpenStack flavor for beta cluster deployment servers - https://phabricator.wikimedia.org/T146209#2653709 (10greg) (This isn't a quota increase request, if that was a mis-aligned upstream task addition ;) ).
[22:35:14] <thcipriani>	 legoktm: yup, sure. I won't do any actual checkout of wmf.20 code on tin until tomorrow.
[22:37:46] <wikibugs>	 10Deployment-Systems, 03Scap3: Purge the hhvm fcgi and cli bytecache as part of deployment - https://phabricator.wikimedia.org/T146226#2654338 (10thcipriani)
[22:37:49] <legoktm>	 thanks :)
[22:38:17] <wikibugs>	 10Deployment-Systems, 03Scap3: Purge the hhvm fcgi and cli bytecache as part of deployment - https://phabricator.wikimedia.org/T146226#2654357 (10thcipriani) p:05Triage>03Low
[22:38:27] <MaxSem>	 greg-g, so if I wanted to run a script that takes 3 weeks to complete, no deployments that time? :P
[22:38:58] <greg-g>	 MaxSem: I'll clarify that, but no, other deploys can happen at the same time :)
[22:39:32] <MaxSem>	 it can't go into the same table then
[22:43:04] <greg-g>	 yeah it can
[22:43:44] <greg-g>	 see https://wikitech.wikimedia.org/wiki/Deployments/Archive/2016/09#deploycal-item-20160912T1600
[22:59:51] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 10Labs-Infrastructure, 07HHVM: OpenStack flavor for beta cluster deployment servers - https://phabricator.wikimedia.org/T146209#2654466 (10AlexMonk-WMF)
[23:15:49] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 13Patch-For-Review, 07WorkType-Maintenance: Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 - https://phabricator.wikimedia.org/T138778#2654510 (10dduvall) The migration to `deployment-db03` and `deployme...
[23:21:35] <wikibugs>	 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3 (Scap3-MediaWiki-MVP), 13Patch-For-Review: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2654515 (10Krinkle)
[23:22:23] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 13Patch-For-Review, 07WorkType-Maintenance: Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 - https://phabricator.wikimedia.org/T138778#2654516 (10dduvall)