[01:17:02] <grrrit-wm>	 (03CR) 10Krinkle: "The wpt jobs don't use a matrix and don't have a parent." [integration/config] - 10https://gerrit.wikimedia.org/r/308579 (owner: 10Niedzielski)
[01:46:19] <legoktm>	 Krinkle: how much do we care about older branches?
[01:49:40] <Krinkle>	 legoktm: I removed the exclusion, so that should work gracefully by just being a bit slower.
[01:51:00] <legoktm>	 right. I should've said the performance of* older branches
[01:51:04] <legoktm>	 But I think that's fine
[02:33:27] <Krinkle>	 Deploying..
[02:34:27] <grrrit-wm>	 (03CR) 10Krinkle: [C: 032] "Recompiled and deployed 4 jobs:" [integration/config] - 10https://gerrit.wikimedia.org/r/310701 (owner: 10Krinkle)
[02:35:28] <grrrit-wm>	 (03Merged) 10jenkins-bot: mediawiki: Merge parsertests job back into main phpunit job [integration/config] - 10https://gerrit.wikimedia.org/r/310701 (owner: 10Krinkle)
[02:36:29] <Krinkle>	 !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/310701 
[02:36:34] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[02:37:22] <grrrit-wm>	 (03CR) 10Krinkle: "Deleted 4 dereferenced jobs:" [integration/config] - 10https://gerrit.wikimedia.org/r/310701 (owner: 10Krinkle)
[03:27:34] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2639138 (10Krinkle)
[04:18:22] <wmf-insecte>	 Yippee, build fixed!
[04:18:22] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #142: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/142/
[04:44:44] <shinken-wm>	 PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[05:24:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[05:54:03] <wmf-insecte>	 Yippee, build fixed!
[05:54:04] <wmf-insecte>	 Project performance-webpagetest-wmf build #2574: 09FIXED in 2 hr 11 min: https://integration.wikimedia.org/ci/job/performance-webpagetest-wmf/2574/
[06:43:02] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[06:46:50] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[07:00:44] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[07:23:02] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[07:26:51] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[07:35:44] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [0.0]
[08:04:10] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2639368 (10hashar) Would be nice to do a proof of concept based on https://github.com/travis-ci/trav...
[08:23:16] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2639500 (10elukey)
[08:23:18] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2639499 (10elukey)
[08:23:30] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2586022 (10elukey) The task is currently blocked by T145611
[08:23:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2639519 (10elukey) a:03elukey
[08:25:11] <hashar>	 :(
[08:25:20] <hashar>	 moritzm: did you get mira02 on beta figured out ?
[08:25:31] <hashar>	 last time I checked it had some troubles related to trebuchet
[08:26:28] <elukey>	 hashar: this morning I was thinking that we could simply proceed with mw03 and 02, removing them from the puppet conf, deleting them and creating 05/06
[08:26:46] <moritzm>	 I asked for someone from releng to check out it's status on T144578, there's a minor bug in the keyholder script, which I'll fix later, but puppet runs etc. seem fine
[08:27:05] <hashar>	 moritzm: awesome
[08:27:09] <moritzm>	 I'm not very familiar with the deployment process, so someone from releng shoulld have a look instead
[08:27:22] <moritzm>	 it looks good, but it's possibly I'm missing something .:-)
[08:27:34] <hashar>	 elukey: yeah definitely.  The mw04 has been proven to work fine with the delta of the curl / update-ca-certificate issue from yesterday.   Fixed by force running update-ca-certificates --fresh
[08:27:47] <hashar>	 moritzm: has it been added in the dsh group files for scap to use it as a co master?  
[08:28:14] <moritzm>	 hashar: no. see, that's one of the issues I was referring to :-)
[08:28:35] <hashar>	 a scap run on deployment-tin is supposed to rsync to the other co masters eg  mira / mira02.  But I am not sure where that list is maintained, I am assuming via a dsh file
[08:29:11] <hashar>	 elukey: I guess depool  mw02,  run puppet on varnish cache + deployment-tin
[08:29:22] <hashar>	 elukey: then delete mw02 which free up quota for a new mw05 Jessie based instance
[08:29:53] <hashar>	 moritzm: great thanks :}
[08:31:04] <elukey>	 hashar: would it be ok for you or do you think that I'll cause some issue?
[08:31:30] <elukey>	 I am not super familiar with deployment-pre
[08:31:39] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2639586 (10hashar) @mmodell @thcipriani @demon @dduvall  can you check mira02 on beta is all fine ?  I dont feel confident double checking that is working properly.  A...
[08:31:50] <hashar>	 moritzm: added rest of team to CC of that bug. I dont feel confident testing it :}
[08:32:15] <hashar>	 elukey: yeah it would be fine
[08:32:24] <hashar>	 pretty sure the load at this time can be handled by a single server
[08:32:42] <hashar>	 mw03 is for the security audits,  not sure it receives traffic for anything else
[08:32:59] <hashar>	 i think it is kind of a hack where when a given HTTP Header is set, Varnish route to mw03
[08:33:08] <hashar>	 but rest is routed to mw02 / mw04 
[08:33:53] <elukey>	 ah so maybe I can start with 03
[08:33:57] <elukey>	 and route the traffic to 02
[08:39:30] <elukey>	 hashar: https://gerrit.wikimedia.org/r/#/c/310749/1
[08:41:37] <hashar>	 elukey: restroom brb
[08:43:15] <hashar>	 looking
[08:44:30] <hashar>	 elukey: looks great lets do that and delete mw03
[08:44:48] <moritzm>	 hashar: for the dsh groups: https://gerrit.wikimedia.org/r/310752 
[08:45:29] <hashar>	 moritzm: that looks like it is magic
[08:45:40] <hashar>	 I suspect the initial sync to co master will fail due to unknown ssh key
[08:45:44] <hashar>	 but that is fixable manually 
[08:45:54] <hashar>	 (beta puppet does not collect ssh host keys centrally)
[08:45:56] <elukey>	 !log removed mediawiki03 from puppet with https://gerrit.wikimedia.org/r/#/c/310749/
[08:45:59] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[08:46:19] <hashar>	 rebasing puppet.git
[08:47:19] <elukey>	 just merged,
[08:48:06] <hashar>	 running puppet on deployment-tin and the text cache
[08:49:33] <hashar>	 backend changed \O/
[08:49:58] <hashar>	 elukey: guess you can delete mw03 and build a new jessie host :}
[08:50:08] <elukey>	 okkk!
[08:52:08] <shinken-wm>	 PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[08:52:20] <elukey>	 !log terminated mediawiki03 and created mediawiki06
[08:52:21] <hashar>	 false alarm
[08:52:25] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[08:53:00] <shinken-wm>	 PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[08:53:28] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0]
[08:55:04] <shinken-wm>	 PROBLEM - Host deployment-mediawiki03 is DOWN: CRITICAL - Host Unreachable (10.68.17.55)
[08:55:37] <icinga-wm>	 PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:56:14] <hashar>	 bah
[08:56:54] <elukey>	 running puppet on mw06, looks good (I had to force a key regeneration as I did with 04)
[08:58:09] <icinga-wm>	 RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[08:58:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:02:04] <hashar>	 rushing for video chat with zeljkof  then I have to meet someone outside
[09:02:07] <shinken-wm>	 RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[09:02:10] <hashar>	 so be back in like an hour and a half
[09:02:59] <shinken-wm>	 RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[09:05:22] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki06 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0]
[09:08:19] <shinken-wm>	 RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[09:09:56] <elukey>	 hashar: I might have messed up with the deployment-puppet master
[09:10:23] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:10:25] <elukey>	 !log executed git pull and then git rebase -i on deployment puppet master
[09:10:29] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[09:11:01] <elukey>	 so I logged what I have done, I am stopping since the git log looks different now
[09:11:25] <elukey>	 I followed what written in the task and I noticed that I had create a merge commit after my git pull
[09:11:33] <elukey>	 then I tried to inspect with rebase -i
[09:11:37] <elukey>	 and log changed
[09:11:42] <elukey>	 sorry if I caused issues
[09:16:16] <hashar>	 ah
[09:16:48] <hashar>	 elukey: looks good to me
[09:18:05] <hashar>	 git reflog  sometime has good infos
[09:18:40] <elukey>	 okok I thought I messed it up
[09:18:44] <elukey>	 good :)
[09:18:57] <elukey>	 I configured in hiera/wikitech 06, running puppet
[09:19:24] <hashar>	 \O/
[09:19:55] <elukey>	 wouldn't it be better to run git pull --rebase instead of git pull in /var/lib/git/operations/puppet ?
[09:20:04] <elukey>	 (asking for curiosity/ignorance)
[09:20:07] <elukey>	 to avoid the merge commit
[09:20:58] <hashar>	 yes
[09:21:00] <hashar>	 that is what we do
[09:21:04] <hashar>	 git fetch && git rebase
[09:21:07] <hashar>	 or git pull --rebase
[09:21:15] <hashar>	 they are slightly different, but really it does not matter
[09:21:31] <hashar>	 in 99,9999% use case both can be used with identical output
[09:21:44] <hashar>	 there is also a cron that auto rebase + tag the new head   every x minutes
[09:21:52] <elukey>	 ah ok because I saw git pull in the commands listed in the phab task
[09:21:58] <elukey>	 ahhh okok
[09:26:24] <wmf-insecte>	 Project beta-scap-eqiad build #120125: 04FAILURE in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120125/
[09:29:11] <hashar>	 deployment-mediawiki06.deployment-prep.eqiad.wmflabs returned [255]: Host key verification failed.
[09:29:16] <hashar>	 known issue :p
[09:33:36] <hashar>	 !log T144006 sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mediawiki06.deployment-prep.eqiad.wmflabs
[09:33:40] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[09:38:04] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki06 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 392 bytes in 0.002 second response time
[09:38:37] <wmf-insecte>	 Project beta-scap-eqiad build #120126: 04STILL FAILING in 4 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120126/
[09:39:24] <hashar>	 rsync fails
[09:39:40] <hashar>	 race condition of scap doing rsync  vs  puppet doing a preliminary rsync as well
[09:40:36] <hashar>	 moritzm: and I found the reason deployment servers on beta have 8 CPUS:  09:40:08 09:40:08 Updating LocalisationCache for master using 6 thread(s)
[09:40:52] <hashar>	 the l10ncache update uses n-2 CPU threads to rebuild the cdb files
[09:41:11] <hashar>	 so the co master mira02 would only have 2 threads. But maybe it is not much of an issue
[09:41:29] <hashar>	 maybe the cdb files are just rsynced and not rebuild
[09:41:29] <wmf-insecte>	 Yippee, build fixed!
[09:41:30] <wmf-insecte>	 Project beta-scap-eqiad build #120127: 09FIXED in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120127/
[09:41:37] <hashar>	 elukey: scap works with mw06 :}
[09:43:10] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki06 is OK: HTTP OK: HTTP/1.1 200 OK - 45256 bytes in 6.537 second response time
[09:45:06] <moritzm>	 hashar: ok, if that should really be a performance bottleneck (which I find hard to believe :-) the deployment-tin reimage can use 8 CPUs again
[09:46:25] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki06 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[09:46:25] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2639929 (10hashar) And here is a night wo...
[09:48:29] <shinken-wm>	 PROBLEM - Puppet run on mira is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[09:48:45] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2639931 (10Paladox) Also theres https://github.com/travis-ci/travis-ci/blob/master/README.markdown w...
[09:52:59] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[09:54:25] <elukey>	 hashar: puppet completed with mw stuff, all good!
[09:56:21] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:59:03] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2639937 (10hashar) I have added a gdb scr...
[10:12:22] <elukey>	 hasharAway: https://gerrit.wikimedia.org/r/#/c/310773
[10:12:32] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-JsonConfig, 10MediaWiki-extensions-ZeroBanner, 06Reading-Web-Backlog, and 3 others: Zero phpunit test failure (blocks merges to MobileFrontend) - https://phabricator.wikimedia.org/T145227#2639976 (10phuedx) @Yurik: Thanks for the explanation.  {096a80...
[10:12:33] <elukey>	 ah I missed the away :D
[10:14:59] <hashar>	 backkkk
[10:15:00] <hashar>	 elukey:  :)
[10:15:19] <hashar>	 elukey: looks great :)
[10:19:40] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-JsonConfig, 10MediaWiki-extensions-ZeroBanner, 06Reading-Web-Backlog, and 3 others: Zero phpunit test failure (blocks merges to MobileFrontend) - https://phabricator.wikimedia.org/T145227#2640015 (10phuedx) >>! In T145227#2635626, @Yurik wrote: > P.S....
[10:21:04] <elukey>	 merged!
[10:22:09] * elukey now proceeds with mediawiki02
[10:22:45] <elukey>	 hashar: question - do you guys prefer to have ubuntu and debian until we have finished the prod migration?
[10:22:51] <elukey>	 or is it not worth it?
[10:25:19] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2640040 (10hashar) And the PHP aware stac...
[10:26:21] <hashar>	 elukey: as production is moving out of trusty
[10:26:27] <hashar>	 I dont think we need trusty instances any more
[10:26:38] <hashar>	 though in theory it might be useful to have trusty on beta
[10:27:00] <hashar>	 for the web servers, lets move to Jessie for sure.  Seems prod migration is pretty much complete
[10:27:58] <elukey>	 okok
[10:28:28] <shinken-wm>	 RECOVERY - Puppet run on mira is OK: OK: Less than 1.00% above the threshold [0.0]
[10:32:58] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:34:34] <elukey>	 hashar: there you go https://gerrit.wikimedia.org/r/#/c/310796/2
[10:39:46] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2640093 (10hashar) Looks like it is whate...
[10:40:22] <hashar>	 elukey: sorry been playing with gdb
[10:41:07] <elukey>	 nice! when you find good stuff feel free to share :)
[10:41:27] <hashar>	 elukey: yeah looks good
[10:41:35] <hashar>	 I am not sure why torrus references beta cluster
[10:41:45] <hashar>	 looks like it is merely for some rspec tests anyway
[10:46:05] <elukey>	 yeah
[10:48:17] <hashar>	 !log beta: cherry picking moritzm patch https://gerrit.wikimedia.org/r/#/c/310793/ "Also handle systemd in keyholder script" T144578
[10:48:20] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[10:49:38] <hashar>	 ran on deployment-tin and mira
[11:18:40] <elukey>	 I'll nuke mw02 after lunch!
[11:25:34] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[11:36:05] <wikibugs>	 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 10CirrusSearch, 06Discovery-Search: Make browsertests for CirrusSearch run on every submitted patch with proper CI infrastructure rather than a bot - https://phabricator.wikimedia.org/T98374#2640235 (10zeljkofilipin)
[11:44:59] <elukey>	 hashar: merged var uuid = require('cassandra-uuid');
[11:45:02] <elukey>	 argh
[11:45:24] <elukey>	 I wanted to paste the link of the mw02 removal CR
[11:45:26] <elukey>	 anyhow
[11:45:27] <elukey>	 :)
[12:00:33] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:09:53] <hashar>	 elukey: do nuke mw02 :}
[12:10:17] <hashar>	 and make sure mw06 is on deployment-cache-text04.deployment-prep.eqiad.wmflabs  :D
[12:10:17] <elukey>	 go go go
[12:10:47] <elukey>	 hashar: I don't see the commit in the puppet master though
[12:10:49] <elukey>	 mmmm
[12:10:52] <hashar>	 ah
[12:10:55] <elukey>	 should I git pull --rebase?
[12:11:03] <hashar>	 I cherry picked a patch for moritz related to the keyholder
[12:11:21] <hashar>	 or maybe you got it merged in puppet.git and rebase got rid of it ?
[12:11:22] <hashar>	 (there is an auto rebase)
[12:12:04] <hashar>	 elukey: maybe I have dropped your cherry pick by mistake :(
[12:12:20] <elukey>	 hashar: a git pull --rebase could help right?
[12:17:14] <wikibugs>	 06Release-Engineering-Team, 07Wikimedia-log-errors: Error: Couldn't find trailer dictionary - https://phabricator.wikimedia.org/T145772#2640326 (10hashar)
[12:18:34] <wikibugs>	 06Release-Engineering-Team, 07Wikimedia-log-errors: Error: Couldn't find trailer dictionary - https://phabricator.wikimedia.org/T145772#2640338 (10hashar)
[12:19:12] <elukey>	 ah no just inspected the repo on the puppet master, all updated
[12:19:28] <hashar>	 elukey: or cherry pick again :}
[12:20:10] <elukey>	 and I can see the change in deployment-cache-text04.deployment-prep.eqiad.wmflabs
[12:20:56] <elukey>	 !log terminate mediawiki02 to create mediawiki05
[12:20:59] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[12:23:59] <shinken-wm>	 PROBLEM - Host deployment-mediawiki02 is DOWN: CRITICAL - Host Unreachable (10.68.16.127)
[12:28:55] <wikibugs>	 10MediaWiki-Codesniffer, 13Patch-For-Review: Position of boolean operators inside an if condition - https://phabricator.wikimedia.org/T116561#2640346 (10Aashaka) Any updates/reviews on this?
[12:33:46] <elukey>	 !log added base::firewall, beta::deployaccess, mediawiki::conftool, role::mediawiki::appserver to mediawiki05
[12:33:50] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[12:34:20] <wikibugs>	 10MediaWiki-Codesniffer, 13Patch-For-Review: Position of boolean operators inside an if condition - https://phabricator.wikimedia.org/T116561#2640351 (10Aklapper) https://gerrit.wikimedia.org/r/#/c/279615/ needs a rebase (says "Cannot Merge")...
[12:35:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:37:07] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: Connection refused
[12:41:44] <elukey>	 puppet is running :)
[12:44:28] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[12:45:26] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:54:30] <hashar>	 awesome elukey 
[12:59:16] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2640395 (10hashar) >>! In T144578#2637243, @AlexMonk-WMF wrote: >>>! In T144578#2637063, @MoritzMuehlenhoff wrote: >> I've added a new deployment server mira02 >  > I...
[13:01:26] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[13:02:14] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki05 is OK: HTTP OK: HTTP/1.1 200 OK - 45234 bytes in 7.122 second response time
[13:06:45] <elukey>	 puppet completed, going to send the other code reviews
[13:11:26] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:20:11] <elukey>	 hashar: https://gerrit.wikimedia.org/r/#/c/310818/1
[13:20:12] <elukey>	 :)
[13:24:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[13:27:35] <hashar>	 elukey: why are you dropping mw06 ?
[13:27:42] <hashar>	 oh that is the security audit box
[13:27:42] <hashar>	  :D
[13:28:22] <hashar>	 elukey: looks good \O/
[13:29:46] <elukey>	 I tried to keep the previous mw01/02/03 config
[13:30:07] <hashar>	 yeah sounds good
[13:34:23] <elukey>	 hashar: good to rebase :)
[13:35:25] <elukey>	 so now if this goes fine we have left a jobrunner and a script runner (that may wait)
[13:35:33] <wikibugs>	 10Beta-Cluster-Infrastructure, 07Puppet: Puppet runs fails randomly on deployment-prep / beta cluster hosts - https://phabricator.wikimedia.org/T145631#2640466 (10hashar) Did a grep of `error` on all /var/log/puppet.log `root@deployment-salt02:~# salt -v '*' cmd.run 'grep -i error /var/log/puppet.log'`   **dep...
[13:35:40] <elukey>	 the jobrunner is single so I can't reall swap it
[13:36:36] <elukey>	 ah no sorry not script server, the video scaler
[13:39:50] <hashar>	 the jobrunner that might be just fine as well
[13:39:54] <hashar>	 the tmh01 box I am not sure
[13:40:06] <hashar>	 moving to Jessie is non trivial due the .deb packages installed there
[13:40:16] <hashar>	 such as ffmpeg  and whatever multimedia related packages there is there
[13:40:33] <hashar>	 then, it is probably a good smoke test for production switch
[13:45:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[13:46:43] <Krenair>	 ^ me
[13:50:04] <hashar>	 the mw web servers are being switched to jessie :}
[13:50:56] <hashar>	 elukey: I guess merge as needed
[13:51:46] <elukey>	 already done :)
[14:00:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[14:02:11] <hashar>	 elukey: if you get some spare time, I could use a bump of the Zuul .deb packages .  I have forgot to rebase a patch and some commands have a wrong shebang
[14:02:17] <hashar>	 merely a noop :}
[14:02:56] <elukey>	 sure, can you open a phab task?
[14:03:43] <hashar>	 elukey: https://phabricator.wikimedia.org/T103529#2632489  with links to people.wm.o :)
[14:03:53] <hashar>	 one for precise-wikimedia , other is for jessie-wikimedia
[14:04:03] <hashar>	 I will upgrade it on gallium / precise
[14:04:13] <hashar>	 scandium / Jessie would need your assistance (I lack root there)
[14:04:21] <elukey>	 sure
[14:05:07] <hashar>	 \O/
[14:08:43] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[14:11:51] <Krenair>	 ^ probably also me
[14:15:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:22:19] <hashar>	 Krenair: there is some varnish local hack on the puppet master I guess that is you ?
[14:22:26] <Krenair>	 yes
[14:22:28] <hashar>	 going to stash them, rebase  and reapply your hacks
[14:22:33] <Krenair>	 ok
[14:24:56] <hashar>	 ran puppet on deployment-tin  that adds deployment-mediawiki05   to dsh group mediawiki-installation 
[14:26:17] <wmf-insecte>	 Project beta-scap-eqiad build #120156: 04FAILURE in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120156/
[14:28:43] <Krenair>	 hashar, done?
[14:29:20] <elukey>	 hello people, I'd need some info about scap and multiple branches on tin
[14:29:50] <hashar>	 for god sake [[Hiera:Deployment-prep]] :(
[14:29:56] <hashar>	 too many different places
[14:30:03] <hashar>	 Krenair: yes
[14:30:13] <hashar>	 Krenair: only takes a few sec. Forgot to ping back sorry
[14:30:18] <hashar>	 elukey: I can help on multiple branches
[14:30:42] <elukey>	 hashar: thanks :)
[14:31:37] <elukey>	 hashar: long story short - we have two clusters for AQS, and I'd need to deploy only to one.. so what we did, with the help of Services, was to come up with one branch called new-aqs-cluster in both src and deploy repos
[14:32:02] <Krenair>	 hm
[14:32:05] <elukey>	 then on tin I switched branch and executed scap deploy --rev new-aqs-cluster --limit hostname
[14:32:17] <Krenair>	 this varnish code doesn't take hyphens in hostnames into account
[14:32:20] <elukey>	 but the src dir doesn't get populated
[14:32:20] <Krenair>	 guess prod has none
[14:32:44] <elukey>	 and restbase doesn't like it of course :D
[14:32:57] <elukey>	 the error seems to be something like "Unable to checkout '4d9f5160687f8dc3df3401453d2da5e861c19db7' in submodule path 'src'"
[14:33:43] <elukey>	 and on tin, in the aqs-deploy repo branch new-aqs-cluster, I can see only master in .git/modules/src/config
[14:34:25] <hashar>	 oh
[14:34:33] <elukey>	 anyhow, scap does sees the correct sha that I want to deploy (with --rev branchname)
[14:34:39] <elukey>	 but it doesn't get deployed
[14:34:58] <hashar>	 which path is it ?
[14:35:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:35:36] <elukey>	 hashar: on tin, /srv/deployment/analytics/aqs/deploy
[14:36:13] <wmf-insecte>	 Project beta-scap-eqiad build #120157: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120157/
[14:36:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[14:37:23] <hashar>	 elukey: so  git show src  
[14:37:29] <hashar>	 -Subproject commit 532ba2af7b15f52a618fb7342e0c2148bb8aa372
[14:37:29] <hashar>	 +Subproject commit 4d9f5160687f8dc3df3401453d2da5e861c19db7
[14:38:06] <hashar>	 and  /srv/deployment/analytics/aqs/deploy/src  (which is a submodule)  has the proper HEAD  -> 4d9f5160687f8dc3df3401453d2da5e861c19db7
[14:38:27] <hashar>	 $ git status 
[14:38:27] <hashar>	 HEAD detached at 4d9f516
[14:38:32] <hashar>	 elukey: so looks like it is fixed?
[14:39:22] <elukey>	 mmm I deployed the old master stuff
[14:39:31] <elukey>	 checking, thanks for the pointer :)
[14:39:37] * elukey is confused
[14:40:32] <hashar>	 maybe ottomata can help
[14:41:07] <elukey>	 hashar: no sorry ok, on tin I switched the branch and ran submodules init
[14:41:12] <elukey>	 so it is ok that shows it
[14:41:30] <elukey>	 the problem is on aqs1004
[14:41:36] <elukey>	 that basically doesn't checkout the src dir :(
[14:41:42] <elukey>	 will try and let you know
[14:43:43] <hashar>	 !sal
[14:43:43] <wm-bot>	 https://tools.wmflabs.org/sal/releng
[14:43:56] <wmf-insecte>	 Project beta-scap-eqiad build #120158: 04STILL FAILING in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120158/
[14:44:08] <hashar>	 !log T144006 sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mediawiki05.deployment-prep.eqiad.wmflabs
[14:44:12] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:45:00] <hashar>	 !log T144006 sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@mira02.deployment-prep.eqiad.wmflabs
[14:45:04] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:47:31] <elukey>	 hashar: so probably it was me not doing submodule init before the deployment
[14:47:40] <hashar>	 might be 
[14:47:45] <hashar>	 no idea how the submodule works with scap
[14:47:52] <hashar>	 guess you have to handle it manually
[14:53:07] <wmf-insecte>	 Project beta-scap-eqiad build #120159: 04STILL FAILING in 7 min 39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120159/
[14:54:09] <shinken-wm>	 PROBLEM - Free space - all mounts on mira02 is CRITICAL: CRITICAL: deployment-prep.mira02.diskspace.root.byte_percentfree (<44.44%)
[14:57:55] <wmf-insecte>	 Project beta-scap-eqiad build #120160: 04STILL FAILING in 3 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120160/
[14:59:03] <hashar>	 oh my 
[14:59:07] <hashar>	 "no space left on device"
[14:59:12] <hashar>	 gives me baremetal!
[14:59:51] <hashar>	 mira02: /dev/vda3        19G   19G     0 100% /
[15:00:05] <hashar>	 why in hell doesn't it have the extended disk :(
[15:02:23] <thcipriani>	 hrm, don't we normally have labs::lvm::srv 
[15:02:34] <Krenair>	 hashar, you know those weird "Puppet::Parser::AST::Resource failed with error ArgumentError: Invalid resource type" we sometimes find in puppet logs?
[15:02:40] <Krenair>	 I just got one while running puppet manually
[15:05:04] <hashar>	 !log T144006  Applying class role::labs::lvm::srv to mira02  (it is out of disk space :D )
[15:05:08] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[15:05:24] <shinken-wm>	 PROBLEM - Keyholder status on mira02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:06:59] <wmf-insecte>	 Project beta-scap-eqiad build #120161: 15ABORTED in 2 min 23 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120161/
[15:08:21] <hashar>	 !log T144006 Disabled Jenkins job  beta-scap-eqiad.  On mira02  rm -fR /srv/*  .  Applying puppet for role::labs::lvm::srv
[15:08:25] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[15:10:26] <hashar>	 !log beta: Applying puppet class role::prometheus::node_exporter to mira02 just like mira.  That is for godog
[15:10:30] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[15:11:28] <paladox>	 hashar hi, could we disable tests for refs/meta/* please?
[15:11:37] <paladox>	 It is causing it to fail here https://gerrit.wikimedia.org/r/#/c/310830/
[15:11:41] <paladox>	 and no way to overide it
[15:13:23] <hashar>	 paladox: there is task for it already
[15:13:31] <paladox>	 Oh
[15:13:45] <hashar>	 and if you are a repo owner, you can drop jenkins-bot v-1
[15:14:08] <hashar>	 that change could get a better message :}
[15:14:49] <paladox>	 Oh
[15:15:07] <hashar>	 reviewed :D
[15:15:10] <paladox>	 hashar i think i know how to 
[15:15:21] <paladox>	 Blacklisting refs is supported in zuul
[15:15:51] <hashar>	 paladox: the task is https://phabricator.wikimedia.org/T52389
[15:16:18] <paladox>	 thanks
[15:16:21] <wikibugs>	 10Continuous-Integration-Infrastructure: Zuul should not run jenkins-bot on changes for refs/meta/* - https://phabricator.wikimedia.org/T52389#2640832 (10hashar)
[15:16:23] <wikibugs>	 10Continuous-Integration-Infrastructure: Jenkins: jenkins-bot reports spurious merge error when pushing changes to one of the gerrit config branches - https://phabricator.wikimedia.org/T66678#2640834 (10hashar)
[15:18:47] <godog>	 hashar: thanks! appreciate it
[15:19:09] <shinken-wm>	 RECOVERY - Free space - all mounts on mira02 is OK: OK: All targets OK
[15:19:20] <hashar>	 godog: though I have no idea what prometheus is  / what it is used for or anything :}    Should attend the ops tech talk about it!
[15:20:10] <godog>	 hashar: hehe there's an outline at https://wikitech.wikimedia.org/wiki/Prometheus but yeah we'll have an ops session about it too
[15:22:04] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2635940 (10greg) >>! In T145611#2637678, @AlexMonk-WMF wrote: > From a deployment-prep admin PoV, I'd prefer the quota bump include the full VCPU count and RAM of the instances you'd like t...
[15:22:50] <wmf-insecte>	 Yippee, build fixed!
[15:22:51] <wmf-insecte>	 Project beta-scap-eqiad build #120162: 09FIXED in 8 min 22 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/120162/
[15:23:21] <hashar>	  moritzm ^^^^ mira02 is good again :}
[15:23:26] <moritzm>	 thanks
[15:23:27] <hashar>	 though it has some redis failing bah
[15:23:39] <hashar>	 returned 1: Job for redis-instance-tcp_6379.service failed. See 'systemctl status redis-instance-tcp_6379.service' and 'journalctl -xn' for details.
[15:23:41] <hashar>	 that never ends :(
[15:23:55] <paladox>	 I have no idea why grrrit-wm has started to restart every couple of hours again
[15:23:56] <moritzm>	 will merge https://gerrit.wikimedia.org/r/#/c/310827 once currently puppet maintenance is over
[15:23:59] <paladox>	 i thought i fixed that
[15:24:07] <paladox>	 guess it is using npm1 again instead of npm 2
[15:24:35] <moritzm>	 paladox: earlier the day Giuseppe posted a 18 patches patch set and gerrit-wm got flood-killed
[15:25:02] <paladox>	 moritzm oh, iv'e been looking into that, it was working a week ago, lastest 4 days
[15:25:18] <paladox>	 then i restarted it to pickup a change that was merged but i already deploy it
[15:25:45] <paladox>	 and then it started restating like this every few hours, i guess some how it managed to use npm1 instead of npm2
[15:25:54] <paladox>	 i actually workaround and got npm2 on tools
[15:25:55] <paladox>	 :)
[15:26:38] <hashar>	 I am fedud with unix
[15:26:47] <paladox>	 Whats that?
[15:26:50] <paladox>	 Is that a mac?
[15:26:59] <paladox>	 since the only unix i know is mac os
[15:27:56] <paladox>	 hashar do you know how to debug npm, since i got the file but it just says what command fails like npm run-script start failed
[15:28:02] <paladox>	 But dosent tell me why it fails
[15:28:45] <paladox>	 8 info grrrit@0.2.0 Failed to exec start script
[15:28:45] <paladox>	 9 error grrrit@0.2.0 start: `node src/relay.js`
[15:29:25] <paladox>	 Oh, i may have an idea
[15:29:37] <paladox>	 a workaround but ugly.
[15:29:41] <hashar>	 so redis complains on mira02
[15:29:45] <hashar>	 paladox: busy on something else
[15:29:48] <paladox>	 Ok
[15:31:04] <hashar>	 it is up but systemd complains about it 
[15:31:05] <hashar>	 bah
[15:31:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:33:20] <hashar>	 moritzm: systemd for redis does not work apparently.    systemctl start redis-instance-tcp_6379.service   does spawn the redis instance
[15:33:20] <hashar>	 but it never returns
[15:33:26] <hashar>	 so I guess puppet times out waiting and errors out
[15:33:39] <moritzm>	 let me check
[15:33:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:33:45] <hashar>	 on mira02
[15:34:14] <hashar>	  /var/log/redis/tcp_6379.log  has some good looking output
[15:34:24] <hashar>	 netstat shows it is listening  and  ps -u redis f   shows the proc
[15:35:37] <hashar>	 it is stuck in    loaded **activating** start    
[15:38:25] <hashar>	 elukey: still got time to upload the zuul packages on apt.wm.o ? https://phabricator.wikimedia.org/T103529#2632489 
[15:38:59] <elukey>	 hashar: tomorrow morning maybe? Would it be ok?
[15:39:14] <hashar>	 would prefer to get it down today
[15:39:23] <elukey>	 because I have a long meeting
[15:39:26] <hashar>	 can ask around if you are busy :}
[15:39:28] <elukey>	 if you can wait a bit
[15:39:33] <elukey>	 sure
[15:40:20] * hashar picks a net and goes to Butter"ops"fly hunting 
[15:44:33] <paladox>	 moritzm i think irc may be forcing grrrit-wm to quit, because looking at the running time it is 1d
[15:46:10] <moritzm>	 paladox: yeah, it's likely that it only got kicked from teh -ops channel, but that the bot keeps running
[15:46:51] <paladox>	 Yep, But if it was one of irc then it would have been blocked by now. Probaly maybe something is causing it to race, causing it to disconnect
[15:47:01] <paladox>	 IE when doing a ton of patches it may crash on irc
[15:47:22] <paladox>	 I have found some packages to update, i also did some minor ajustments so im going to try that now
[15:47:32] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 05Prometheus-metrics-monitoring: deploy prometheus node_exporter and server to deployment-prep - https://phabricator.wikimedia.org/T144502#2640898 (10fgiunchedi) It'd be nice to have `role::prometheus::node_exporter` applied blanket to all of deployment-prep, I se...
[15:48:15] <paladox>	 moritzm actually it seems to have a restart columb,
[15:48:16] <paladox>	 NAME                        READY     STATUS    RESTARTS   AGE
[15:48:16] <paladox>	 grrrit-wm-230500525-xxwlj   1/1       Running   3          1d
[15:48:22] <paladox>	 Something is causing it to restart
[15:49:07] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 05Prometheus-metrics-monitoring: deploy prometheus node_exporter and server to deployment-prep - https://phabricator.wikimedia.org/T144502#2601885 (10AlexMonk-WMF) pretty sure it is, yeah
[15:49:11] <paladox>	 moritzm would flood protection count?
[15:49:23] <paladox>	 I could disable it if no one object's to me doing that
[15:50:20] <moritzm>	 no sure, don't know anything about this, only noticed thar gerrit-wm was floodkicker earlier
[15:50:51] <paladox>	 Ok, i guess that's it
[15:51:50] <godog>	 Krenair: thanks! objections to me applying it to deployment-prep ? 
[15:52:04] <Krenair>	 godog, well, I don't know what it does really
[15:52:43] <godog>	 it installs prometheus-node-exporter debian package and a couple of setup tasks
[15:52:43] <Krenair>	 but apply it if you like
[15:52:48] <wikibugs>	 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Analytics, 10Analytics-EventLogging, 13Patch-For-Review: Use scap3 to deploy eventlogging/eventlogging - https://phabricator.wikimedia.org/T118772#2640913 (10Milimetric)
[15:53:13] <godog>	 ok let's see how that goes
[15:53:52] <godog>	 !log add role::prometheus::node_exporter to classes in hiera:deployment-prep T144502
[15:53:56] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[15:53:57] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[15:58:45] <godog>	 nice, that worked
[16:00:05] <godog>	 it'll likely fail on precise instances db0[12]
[16:00:26] <hashar>	 godog: they are going to be dished out by marxarelli|afk soonish
[16:00:32] <hashar>	 in favor of db03 db04  (Jessie)
[16:00:58] <godog>	 hashar: sweet, the last two precises afaics!
[16:01:38] <wikibugs>	 10Browser-Tests-Infrastructure, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 15User-zeljkofilipin: CentralNotice: Intermittent unexplained browser test failures - https://phabricator.wikimedia.org/T145718#2640992 (10zeljkofilipin)
[16:02:29] <hashar>	 godog: and we have been sprinting moving the beta mw app servers to Jessie \O/
[16:03:05] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review: mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser. - https://phabricator.wikimedia.org/T94577#2641005 (10zeljkofilipin) a:03zeljkofilipin I will take a look.
[16:03:39] <paladox>	 moritzm, i belive it may be because he was force merging
[16:03:41] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser. - https://phabricator.wikimedia.org/T94577#2641007 (10zeljkofilipin)
[16:03:51] <grrrit-wm>	 (03Abandoned) 10Hashar: (WIP) Demo to dump browser and selenium logs (WIP) [selenium] - 10https://gerrit.wikimedia.org/r/310583 (https://phabricator.wikimedia.org/T94577) (owner: 10Hashar)
[16:04:20] <paladox>	 moritzm caused by https://phabricator.wikimedia.org/diffusion/TGRT/browse/master/src/preprocess.js;74e812746f2ce846863d34d040e9b0d8f95a8a61$145
[16:04:23] <paladox>	 But not really
[16:04:24] <paladox>	 sure
[16:05:21] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07HHVM: Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2641039 (10MoritzMuehlenhoff) I also built trebuchet-trigger for jessie and uploaded it to apt.wikimedia.org
[16:07:01] <wikibugs>	 10Browser-Tests-Infrastructure, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 15User-zeljkofilipin: CentralNotice: Intermittent unexplained browser test failures - https://phabricator.wikimedia.org/T145718#2641048 (10zeljkofilipin) p:05Triage>03Normal
[16:09:28] <icinga-wm>	 PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:11:04] <wikibugs>	 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Wikidata: Run Wikibase browser tests on gerrit triggered with keyword - https://phabricator.wikimedia.org/T145190#2641073 (10zeljkofilipin) @hashar: what do you think?
[16:12:34] <wikibugs>	 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2641083 (10zeljkofilipin) Is this really in progress? I thought you are no longer working on it @hashar.
[16:14:05] <shinken-wm>	 PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[16:14:17] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 07Tracking: [EPIC] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#2641105 (10hashar)
[16:14:19] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#2641106 (10hashar)
[16:14:21] <wikibugs>	 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2641102 (10hashar) 05Open>03stalled a:05hashar>03None mwext-mw-selenium is stuck to the permanent T...
[16:15:14] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-extensions-Examples, 07Documentation, and 5 others: Improve documentation around running/writing (with lots of examples) browser tests - https://phabricator.wikimedia.org/T108108#2641111 (10zeljkofilipin)
[16:16:44] <hashar>	 godog: puppet indeed crash on Precise due to prometheus . Then it is just deployment-db1 and deployment-db2 so not a big deal
[16:18:03] <hashar>	 !log prometheus enabled on all beta cluster instance.  Does not support Precise hence puppet will fail on the last two Precise instances deployment-db1 and deployment-db2  until they are migrated to Jessie   T138778
[16:18:07] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:18:08] <shinken-wm>	 PROBLEM - Puppet run on deployment-db2 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[16:18:15] <hashar>	 marxarelli|afk: puppet fails on db1/db2  but that should be a non issue for the service :}
[16:19:04] <shinken-wm>	 PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[16:19:50] <hashar>	 deployment-zotero01:~# lsb_release 
[16:19:50] <hashar>	 No LSB modules are available.
[16:19:52] <hashar>	 ...
[16:20:29] <hashar>	 Error: Could not create group prometheus-node-exporter: Execution of '/usr/sbin/groupadd prometheus-node-exporter' returned 10: groupadd: failure while writing changes to /etc/group
[16:23:36] <wikibugs>	 10Browser-Tests-Infrastructure, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 15User-zeljkofilipin: CentralNotice: Intermittent unexplained browser test failures - https://phabricator.wikimedia.org/T145718#2641165 (10AndyRussG)
[16:23:56] <wikibugs>	 10Beta-Cluster-Infrastructure, 05Prometheus-metrics-monitoring: Prometheus puppet manifest fail on Trusty instance deployment-zotero1 groupadd: failure while writing changes to /etc/group - https://phabricator.wikimedia.org/T145793#2641171 (10hashar)
[16:24:06] <wikibugs>	 10Browser-Tests-Infrastructure, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 15User-zeljkofilipin: CentralNotice: Intermittent unexplained browser test failures - https://phabricator.wikimedia.org/T145718#2639099 (10AndyRussG) @zeljkofilipin Thanks!! :)
[16:24:09] <hashar>	 filled the group add issue of prometheus as https://phabricator.wikimedia.org/T145793
[16:26:05] <godog>	 hashar: thanks I'll take a look, and remove the class in half an hour when I'm sure puppet has ran everywhere to avoid false positives like db boxes (cc marxarelli|afk)
[16:27:11] <wikibugs>	 03Scap3: Local config deploys should use the target's current version - https://phabricator.wikimedia.org/T145373#2641191 (10thcipriani) p:05Triage>03Normal One plan of attack here is to make `.git/DEPLOY_HEAD` an actual symbolic ref and keep a history of deployments as git objects...somewhere. Either on a l...
[16:27:14] <hashar>	 godog: the good news is that I have confirmed beta only has two Precise instances left \O/
[16:27:20] <icinga-wm>	 RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[16:27:35] <godog>	 hehe yeah that's good news indeed
[16:29:33] <grrrit-wm>	 (03PS3) 10Zfilipin: [ArticleFeedbackv5] Remove the rake test [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T145792) (owner: 10Paladox)
[16:30:06] <grrrit-wm>	 (03PS4) 10Zfilipin: [ArticleFeedbackv5] Remove the rake test [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T145792) (owner: 10Paladox)
[16:30:15] <paladox>	 hashar im wondering should i use console.log instead of logging.error
[16:30:30] <paladox>	 I have no idea what the difference is and if console will tell us more info on the errors
[16:31:20] <hashar>	 paladox: I have exactly ZERO knowledge about Javascript beside document.getElementById('foo'
[16:31:28] <paladox>	 Oh
[16:31:45] <paladox>	 Oh wait
[16:31:50] <paladox>	 sorry logging.error is from a package
[16:31:55] <paladox>	 just saw that, sorry
[16:33:57] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:34:07] <shinken-wm>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[16:36:58] <wikibugs>	 10Continuous-Integration-Config, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: [Dev] Fix periodic tests - https://phabricator.wikimedia.org/T139137#2641255 (10hashar) I think this is now complete ? Or is there anything else to do?
[16:37:16] <wikibugs>	 10Beta-Cluster-Infrastructure, 05Prometheus-metrics-monitoring: Prometheus puppet manifest fail on Trusty instance deployment-zotero1 groupadd: failure while writing changes to /etc/group - https://phabricator.wikimedia.org/T145793#2641283 (10fgiunchedi) a:03fgiunchedi I remember seeing this before in {T1444...
[16:39:49] <wikibugs>	 10Continuous-Integration-Config, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: [Dev] Fix periodic tests - https://phabricator.wikimedia.org/T139137#2641303 (10Niedzielski) I'm guessing this is done. @Mholloway? (I've actually broken the job again working on T133183 but I'll get it sorted in that ticke...
[16:42:33] <wikibugs>	 10Beta-Cluster-Infrastructure, 05Prometheus-metrics-monitoring: Prometheus puppet manifest fail on Trusty instance deployment-zotero1 groupadd: failure while writing changes to /etc/group - https://phabricator.wikimedia.org/T145793#2641313 (10hashar) Ah that instances runs `3.13.0-83-generic` but `3.13.0.95.10...
[16:45:31] <godog>	 !log install xenial kernel on deployment-zotero01 and reboot T145793
[16:45:35] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[16:49:34] <wikibugs>	 10Beta-Cluster-Infrastructure, 05Prometheus-metrics-monitoring: Prometheus puppet manifest fail on Trusty instance deployment-zotero1 groupadd: failure while writing changes to /etc/group - https://phabricator.wikimedia.org/T145793#2641327 (10fgiunchedi) 05Open>03Resolved fixed!  ``` filippo@deployment-zot...
[16:51:16] <wikibugs>	 10Continuous-Integration-Infrastructure: Run zuul-clear-refs.py daily on all our repositories to reclaim Zuul references - https://phabricator.wikimedia.org/T103528#2641332 (10akosiaris)
[16:51:18] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Packaging, 13Patch-For-Review, 07Zuul: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#2641331 (10akosiaris) 05Open>03Resolved
[16:52:32] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Zuul: Get the Zuul debian packaging to pass -sa so orig.tar.gz is always in the .changes file - https://phabricator.wikimedia.org/T145797#2641337 (10hashar)
[16:55:01] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Packaging, 13Patch-For-Review, 07Zuul: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#2641356 (10hashar) Packages have been uploaded to apt.wikimedia.org \O/
[16:57:31] <wikibugs>	 10Continuous-Integration-Infrastructure: Run zuul-clear-refs.py daily on all our repositories to reclaim Zuul references - https://phabricator.wikimedia.org/T103528#2641357 (10hashar) So we have the packages on apt.wikimedia.org with the zuul-clear-refs.py script.  What I have noticed though is that when it garb...
[16:57:59] <wikibugs>	 10Continuous-Integration-Config, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: [Dev] Fix periodic tests - https://phabricator.wikimedia.org/T139137#2641358 (10Mholloway) 05Open>03Resolved Done!
[16:58:30] <wikibugs>	 10Continuous-Integration-Infrastructure: Run zuul-clear-refs.py daily on all our repositories to reclaim Zuul references - https://phabricator.wikimedia.org/T103528#2641360 (10hashar) Example stacktrace: ``` 2016-09-15 11:20:36,855 ERROR zuul.Merger: Unable to reset repo <zuul.merger.merger.Repo object at 0x7fe0...
[16:58:56] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Zuul: Run zuul-clear-refs.py daily on all our repositories to reclaim Zuul references - https://phabricator.wikimedia.org/T103528#2641362 (10hashar)
[16:59:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:59:57] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Upstream: Zuul repositories have too many refs causing slow updates - https://phabricator.wikimedia.org/T70481#2641382 (10hashar) Our .deb package is up-to-date and include the zuul-clear-refs.py utility.  It has a race condition though which I have detailed in T103528
[17:00:07] <wikibugs>	 10Continuous-Integration-Infrastructure, 07Upstream, 07Zuul: Zuul repositories have too many refs causing slow updates - https://phabricator.wikimedia.org/T70481#2641384 (10hashar)
[17:00:43] <hashar>	 godog: you got the new kernel right ? :)
[17:00:55] <godog>	 yeah
[17:01:50] <hashar>	 godog: awesome :}
[17:02:18] <hashar>	 and all puppet are happy ( http://shinken.wmflabs.org/problems?search=deployment )
[17:02:34] <hashar>	 godog: I can't wait to see some prometheus magic to happen for beta
[17:03:38] <hashar>	 *waves*
[17:03:52] <godog>	 heheh
[17:19:10] <wikibugs>	 06Release-Engineering-Team, 15User-greg: Create agenda outline for 2016 RelEng team offsite - https://phabricator.wikimedia.org/T138437#2400369 (10ksmith) Draft thinking (as I understand it):  * Future of CI (1.5 units) * Team yearly retrospective (1 unit) * Work process discussion/alignment (1 unit) * Teambui...
[17:54:05] <shinken-wm>	 RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:08:10] <shinken-wm>	 RECOVERY - Puppet run on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:15:08] <shinken-wm>	 PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[18:39:25] <paladox>	 hashar hi, im looking into https://phabricator.wikimedia.org/T145797 and may have a solution for you
[18:39:37] <paladox>	 What about doing 
[18:39:37] <paladox>	 builder = /usr/bin/git-pbuilder -sa
[18:55:06] <shinken-wm>	 RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:10:26] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2641828 (10hashar) [19:09:21] <logmsgbot> !log hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.19
[19:24:19] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2641877 (10hashar)
[19:24:35] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[19:24:57] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[19:28:06] <wikibugs>	 06Release-Engineering-Team, 15User-greg: Create agenda outline for 2016 RelEng team offsite - https://phabricator.wikimedia.org/T138437#2641912 (10ksmith) @greg: Let me know how much of the detailed planning you want me involved with. From my perspective, you're still the lead on that.
[19:49:55] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2641971 (10hashar)
[20:04:35] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:05:00] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:10:26] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2642040 (10hashar)
[20:14:06] <twentyafterfour>	 wooo! https://phabricator.wikimedia.org/harbormaster/build/1399/
[20:15:00] <twentyafterfour>	 first green scap build in a while
[20:15:37] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:15:41] <shinken-wm>	 PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:16:14] <grrrit-wm>	 (03PS2) 10Chad: Add CREDITS, including xZise and myself [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303752 (owner: 10Legoktm)
[20:27:17] <shinken-wm>	 PROBLEM - Puppet run on integration-puppetmaster is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:31:43] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1004 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:31:44] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:32:33] <shinken-wm>	 PROBLEM - Puppet run on integration-publisher is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:33:22] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:33:30] <paladox>	 Puppet fails ^^
[20:34:47] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2642091 (10hashar) I have further pushed...
[20:35:26] <shinken-wm>	 PROBLEM - Puppet run on integration-saltmaster is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:35:48] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:36:08] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1018 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:37:10] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:37:42] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:38:53] <grrrit-wm>	 (03PS1) 10Paladox: Pass option -sa to git-pbuilder [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/310959 (https://phabricator.wikimedia.org/T145797) 
[20:38:55] <paladox>	 hashar ^^ ?
[20:41:49] <grrrit-wm>	 (03CR) 10Paladox: "recheck" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/310959 (https://phabricator.wikimedia.org/T145797) (owner: 10Paladox)
[20:42:22] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1002 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:42:38] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1012 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:43:43] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-precise-1011 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:44:01] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:45:31] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:45:35] <shinken-wm>	 PROBLEM - Puppet run on zuul-dev-jessie is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:47:51] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:50:20] <shinken-wm>	 PROBLEM - Puppet run on castor is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:50:38] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:50:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[20:51:14] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1016 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:52:10] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1005 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:52:34] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:53:25] <grrrit-wm>	 (03CR) 10Florianschmidtwelzow: [C: 031] [CookieWarning] Add Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/310561 (owner: 10Paladox)
[20:55:46] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-android is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:55:50] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1004 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:55:54] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1017 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:55:56] <hashar>	 oh my god
[20:56:12] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1012 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:56:54] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1013 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:57:23] <hashar>	 easy 
[21:02:16] <hashar>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: /var/lib/puppet/lib/hiera/httpcache.rb:21: duplicate optional argument name
[21:02:49] <shinken-wm>	 PROBLEM - Puppet run on deployment-imagescaler01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:10:50] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:11:39] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "Found a potentially better setting." (031 comment) [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/310959 (https://phabricator.wikimedia.org/T145797) (owner: 10Paladox)
[21:12:13] <hashar>	 poked yuvi about it
[21:12:16] <grrrit-wm>	 (03PS2) 10Paladox: Pass option -sa to git-pbuilder [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/310959 (https://phabricator.wikimedia.org/T145797) 
[21:12:44] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:15:22] <shinken-wm>	 RECOVERY - Puppet run on integration-saltmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[21:16:08] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1018 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:16:45] <grrrit-wm>	 (03CR) 10Chad: [C: 032] Add CREDITS, including xZise and myself [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303752 (owner: 10Legoktm)
[21:16:59] <paladox>	 hashar yay that worked https://integration.wikimedia.org/ci/job/debian-glue-non-voting/240/artifact/zuul_2.5.0-8-gcbc7f62-wmf3precise1+0~20160915211226.240+precise+wikimedia~1.gbpb1b6f5_amd64.changes/*view*/
[21:17:10] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:17:15] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add CREDITS, including xZise and myself [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303752 (owner: 10Legoktm)
[21:18:42] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:20:15] <grrrit-wm>	 (03PS1) 10Mattflaschen: Have PageTriage depend on WikiLove [integration/config] - 10https://gerrit.wikimedia.org/r/311024 (https://phabricator.wikimedia.org/T145798) 
[21:20:33] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:22:23] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-precise-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:22:37] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-precise-1012 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:24:01] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:25:33] <shinken-wm>	 RECOVERY - Puppet run on zuul-dev-jessie is OK: OK: Less than 1.00% above the threshold [0.0]
[21:27:31] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:27:51] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:30:22] <shinken-wm>	 RECOVERY - Puppet run on castor is OK: OK: Less than 1.00% above the threshold [0.0]
[21:30:35] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2641971 (10Addshore) From the logs   ```...
[21:30:54] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1017 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:31:12] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1012 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:31:12] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1016 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:31:54] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1013 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:32:08] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1005 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:35:26] <shinken-wm>	 RECOVERY - Host deployment-parsoid05 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms
[21:35:48] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-android is OK: OK: Less than 1.00% above the threshold [0.0]
[21:35:50] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1004 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:36:44] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:37:16] <shinken-wm>	 RECOVERY - Puppet run on integration-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[21:37:34] <shinken-wm>	 RECOVERY - Puppet run on integration-publisher is OK: OK: Less than 1.00% above the threshold [0.0]
[21:37:37] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2642200 (10Addshore) One thing that kind...
[21:38:16] <shinken-wm>	 PROBLEM - Host deployment-parsoid05 is DOWN: CRITICAL - Host Unreachable (10.68.16.120)
[21:38:22] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:41:43] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:51:35] <grrrit-wm>	 (03PS1) 10Paladox: Ignore refs/meta/config refs from jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/311032 (https://phabricator.wikimedia.org/T52389) 
[21:52:15] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Ignore refs/meta/config refs from jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/311032 (https://phabricator.wikimedia.org/T52389) (owner: 10Paladox)
[21:55:48] <grrrit-wm>	 (03PS2) 10Paladox: Ignore refs/meta/config refs from jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/311032 (https://phabricator.wikimedia.org/T52389) 
[21:56:53] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Ignore refs/meta/config refs from jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/311032 (https://phabricator.wikimedia.org/T52389) (owner: 10Paladox)
[21:58:14] <shinken-wm>	 PROBLEM - Puppet run on integration-puppetmaster is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[22:00:55] <grrrit-wm>	 (03PS3) 10Paladox: Ignore refs/meta/config refs from jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/311032 (https://phabricator.wikimedia.org/T52389) 
[22:01:58] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Ignore refs/meta/config refs from jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/311032 (https://phabricator.wikimedia.org/T52389) (owner: 10Paladox)
[22:05:34] <grrrit-wm>	 (03PS4) 10Paladox: Ignore refs/meta/config refs from jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/311032 (https://phabricator.wikimedia.org/T52389) 
[22:06:01] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[22:08:17] <shinken-wm>	 RECOVERY - Puppet run on integration-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[22:11:24] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] Have PageTriage depend on WikiLove [integration/config] - 10https://gerrit.wikimedia.org/r/311024 (https://phabricator.wikimedia.org/T145798) (owner: 10Mattflaschen)
[22:18:44] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2642290 (10Addshore) It looks like "Faile...
[22:19:39] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 05Release: Wikidata at 1.28.0-wmf.19 no more replicate to wikis (replag raise / dispatch stop) - https://phabricator.wikimedia.org/T145819#2642295 (10Legoktm) GlobalRename insertin...
[22:25:29] <paladox>	 legoktm twentyafterfour hi, it seems that mysql is down on https://integration.wikimedia.org/ci/job/mediawiki-phpunit-php53/624/console
[22:25:37] <paladox>	 integration-slave-precise-1011
[22:25:56] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[22:28:19] <legoktm>	 oh
[22:28:21] <legoktm>	 hmm
[22:28:24] <legoktm>	 lemme see
[22:28:43] <legoktm>	 I suspect it died when all of the trusty mysql's died the other day
[22:28:53] <legoktm>	 but precise jobs rarely run
[22:29:20] <legoktm>	 yep all down
[22:29:28] <legoktm>	 integration-slave-precise-1011.integration.eqiad.wmflabs:
[22:29:28] <legoktm>	     mysql stop/waiting
[22:29:28] <legoktm>	 integration-slave-precise-1002.integration.eqiad.wmflabs:
[22:29:28] <legoktm>	     mysql stop/waiting
[22:29:29] <legoktm>	 integration-slave-precise-1012.integration.eqiad.wmflabs:
[22:29:32] <legoktm>	     mysql stop/waiting
[22:29:46] <legoktm>	 !log sudo salt '*precise*' cmd.run 'service mysql start', all mysql's are down
[22:29:52] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[22:30:17] <greg-g>	 legoktm: gracias
[22:30:35] <legoktm>	 np
[22:31:27] <paladox>	 oh thanks
[22:31:31] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[22:46:02] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:55:13] <shinken-wm>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[23:00:59] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:06:30] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:15:42] <Pchelolo>	 Does anybody know how to become an admin on beta labs wiki? I need to be able to delete pages to test something
[23:30:15] <shinken-wm>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[23:32:04] <legoktm>	 Pchelolo: ssh in and use createAndPromote.php :P