[00:55:15] <wikibugs>	 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2758960 (10Paladox) Oh @greg yes please for the test instance. Just needs labs to create the project for this test robot and to copy the original bot to this instance too :)
[01:18:21] <Reedy>	 Seems we have a phab spammmet
[03:07:44] <shinken-wm>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[03:08:10] <shinken-wm>	 RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[03:08:34] <shinken-wm>	 RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[05:34:56] <wikibugs>	 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759181 (10greg) Of what? Again, please use the names of things you want a test instance of. I'm still confused on what you need. You haven't listed anything yet other than "test instance".  I *think* your last se...
[07:23:02] <wikibugs>	 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759232 (10Paladox) Oh, I will just be duplicating grrrit-wm on the instance so it should use the same gerrit account. But as it will be a test I will start it and test then stop it since we doint need two bots. A...
[07:28:26] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759234 (10Paladox)
[07:28:35] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2755213 (10Paladox) 05stalled>03Open
[07:33:57] <wikibugs>	 10MediaWiki-Releasing, 10Timeless, 10Vector, 10Wikimedia-Developer-Summit (2017): Replacing Vector as the default MediaWiki skin - https://phabricator.wikimedia.org/T149636#2759237 (10Paladox) As some extensions Wikipedia uses hardcore support for vector. Will will need to update those extensions to also c...
[07:43:33] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[07:44:25] <wikibugs>	 03Scap3, 15User-mobrovac: Smart-merge checks for different environments - https://phabricator.wikimedia.org/T149668#2759242 (10mobrovac)
[07:45:59] <wikibugs>	 03Scap3, 15User-mobrovac: Smart-merge checks for different environments - https://phabricator.wikimedia.org/T149668#2759256 (10mobrovac)
[07:57:03] <wikibugs>	 10MediaWiki-Releasing, 10Timeless, 10Vector, 10Wikimedia-Developer-Summit (2017): Replacing Vector as the default MediaWiki skin - https://phabricator.wikimedia.org/T149636#2759281 (10Nemo_bis)
[08:23:32] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0]
[08:35:47] <wikibugs>	 10Continuous-Integration-Config, 10Tool-Labs-tools-stewardbots, 13Patch-For-Review: Implement jenkins tests on labs/tools/stewardbots - https://phabricator.wikimedia.org/T128503#2759303 (10MarcoAurelio) Maybe we shouldn't be using the `mediawiki` queue. Is there a `labs` queue? (Sometimes the mediawiki queue...
[08:36:39] <wikibugs>	 10Continuous-Integration-Config, 10Tool-Labs-tools-stewardbots, 13Patch-For-Review: Implement jenkins tests on labs/tools/stewardbots - https://phabricator.wikimedia.org/T128503#2759304 (10MarcoAurelio) I've merged the above change. It always fails on tox, but I guess it's because the bot code is old.
[09:19:18] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Mobile-Content-Service, 10RESTBase, 06Services: Set up MCS in BetaCluster - https://phabricator.wikimedia.org/T149671#2759340 (10mobrovac)
[09:20:01] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Mobile-Content-Service, 10RESTBase, 06Services, 15User-mobrovac: Set up MCS in BetaCluster - https://phabricator.wikimedia.org/T149671#2759340 (10mobrovac)
[09:51:33] <shinken-wm>	 PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[10:18:51] <wikibugs>	 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2759431 (10Peachey88)
[12:18:09] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759556 (10Aklapper) @Paladox: Why did you re-add the #Labs team project?
[12:19:38] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759569 (10Aklapper) (In general: Could people please be specific, avoid using only "this" and "it", actually be explicit what they're talking about, and take more time to phrase sentences that do not of...
[12:43:58] <shinken-wm>	 PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[12:48:38] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759613 (10Paladox) Sorry, I didn't see the labs tag had already been added and then removed. Reason why I added the tag is because labs needs to create this new labs project so I can create the instance.
[13:23:59] <shinken-wm>	 RECOVERY - Puppet run on repository is OK: OK: Less than 1.00% above the threshold [0.0]
[14:15:38] <wikibugs>	 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2759780 (10Zppix) >>! In T149609#2758578, @Dzahn wrote: > And how would that be triggered from gerrit, when the whole point of ne...
[14:21:32] <wikibugs>	 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2759801 (10Zppix)
[14:33:32] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2755213 (10Krenair) New project requests don't just need to be in #Labs, they also need to block T76375 - but I'm not sure this qualifies for a project of it's own. Why not just an extra tool, or even a...
[14:34:49] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759828 (10Zppix) >>! In T149529#2759181, @greg wrote: > Of what? Again, please use the names of things you want a test instance of. I'm still confused on what you need. You haven't listed anything yet o...
[14:46:34] <shinken-wm>	 PROBLEM - Puppet run on zuul-dev-jessie is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[14:53:11] <shinken-wm>	 PROBLEM - Host deployment-pdf02 is DOWN: CRITICAL - Host Unreachable (10.68.16.129)
[14:54:29] <shinken-wm>	 PROBLEM - Host deployment-conftool is DOWN: CRITICAL - Host Unreachable (10.68.20.30)
[15:07:36] <robh>	 So how does one disable a phabricator user? https://phabricator.wikimedia.org/p/OakleyAlways1/  this person is just adding crap about hacking him to every task that he touches....
[15:08:34] <Krenair>	 robh, or.i disabled that one for me
[15:08:43] <robh>	 awesome
[15:08:50] <Krenair>	 robh, you need an admin account to do it
[15:08:56] <Krenair>	 robh, but you don't have one
[15:09:01] <robh>	 yeah i would have used the command line to take over as root admin ;D
[15:09:11] <Krenair>	 unless you have the password to that @admin account
[15:09:12] * robh doesnt use admin on robh since he can become root
[15:09:14] <Krenair>	 which I know you've used before
[15:09:28] <robh>	 but its disabled, no worries... not a very effective troll =]
[15:09:29] <Krenair>	 in which case, where is that password? two other ops couldn't find it when asked
[15:09:40] <robh>	 its a command line flag to give you a login url
[15:09:56] <robh>	 its detailed in the phab module notes, i dont recall the exact command i read the README in the module when i need it =]
[15:09:58] <robh>	 lemme see
[15:10:09] <Krenair>	 ew. I assumed you had a password stored away in pwstore somewhere
[15:10:10] <Krenair>	 ok
[15:10:21] <robh>	 yeah, its not a password but a generated string with login details in the url
[15:10:28] <robh>	 changes its generation each time too, so not static
[15:10:30] <Krenair>	 might be easier to ask ops to bin/accountadmin to disable people
[15:11:21] <Krenair>	 which is probably the answer to your original question now that I think about it, maybe
[15:12:00] <shinken-wm>	 PROBLEM - SSH on deployment-sca02 is CRITICAL: Server answer
[15:12:20] <robh>	 hrmm, readme says <phabricator_root>/bin/auth recover <username> 
[15:12:24] <robh>	 that sounds familar
[15:15:07] <robh>	 nope, at least that file doesnt exist in /srv/phab/phabricator/bin 
[15:15:24] <robh>	 Krenair: if i figure it out i'll toss a file in pwstore that says how to do it for ease of reference =]
[15:15:59] <Krenair>	 to get the @admin credentials or to just disable an account from the CLI?
[15:16:05] <robh>	 get the admin account
[15:16:13] <Krenair>	 either way thanks
[15:19:04] <robh>	 got it to work to give me a recovery url string that logs me in.
[15:19:18] <robh>	 i'll go ahead and copy that info directly into pwstore now for future =]
[15:28:44] <robh>	 disabling is as easy as pulling up the user profile and managing it after being admin, so thats easy.  i love phabricator.
[15:30:41] <greg-g>	 thanks robh 
[15:30:47] <greg-g>	 (re getting that info into pwstore)
[15:31:04] <robh>	 it was always in the phab readme in the module but that may not be the first place someone looked.
[15:31:14] <robh>	 I only knew from being around for the first implementation of phab
[15:31:51] * greg-g nods
[15:37:26] <matanya>	 greg-g: we are doing 1.29 to group0 today ?
[15:39:49] <shinken-wm>	 PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:39:59] <matanya>	 asking cause i don't see any changelog for this release 
[15:40:09] <Reedy>	 matanya: No changelog if it's not branched yet, presumably
[15:40:30] <matanya>	 yeah, makes sense
[15:42:13] <thcipriani>	 usually branch around 17:00 UTC, post notes on mediawiki.org shortly thereafter.
[15:42:23] <greg-g>	 matanya: yup: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161101T1900
[15:42:44] <matanya>	 and now i see https://phabricator.wikimedia.org/T149059
[15:42:55] <matanya>	 thanks folks, sorry for the noise
[15:44:55] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760101 (10Paladox) @Krenair oh, I guess we could have it on tools. But we need the ability to perminatly stop this test bot since it will just duplicate things if it is left running.
[15:51:03] <shinken-wm>	 PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:55:12] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T149338#2760145 (10greg) a:03mmodell
[15:55:39] <wikibugs>	 10Gerrit, 06Release-Engineering-Team, 06Operations, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2760146 (10ArielGlenn) Preliminary findings from the logs for about 1 week:   - There were 238 full GCs or an averag...
[15:59:06] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760158 (10greg) I saw complaints last night of testing related to this making noise in our production Gerrit (and Phab?); what is your testing plan and how will you ensure that you are not disruptive in...
[16:02:46] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760170 (10Paladox) @greg we could test on either an instance. Or duplicate grrrit-wm on the tools labs so that the production one is always working, and we can test using the test bot under a different...
[16:07:41] <wikibugs>	 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760183 (10Paladox) Ive managed to create a instance on the git project. It is a small instance.
[16:34:55] <paladox>	 greg-g ive managed to setup a test instance using the project git ^^
[16:39:00] <greg-g>	 paladox: a test instance of what?
[16:39:11] <paladox>	 greg-g of grrrit-wm
[16:39:53] <greg-g>	 and what impact will it have on Gerrit or IRC?
[16:41:57] <paladox>	 Well i will not do any testing on the production bot. I will do it on the test bot and will do it under probaly a nick name like grrrit-wm-test
[16:42:02] <paladox>	 as not to confuse
[16:59:16] <wikibugs>	 10Gerrit: Delete node-rdkafka-stats repo from gerrit - https://phabricator.wikimedia.org/T149712#2760441 (10Ottomata)
[17:00:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Analytics, 10WikimediaPageViewInfo: Deploy WikimediaPageViewInfo extension to beta cluster - https://phabricator.wikimedia.org/T129602#2110544 (10Reedy) Don't want to do this until T148775 is fixed, but I'm happy to deploy this to beta and/or production afterwards as deemed a...
[18:02:09] <shinken-wm>	 PROBLEM - Host integration-slave-trusty-1001 is DOWN: CRITICAL - Host Unreachable (10.68.16.168)
[18:02:56] <shinken-wm>	 PROBLEM - Host integration-slave-trusty-1021 is DOWN: CRITICAL - Host Unreachable (10.68.17.118)
[18:03:45] <shinken-wm>	 PROBLEM - Host deployment-elastic06 is DOWN: CRITICAL - Host Unreachable (10.68.17.186)
[18:05:46] <grrrit-wm1>	 (03PS1) 10Arlolra: Build docs on jessie as well [integration/config] - 10https://gerrit.wikimedia.org/r/319114 
[18:07:51] <grrrit-wm1>	 (03CR) 10Arlolra: "Please test this before merging. This was in best effort." [integration/config] - 10https://gerrit.wikimedia.org/r/319114 (owner: 10Arlolra)
[18:56:28] <mutante>	 greg-g: #wikimedia-gerrit is a thing.. apparently .. just not a very popular thing
[18:56:31] <mutante>	 11:55 [Users #wikimedia-gerrit]
[18:56:33] <mutante>	 11:55 [@ChanServ] [ erikab] [ mutante] 
[18:56:36] <mutante>	 never knew
[18:56:50] <paladox>	 Me either found it after trying some channels that were empty
[18:56:53] <paladox>	 to test a bot
[19:08:17] <greg-g>	 well that's a dumb thing :)
[19:08:29] <greg-g>	 we don't need more channels for specific tools like that
[19:08:36] <paladox>	 Yep
[19:08:41] <greg-g>	 I'd leave it and let it go quietly into the sunset :)
[19:16:51] <paladox>	 ^^ that wasent me
[19:16:57] <wmf-insecte>	 Project beta-scap-eqiad build #126977: 04FAILURE in 2 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126977/
[19:26:48] <wmf-insecte>	 Project beta-scap-eqiad build #126978: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126978/
[19:37:10] <wmf-insecte>	 Yippee, build fixed!
[19:37:11] <wmf-insecte>	 Project beta-scap-eqiad build #126979: 09FIXED in 2 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126979/
[21:19:33] <wmf-insecte>	 Project beta-scap-eqiad build #126989: 04STILL FAILING in 4 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126989/
[21:19:58] <wikibugs>	 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2761484 (10Paladox) @Legoktm I've figured out how to do that. I found an npm library that can restart the whole script.
[21:27:22] <greg-g>	 paladox: renames should be straight-forward, simply a #site-request to make the relevant config changes
[21:27:28] <greg-g>	 legoktm: ty sir
[21:27:31] <paladox>	 Oh
[21:27:55] <paladox>	 greg-g ive created https://gerrit.wikimedia.org/r/#/c/319131/
[21:28:05] <wmf-insecte>	 Project beta-scap-eqiad build #126990: 04STILL FAILING in 3 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126990/
[21:29:24] <greg-g>	 paladox: it should really have an associated task for the config change in WMF production, not the task that's linked right now (the one about the procedural aspects of it)
[21:30:07] <paladox>	 greg-g oh is there a wiki page on this please? So i can create the task according to how the wiki page says this?
[21:30:57] <greg-g>	 paladox: https://www.mediawiki.org/wiki/Shell_requests
[21:31:03] <paladox>	 Thanks
[21:34:16] <wikibugs>	 10Gerrit, 06Repository-Admins, 13Patch-For-Review, 05WMF-deploy-2016-11-01_(1.29.0-wmf.1): Rename the Semantic Forms extension to "Page Forms" - https://phabricator.wikimedia.org/T147582#2697339 (10Paladox)
[21:34:20] <paladox>	 greg-g https://phabricator.wikimedia.org/T149749
[21:42:21] <grrrit-wm>	 (03PS2) 10Legoktm: Build Parsoid docs on jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319114 (owner: 10Arlolra)
[21:50:30] <wmf-insecte>	 Project beta-code-update-eqiad build #128322: 04FAILURE in 17 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/128322/
[21:56:54] <wmf-insecte>	 Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #204: 04FAILURE in 4 min 53 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/204/
[21:57:21] <wmf-insecte>	 Yippee, build fixed!
[21:57:21] <wmf-insecte>	 Project beta-code-update-eqiad build #128323: 09FIXED in 2 min 16 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/128323/
[21:58:09] <wmf-insecte>	 Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #194: 04FAILURE in 8.3 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/194/
[21:58:10] <wmf-insecte>	 Project selenium-PageTriage » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #194: 04FAILURE in 9.2 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/194/
[22:00:24] <wmf-insecte>	 Project beta-scap-eqiad build #126991: 04STILL FAILING in 3 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126991/
[22:05:55] <hashar>	 beta has some troubles apparently ^^^
[22:06:20] <greg-g>	 :/
[22:06:29] <Krenair>	 hm
[22:06:49] <wmf-insecte>	 Project beta-scap-eqiad build #126992: 04STILL FAILING in 1 min 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126992/
[22:07:01] <hashar>	 permission denied
[22:07:14] <hashar>	 I guess the keyholder is crazy on deployment-tin
[22:07:45] <mutante>	 probably still because all of labs is rebooting
[22:07:54] <Krenair>	 or I broke it with the puppetmaster move :/
[22:08:01] <hashar>	 yup
[22:08:10] <hashar>	 https://en.wikipedia.beta.wmflabs.org/w/api.php yields a varnish error
[22:08:21] <hashar>	 so maybe the mw app instances are rebooting
[22:08:29] <hashar>	 or the cache did not came back all fine
[22:10:04] <hashar>	 !log Armed keyholder on deployment-tin . Instance had 20 minutes uptime and apparently keyholder does not self arm
[22:10:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[22:10:29] <hashar>	 building on https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126993/console
[22:10:35] <Krenair>	 I already tried that hashar
[22:10:43] <Krenair>	 Oh
[22:10:51] <Krenair>	 Did you do it differently?
[22:11:03] <hashar>	 just did ssh to deployment-tin
[22:11:07] <hashar>	 sudo keyholder status
[22:11:16] <hashar>	 noticed keys are not loaded and thus:  sudo keyholder arm
[22:11:28] <Krenair>	 derp
[22:11:31] <Krenair>	 I armed on -mira
[22:11:36] <Krenair>	 then tried to use it on -tin
[22:12:12] <hashar>	 maybe we can make the jenkins job to always try to arm :D
[22:12:43] <Krenair>	 there's still something wrong on -mir
[22:12:44] <Krenair>	 mira
[22:12:52] <Krenair>	 krenair@deployment-mira:~$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mediawiki04
[22:12:52] <Krenair>	 Connection closed by 10.68.19.128
[22:13:06] <Krenair>	 krenair@deployment-tin:~$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mediawiki04
[22:13:06] <Krenair>	 Linux deployment-mediawiki04 4.4.0-2-amd64 #1 SMP Debian 4.4.2-3+wmf6 (2016-10-18) x86_64
[22:14:43] <hashar>	 who knows :(
[22:15:01] <hashar>	 the lame way would be:  run puppet ; reboot ; keyholder arm
[22:15:07] <wmf-insecte>	 Yippee, build fixed!
[22:15:08] <wmf-insecte>	 Project beta-scap-eqiad build #126993: 09FIXED in 4 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126993/
[22:15:31] <hashar>	 so at least scap works now
[22:15:43] <hashar>	 the text cache / mw app servers is to be figured out though ( https://en.wikipedia.beta.wmflabs.org/w/api.php yields a 503 )
[22:16:04] <hashar>	 but I am not working today and it is time to get to bed :/
[22:16:53] <Krenair>	 Nov  1 22:14:39 deployment-mediawiki04 sshd[5342]: pam_access(sshd:account): access denied for user `mwdeploy' from `deployment-mira.deployment-prep.eqiad.wmflabs'
[22:17:09] <Krenair>	 right, I'll look at that first
[22:17:24] <hashar>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: invalid byte sequence in US-ASCII at /etc/puppet/modules/role/manifests/cache/text.pp:1 on node deployment-cache-text04.deployment-prep.eqiad.wmflabs
[22:17:34] <hashar>	 that is puppet on deployment-cache-text04
[22:18:14] <hashar>	 iirc that would be due to a newer ruby version on the puppetmaster that is more strict about encoding
[22:18:41] * Krenair sighs
[22:18:42] <Krenair>	    13 FetchError   c Junk after gzip data
[22:18:49] <hashar>	 bah
[22:20:04] <wmf-insecte>	 Project beta-update-databases-eqiad build #12493: 04FAILURE in 3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12493/
[22:20:06] <Krenair>	 2016-11-01 22:19:55 [WBkVCwpEE4AAAANQjy4AAAAE] deployment-mediawiki04 enwiki 1.29.0-alpha exception ERROR: [WBkVCwpEE4AAAANQjy4AAAAE] /wiki/test   DBConnectionError from line 748 of /srv/mediawiki/php-master/includes/libs/rdbms/database/Database.php: Cannot access the database: Can't connect to MySQL server on '10.68.18.35' (111) (10.68.18.35) {"exception_id":"WBkVCwpEE4AAAANQjy4AAAAE"} 
[22:20:06] <wmf-insecte>	 Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #196: 04FAILURE in 5.9 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/196/
[22:20:17] <Krenair>	 -db04
[22:22:37] <Krenair>	 hashar, okay, started mysql on the -db hosts to fix that
[22:22:51] <hashar>	 magic
[22:23:39] <hashar>	 clicked the job that updates the db https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12494/console
[22:24:01] <hashar>	 api link is all back
[22:24:03] <hashar>	 Krenair: well done :)
[22:24:23] <wmf-insecte>	 Yippee, build fixed!
[22:24:23] <wmf-insecte>	 Project beta-update-databases-eqiad build #12494: 09FIXED in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12494/
[22:24:50] <Krenair>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::cache::text for deployment-cache-text04.deployment-prep.eqiad.wmflabs on node deployment-cache-text04.deployment-prep.eqiad.wmflabs
[22:24:52] <Krenair>	 seriously.
[22:25:22] <Krenair>	 oh, it's one of those silly temporary errors I think
[22:25:32] <wmf-insecte>	 Yippee, build fixed!
[22:25:33] <wmf-insecte>	 Project selenium-PageTriage » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #195: 09FIXED in 54 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/195/
[22:25:38] <hashar>	 retriggered https://integration.wikimedia.org/ci/job/selenium-PageTriage/ and https://integration.wikimedia.org/ci/job/selenium-Core/
[22:25:41] <wmf-insecte>	 Yippee, build fixed!
[22:25:42] <wmf-insecte>	 Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #195: 09FIXED in 1 min 3 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/195/
[22:25:43] <Krenair>	 wait... no
[22:30:27] <Krenair>	 hashar, did you change anything on that host?
[22:30:30] <Krenair>	 -cache-text04
[22:30:43] <wmf-insecte>	 Yippee, build fixed!
[22:30:44] <wmf-insecte>	 Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #205: 09FIXED in 6 min 12 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/205/
[22:32:01] <hashar>	 Krenair: nothing
[22:32:05] <hashar>	 havent touched it beside looking at puppet.log
[22:32:16] <hashar>	 maybe puppet has been broken for a while
[22:32:29] <hashar>	 at least starting the db on db4 fixed the jobs/beta web pages
[22:33:17] <hashar>	 (and apparently we have lost shinken.wmflabs.org / all monitoring)
[22:33:49] <Krenair>	 sigh
[22:33:56] <Krenair>	 I am losing track of all the broken things
[22:34:02] <Krenair>	 The last Puppet run was at Tue Nov  1 18:03:20 UTC 2016 (263 minutes ago). Puppet is disabled. reason not specified
[22:34:03] <Krenair>	 FML
[22:34:14] <hashar>	 bah
[22:34:20] <Krenair>	 (this is on shinken-01)
[22:34:35] <hashar>	 just give up / move to other more interesting things :D
[22:35:14] <Krenair>	 okay let's go back to beta
[22:35:15] <Krenair>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: invalid byte sequence in US-ASCII at /etc/puppet/modules/role/manifests/cache/text.pp:1 on node deployment-cache-text04.deployment-prep.eqiad.wmflabs
[22:35:22] <Krenair>	 this uses the deployment-puppetmaster puppetmaster
[22:35:50] <Krenair>	 root@deployment-puppetmaster:/var/lib/git/operations/puppet# head -n1 /etc/puppet/modules/role/manifests/cache/text.pp | xxd
[22:35:50] <Krenair>	 0000000: 636c 6173 7320 726f 6c65 3a3a 6361 6368  class role::cach
[22:35:50] <Krenair>	 0000010: 653a 3a74 6578 7428 0a                   e::text(.
[22:39:10] <Krenair>	 When I run puppet, I get a different error
[22:39:13] <Krenair>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::cache::text for deployment-cache-text04.deployment-prep.eqiad.wmflabs on node deployment-cache-text04.deployment-prep.eqiad.wmflabs
[22:39:15] <hashar>	 the puppet message is misleading
[22:39:35] <hashar>	 that is probably  ruby dieing out  when parsing the .pp as a ruby
[22:39:40] <hashar>	 and puppet assuming it is line 1
[22:40:18] <hashar>	 Krenair: example from the past https://phabricator.wikimedia.org/T86282
[22:40:49] <hashar>	 though that is for mw vagrant
[22:41:01] <hashar>	 but the trick is that ruby needs some appropriate locale to be passed
[22:41:11] <hashar>	 and whatever new puppetmaster setup we have is broken in that regard
[22:41:22] <hashar>	 worth filling it as a bug and reach out to ops list
[22:41:31] <hashar>	 some puppet guru will know
[22:41:46] <hashar>	 that is all. I am asleep now :)
[22:42:38] <Krenair>	 no
[22:42:42] <Krenair>	 this is the normal puppetmaster
[22:42:46] <Krenair>	 I have not moved this node over
[22:45:42] <hashar>	 Krenair: dont obsess too much about it though :)
[22:45:50] <hashar>	 have  a good day
[22:46:00] <tgr>	 any idea what happened here? https://integration.wikimedia.org/ci/job/selenium-CentralAuth/196/console
[22:46:11] <tgr>	 the selenium job failed with no output
[22:47:29] <Krenair>	 Think I got the damn thing
[22:47:53] <Krenair>	 Had python try to read the file, encode it, it told me what the invalid character was
[22:47:56] <Krenair>	 then I removed it
[22:48:00] <Krenair>	 -    # topic into�many JSON based kafka topics for further
[22:48:00] <Krenair>	 +    # topic into many JSON based kafka topics for further
[22:53:54] <shinken-wm>	 RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[22:54:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:54:08] <shinken-wm>	 RECOVERY - Puppet run on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:54:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:54:31] <shinken-wm>	 RECOVERY - Puppet run on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:54:35] <shinken-wm>	 RECOVERY - Puppet run on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:54:35] <shinken-wm>	 RECOVERY - Puppet run on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:54:53] <shinken-wm>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[22:55:18] <shinken-wm>	 RECOVERY - Puppet run on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:55:49] <greg-g>	 tgr: there was weirdness in beta recently
[22:56:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:56:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:56:08] <shinken-wm>	 RECOVERY - Puppet run on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0]
[22:57:18] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:58:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:58:19] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:58:23] <shinken-wm>	 RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0]
[22:59:49] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:24:00] <wikibugs>	 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2761798 (10Paladox) @Legoktm ok ive fixed everything https://gerrit.wikimedia.org/r/318976 now. It's ready to be merged but needs...
[23:37:21] <Krenair>	 tgr|away, greg-g: okay beta should have returned to normal levels of weirdness now
[23:40:39] <greg-g>	 :)
[23:45:48] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[23:50:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]