[00:08:04] <bd808>	 twentyafterfour: +1 to all the things you said about the train deploy process sucking. It needs to be rebuilt from the ground up and much slaying of strange sacred ideas.
[00:08:37] <bd808>	 My biggest contribution was making the giant checklist visible rather than buried in Sam's head
[00:09:42] <twentyafterfour>	 bd808: indeed... I figured out how to semi-automate most of it today but it was taking too long and I had to finish the deploy so I didn't have time to tie it all together
[00:10:07] <twentyafterfour>	 can deployments be simulated on beta cluster?
[00:10:21] <bd808>	 I think they could
[00:10:36] <bd808>	 we run scap there and it has the whole multiwiki setup
[00:10:57] <bd808>	 today it only runs a single branch
[00:11:19] <bd808>	 but that was something that greg-g wanted to change this quarter at one point
[00:11:50] <twentyafterfour>	 one big improvement would be simply to have the different branches cloned from a local copy instead of cloning fresh from the origin every time
[00:12:02] <twentyafterfour>	 git-new-workdir would probably do the trick
[00:12:30] <bd808>	 yeah. ^d started to look at that ~1 year ago and then never got to finish it
[00:12:56] <^d>	 It gets a little weird with the submodules
[00:13:01] <^d>	 But it should be possible in theory
[00:13:20] <twentyafterfour>	 to test the waters I added a remote pointing from the new branch (wmf18) to /srv/mediawiki-staging/php-1.25wmf17/   ... then I was able to merge in the security patches using git instead of reapplying .patch files from csteipp
[00:14:19] <twentyafterfour>	 the submodules are a separate issue ...but at least the patches to core could be handled without making a fresh clone of the entire (large) repo for each branch deploy
[00:14:41] <bd808>	 In Beta today, each "deploy" runs the beta-code-update-eqiad jenkins job which is an inplace update of all the staging things and then the beta-scap-eqiad job that just runs scap to send the staged copy out across the cluster
[00:14:47] <twentyafterfour>	 I mean, git-new-workdir just symlinks a bunch of stuff to a shared  .git
[00:15:14] <bd808>	 So you could practice things by stopping the update job and doing whatever you wanted before runnign scap
[00:15:46] <twentyafterfour>	 I want to make a simple dashboard that collects all the details about the current state of mediawiki-staging and summarizes it so that it's not so difficult to tell what's going on
[00:16:30] <bd808>	 that would be awesome
[00:17:24] <twentyafterfour>	 basically a conglomeration of git status / git log, git submodule summary, active mediawiki versions, and some kind of representation of the wikiversions.json pointers
[00:48:56] <bd808>	 marxarelli: I filed https://phabricator.wikimedia.org/T89917 as a placeholder for mw-vagrant install parties at both hackathons
[00:49:25] <bd808>	 feel free to reference it on your application to attend either or both
[00:49:35] <marxarelli>	 bd808: nice!
[00:49:46] * bd808 is not sure about this "community buddy" idea
[02:05:57] <James_F>	 bd808: Can it be Reedy?
[02:06:12] <bd808>	 I was wondering the same thing :)
[02:07:02] * James_F grins.
[03:30:06] <Krenair>	 ^d, have you seen those unexpected N4HPHP13DataBlockFullE fatals?
[03:30:38] <^d>	 I have not
[03:32:25] <Krenair>	 looks like they stopped just over 5 minutes ago
[03:33:15] <^d>	 Lob it in Phab if you'd like so we don't forget to follow up
[03:33:16] * ^d is walking out the door to dinner
[03:56:47] <wmf-insecte>	 Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #378: FAILURE in 7 min 45 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/378/
[04:54:40] <kart_>	 ^d: I've added to you for co-deployer for CX/cxserver (specially, need first time assistance for CX) :)
[05:02:25] <wmf-insecte>	 Yippee, build fixed!
[05:02:25] <wmf-insecte>	 Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #525: FIXED in 17 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/525/
[08:57:06] <wikibugs>	 3Deployment-Systems: l10nupdate user can't access scap shared ssh key causing nightly l10nupdate sync process to fail - https://phabricator.wikimedia.org/T76061#1049229 (10Nikerabbit) Still same error in the log for last run.
[09:03:12] <grrrit-wm>	 (03PS1) 10Adrian Lang: Ignore HEADLESS and KEEP_BROWSER_OPEN for phantomjs [selenium] - 10https://gerrit.wikimedia.org/r/191552 
[09:05:24] <grrrit-wm>	 (03CR) 10Adrian Lang: "Btw, how comes https://rubygems.org/gems/mediawiki_selenium/versions/0.4.2 is not in this repo?" [selenium] - 10https://gerrit.wikimedia.org/r/191552 (owner: 10Adrian Lang)
[09:07:17] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-eventlogging02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[09:18:55] <wikibugs>	 3Beta-Cluster: Don't throttle WMF office IP(s) for account creation - https://phabricator.wikimedia.org/T87841#1049247 (10hashar) 5Open>3Resolved Should be good now. Thanks @JohnLewis !
[09:18:56] <wikibugs>	 3Beta-Cluster: Account creation throttling too restrictive on Beta Cluster - https://phabricator.wikimedia.org/T87704#1049249 (10hashar)
[09:25:34] <wmf-insecte>	 Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » en,contintLabsSlave && UbuntuTrusty build #15: FAILURE in 18 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=en,label=contintLabsSlave%20&&%20UbuntuTrusty/15/
[09:41:48] <wikibugs>	 3Release-Engineering: Rethinking our deployment process - https://phabricator.wikimedia.org/T89945#1049275 (10mmodell) 3NEW a:3mmodell
[09:53:40] <wikibugs>	 3Release-Engineering: Rethinking our deployment process - https://phabricator.wikimedia.org/T89945#1049302 (10Aklapper) p:5Triage>3Normal
[09:53:56] <wikibugs>	 3Deployment-Systems, Release-Engineering: Rethinking our deployment process - https://phabricator.wikimedia.org/T89945#1049275 (10Aklapper)
[10:45:48] <grrrit-wm>	 (03PS1) 10Amire80: Remove failing ULS jobs: [integration/config] - 10https://gerrit.wikimedia.org/r/191566 
[10:59:27] <aharoni>	 zeljkof: Great success!
[10:59:28] <aharoni>	 https://integration.wikimedia.org/ci/view/BrowserTests/view/VisualEditor/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=ast,label=contintLabsSlave%20&&%20UbuntuTrusty/16/console
[10:59:29] <aharoni>	 All the uploads wrked.
[10:59:31] <aharoni>	 I checked https://commons.wikimedia.org/wiki/Special:Contributions/LanguageScreenshotBot
[10:59:32] <aharoni>	 uploads for ast appear to be correct.
[11:00:36] <zeljkof>	 aharoni: yeah! :)
[11:00:52] <aharoni>	 you were worried about passwords but everything just worked!
[11:01:03] <aharoni>	 I can now start doing the same for ContentTranslation!
[11:01:05] <aharoni>	 Excitement!
[11:02:41] <vikasyaligar>	 aharoni: I think we can create a command for upload ? something like bundle exec commons_upload ? 
[11:03:04] <vikasyaligar>	 aharoni: This will eliminate upload.rb 
[11:05:31] <wikibugs>	 3Deployment-Systems: HHVM lock-ups - https://phabricator.wikimedia.org/T89912#1049432 (10Aklapper) p:5Triage>3High
[11:22:58] <aharoni>	 vikasyaligar: patches welcome :)
[11:23:04] <aharoni>	 will it be in the gem?
[11:23:22] <vikasyaligar>	 aharoni: Yes ! 
[11:24:39] <aharoni>	 cool, pull request is welcome here: https://github.com/amire80/commons_upload
[11:25:49] <aharoni>	 vikasyaligar: I made you a collaborator in GitHub and rubygems for this gem, too.
[11:27:34] <vikasyaligar>	 aharoni: yup ! thank you :) 
[11:29:59] <kart_>	 aharoni: can that be used to send any screenshot to Commons?
[11:44:47] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "Already deleted apparently \O/" [integration/config] - 10https://gerrit.wikimedia.org/r/191566 (owner: 10Amire80)
[11:51:31] <grrrit-wm>	 (03Merged) 10jenkins-bot: Remove failing ULS jobs: [integration/config] - 10https://gerrit.wikimedia.org/r/191566 (owner: 10Amire80)
[11:54:30] <grrrit-wm>	 (03PS4) 10Hashar: VectorBeta depends on EventLogging [integration/config] - 10https://gerrit.wikimedia.org/r/191262 (owner: 10Mattflaschen)
[12:01:25] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "Jobs updated" [integration/config] - 10https://gerrit.wikimedia.org/r/191262 (owner: 10Mattflaschen)
[12:03:29] <aharoni>	 kart_: yes, pretty much.
[12:04:08] <aharoni>	 that's what I did with vikasyaligar and zeljko in last GSoC, and since then the three of us are slowly maintaining and developing it.
[12:04:57] <aharoni>	 kart_: until now it was all coupled to VisualEditor, but now we decoupled the generic screenshot capturing and uploading functionality, so it can be used by any MediaWiki component.
[12:05:02] <aharoni>	 My guess it that CX will be next.
[12:08:28] <grrrit-wm>	 (03Merged) 10jenkins-bot: VectorBeta depends on EventLogging [integration/config] - 10https://gerrit.wikimedia.org/r/191262 (owner: 10Mattflaschen)
[12:14:33] <kart_>	 aharoni: nice!
[12:37:35] <shinken-wm>	 PROBLEM - SSH on deployment-lucid-salt is CRITICAL: Connection refused  
[12:41:17] <grrrit-wm>	 (03PS4) 10Hashar: zuul: test 'recheck' behavior [integration/config] - 10https://gerrit.wikimedia.org/r/184967 
[12:41:23] <grrrit-wm>	 (03PS3) 10Hashar: zuul: test check/test behavior [integration/config] - 10https://gerrit.wikimedia.org/r/184968 
[14:39:56] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] Rebuild composer autoloader to support classmap-authoritative setting [integration/phpunit] - 10https://gerrit.wikimedia.org/r/188398 (https://phabricator.wikimedia.org/T85182) (owner: 10Legoktm)
[14:55:02] <zeljkof>	 aharoni: around?
[15:14:22] <zeljkof>	 aharoni: sent you e-mail
[15:14:38] <aharoni>	 zeljkof: got it :)
[15:14:42] <kart_>	 zeljkof: he is in meeting.
[15:14:48] <kart_>	 oh. here he is :)
[15:14:50] <zeljkof>	 kart_, aharoni :)
[15:23:18] <^d>	 Krenair: I'm still seeing those errors you mentioned last night
[15:23:49] <^d>	 I'm going to take this to #-core. Looks like an HHVM thing
[15:23:54] <Krenair>	 ok
[15:23:54] <grrrit-wm>	 (03CR) 10Hashar: "Hello Kartik, from my comment on T87607 we have to figure out how to get jenkins-debian-glue to build with a Trusty. We probably want to r" [integration/config] - 10https://gerrit.wikimedia.org/r/190708 (https://phabricator.wikimedia.org/T87607) (owner: 10KartikMistry)
[15:28:39] <wikibugs>	 3Continuous-Integration: debian-glue need multiple distributions support (add Ubuntu Trusty and Debian Jessie) - https://phabricator.wikimedia.org/T89959#1049819 (10hashar) 3NEW a:3hashar
[15:41:25] <grrrit-wm>	 (03PS1) 10Hashar: Remove '{name}-debbuild' (unused) [integration/config] - 10https://gerrit.wikimedia.org/r/191623 
[15:41:37] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Remove '{name}-debbuild' (unused) [integration/config] - 10https://gerrit.wikimedia.org/r/191623 (owner: 10Hashar)
[15:49:00] <grrrit-wm>	 (03Merged) 10jenkins-bot: Remove '{name}-debbuild' (unused) [integration/config] - 10https://gerrit.wikimedia.org/r/191623 (owner: 10Hashar)
[15:55:33] <legoktm>	 hi hashar! how would I go about deploying? https://gerrit.wikimedia.org/r/188398 (updating phpunit's composer autoloader)? will merging it update it everywhere?
[15:58:22] <bd808>	 I think a merge will update in labs slaves and then prod will need a trebuchet deploy
[16:03:01] <wikibugs>	 3Continuous-Integration, MediaWiki-extensions-WikibaseRepository, Wikidata: generate patch code coverage on gerrit patch-set upload for wikibase.git - https://phabricator.wikimedia.org/T88435#1049926 (10Lydia_Pintscher) p:5Triage>3Normal
[16:04:25] <hashar>	 legoktm: what bd808 said :-]
[16:04:41] <hashar>	 the integration/phpunit  is maintained by puppet git::clone on labs instance
[16:04:44] <hashar>	 and via git-deploy on prod
[16:04:47] <legoktm>	 are there docs somewhere on how to do the trebuchet deploy?
[16:05:00] <hashar>	 if trebuchet has doc yeah
[16:05:02] <hashar>	 ssh tin.eqiad.wmnet
[16:05:13] <hashar>	 cd /srv/deployment/integration/slave-scripts
[16:05:13] <hashar>	 git deploy start
[16:05:13] <hashar>	 git pull
[16:05:14] <hashar>	 <review>
[16:05:20] <hashar>	 git deploy sync
[16:05:20] <hashar>	 r
[16:05:20] <hashar>	 r
[16:05:20] <hashar>	 r
[16:05:21] <hashar>	 r
[16:05:21] <hashar>	 y
[16:05:22] <hashar>	 r
[16:05:22] <hashar>	 r
[16:05:23] <hashar>	 r
[16:05:23] <hashar>	 r
[16:05:24] <hashar>	 y
[16:05:34] <hashar>	 r  == retry
[16:05:37] <hashar>	 y = yes / continue
[16:06:28] <bd808>	 a sad but true representation of a typical trebuchet install
[16:07:28] <bd808>	 legoktm: you want to do that in integration/phpunit rather than slave-scripts for this one though
[16:07:38] <legoktm>	 right :P
[16:07:50] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Rebuild composer autoloader to support classmap-authoritative setting [integration/phpunit] - 10https://gerrit.wikimedia.org/r/188398 (https://phabricator.wikimedia.org/T85182) (owner: 10Legoktm)
[16:08:43] <hashar_>	 re
[16:09:08] <bd808>	 I typically do `git fetch; git log --stat HEAD..origin/master; git rebase origin/master` rather than `git pull`
[16:09:08] <hashar_>	 bd808: with bit torrent we would know when all leechers finished fetching
[16:09:13] <hashar_>	 then salt the switch to the new ver
[16:09:51] <bd808>	 Or we could get rid of salt, switch to mcollective and have a process that is actually scriptable
[16:10:16] <hashar_>	 not sure whether ops will like mcollective
[16:10:29] <hashar_>	 does ansible provide such orchestration system ?
[16:10:34] <bd808>	 some would, some wouldn't
[16:11:43] <grrrit-wm>	 (03CR) 10QEDK: [C: 031] Rebuild composer autoloader to support classmap-authoritative setting [integration/phpunit] - 10https://gerrit.wikimedia.org/r/188398 (https://phabricator.wikimedia.org/T85182) (owner: 10Legoktm)
[16:12:21] <^d>	 bd808: My crazy idea was to torrent the .git directory about
[16:12:53] <^d>	 Then git-deploy could just to the checkout on all hosts that are up to date
[16:13:54] <legoktm>	 who's QEDK?
[16:13:57] <bd808>	 The big trick is ensuring that all hosts are either up-to-date or excluded from participating in whatever the repo is providing. Running mixed version networks is a huge pain for most things
[16:14:33] <^d>	 With torrenting we'd know when clients were done
[16:14:44] <^d>	 (plus they'd only have to ever fetch the delta)
[16:15:06] <^d>	 Long as we don't do destructive repacks ;-)
[16:15:15] <grrrit-wm>	 (03Merged) 10jenkins-bot: Rebuild composer autoloader to support classmap-authoritative setting [integration/phpunit] - 10https://gerrit.wikimedia.org/r/188398 (https://phabricator.wikimedia.org/T85182) (owner: 10Legoktm)
[16:15:18] <bd808>	 legoktm: no idea. random looking gmail address and no patches anywhere
[16:16:36] <legoktm>	 Missing the following configuration item: user.name
[16:16:36] <legoktm>	 Missing the following configuration item: user.email
[16:16:36] <legoktm>	 Please add the missing configuration items via git config or in the .trigger file
[16:17:29] <^d>	 bd808: Anyway, it may all be very crazy
[16:17:56] <bd808>	 legoktm: that sounds weird. getting that on tin?
[16:18:06] <grrrit-wm>	 (03PS1) 10Hashar: Switch debian-glue to Trusty instance [integration/config] - 10https://gerrit.wikimedia.org/r/191626 (https://phabricator.wikimedia.org/T89959) 
[16:18:12] <legoktm>	 yeah, I created a ~/.gitconfig and it went away
[16:18:23] <bd808>	 ^d: crazy ideas are often the best. I want twentyafterfour to just tell us the right way to do it. :)
[16:18:27] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Switch debian-glue to Trusty instance [integration/config] - 10https://gerrit.wikimedia.org/r/191626 (https://phabricator.wikimedia.org/T89959) (owner: 10Hashar)
[16:18:37] <legoktm>	 2/2 minions completed fetch
[16:18:37] <legoktm>	 Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): 
[16:18:43] <legoktm>	 is it r or y?
[16:18:50] <bd808>	 2/2 == y
[16:19:13] <^d>	 Anything less than a full fetch you should at least check [d] first before continuing
[16:19:21] <bd808>	 *nod*
[16:19:27] <^d>	 (that's the biggest gripe I have with it, that stale shit doesn't get purged)
[16:19:35] <legoktm>	 checkout was 2/2 so I said y, and then "Deployment finished."
[16:19:53] <bd808>	 it often takes several trebuchet timeout cycles for all the minions to get the command from the salt master
[16:20:07] <^d>	 *several hundred thousand
[16:20:14] <bd808>	 legoktm: perfect
[16:20:45] * bd808 gives legoktm an "I survived using Trebuchet" sticker
[16:20:45] <legoktm>	 !log updated phpunit for https://gerrit.wikimedia.org/r/188398
[16:20:48] <qa-morebots>	 Logged the message, Master
[16:20:49] <legoktm>	 :>
[16:21:50] <^d>	 bd808: I used trebuchet and all I got was a goddamn sticker
[16:22:03] <bd808>	 legoktm: so how many other composer installs to we need to touch to get the classmap authoritative ClassLoader.php everywhere?
[16:22:21] <legoktm>	 theoretically that one should be good enough for mediawiki-config...
[16:22:44] <legoktm>	 https://gerrit.wikimedia.org/r/#/c/188393/
[16:23:04] <legoktm>	 there are a bunch of 18hr old beta-mediawiki-config-update-eqiad  jobs queued btw
[16:23:12] <bd808>	 blah
[16:23:19] <bd808>	 stuck I bet
[16:23:34] <hashar_>	 ah
[16:23:42] <hashar_>	 yeah deployment-bastion is deadlocked
[16:23:46] <hashar_>	 gotta restart Jenkins :-(
[16:23:53] <bd808>	 stupid damn database update job locked it all again
[16:24:11] <bd808>	 can I try to shake it loose without a restart?
[16:24:19] <hashar_>	 sure
[16:24:26] <bd808>	 sometimes it works to disable the slave and kill all the jobs
[16:24:32] <bd808>	 sometimes
[16:24:44] <hashar_>	 from some trace I took, it seems it is the Gearman plugin that considers the executors on deployment-bastion slaves are unavailable
[16:24:55] <bd808>	 right
[16:25:16] <bd808>	 toggling the slave off and on can shake that loose most of the time
[16:25:34] <bd808>	 but you also have to kill all the stacked up jobs
[16:25:45] * bd808 tries
[16:26:39] <bd808>	 !log disconnected deployment-bastion.eqiad from jenkins
[16:26:44] <qa-morebots>	 Logged the message, Master
[16:27:23] <bd808>	 !log killed all pending jobs for deployment-bastion.eqiad
[16:27:24] <^d>	 Running out of log space again or something else?
[16:27:27] <qa-morebots>	 Logged the message, Master
[16:28:15] <bd808>	 !log disconnected deployment-bastion.eqiad from jenkins
[16:28:18] <qa-morebots>	 Logged the message, Master
[16:28:47] <bd808>	 !log reconnected deployment-bastion.eqiad to jenkins
[16:28:49] <qa-morebots>	 Logged the message, Master
[16:29:01] <hashar_>	 oh
[16:29:53] <bd808>	 bah. still getting the waiting for executor message
[16:29:56] <bd808>	 one more round
[16:30:46] <hashar_>	 the plugin has a bunch of logs at https://integration.wikimedia.org/ci/log/Plugins%20-%20Gearman/
[16:30:51] <bd808>	 !log disconnected and reconnected deployment-bastion.eqiad again
[16:30:52] <hashar_>	 Feb 19, 2015 4:28:14 PM FINE hudson.plugins.gearman.NodeAvailabilityMonitor unlock
[16:30:53] <hashar_>	 AvailabilityMonitor unlock request: null
[16:30:54] <qa-morebots>	 Logged the message, Master
[16:30:58] <hashar_>	 the last 'null' should be a hostname
[16:31:21] <hashar_>	 AvailabilityMonitor lock request: deployment-bastion.eqiad_exec-2
[16:31:22] <hashar_>	 oh
[16:31:33] <hashar_>	 maybe that fixed it
[16:31:51] <bd808>	 nope :(
[16:32:10] <bd808>	   #45142 (pending—Waiting for next available executor on deployment-bastion.eqiad)
[16:32:15] <hashar_>	 :-(
[16:32:16] <bd808>	 https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/
[16:32:27] <hashar_>	 gotta grab my daughter back home
[16:32:32] <bd808>	 it works sometimes but I've had to toggle it a bunch of times
[16:32:34] <hashar_>	 only fix is to restart Jenkins :-/
[16:32:58] * hashar_ waves
[16:33:08] <vikasyaligar>	 zeljkof: aharoni: What do you think about https://github.com/amire80/commons_upload/pull/2 ? 
[16:33:34] <zeljkof>	 vikasyaligar: sorry, in a meeting
[16:33:41] <bd808>	 Another thing we could try is toggling the gearman plugin off and back on
[16:33:44] <vikasyaligar>	 zeljkof: OK :) 
[16:34:00] * bd808 does that
[16:34:23] <bd808>	 bah that requires a restart too
[16:35:16] <bd808>	 ^d: have you done a full jenkins restart before? I haven't here. Not sure I know all the right bits
[16:36:05] <^d>	 You just restart the service right?
[16:36:32] <^d>	 Ah, https://integration.wikimedia.org/ci/manage
[16:36:33] <bd808>	 yeah I think that's it
[16:36:42] <^d>	 "Prepare for shutdown"
[16:36:47] <^d>	 Probably good to do that first
[16:36:51] <bd808>	 *nod*
[16:37:11] <^d>	 "Stops executing new builds, so that the system can be eventually shut down safely."
[16:37:20] <^d>	 Yeah, I'd do that, then kick the service
[16:37:24] <bd808>	 +1
[16:41:27] <greg-g>	 https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Restart_all_of_Jenkins
[16:41:48] <greg-g>	 (g'morning btw)
[16:42:23] <^d>	 Oh that's a nice shiny button
[16:42:26] <^d>	 I like shiny buttons
[16:52:20] <greg-g>	 (we need more nice shiny buttons)
[16:58:39] <^d>	 bd808: You know the default pull strategy on deployment repos is rebase, right? Your fancy rebase commands are just extra keystrokes ;-)
[16:59:08] <^d>	 Too many people just do `git pull` without thinking, so I made sure that does the right thing!
[16:59:53] <bd808>	 explicit is better than implicit and I can peek before I apply :P
[17:01:09] <^d>	 Oh fetching and checking the log is always good :)
[17:02:23] <^d>	 pulling without rebase leads to messy ugly merge commits you don't see in the gerrit-hosted version of the repo
[17:02:29] <^d>	 And hides security patches, which is bad bad bad
[17:18:05] <wikibugs>	 3Continuous-Integration: Why are the language screenshot tests stalled by so long? - https://phabricator.wikimedia.org/T89178#1050121 (10greg) p:5Triage>3Low
[17:18:18] <grrrit-wm>	 (03CR) 10Phuedx: "Are there any requirements for the folder structure inside of src/docs? Note well that running `make docs` generates docs in the js, php, " [integration/config] - 10https://gerrit.wikimedia.org/r/191046 (https://phabricator.wikimedia.org/T74794) (owner: 10Hashar)
[17:25:32] <grrrit-wm>	 (03Merged) 10jenkins-bot: Switch debian-glue to Trusty instance [integration/config] - 10https://gerrit.wikimedia.org/r/191626 (https://phabricator.wikimedia.org/T89959) (owner: 10Hashar)
[17:55:57] <wikibugs>	 3Continuous-Integration, Wikimedia-Fundraising-CiviCRM: CI for Civi: provision and run tests under Jenkins/Zuul - https://phabricator.wikimedia.org/T86103#1050236 (10awight)
[17:58:31] <James_F>	 cscott: Can you kick Jenkins (or tell me who to ask; Krinkle|detached is |detached and hashar is absent)? It's got 328 items in the queue and counting, with nothing executing.
[17:58:40] <^d>	 I already did kick it
[17:58:48] <James_F>	 Ah. Darn.
[17:59:05] <^d>	 And I saw at least one job go through post-restart
[17:59:14] * James_F sighs.
[17:59:33] <^d>	 All the slaves are connected
[18:18:12] <vikasyaligar>	 chrismcmahon: Here you go: https://gerrit.wikimedia.org/r/#/c/191655
[18:19:18] <chrismcmahon>	 thanks vikasyaligar, I merged it
[18:19:30] <vikasyaligar>	 chrismcmahon: thank you :) 
[18:22:13] <^d>	 greg-g: Jenkins is really hurting, need more help.
[18:23:04] <greg-g>	  integration-slave1007  (offline)
[18:23:04] <greg-g>	  integration-slave1008  (offline)
[18:23:04] <greg-g>	  integration-slave1009  (offline)
[18:23:06] <greg-g>	 why?
[18:23:54] <^d>	 7 and 8 were because low disk space, automatic
[18:23:55] <^d>	 That's ok
[18:24:06] <^d>	 9 says d/c by Krinkle|detached for debugging
[18:24:25] <^d>	 I'm not worried about that. I'm worried because no jobs except beta-scap-update seem to be making it into an executor
[18:24:32] <^d>	 zuul's backed up with like >300 jobs
[18:25:20] <bd808>	 blerg
[18:25:32] <bd808>	 how did that break?
[18:26:06] <greg-g>	 is it worth kicking jenkins the "hardcore way"?
[18:26:17] <greg-g>	 https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Restart_all_of_Jenkins
[18:26:38] <bd808>	 lets' check zuul's gearman
[18:26:53] <^d>	 Ah yes, let's check gearman
[18:26:58] <bd808>	 https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Debugging
[18:29:03] <bd808>	 ffs I did it
[18:29:11] <bd808>	 geraman plugin is disable in jenkins
[18:29:29] <bd808>	 jenkins will need to restart again
[18:29:44] * ^d gives bd808 another of his shirts
[18:29:47] <^d>	 :)
[18:29:53] <bd808>	 !log restarting jenkins because I messed up and disabled gearman plugin earlier
[18:29:58] <qa-morebots>	 Logged the message, Master
[18:30:10] <chrismcmahon>	 less work for me
[18:30:28] <bd808>	 can I nuke the running sauce tests?
[18:30:38] <chrismcmahon>	 bd808: nuke away
[18:31:06] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #324: ABORTED in 24 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/324/
[18:31:13] <chrismcmahon>	 kill 'em all
[18:31:16] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #478: ABORTED in 21 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/478/
[18:32:02] <bd808>	 Please wait while Jenkins is getting ready to work...
[18:32:18] * greg-g brews him some coffee
[18:32:45] <bd808>	 somebody should take away my rights to do things while I'm pretending to be a PM
[18:32:56] <bd808>	 bam!
[18:33:03] <bd808>	 things are flowing in from gearman now
[18:33:13] <bd808>	 all of the things
[18:33:28] * bd808 expects some slaves to fall on their faces
[18:34:57] <chrismcmahon>	 bd808: actually, for future reference, you can kill browser test builds any time you want. the daytime builds are less valuable than the overnight ones, and even those aren't sacred
[18:35:25] <James_F>	 bd808: Aha, thanks.
[18:35:40] <chrismcmahon>	 and Thu and Fri builds less valuable than Mon/Tue/Wed
[18:36:13] <chrismcmahon>	 just don't stop them forever :-)
[18:36:32] <bd808>	 chrismcmahon: you could go on vacation though!
[18:37:03] <chrismcmahon>	 heh
[18:37:35] <bd808>	 !log cleaned up mess in /tmp on integration-slave1007
[18:37:39] <qa-morebots>	 Logged the message, Master
[18:38:53] <bd808>	 !log brought integration-slave1007 back online
[18:38:55] <qa-morebots>	 Logged the message, Master
[18:41:56] <bd808>	 !log cleaned up mess in /tmp on integration-slave1008
[18:41:59] <qa-morebots>	 Logged the message, Master
[18:44:03] <bd808>	 ^d: you're not going to believe this crap. (pending—Waiting for next available executor on deployment-bastion.eqiad)
[18:44:13] <bd808>	 fffffffffuuuuuuuuu
[18:44:18] <^d>	 ...
[18:45:11] <James_F>	 bd808: Again?
[18:45:23] <bd808>	 yeah.
[18:45:29] * James_F sighs.
[18:45:52] <bd808>	 https://crayfisher.files.wordpress.com/2012/07/double_facepalm_tng1.jpg
[18:47:11] <^d>	 Why does it keep disconnecting from dp-bastion?
[18:47:56] <bd808>	 There is some gearman lockup that only seems to strike that box. It's a bug in the jenkins gearman plugin that deadlocks
[18:48:16] <^d>	 And we have to kick the master?
[18:48:20] <bd808>	 hashar has stacktraces in a phab task somewhere
[18:48:54] <^d>	 In the meantime, can we bring 7 and 8 back up? Some jobs are half-stuck on them
[18:49:00] <bd808>	 fixes are shake dp-bastion violently or restart jenkins yet again
[18:49:14] <bd808>	 I brought them up and the toggled right back to down
[18:49:20] <^d>	 boo
[18:49:22] <^d>	 silly jenkins
[18:49:33] <bd808>	 df looks good but they havent told jenkins apparently
[18:49:48] <bd808>	 "go home jerkins you're drunk"
[18:50:22] <^d>	 Should've !logged that
[18:50:56] <^d>	 Well, we can kick -bastion again
[18:51:05] <bd808>	 the problem isn't there
[18:51:08] <^d>	 I'd rather not kick jenkins until the zuul queue goes down
[18:51:11] <bd808>	 yeah
[18:51:25] <bd808>	 I just did the jenkins detach/reattach dance
[18:51:50] * greg-g sighs
[18:53:12] <bd808>	 looks like 07 and 08 are staying alive now
[18:54:44] <^d>	 queue's down to 100
[18:55:18] <greg-g>	 https://graphite.wikimedia.org/render/?from=-8hours&height=180&width=400&target=alias(color(zuul.geard.queue.running.value,%27blue%27),%27Running%27)&target=alias(color(zuul.geard.queue.waiting.value,%27red%27),%27Waiting%27)&target=alias(color(zuul.geard.queue.total.value,%27888888%27),%27Total%27)&title=Zuul%20Geard%20job%20queue%20(8%20hours)&_=1424372094169
[18:55:23] <greg-g>	 fun graph
[18:57:20] <bd808>	 other than active tests the build queue is just sauce labs stuff again
[18:58:58] <bd808>	 !log took deployment-bastion jenkins connection offline and online 5 times; gearman plugin still stuck
[18:59:01] <qa-morebots>	 Logged the message, Master
[19:00:31] <bd808>	 So there's this -- https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Gearman_deadlock
[19:01:24] <bd808>	 !log toggling gearman plugin in jenkins admin console
[19:01:28] <qa-morebots>	 Logged the message, Master
[19:02:48] <bd808>	 !log VICTORY! deployment-bastion jenkins slave unstuck
[19:02:54] <qa-morebots>	 Logged the message, Master
[19:03:40] <bd808>	 .... for one f'ing job?
[19:03:50] <bd808>	 no there it goes again
[19:04:15] <^d>	 https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/45147/console wfm?
[19:04:52] <^d>	 And now https://integration.wikimedia.org/ci/job/beta-scap-eqiad/42319/console
[19:04:54] <bd808>	 yeah. I killed one that had been waiting for 5+ minutes then the queue started to move again
[19:04:55] <greg-g>	 "there it goes again" I think was a positive not negative statement
[19:05:38] <bd808>	 now we just wait for the singel file line for gate-and-submit
[19:06:10] <greg-g>	 no cutting please
[19:06:38] <bd808>	 I was never here
[19:06:42] * bd808 slinks away
[19:32:44] <wikibugs>	 3MediaWiki-extensions-GWToolset, Multimedia, Beta-Cluster: Creating directory with special characters - https://phabricator.wikimedia.org/T75725#1050759 (10Bawolff) I can't reproduce this (locally using the username Léna, and on beta commons using username Léna2).  For example, I successfully uploaded http://com...
[19:45:30] <Krinkle>	 !log Destroying integration-slave1009 and re-imaging
[19:45:35] <qa-morebots>	 Logged the message, Master
[19:53:23] <grrrit-wm>	 (03PS1) 10Hashar: debian-glue can now use a different distribution [integration/config] - 10https://gerrit.wikimedia.org/r/191676 (https://phabricator.wikimedia.org/T89959) 
[19:53:41] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] debian-glue can now use a different distribution [integration/config] - 10https://gerrit.wikimedia.org/r/191676 (https://phabricator.wikimedia.org/T89959) (owner: 10Hashar)
[19:55:57] <wikibugs>	 3Continuous-Integration: debian-glue need multiple distributions support (add Ubuntu Trusty and Debian Jessie) - https://phabricator.wikimedia.org/T89959#1050959 (10hashar)
[19:56:59] <wikibugs>	 3Continuous-Integration: debian-glue need multiple distributions support (add Ubuntu Trusty and Debian Jessie) - https://phabricator.wikimedia.org/T89959#1049819 (10hashar) To change the distribution, we just have to `export distribution=precise` which build-and-provide-package recognize.  '{name}-debian-glue' l...
[20:01:08] <grrrit-wm>	 (03Merged) 10jenkins-bot: debian-glue can now use a different distribution [integration/config] - 10https://gerrit.wikimedia.org/r/191676 (https://phabricator.wikimedia.org/T89959) (owner: 10Hashar)
[20:04:55] <grrrit-wm>	 (03PS4) 10Hashar: Enable jenkins for operations/debs/contenttranslation [integration/config] - 10https://gerrit.wikimedia.org/r/190708 (https://phabricator.wikimedia.org/T87607) (owner: 10KartikMistry)
[20:06:02] <wikibugs>	 3Continuous-Integration: debian-glue need multiple distributions support (add Ubuntu Trusty and Debian Jessie) - https://phabricator.wikimedia.org/T89959#1050984 (10hashar) 5Open>3Resolved Haven't tried Jessie, but Precise/Trusty should work.  All debian-glue jobs are using `$distribution=trusty`.
[20:06:03] <wikibugs>	 3Continuous-Integration, MediaWiki-extensions-ContentTranslation, ContentTranslation-Deployments: Enable Debian CI tests on all Apertium packages - https://phabricator.wikimedia.org/T87607#1050986 (10hashar)
[20:06:59] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "Rebased and integrated changes made for T89959: debian-glue need multiple distributions support (add Ubuntu Trusty and Debian Jessie)" [integration/config] - 10https://gerrit.wikimedia.org/r/190708 (https://phabricator.wikimedia.org/T87607) (owner: 10KartikMistry)
[20:13:42] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #478: FAILURE in 30 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/478/
[20:13:56] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable jenkins for operations/debs/contenttranslation [integration/config] - 10https://gerrit.wikimedia.org/r/190708 (https://phabricator.wikimedia.org/T87607) (owner: 10KartikMistry)
[20:17:28] <wmf-insecte>	 Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #477: FAILURE in 21 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/477/
[20:26:20] <wikibugs>	 3Quality-Assurance, VisualEditor, VisualEditor-MediaWiki: Update language_screenshot test - https://phabricator.wikimedia.org/T89370#1051039 (10Cmcmahon) 5Open>3Resolved
[20:37:22] <wikibugs>	 3Continuous-Integration, MediaWiki-extensions-ContentTranslation, ContentTranslation-Deployments: Enable Debian CI tests on all Apertium packages - https://phabricator.wikimedia.org/T87607#1051076 (10hashar) I have migrated the debian-glue jobs to Trusty instances and additionally made them to explicitly `export...
[20:54:54] <MatmaRex>	 hey, is there a way for me to trigger full V+2 tests for a commit created by a non-whitelisted contributor?
[20:55:18] <legoktm>	 MatmaRex: comment "recheck"
[20:55:31] <MatmaRex>	 legoktm: will that run full tests? not just the V+1 ones?
[20:55:40] <legoktm>	 yup
[20:58:19] <MatmaRex>	 thanks legoktm
[20:58:24] <legoktm>	 :)
[21:03:12] <grrrit-wm>	 (03PS1) 10Hashar: Experimental integration-zuul-debian-glue job [integration/config] - 10https://gerrit.wikimedia.org/r/191701 (https://phabricator.wikimedia.org/T48552) 
[21:03:25] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Experimental integration-zuul-debian-glue job [integration/config] - 10https://gerrit.wikimedia.org/r/191701 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar)
[21:10:20] <grrrit-wm>	 (03Merged) 10jenkins-bot: Experimental integration-zuul-debian-glue job [integration/config] - 10https://gerrit.wikimedia.org/r/191701 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar)
[21:15:57] <grrrit-wm>	 (03PS1) 10Hashar: git-buildpackage config [integration/zuul] - 10https://gerrit.wikimedia.org/r/191765 
[21:16:48] <grrrit-wm>	 (03CR) 10Hashar: "check experimental" [integration/zuul] - 10https://gerrit.wikimedia.org/r/191765 (owner: 10Hashar)
[21:21:30] <csteipp>	 Are phabricator and jenkins inaccessible for anyone else, or is it just me?
[21:22:07] <Krenair>	 phabricator is fine
[21:22:19] <Krenair>	 jenkins seems ok as well
[21:22:25] <greg-g>	 gerrit isn't loading for me
[21:22:31] <greg-g>	 phab is fine though
[21:22:43] <csteipp>	 gerrit is fine for me, but 208.80.154.241 isn't reachable...
[21:23:21] <greg-g>	 (right when chad loses his ssh keys...)
[21:23:48] <csteipp>	 I'm guessing it's a comcast issue
[21:24:21] <greg-g>	 I've been having issues with other sites as well
[21:24:24] <greg-g>	 so probably yeah
[21:25:38] * greg-g loads sonic.net
[21:26:13] <grrrit-wm>	 (03Abandoned) 10Hashar: git-buildpackage config [integration/zuul] - 10https://gerrit.wikimedia.org/r/191765 (owner: 10Hashar)
[21:27:24] <greg-g>	 (bah, only up to 20mbs for sonic, but I can only hope they'll bring fiber soon)
[21:28:17] <greg-g>	 (we're on the list: https://www.sonic.com/gigabit-fiber-internet )
[21:28:39] <marxarelli>	 re: fiber, hear hear. i'm stuck at 5mbps :/
[21:28:48] <greg-g>	 eek!
[21:28:49] <csteipp>	 I hate comcast sometimes
[21:29:00] <greg-g>	 s/ sometimes//
[21:29:27] <csteipp>	 :)
[21:29:29] <csteipp>	 So true
[21:30:27] <greg-g>	 marxarelli: sonic.net support good though?
[21:32:21] <marxarelli>	 greg-g: it's incredible, but i haven't had to call since they set it up
[21:32:37] <greg-g>	 awesome
[21:33:14] * greg-g might just do that, he has 25 mbps with comcast, no big deal going to 20
[21:33:50] <greg-g>	 bah, now smtp.google.com isn't responding....
[21:34:12] <greg-g>	 (smtp.gmail.com I mean)
[21:34:14] <legoktm>	 csteipp: I'm on comcast and been having issues with about half the internet today :/
[21:34:22] <legoktm>	 including phab
[21:34:26] <legoktm>	 gerrit works though
[21:34:38] <greg-g>	 random routing issues suck
[21:34:49] <csteipp>	 legoktm: Yep, gerrit was working for me, but tons of other stuff started dropping off.
[21:36:18] * greg-g vpns into the office
[21:36:29] <wikibugs>	 3Continuous-Integration, VisualEditor, Flow: Flow tests fails to run with VisualEditor installed - https://phabricator.wikimedia.org/T86920#1051322 (10Jdforrester-WMF)
[21:36:40] <wikibugs>	 3VisualEditor, Beta-Cluster: Beta Cluster: API PrefixSearch is taking a very long time to return, and returns nothing when it does - https://phabricator.wikimedia.org/T74332#1051329 (10Jdforrester-WMF)
[21:38:28] <grrrit-wm>	 (03PS1) 10Hashar: Merge branch 'upstream-debian-sid' into debian [integration/zuul] (debian) - 10https://gerrit.wikimedia.org/r/191770 
[21:38:48] <grrrit-wm>	 (03CR) 10Hashar: "check experimental" [integration/zuul] (debian) - 10https://gerrit.wikimedia.org/r/191770 (owner: 10Hashar)
[21:41:26] <grrrit-wm>	 (03PS2) 10Hashar: Merge branch 'upstream-debian-sid' into debian [integration/zuul] (debian) - 10https://gerrit.wikimedia.org/r/191770 
[21:41:47] <grrrit-wm>	 (03CR) 10Hashar: "check experimental" [integration/zuul] (debian) - 10https://gerrit.wikimedia.org/r/191770 (owner: 10Hashar)
[21:44:25] <grrrit-wm>	 (03PS3) 10Hashar: Merge branch 'upstream-debian-sid' into debian [integration/zuul] (debian) - 10https://gerrit.wikimedia.org/r/191770 
[21:44:41] <grrrit-wm>	 (03CR) 10Hashar: "recheck" [integration/zuul] (debian) - 10https://gerrit.wikimedia.org/r/191770 (owner: 10Hashar)
[21:45:20] <grrrit-wm>	 (03CR) 10Hashar: "check experimental" [integration/zuul] (debian) - 10https://gerrit.wikimedia.org/r/191770 (owner: 10Hashar)
[22:04:49] <grrrit-wm>	 (03CR) 10Hashar: "Whatever is under /src/docs/ (which is /docs/ relatively to the source repo working copy) will be rsynced as it to https://doc.wikimedia." [integration/config] - 10https://gerrit.wikimedia.org/r/191046 (https://phabricator.wikimedia.org/T74794) (owner: 10Hashar)
[22:06:58] <wmf-insecte>	 Project beta-scap-eqiad build #42335: FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/42335/
[22:24:36] <mutante>	 !next
[22:24:53] <mutante>	 hello deployers, could you do me a favor and sync one appserver?
[22:25:09] <mutante>	 it has been reinstalled, so it needs to be synced
[22:25:27] <mutante>	 i dont wanna hack dsh groups
[22:29:12] <greg-g>	 twentyafterfour: ^
[22:30:28] <wmf-insecte>	 Yippee, build fixed!
[22:30:29] <wmf-insecte>	 Project beta-scap-eqiad build #42336: FIXED in 22 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/42336/
[22:32:27] <twentyafterfour>	 greg-g, mutante: I don't think I have dsh access 
[22:32:56] <mutante>	 i was wondering how you do it without changing them
[22:33:12] <mutante>	 but if you ever need to, they are in puppet
[22:33:26] <mutante>	 so i just went to the server and ran sync-common
[22:33:31] <mutante>	 after Krenair helped me find it
[22:33:44] <mutante>	 22:31:49 Copying to mw1062.eqiad.wmnet from tin.eqiad.wmnet
[22:34:00] <mutante>	 i hope that makes it so i can put it back in dsh now
[22:34:03] <twentyafterfour>	 well scap has it's own ssh keys 
[22:34:09] <mutante>	 and you guys dont run into issues on next deploy
[22:34:29] <bd808>	 twentyafterfour: you have dsh access
[22:34:30] <mutante>	 we always have this problem when hardware breaks, a server gets removed and fixed later
[22:35:09] <mutante>	 i reinstalled it, added back to puppet and then needed the initial sync
[22:35:44] <bd808>	 puppet will refresh if the /srv/mediawiki dir is completely absent but it doesn't update otherwise I don't think
[22:35:55] <twentyafterfour>	 bd808: I only said that because the one dsh command on https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys  ..failed for me with a permission error
[22:36:22] <mutante>	 bd808: does "sync-common" sound like all there is to it?
[22:36:28] <bd808>	 yup
[22:36:31] <mutante>	 rsync finished fine.. cool
[22:36:42] <mutante>	 i'm re-adding it to dsh and pybal then
[22:36:55] <bd808>	 mutante: https://github.com/wikimedia/operations-puppet/blob/5bd92dcb68d945fd807515fc00a42249c58c9115/modules/mediawiki/manifests/scap.pp#L48-L55
[22:37:20] <greg-g>	 is there a way to have puppet run sync-common when it rejoins the flock?
[22:37:25] <bd808>	 and sync-common is what scap runs on each host
[22:37:43] <mutante>	 oh, it already does that?? nice
[22:37:49] <bd808>	 greg-g: we'd need a custom resource to decide if it was needed
[22:38:20] <mutante>	 ah
[22:38:32] <bd808>	 if we had something on each host that told what "revision" was there then we could have puppet check that against some cannonical source
[22:38:46] <bd808>	 we have a bug for part of that
[22:38:59] <mutante>	 is that checking for directory php-1.25wmfFOO ?
[22:39:25] <greg-g>	 ^^ might be something to add to https://phabricator.wikimedia.org/T89945 twentyafterfour :)
[22:39:33] <mutante>	 maybe like this:
[22:39:45] <mutante>	 let mediawiki puppet role add a salt grain with the mediawiki version
[22:39:50] <mutante>	 then check the salt grain
[22:39:50] <twentyafterfour>	 it would need to check for patches too, not just the presence of the directory
[22:42:13] <bd808>	 *nod* we need a .version file or something that we push to all hosts on each sync-* or scap
[22:42:31] <bd808>	 and then a way to check the "current" version from tin
[22:42:56] <bd808>	 or just kill all this shit and use git for real on all the hosts
[22:43:06] <bd808>	 then `git pull` whenever you want
[22:43:31] <twentyafterfour>	 bd808: that's how I'd like to do it ;)
[22:43:41] <greg-g>	 oooorrrrr, push out the hhvm binary
[22:43:46] <twentyafterfour>	 ^
[22:43:49] <mutante>	 adding a grain, i think i can do the patch, but it would still mean a puppet change on each MW version bump
[22:43:57] <mutante>	 that somebody needs to do and merge
[22:44:10] <twentyafterfour>	 nooooo
[22:44:24] <greg-g>	 puppet change per week deploy? no
[22:44:54] <twentyafterfour>	 greg-g: that would also have to be updated for each swat 
[22:45:03] <greg-g>	 yeah, hellz no
[22:45:05] <twentyafterfour>	 the last thing we need is more manual stuff
[22:45:47] <mutante>	 how about asking Special:Version of en.wp
[22:45:50] <bd808>	 twentyafterfour: have I told you about all the cool things in the Facebook deploy process?
[22:45:58] <mutante>	 and comparing that to what the server has
[22:46:01] <bd808>	 mutante: not good enough at all
[22:46:07] <greg-g>	 mutante: but what about worst case when we're deploying during an outage? :)
[22:46:10] <mutante>	 can we make it good enough?
[22:46:10] <twentyafterfour>	 bd808: no
[22:46:14] <bd808>	 the git hash only updates on a full scap
[22:46:19] <mutante>	 make Special:Version show more , i mean
[22:46:23] <twentyafterfour>	 I'm familiar with facebook, a little bit
[22:46:45] <bd808>	 Their hhvm deploy process is basically:
[22:46:50] <mutante>	 greg-g: ok, version of wikitech-static ../me hides
[22:47:01] <bd808>	 make a squashfs files system
[22:47:22] <bd808>	 fill it with hhvm binary, hhbc cache, other assets
[22:47:28] <bd808>	 torrent it to the clsuter
[22:47:38] <twentyafterfour>	 mutante: the problem is we do so much one-off-patching there really would just need to be a centrally published serial number that increments with each change
[22:47:39] <bd808>	 touch a file to depool a server
[22:47:47] <bd808>	 wit for it to drain
[22:47:55] <bd808>	 stopp hhvm
[22:48:01] <bd808>	 unmount current version
[22:48:05] <bd808>	 mount new version
[22:48:10] <bd808>	 prime cache
[22:48:11] <mutante>	 twentyafterfour: i wasn't aware that doesn't exist :p
[22:48:14] <bd808>	 rm stop file
[22:48:36] <grrrit-wm>	 (03PS1) 10Dduvall: Run CentralAuth browser tests at en.m.wikipedia.beta.wmflabs.org [integration/config] - 10https://gerrit.wikimedia.org/r/191798 
[22:48:45] <twentyafterfour>	 mutante: it should exist. I'm happy to build it
[22:49:04] <twentyafterfour>	 bd808: that seems complex but honestly less scary than what we do now
[22:49:24] <mutante>	 twentyafterfour: so maybe it could be a post-commit hook in the mw repo, that automatically increments it
[22:49:38] <mutante>	 on merge ++
[22:50:37] <twentyafterfour>	 all deployment changes go through tin so any scap action could increment the value ...just need a place to publish it (I'd vote for directly publishing it via http on tin) 
[22:51:26] <mutante>	 or put that new version number in special:version as well?
[22:51:34] <mutante>	 then you can even check via API
[22:52:10] <twentyafterfour>	 yeah that would probably work
[22:53:31] <bd808>	 -1 for more crap in special:version
[22:54:00] <mutante>	 version number = crap ?:)
[22:54:01] <bd808>	 we run http on tin for trebuchet already
[22:54:20] <bd808>	 well it would be custom somthing or other just for the wmf cluster
[22:54:46] <twentyafterfour>	 something simple that you can curl with no extra processing would be nice
[22:54:50] <bd808>	 we could try to jam it in the version number I guess 1.25wmf.37
[22:54:58] <bd808>	 we could try to jam it in the version number I guess 1.25wmf18.37
[22:55:00] <twentyafterfour>	 shouldn't have to parse html or run a full api client 
[22:55:06] <bd808>	 right
[22:55:13] <mutante>	 there is also http://config-master.wikimedia.org/
[22:55:21] <mutante>	 which is used for pybal currenlty
[22:55:24] <bd808>	 jsut a .txt with a hash/timestamp/whatever in it
[22:55:40] <twentyafterfour>	 what's config-master
[22:56:00] <bd808>	 root only stuff for pybal I think
[22:56:19] <mutante>	 a webserver made to store config (for which appserver is in the cluster and which isnt)
[22:57:21] <mutante>	 it could be super smart if it knew that one appserver has a wrong serial
[22:57:25] <mutante>	 and deactivate it
[22:59:08] <greg-g>	 even though that list of which appserver is in the cluster or not is hand maintained out of vcs... :P
[23:00:24] <mutante>	 yea, that's why i say it would be nice if it could automatically do that.. cycle closed
[23:00:37] * greg-g nods
[23:01:18] <greg-g>	 so much here to grapple with, it's almost like we need a cross team (RelEng+Ops) group to drive and maintain these kinds of changes
[23:01:20] <mutante>	 also just because you said http and this is already a http server that could be used
[23:01:47] <mutante>	 greg-g: yea, just merge engineering teams into one and get rid of the overhead 
[23:01:51] <greg-g>	 (which, btw, is what I'm proposing during the budgeting process conversation that is kicking off in earnest tomorrow for engineering)
[23:02:00] <greg-g>	 mutante: ish
[23:02:03] <greg-g>	 :)
[23:02:20] <greg-g>	 I want a small nimble group to work on this that has buy-in from ops and releng
[23:03:19] <mutante>	 uhmm yea, but on the other hand if you put every ops into a special project you have no regular ops left
[23:04:00] <mutante>	 budget meeting sounds "fun" :p
[23:05:36] <greg-g>	 mutante: there is the possibility of hiring :)
[23:07:14] <^d>	 I never realized how much I relied on my bouncer until today.
[23:07:20] <^d>	 spof?
[23:07:39] <mutante>	 hah, i guess that depends on the budget :)
[23:07:57] <mutante>	 ^d: do you want the OIT bouncer? *g*
[23:08:09] <mutante>	 there was that project wasnt there
[23:08:21] <^d>	 Yes but hours/day got in the way
[23:08:34] <legoktm>	 ^d: just have two bouncers running :P
[23:08:43] <twentyafterfour>	 ^d: thcipriani needs sudo privs (he's not in the nda group in ldap?)
[23:08:47] <^d>	 Then I'd need a second VM
[23:08:57] <^d>	 twentyafterfour: I know, working on it.
[23:09:07] <^d>	 Something something, legal hoops to jump through
[23:09:09] <legoktm>	 I have a shared host and a VM
[23:09:12] <mutante>	 how are sudo privs related to that ldap group?
[23:09:14] <twentyafterfour>	 ^d: ok cool, figured you might have missed it
[23:09:20] <thcipriani>	 heh, thanks twentyafterfour and ^d 
[23:09:28] <greg-g>	 thcipriani: you signed all the forms they put in front of you at HR, right?
[23:09:47] <thcipriani>	 yup
[23:10:09] <^d>	 greg-g: I feel like HR would've made a point to tell you if he had refused to sign some :)
[23:10:23] <greg-g>	 if you say yes, I'm willing to take the fall and say "just freaking add him already, he's an employee who signed all his paperwork, just because we don't have a good HR -> Legal -> ops/us workflow for NDAs doesn't matter"
[23:10:32] <greg-g>	 ^d: you'd think :)
[23:10:59] <mutante>	 but that LDAP group is just for logins on icinga and graphite
[23:11:02] <^d>	 "So Greg, this new hire of yours...he won't sign the non-discrimination policy"
[23:11:13] * bd808 got in based on "works for Rob"
[23:11:14] <greg-g>	 if it turns out you sneakily wrote "Not Tyler" in cursive everywhere, well, good on you
[23:11:19] <^d>	 mutante: He's already in `wmf` so no big deal there
[23:11:26] <^d>	 It's the nda groups in beta cluster he needs
[23:11:31] <mutante>	 ^d: then i don't get what it has to do with sudo
[23:11:46] <^d>	 sudo on beta requires nda
[23:11:51] <^d>	 because $reasons
[23:11:58] <mutante>	 hah, there is a group in beta cluster called nda that is unlike the other LDAP group called nda?
[23:12:05] <bd808>	 oh. that's a dumb thing to worry about
[23:12:08] <^d>	 Which is also not the Phab nda, right.
[23:12:09] * bd808 will add him
[23:12:12] <mutante>	 lol
[23:12:24] <mutante>	 but it's even the same LDAP server that is wikitech and labs :p
[23:12:45] <^d>	 We have 3 nda groups, all of which are managed separately :)
[23:12:54] <mutante>	 hahaha
[23:13:05] <mutante>	 if it has the keyword NDA in it, run 
[23:13:46] <bd808>	 !log added Thcipriani to under_NDA sudoers group; WMF staff
[23:13:51] <qa-morebots>	 Logged the message, Master
[23:13:52] <mutante>	 (it will never change though if we just keep doing it manually when needed instead of having it on the onboarding workflow docs)
[23:14:37] <mutante>	 give HR access to LDAP already so they can set it :) just needs some nice web UI
[23:15:07] <^d>	 Different LDAP
[23:15:10] <^d>	 That's OIT ldap
[23:15:14] <^d>	 They don't add to wikitech ldap
[23:15:18] * ^d cries a little
[23:15:29] <thcipriani>	 neat, I have sudo access, but I'm not really at liberty to talk about it: I signed an NDA, probably.
[23:16:11] <^d>	 Also, if you disclose stuff from beta you'd be disclosing some of the most boring data we have :p
[23:16:17] <mutante>	 ^d: the OIT ldap needs to die then
[23:16:20] <^d>	 Which is part of why nda-for-beta makes me lol.
[23:16:48] <^d>	 mutante: Suggest it to techsupport@ ;-)
[23:17:05] <mutante>	 how is it even possible.. wikitech uses ldap to determine project memberships
[23:17:14] <mutante>	 being admin in a project gives you sudo
[23:17:17] <mutante>	 beta is a labs project
[23:17:28] <^d>	 Sudo policies for beta aren't default
[23:17:38] <mutante>	 so why isn't this just adding them in wikitech to the project as admins
[23:21:24] <Krenair>	 Can't project admins add people to the nda group though?
[23:26:20] <^d>	 Yes, so you can escalate if you get added to the former :p
[23:32:33] <Krenair>	 "escalate"