[00:07:01] <shinken-wm>	 PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL
[00:07:01] <mutante>	 Notice: /Stage[main]/Phabricator/Phabricator::Conf_env[vcs]/File[/srv/phab/phabricator/conf/local/vcs.json]/group: group changed 'vcs' to 'phd'
[00:07:11] <mutante>	 Notice: Finished catalog run in 21.16 seconds
[00:07:23] <mutante>	 twentyafterfour: ^ all good, just fyi
[00:08:41] <paladox>	 phabricator works on labs
[00:08:51] <paladox>	 time to deprecate phabricator labs class
[00:09:06] <paladox>	 just remeber to use scap on localhost on labs.
[00:11:41] <mutante>	 :))
[00:14:50] <wikibugs_>	 10Deployment-Systems, 06Release-Engineering-Team, 06Operations: Trebuchet targets for test/testrepo are out of date - https://phabricator.wikimedia.org/T149180#2833693 (10fgiunchedi) p:05Triage>03Low
[00:21:49] <wikibugs_>	 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Mobile view url broken on beta cluster (redirect, mobile view, etc.) - https://phabricator.wikimedia.org/T151894#2833715 (10Jdforrester-WMF) 05Open>03Resolved Looks like this is now fixed. Thanks!
[00:23:53] <mutante>	 We are now getting into fixing startup issue of the phab-ssh deamon on systemd
[00:24:29] <mutante>	 that will help with labs and phab2001 too i think
[00:34:53] <mutante>	 it seems the systemd unit file for the ssh-phab.service does not get created on the labs jessie instance
[00:35:02] <mutante>	 but on phab2001 we do have it
[00:35:50] <mutante>	 which is also jessie. but it could be that it used to be installed by puppet in the past and then it stopped due to another change later
[00:36:35] <mutante>	 since puppet is deactivated on phab2001. i'd like to enable it and remove that file and run it and see if it comes back or not 
[00:39:04] <paladox>	 There seems to be a difference between ssh phabricator and phd (service)
[00:39:06] <paladox>	 https://github.com/wikimedia/operations-puppet/blob/a9b55135045b9b7edd7dbc75dbec7fbe8097ca87/modules/phabricator/manifests/phd.pp#L27
[00:39:14] <paladox>	 https://github.com/wikimedia/operations-puppet/blob/8d5ac3337641041ae92e2ed7fab4e5e5b30f3f15/modules/phabricator/manifests/vcs.pp#L104
[00:39:18] <paladox>	 mutante ^^
[00:39:31] <paladox>	 most likly just phd redirects to phabricator bin
[00:39:37] <mutante>	 one moment paladox, be right back, testing something
[00:39:43] <paladox>	 ok
[00:42:37] <paladox>	 mutante i belive this https://github.com/wikimedia/operations-puppet/commit/6b6a5849e13b572fa64149925e313b6ad39a681f
[00:42:39] <paladox>	 broke it
[00:42:46] <mutante>	 can you confirm if git-ssh from phab is working?
[00:42:58] <paladox>	 Oh how do i test that?
[00:43:04] <mutante>	 good find, will get back to that in a minute
[00:43:06] <paladox>	 is that from phabricator.wikimedia.org
[00:43:40] <mutante>	 git-ssh.wikimedia.org
[00:44:01] <paladox>	 phabricator ssh works
[00:44:06] <mutante>	 good
[00:44:15] <mutante>	 i ran puppet on phab2001
[00:44:25] <paladox>	 ok
[00:44:26] <paladox>	 :)
[00:44:27] <mutante>	 and remember how last time it set the wrong IP
[00:44:33] <mutante>	 and broke this
[00:44:33] <paladox>	 Yep
[00:44:50] <mutante>	 this means that phab2001 got a bunch of updates that had accumulated
[00:44:57] <paladox>	 yep
[00:45:46] <grrrit-wm>	 (03PS1) 10Krinkle: Replace visualeditor-jsduck-jessie with npm-run-doc-jessie template [integration/config] - 10https://gerrit.wikimedia.org/r/324368 
[00:45:58] <mutante>	 so let me tell you the details from prod right away
[00:46:13] <mutante>	 since you have the same issue in labs
[00:46:14] <mutante>	 well, similar
[00:46:20] <grrrit-wm>	 (03PS2) 10Krinkle: Replace visualeditor-jsduck-jessie with npm-run-doc-jessie template [integration/config] - 10https://gerrit.wikimedia.org/r/324368 
[00:46:48] <mutante>	 phab2001 has 2 IPs, the first  is phab2001.codfw.wmnet  and the second phab2001-vcs.codfw.wmnet
[00:46:59] <grrrit-wm>	 (03CR) 10Krinkle: "Fixed in Idd396f1acaec78f." [integration/config] - 10https://gerrit.wikimedia.org/r/323872 (owner: 10Jforrester) 
[00:47:03] <mutante>	 the second is the one your second SSHD runs on
[00:47:10] <paladox>	 uep
[00:47:17] <mutante>	 on iridium thi sis:
[00:47:46] <paladox>	 yep
[00:47:57] <mutante>	 iridium.eqiad.wmnet and phab1001-vcs.eqiad.wmnet
[00:48:02] <paladox>	 yep
[00:48:08] <mutante>	 you should imagine that iridium is already phab1001 for this
[00:48:36] <mutante>	 now what you need on labs is a second private IP
[00:48:39] <mutante>	 no need for public
[00:48:41] <paladox>	 yep
[00:48:49] <mutante>	 but you just need a second IP on the interface
[00:48:53] <paladox>	 oh
[00:49:05] <mutante>	 and then the second ssh server should start
[00:49:18] <mutante>	 BUT .. it also does not right now because you dont have the systemd unit file
[00:49:30] <mutante>	 and this brings us back to what you pasted above
[00:49:40] <paladox>	 yep
[00:49:40] <mutante>	 why does the unit file not get created by puppet for you
[00:50:01] <paladox>	 Not sure I think it may be https://github.com/wikimedia/operations-puppet/commit/6b6a5849e13b572fa64149925e313b6ad39a681f
[00:51:49] <mutante>	 i am moving that file on phab2001 and running puppet 
[00:52:30] <paladox>	 ok
[00:52:36] <mutante>	 it does not come back
[00:52:41] <mutante>	 and i see the behaviour you see
[00:52:46] <mutante>	 Could not evaluate: Could not retrieve information from environment production source(s) puppet:///modules/phabricator/sshd-phab.service
[00:53:00] <mutante>	 wait.. that path
[00:53:05] <paladox>	 Oh
[00:53:50] <mutante>	 sshd-phab.service
[00:53:52] <mutante>	 vs
[00:53:56] <mutante>	 ssh-phab.service
[00:54:25] <mutante>	 there is the extra "d" in the error but not in repo
[00:54:27] <bd808>	 marxarelli: my vbguest plugin became a bigger deal today when we found out that Vagrant 1.9.0 is broken with our current plugin loading scheme -- https://gerrit.wikimedia.org/r/#/c/320277/
[01:01:04] <mutante>	 i believe the fix is just moving the file  https://gerrit.wikimedia.org/r/#/c/324369/
[01:02:20] <mutante>	 just because that matches " $init_source = 'puppet:///modules/phabricator/sshd-phab.service'
[01:03:33] <mutante>	 twentyafterfour: ^ sounds right? i'll go ahead with that since it would not influence trusty iridium anyways
[01:03:51] <mutante>	 but it should fix 2001 and labs on jessie
[01:03:56] <twentyafterfour>	 ok
[01:14:13] <wikibugs>	 06Release-Engineering-Team, 06Operations, 10Parsoid: Provide a /parsoid directory on releases.wikimedia.org - https://phabricator.wikimedia.org/T150672#2833841 (10fgiunchedi) p:05Triage>03Normal
[01:14:52] <mutante>	 in iridium: nothing
[01:15:00] <paladox>	 no more puppet errors on the phabricator class in labs
[01:15:01] <mutante>	 on phab2001: service ssh-phab started
[01:15:03] <mutante>	 :)
[01:15:13] <paladox>	 started on the test instance too
[01:15:17] <mutante>	 nice
[01:16:17] <paladox>	 yep
[01:17:07] <mutante>	 well, and we have phab2001 "back" 
[01:17:18] <mutante>	 puppet running i mean
[01:17:41] <paladox>	 yep :)
[01:17:42] <mutante>	 now let's see what else we need there
[01:17:49] <paladox>	 ok
[01:18:16] <grrrit-wm>	 (03PS2) 10Reedy: Stop reference of string $content as an array [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/283384 (https://phabricator.wikimedia.org/T127572) (owner: 10Aashaka) 
[01:18:24] <mutante>	 T137928
[01:18:25] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Stop reference of string $content as an array [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/283384 (https://phabricator.wikimedia.org/T127572) (owner: 10Aashaka) 
[01:18:42] <mutante>	 is the bug for that right
[01:18:57] <mutante>	 i expected a bot to turn that into full URL
[01:19:48] <paladox>	 mutante the bot is silent in this channel
[01:20:15] <paladox>	 it wont reply to you in this channel but it will publish it to the task if you log it.
[01:20:44] <mutante>	 ah, ok
[01:20:55] <mutante>	 well then, at this point we take a break and continue more tomorrow
[01:21:05] <paladox>	 yep
[01:35:28] <grrrit-wm>	 (03CR) 10Jforrester: [C: 031] Replace visualeditor-jsduck-jessie with npm-run-doc-jessie template [integration/config] - 10https://gerrit.wikimedia.org/r/324368 (owner: 10Krinkle) 
[01:44:27] <wikibugs_>	 10MediaWiki-Codesniffer: Should we require documentation for constructors? - https://phabricator.wikimedia.org/T146388#2659448 (10Samwilson) You can just document the constructor parameters, and it'll pass.  I think it's worth documenting parameters for every method, including constructors. But no need, as you s...
[01:59:20] <wikibugs_>	 06Release-Engineering-Team, 06Operations, 06Parsing-Team, 07HHVM, and 2 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2833920 (10tstarling) >>! In T151702#2831448, @Joe wrote: > From a quick look, most threads seem effectively blocked in a very simple function: >  > ``` > je...
[02:02:38] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 07Mobile: CommentStreams: The module 'ext.CommentStreams' must not have target 'mobile' because its dependency 'jquery.ui.dialog' does not have it - https://phabricator.wikimedia.org/T151863#2833926 (10Krinkle) @cicalese If you are passing `ext.C...
[02:17:24] <wmf-insecte>	 Yippee, build fixed!
[02:17:25] <wmf-insecte>	 Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #234: 09FIXED in 4 min 24 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/234/
[02:48:57] <grrrit-wm>	 (03PS1) 10Samwilson: Return earlier when testing scope fields [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/324376 (https://phabricator.wikimedia.org/T146439) 
[02:52:56] <grrrit-wm>	 (03PS2) 10Samwilson: Return earlier when testing scope fields [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/324376 (https://phabricator.wikimedia.org/T146439) 
[02:57:58] <wikibugs_>	 10MediaWiki-Codesniffer, 13Patch-For-Review: Undefined index: parenthesis_closer in SpaceBeforeControlStructureBraceSniff.php - https://phabricator.wikimedia.org/T146439#2833984 (10Samwilson) a:03Samwilson The above changes fix this and one other similar problem. phpcs runs fine on TextExtracts (well, there'...
[03:40:08] <grrrit-wm>	 (03CR) 10Legoktm: "Thanks, could you add a test case with code that previously would have triggered the warning?" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/324376 (https://phabricator.wikimedia.org/T146439) (owner: 10Samwilson) 
[04:18:20] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #219: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/219/
[05:13:32] <wikibugs_>	 06Release-Engineering-Team, 06Operations, 06Parsing-Team, 07HHVM, and 2 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2834167 (10tstarling) Filed upstream bug https://github.com/facebook/hhvm/issues/7515 , but we're not blocked on it, we can use the MALLOC_CONF environment v...
[06:01:21] <wikibugs>	 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Mobile view url broken on beta cluster (redirect, mobile view, etc.) - https://phabricator.wikimedia.org/T151894#2831435 (10phuedx) Thanks @Krenair!  ---  >>! In T151894#2833266, @Krenair wrote: > Oh, no, maybe not, I misunderstood the hackery going on here: >...
[06:40:48] <wmf-insecte>	 Yippee, build fixed!
[06:40:48] <wmf-insecte>	 Project selenium-Wikibase » chrome,test,Linux,contintLabsSlave && UbuntuTrusty build #193: 09FIXED in 2 hr 0 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/193/
[09:13:49] <hashar>	 zeljkof: !;-)
[09:14:39] <zeljkof>	 hashar: what's up? 🤔
[09:14:47] <hashar>	 sorry about yesterday
[09:14:57] <hashar>	 could not really assist on the npm/selenium job
[09:15:01] <zeljkof>	 what about yesterday?
[09:15:08] <hashar>	 ended up swamped trying to add tests for a hundred or so of extensions
[09:15:14] <hashar>	 and it took slightly longer than expected
[09:15:19] <zeljkof>	 oh, no problem, will continue with that today
[09:15:25] <hashar>	 seen your patch to do the symlinks under /usr/local/bin
[09:15:37] <hashar>	 lets try it!
[09:15:45] <zeljkof>	 yeah, I have no idea if that would work
[09:18:27] <hashar>	 it would :)
[09:19:04] <zeljkof>	 it took me a while to figure out how it's done, the first hit on google was something completely different
[09:19:07] * zeljkof was confused
[09:19:35] <hashar>	 yeah puppet is messy
[09:19:40] <hashar>	 wanna deploy it?
[09:26:53] <zeljkof>	 hashar: sorry, just saw your comment
[09:27:01] <zeljkof>	 sure! lets's deploy!
[09:27:14] <zeljkof>	 meeting in the usual hangout?
[09:27:45] <hashar>	 in a coworking place so that is not convenient
[09:27:59] <zeljkof>	 I have no idea how to deploy :)
[09:28:04] * zeljkof is searching for docs
[09:28:12] <hashar>	 puppet standalone
[09:28:14] <hashar>	 basically
[09:28:16] <hashar>	 ssh to integration-puppetmaster01.integration.eqiad.wmflabs
[09:28:17] <hashar>	 sudo su -
[09:28:22] <hashar>	 cd /var/lib/git/operations/puppet
[09:28:38] <hashar>	 (that is the local checkout of ops/puppet.git that is read by the puppet master running on that instance)
[09:28:51] <hashar>	 then git fetch  && git cherry-pick
[09:29:02] <hashar>	 the fetch url / reference is listed in Gerrit. Top right under "Download" link
[09:36:05] <zeljkof>	 hashar: uh oh, never done that, let's see... :)
[09:37:56] <wikibugs_>	 06Release-Engineering-Team, 06Operations, 10Parsoid: Provide a /parsoid directory on releases.wikimedia.org - https://phabricator.wikimedia.org/T150672#2792988 (10Legoktm) A new directory can be created by defining it in puppet: https://github.com/wikimedia/operations-puppet/blob/production/modules/releases/...
[10:31:06] <hashar>	 zeljkof: internet went down
[10:31:27] <zeljkof>	 welcome back! 🎉
[10:32:23] <zeljkof>	 hashar: I've left a comment at https://gerrit.wikimedia.org/r/#/c/324203/2
[10:32:39] <zeljkof>	 I did cherry pick, but not sure if I have to run puppet manually, or if it would run automatically
[10:32:41] <zeljkof>	 reading https://wikitech.wikimedia.org/wiki/Puppet
[10:32:49] <hashar>	 puppet runs from a cron
[10:32:57] <hashar>	 every maybe 20 minutes or so
[10:33:37] <zeljkof>	 it was more than 20 minutes ago, so it should be applied then?
[10:33:58] <zeljkof>	 I'll rerun one of the jobs and see if it can see chromedriver
[10:34:53] <wikibugs_>	 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Ensure ChromeDriver is installed for jobs that run Selenium tests - https://phabricator.wikimedia.org/T117418#2834427 (10hashar) ``` $ ssh integration-saltmaster.integration.eqiad.wmflabs  hashar@integration-saltmaster:~$ sudo su - roo...
[10:35:05] <hashar>	 zeljkof: and salt lets one mass verify https://phabricator.wikimedia.org/T117418#2834427
[10:35:37] <hashar>	 the next trick is that it is solely for the permanent slaves
[10:35:42] <zeljkof>	 hashar: so it worked?!
[10:35:50] <hashar>	 for the nodepool slaves, they are booting out of  a snapshot that got generated yesterday
[10:35:56] <hashar>	 so lack the link :D
[10:36:32] <zeljkof>	 ok, but it will be there tomorrow?
[10:37:29] <hashar>	 zeljkof: or we can refresh the images
[10:37:37] <hashar>	 nodepool does it automatically at 14:14 UTC
[10:37:49] <hashar>	 (on other news, nodepool get more instances to spawn ! https://grafana.wikimedia.org/dashboard/db/nodepool?panelId=1&fullscreen&from=now-24h&to=now  )
[10:37:54] <hashar>	 from 12 to 19!
[10:38:10] <zeljkof>	 19? why not 20?
[10:39:13] <hashar>	 it is complicated :D
[10:39:29] <hashar>	 goes with the quota
[10:39:37] <hashar>	 we had  up to 12 instances  against a quota of 15
[10:39:48] <hashar>	 leaving extra room for 3 instances
[10:40:06] <hashar>	 two days ago, we had 3 instances leaked. They were in the openstack project but not known to nodepool
[10:40:09] <hashar>	 so it worked fine
[10:40:26] <hashar>	 when changing the quota to  20 instances 
[10:40:43] <hashar>	 that means nodepool could have 20 instances, add to that the 3 leaked instances and that is  23 instances
[10:40:54] <hashar>	 or 23 instances  *  2 CPU/instance =  46 CPU
[10:40:59] <hashar>	 but the quota is 44 CPU
[10:41:05] <hashar>	 hence oepnstack refused to boot an extra
[10:41:24] <hashar>	 moving the quota down to 19 let us allow for up to 3 leaked instances as before
[10:41:27] <hashar>	 (sorry all confusing)
[10:41:35] <zeljkof>	 what is a leaked instance?
[10:41:36] <hashar>	 the fix is to get Nodepool to detect leaked instances and delete the
[10:41:38] <hashar>	 m
[10:41:40] <hashar>	 ah leaked
[10:41:46] <hashar>	 an instance that nodepool asked to spawn
[10:41:50] <hashar>	 which get spawned by openstack
[10:41:58] <hashar>	 but that nodepool erroneously forget/stop tracking
[10:42:13] <hashar>	 so the instance is idling/doing nothing in openstack, and consumes its quota
[10:42:30] <hashar>	 but nodepool hasn't acknowledged it /  knows about it
[10:42:52] <hashar>	 so eventually you could have 19 instances spawned in openstack.  Nodepool would know about none and will try to spawn instances over and over
[10:43:10] <hashar>	 only to get refused by openstack because the labs project has 19/19 instances used
[10:43:13] <zeljkof>	 but why is 3 the magic number?
[10:43:21] <hashar>	 that is what we had 2 days ago
[10:43:27] <hashar>	 so merely set the same
[10:43:33] <hashar>	 it is arbitrary really
[10:43:34] <zeljkof>	 why wouldn't there be more, or less?
[10:43:37] <zeljkof>	 I see
[10:44:17] <hashar>	 meanwhile
[10:44:18] <hashar>	 https://gerrit.wikimedia.org/r/#/c/324203/ is ready
[10:44:21] <zeljkof>	 hashar: all *-jessie jobs are on nodepool, right?
[10:44:27] <hashar>	 gotta drop the WIP,  maybe add some more info to the commit message
[10:44:32] <hashar>	 and we can get ops to review/merge the patch
[10:44:44] <zeljkof>	 hashar: will do
[10:45:01] <hashar>	 the -jessie -trusty jobs are on nodepool yes
[10:45:11] <hashar>	 once the patch is merged, it is quite trivial to refresh nodepool snapshots
[10:45:15] <zeljkof>	 oh, and -trusty too
[10:45:24] <hashar>	 https://wikitech.wikimedia.org/wiki/Nodepool#Manually_generate_a_new_snapshot
[10:45:30] <zeljkof>	 but the patch needs to be merged first?
[10:45:31] <hashar>	 ssh labnodepool1001.eqiad.wmnet
[10:45:33] <hashar>	 become-nodepool
[10:45:39] <hashar>	 git -C /etc/nodepool/wikimedia/ pull
[10:45:42] <hashar>	 nodepool image-update wmflabs-eqiad ci-jessie-wikimedia
[10:45:48] <hashar>	 ^^^^4 lines :}
[10:45:54] <hashar>	 and the patch has to be merged yes
[10:46:09] <zeljkof>	 ok, working on the commit message
[10:46:11] <hashar>	 the provisioning script will git pull from operations/puppet.git  and does not support cherry pick / local hacks
[10:46:19] <hashar>	 that is a limitation :(
[10:48:32] <zeljkof>	 hashar: better? https://gerrit.wikimedia.org/r/#/c/324203/
[10:48:40] * zeljkof is back in a few minutes
[10:51:38] <hashar>	 coffee etc
[10:53:02] <mafk>	 le café
[10:53:24] <hashar>	 mafk: indeed :-}
[10:54:27] <mafk>	 bien sûr
[11:03:08] <hashar>	 back
[11:04:25] <paladox>	 hashar hi
[11:06:15] <hashar>	 lo
[11:07:33] <paladox>	 hashar im not sure if you saw me wrote this last night
[11:07:34] <paladox>	 http://snapshot.debian.org/package/python-shade/0.6.1-1/
[11:07:40] <paladox>	 ^^ we could use that
[11:07:55] <hashar>	 yeah maybe or maybe not
[11:08:11] <hashar>	 as I said yesterday, I am not sure I am going to spend time trying to upgrade nodepool
[11:08:17] <hashar>	 gotta look at the patches that fix leaked instances
[11:08:25] <hashar>	 if I have confidence I can just cherry pick them I will just do that
[11:09:14] <paladox>	 ok
[11:10:13] <paladox>	 hashar if we decide to scap nodepool, are we going with Docker?
[11:14:52] <hashar>	 most probably
[11:15:00] <hashar>	 Dan did a proof of concept 
[11:15:08] <hashar>	 basically the source repo has a Dockerfile
[11:15:20] <hashar>	 he has setup a basic Jessie instance that really just have docker installed
[11:15:26] <hashar>	 clone the repo,  run the docker command
[11:15:26] <hashar>	 and report
[11:15:49] <hashar>	 https://integration.wikimedia.org/ci/job/differential-docker-test/
[11:16:18] <hashar>	 and the Dockerfile example is https://phabricator.wikimedia.org/D455
[11:18:48] <hashar>	 !log Gerrit  mediawiki/extensions/CentralNotice/BannerProxy.git  Empty since 2014
[11:18:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[11:20:52] <hashar>	 !log Gerrit made mediawiki/extensions/GuidedTour/guiders read-only (per README.md, no more used)
[11:20:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[11:22:11] <hashar>	 !log Gerrit hide mediawiki/extensions/JsonData/JsonSchema Empty since 2013
[11:22:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[11:22:34] <paladox>	 yep
[11:22:51] <paladox>	 hashar would that allow us to use one instance and run many Docker's in it?
[11:22:59] <hashar>	 yes
[11:23:07] <hashar>	 with that proof of concept
[11:23:12] <paladox>	 oh :) :)
[11:23:17] <hashar>	 we would have permanent instances in wmflabs
[11:23:26] <hashar>	 allowing X builds to run in parlalel
[11:23:36] <hashar>	 and the build being run in their own docker container
[11:23:37] <hashar>	 s
[11:23:43] <paladox>	 oh, i guess that would be the benefit allow mutiple runs on the same instance.
[11:23:49] <paladox>	 would it be as secure as nodepool?
[11:24:17] <paladox>	 so we can remove the whitelist as planned?
[11:29:13] <hashar>	 ideally yes
[11:31:51] <paladox>	 Oh :)
[11:32:32] <paladox>	 hashar i clean up the look of the ext dependacies in paramater_function in https://gerrit.wikimedia.org/r/#/c/323540/
[11:32:59] <paladox>	 its is now all clear, making it easyer for future users to add deps without jenkins failing hopefully
[11:37:59] <wikibugs>	 10Deployment-Systems, 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10MediaWiki-JobRunner: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#2834598 (10hashar)
[11:42:15] <wikibugs_>	 03Scap3 (Scap3-Adoption-Phase1), 10scap, 06Discovery, 06Discovery-Search, 10Elasticsearch: Deploy elasticsearch plugins with scap3 - https://phabricator.wikimedia.org/T151996#2834610 (10hashar)
[11:43:14] <wikibugs>	 03Scap3 (Scap3-Adoption-Phase1), 10scap, 06Discovery, 06Discovery-Search, 10Elasticsearch: Deploy elasticsearch plugins with scap3 - https://phabricator.wikimedia.org/T151996#2834625 (10hashar) @dcausse @Gehel I am going to try to migrate the jobrunner service first ( T129148 ), then I guess jump into tr...
[11:43:34] <wikibugs_>	 03Scap3 (Scap3-Adoption-Phase1), 10scap, 06Discovery, 06Discovery-Search, 10Elasticsearch: Deploy elasticsearch plugins with scap3 (Trebuchet elasticsearch/plugins) - https://phabricator.wikimedia.org/T151996#2834627 (10hashar)
[11:44:29] <wikibugs>	 03Scap3 (Scap3-Adoption-Phase1), 10scap, 06Discovery, 06Discovery-Search, 10Elasticsearch: Deploy elasticsearch plugins with scap3 (Trebuchet elasticsearch/plugins) - https://phabricator.wikimedia.org/T151996#2834628 (10Gehel) @hashar I only have very limited experience with scap3 and even less with treb...
[11:53:09] <wikibugs_>	 03Scap3 (Scap3-Adoption-Phase1), 10scap, 06Discovery, 06Discovery-Search, 10Elasticsearch: Deploy elasticsearch plugins with scap3 (Trebuchet elasticsearch/plugins) - https://phabricator.wikimedia.org/T151996#2834637 (10hashar) I will level up myself on jobrunner then brain dump what I know and lead the...
[11:57:01] <grrrit-wm>	 (03PS1) 10Zfilipin: WIP Run experimental Node.js Selenium job for mediawiki/core in experimental pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/324416 (https://phabricator.wikimedia.org/T139740) 
[11:57:46] <grrrit-wm>	 (03Abandoned) 10Zfilipin: WIP mediawiki-core-qunit-jessie Jenkins job needs Vector skin [integration/config] - 10https://gerrit.wikimedia.org/r/324178 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) 
[11:58:11] <zeljkof>	 hashar: how does this look? https://gerrit.wikimedia.org/r/#/c/324416/
[11:59:02] <wikibugs>	 03Scap3 (Scap3-Adoption-Phase1), 10scap, 06Discovery, 06Discovery-Search, 10Elasticsearch: Deploy elasticsearch plugins with scap3 (Trebuchet elasticsearch/plugins) - https://phabricator.wikimedia.org/T151996#2834644 (10dcausse) Thanks @hashar! Let me know if I can help, I know the existing process but I...
[12:03:52] <wmf-insecte>	 Yippee, build fixed!
[12:03:53] <wmf-insecte>	 Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #228: 09FIXED in 2 min 51 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/228/
[12:09:04] <grrrit-wm>	 (03PS1) 10Hashar: test: add qa to list Gerrit repos not in Zuul [integration/config] - 10https://gerrit.wikimedia.org/r/324420 
[12:18:37] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] test: add qa to list Gerrit repos not in Zuul [integration/config] - 10https://gerrit.wikimedia.org/r/324420 (owner: 10Hashar) 
[12:19:30] <grrrit-wm>	 (03Merged) 10jenkins-bot: test: add qa to list Gerrit repos not in Zuul [integration/config] - 10https://gerrit.wikimedia.org/r/324420 (owner: 10Hashar) 
[12:21:11] <grrrit-wm>	 (03PS1) 10Hashar: Add debian-glue-non-voting to four repos [integration/config] - 10https://gerrit.wikimedia.org/r/324422 
[12:22:35] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Add debian-glue-non-voting to four repos [integration/config] - 10https://gerrit.wikimedia.org/r/324422 (owner: 10Hashar) 
[12:23:46] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add debian-glue-non-voting to four repos [integration/config] - 10https://gerrit.wikimedia.org/r/324422 (owner: 10Hashar) 
[13:08:57] <Reedy>	 sync-masters:   0% (ok: 0; fail: 0; left: 1)                                    
[13:09:01] <Reedy>	 Shouldn't that be doing both?
[13:09:08] <Reedy>	 ie a sync-common on tin too?
[13:15:59] <wikibugs>	 06Release-Engineering-Team, 03Scap3: /srv/mediawiki on tin not being updated when using scap sync-file - https://phabricator.wikimedia.org/T152005#2834815 (10Reedy)
[13:16:24] <grrrit-wm>	 (03PS1) 10Hashar: Tweak integration-config-qa email notification [integration/config] - 10https://gerrit.wikimedia.org/r/324437 
[13:21:36] <hashar>	 Reedy: sync-masters is just to sync /srv/mediawiki-staging between the deployment hosts
[13:22:04] <hashar>	 sync-common I guess the deployment servers are targets of deployment and they are populated just like other mw app servers
[13:22:21] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "Looks nicer now :-}" [integration/config] - 10https://gerrit.wikimedia.org/r/324437 (owner: 10Hashar) 
[13:23:44] <grrrit-wm>	 (03Merged) 10jenkins-bot: Tweak integration-config-qa email notification [integration/config] - 10https://gerrit.wikimedia.org/r/324437 (owner: 10Hashar) 
[13:47:02] <wmf-insecte>	 Yippee, build fixed!
[13:47:02] <wmf-insecte>	 Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #229: 09FIXED in 3 min 1 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/229/
[14:27:02] <grrrit-wm>	 (03PS1) 10Hashar: [EditPageTracking] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324450 
[14:27:04] <grrrit-wm>	 (03PS1) 10Hashar: [OpenIDConnect] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324451 
[14:27:06] <grrrit-wm>	 (03PS1) 10Hashar: [Auth_remoteuser] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324452 
[14:34:08] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] [EditPageTracking] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324450 (owner: 10Hashar) 
[14:34:12] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] [Auth_remoteuser] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324452 (owner: 10Hashar) 
[14:34:15] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] [OpenIDConnect] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324451 (owner: 10Hashar) 
[14:36:08] <grrrit-wm>	 (03Merged) 10jenkins-bot: [EditPageTracking] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324450 (owner: 10Hashar) 
[14:36:55] <grrrit-wm>	 (03Merged) 10jenkins-bot: [OpenIDConnect] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324451 (owner: 10Hashar) 
[14:36:57] <grrrit-wm>	 (03Merged) 10jenkins-bot: [Auth_remoteuser] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324452 (owner: 10Hashar) 
[14:44:58] <grrrit-wm>	 (03PS1) 10Hashar: mediawiki-extensions-* jobs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324456 (https://phabricator.wikimedia.org/T135001) 
[14:45:28] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] "Yay" [integration/config] - 10https://gerrit.wikimedia.org/r/324456 (https://phabricator.wikimedia.org/T135001) (owner: 10Hashar) 
[14:45:38] <wikibugs_>	 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate PHPUnit MediaWiki core jobs to Nodepool - https://phabricator.wikimedia.org/T135001#2835039 (10hashar)
[14:50:04] <grrrit-wm>	 (03PS1) 10Hashar: mediawiki HHVM job from Trusty to Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/324457 
[14:54:41] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] mediawiki HHVM job from Trusty to Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/324457 (owner: 10Hashar) 
[14:55:41] <grrrit-wm>	 (03Merged) 10jenkins-bot: mediawiki HHVM job from Trusty to Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/324457 (owner: 10Hashar) 
[14:57:18] <grrrit-wm>	 (03PS2) 10Hashar: mediawiki-extensions-* jobs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324456 (https://phabricator.wikimedia.org/T135001) 
[14:59:27] <grrrit-wm>	 (03PS1) 10Hashar: [CryoKey] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324463 
[15:03:42] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] [CryoKey] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324463 (owner: 10Hashar) 
[15:05:32] <shinken-wm>	 PROBLEM - Host deployment-elastic08 is DOWN: CRITICAL - Host Unreachable (10.68.21.29)
[15:06:33] <grrrit-wm>	 (03Merged) 10jenkins-bot: [CryoKey] make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/324463 (owner: 10Hashar) 
[15:44:17] <grrrit-wm>	 (03PS3) 10Hashar: mediawiki-extensions-* jobs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324456 (https://phabricator.wikimedia.org/T135001) 
[15:44:19] <grrrit-wm>	 (03PS1) 10Hashar: Drop mediawiki-phpunit-hhvm-jessie from experimental [integration/config] - 10https://gerrit.wikimedia.org/r/324470 
[15:44:26] <shinken-wm>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:47:59] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Drop mediawiki-phpunit-hhvm-jessie from experimental [integration/config] - 10https://gerrit.wikimedia.org/r/324470 (owner: 10Hashar) 
[15:48:53] <grrrit-wm>	 (03Merged) 10jenkins-bot: Drop mediawiki-phpunit-hhvm-jessie from experimental [integration/config] - 10https://gerrit.wikimedia.org/r/324470 (owner: 10Hashar) 
[15:50:00] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] mediawiki-extensions-* jobs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324456 (https://phabricator.wikimedia.org/T135001) (owner: 10Hashar) 
[15:51:42] <grrrit-wm>	 (03Merged) 10jenkins-bot: mediawiki-extensions-* jobs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324456 (https://phabricator.wikimedia.org/T135001) (owner: 10Hashar) 
[16:07:55] <addshore>	 hashar: does beta not have a beta restbase?
[16:08:17] <hashar>	 (let me find out how to give you back an answer that does not sound like trolling)
[16:08:24] <addshore>	 :D
[16:08:34] <hashar>	 maybe they have docker images running on rackspace?
[16:08:48] <hashar>	 honestly, I have no idea
[16:09:06] <addshore>	 okay! :D
[16:09:17] <hashar>	 maybe it is not quite possible to setup kafka/cassandra/whatever trendy tech on labs instance
[16:09:58] <hashar>	 addshore: there is deployment-restbase01.deployment-prep.eqiad.wmflabs !!
[16:10:20] <hashar>	 isnt VisualEditor relying on restbase to reach parsoid nowadays ?
[16:10:48] <hashar>	 and there is
[16:10:49] <hashar>	 wmf-config/LabsServices.php:$wmfAllServices['eqiad']['restbase'] = 'http://10.68.17.189:7231'; // deployment-restbase02.deployment-prep.eqiad.wmflabs
[16:11:25] <hashar>	 that later url seems to respond addshore !
[16:12:11] <addshore>	 oooh
[16:13:11] <addshore>	 https://deployment.wikimedia.beta.wmflabs.org/api/rest_v1/page/html/User%3AErikaHerzog?redirect=false
[16:13:17] <addshore>	 oooh, okay it is there
[16:13:26] <hashar>	 Hi! I am active on the Wikimedia and Wikipedia projects as User:BrillLyle on English Wikipedia and am part of Wikimedia NYC.
[16:13:36] <hashar>	 ;D
[16:13:37] <addshore>	 but the doc page is a 404 https://deployment.wikimedia.beta.wmflabs.org/api/rest_v1
[16:13:41] <hashar>	 wrong user
[16:13:53] <addshore>	 haha, that was just a sample page ;)
[16:13:59] <hashar>	 trailing slash issue I guess
[16:14:00] <hashar>	 https://deployment.wikimedia.beta.wmflabs.org/api/rest_v1/
[16:14:01] <hashar>	 works
[16:14:12] <hashar>	 TIL restbase is available on beta :}}}
[16:14:22] <addshore>	 ahh epic, so it does exist for beta, and it's in the same place! woo!
[16:14:23] <hashar>	 there are some entry point that would not be availalbe
[16:14:32] <hashar>	 I have seen a task related to setting up page views api on beta
[16:14:44] <hashar>	 and eventually I think folks will use a mock/fake database
[16:14:56] <hashar>	 instead of trying to replicate the whole analytics cluster
[16:15:55] <wikibugs>	 06Release-Engineering-Team, 06Operations, 06Parsing-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2835239 (10Joe) I have set arenas for jemalloc to be equal to the number of processors seen by the OS, the bandaid fix should be in the process of being remo...
[16:20:30] <grrrit-wm>	 (03PS1) 10Hashar: Switch to mediawiki-extensions-* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/324477 (https://phabricator.wikimedia.org/T135001) 
[16:21:00] <wikibugs>	 06Release-Engineering-Team, 06Operations, 06Parsing-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2835255 (10Joe) So, with the HHVM part "solved" we still should take the prevention measures I named here:  - Check the concurrency/retry/timeout rates of al...
[16:21:00] <grrrit-wm>	 (03PS2) 10Hashar: Switch to Nodepool mediawiki-extensions-* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/324477 (https://phabricator.wikimedia.org/T135001) 
[16:22:49] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Switch to Nodepool mediawiki-extensions-* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/324477 (https://phabricator.wikimedia.org/T135001) (owner: 10Hashar) 
[16:31:28] <grrrit-wm>	 (03PS3) 10Hashar: Switch to mediawiki-extensions-* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/324477 (https://phabricator.wikimedia.org/T135001) 
[16:41:09] <andrewbogott>	 releng folks, gerrit seems to be broken
[16:41:34] <andrewbogott>	 giuseppe is looking at it, as of a minute ago in _security
[16:41:46] <paladox>	 see -operations
[16:41:53] <paladox>	 andrewbogott ^^
[16:41:54] <andrewbogott>	 heh, ok :)
[16:41:59] <paladox>	 :)
[16:42:27] <andrewbogott>	 paladox: I just thought the releng people might want to know
[16:42:37] <paladox>	 Oh ok
[16:42:46] <paladox>	 :)
[16:56:38] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 07Mobile, 13Patch-For-Review: CommentStreams: The module 'ext.CommentStreams' must not have target 'mobile' because its dependency 'jquery.ui.dialog' does not have it - https://phabricator.wikimedia.org/T151863#2835374 (10cicalese) Thank you @K...
[17:01:41] <grrrit-wm>	 (03CR) 10Hashar: "Should work (checked via experimental pipeline) but one never know. Will do tomorrow I guess." [integration/config] - 10https://gerrit.wikimedia.org/r/324477 (https://phabricator.wikimedia.org/T135001) (owner: 10Hashar) 
[17:04:42] <wmf-insecte>	 Project beta-code-update-eqiad build #132416: 04FAILURE in 1 min 41 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/132416/
[17:06:07] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 07Mobile, 13Patch-For-Review: CommentStreams: The module 'ext.CommentStreams' must not have target 'mobile' because its dependency 'jquery.ui.dialog' does not have it - https://phabricator.wikimedia.org/T151863#2835411 (10cicalese) OK, tests pa...
[17:09:17] <wmf-insecte>	 Yippee, build fixed!
[17:09:17] <wmf-insecte>	 Project beta-code-update-eqiad build #132417: 09FIXED in 1 min 47 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/132417/
[17:11:01] <gehel>	 !log rolling restart of deployment-elastic0* - upgrade to Java 8 - T151325
[17:11:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[17:12:53] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 07Mobile, 13Patch-For-Review: CommentStreams: The module 'ext.CommentStreams' must not have target 'mobile' because its dependency 'jquery.ui.dialog' does not have it - https://phabricator.wikimedia.org/T151863#2835448 (10hashar) There is no ro...
[17:13:06] <hashar>	 I am off
[17:16:00] <wikibugs_>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 07Mobile, 13Patch-For-Review: CommentStreams: The module 'ext.CommentStreams' must not have target 'mobile' because its dependency 'jquery.ui.dialog' does not have it - https://phabricator.wikimedia.org/T151863#2835453 (10cicalese) Great! I see...
[17:16:40] <wikibugs_>	 06Release-Engineering-Team, 06Operations, 06Parsing-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2835456 (10greg) >>! In T151702#2835255, @Joe wrote: > So, with the HHVM part "solved" we still should take the prevention measures I named here: >  > - Chec...
[17:22:24] <gehel>	 !log restart of logstash on deployment-logstash2 - upgrade to Java 8 - T151325
[17:22:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[17:23:50] <wikibugs_>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 07Mobile, 13Patch-For-Review: CommentStreams: The module 'ext.CommentStreams' must not have target 'mobile' because its dependency 'jquery.ui.dialog' does not have it - https://phabricator.wikimedia.org/T151863#2835488 (10cicalese) One more que...
[17:27:25] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 07Mobile, 13Patch-For-Review: CommentStreams: The module 'ext.CommentStreams' must not have target 'mobile' because its dependency 'jquery.ui.dialog' does not have it - https://phabricator.wikimedia.org/T151863#2835507 (10hashar) Sure thing! Th...
[17:29:52] <wikibugs_>	 06Release-Engineering-Team, 06Operations, 06Parsing-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2835543 (10Joe) @greg yeah I know, I'll do my homework, promised :)  I'm just waiting to see if the issue happens again in the next couple of days before clo...
[17:30:03] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 07Mobile, 13Patch-For-Review: CommentStreams: The module 'ext.CommentStreams' must not have target 'mobile' because its dependency 'jquery.ui.dialog' does not have it - https://phabricator.wikimedia.org/T151863#2835544 (10cicalese) Awesome! Tha...
[18:18:20] <wikibugs_>	 03Scap3: scap version flag - https://phabricator.wikimedia.org/T147155#2835730 (10mmodell) 05Open>03Resolved a:03mmodell resolved by D448
[20:07:31] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:14:29] <wikibugs>	 03Scap3, 10Parsoid: Scap rollback fails after promote completes - https://phabricator.wikimedia.org/T149012#2836204 (10dduvall) 05Open>03Resolved a:03dduvall Implemented in {D439}
[20:21:26] <wikibugs_>	 10Continuous-Integration-Config, 06Operations, 06Operations-Software-Development: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169#2590514 (10fgiunchedi) After some discussion in https://gerrit.wikimedia.org/r/#/c/323559/ I've changed my vote to "automatica...
[20:47:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:07:32] <wikibugs>	 10Gerrit, 06Operations, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2836410 (10Dzahn) today we disabled gc on gerrit completely  https://gerrit.wikimedia.org/r/#/c/323655/  this was linked to T151676 a related ticket
[21:08:17] <wikibugs>	 10Gerrit, 06Operations, 07Beta-Cluster-reproducible, 13Patch-For-Review: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2824332 (10Dzahn) now gc is disabled.  also see T148478
[21:09:14] <wikibugs>	 10Gerrit, 06Operations, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016  30/11/2016 - https://phabricator.wikimedia.org/T148478#2724179 (10Dzahn)
[21:22:17] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:22:48] <shinken-wm>	 PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[21:23:14] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:24:24] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:24:24] <shinken-wm>	 PROBLEM - Puppet run on deployment-puppetmaster02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:24:24] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-docker-1000 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:25:05] <shinken-wm>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:25:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-stream is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:26:07] <shinken-wm>	 PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:27:05] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:27:27] <shinken-wm>	 PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:27:38] <shinken-wm>	 PROBLEM - Puppet run on deployment-pdf01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[21:27:45] <mutante>	 this will also be https://gerrit.wikimedia.org/r/#/c/256890/11
[21:27:48] <mutante>	 and was reverted
[21:28:10] <shinken-wm>	 PROBLEM - Puppet run on deployment-restbase02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:28:18] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:28:22] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-android is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:30:10] <shinken-wm>	 PROBLEM - Puppet run on deployment-imagescaler01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:30:30] <shinken-wm>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:30:34] <shinken-wm>	 PROBLEM - Puppet run on deployment-poolcounter04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[21:31:10] <shinken-wm>	 PROBLEM - Puppet run on integration-publisher is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:32:05] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:32:11] <shinken-wm>	 PROBLEM - Puppet run on deployment-ircd is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:32:17] <shinken-wm>	 PROBLEM - Puppet run on deployment-sentry01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:32:19] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:32:25] <shinken-wm>	 PROBLEM - Puppet run on deployment-secureredirexperiment is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:33:13] <shinken-wm>	 PROBLEM - Puppet run on deployment-memc04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:33:35] <shinken-wm>	 PROBLEM - Puppet run on deployment-db03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:34:07] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-be01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:34:14] <shinken-wm>	 PROBLEM - Puppet run on deployment-kafka03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:40:36] <wikibugs>	 10Deployment-Systems, 10Architecture, 07Availability: WikiDev 16 working area: Software engineering - https://phabricator.wikimedia.org/T119032#2836513 (10daniel)
[22:00:05] <shinken-wm>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[22:01:05] <shinken-wm>	 RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:02:03] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:02:19] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:02:47] <shinken-wm>	 RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[22:03:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:03:24] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-android is OK: OK: Less than 1.00% above the threshold [0.0]
[22:04:24] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-docker-1000 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:04:24] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1004 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:04:26] <shinken-wm>	 RECOVERY - Puppet run on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:05:16] <greg-g>	 we good?
[22:05:31] <shinken-wm>	 RECOVERY - Puppet run on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[22:05:31] <greg-g>	 ah, I see daniel's comment
[22:05:33] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:05:33] <shinken-wm>	 RECOVERY - Puppet run on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:05:50] <greg-g>	 this == that == the puppet failures
[22:06:20] <mutante>	 it was a change in base, that's why it affected prod and labs
[22:06:36] * greg-g nods
[22:06:55] <mutante>	 i will also restart the prod icinga bot now, or you would have seen spam in -operations too
[22:07:10] <shinken-wm>	 RECOVERY - Puppet run on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0]
[22:07:18] <shinken-wm>	 RECOVERY - Puppet run on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:07:20] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:07:26] <shinken-wm>	 RECOVERY - Puppet run on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0]
[22:07:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[22:07:36] <shinken-wm>	 RECOVERY - Puppet run on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:08:08] <shinken-wm>	 RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:08:12] <shinken-wm>	 RECOVERY - Puppet run on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:08:15] <shinken-wm>	 RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:09:04] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:09:14] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:10:12] <shinken-wm>	 RECOVERY - Puppet run on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:11:09] <shinken-wm>	 RECOVERY - Puppet run on integration-publisher is OK: OK: Less than 1.00% above the threshold [0.0]
[22:12:05] <shinken-wm>	 RECOVERY - Puppet run on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:13:35] <shinken-wm>	 RECOVERY - Puppet run on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:03:18] <wikibugs>	 10Gerrit, 06Operations, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016  30/11/2016 - https://phabricator.wikimedia.org/T148478#2836728 (10Paladox) The cpu seems to be still very high https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&c=Miscellane...
[23:10:08] <grrrit-wm>	 (03PS11) 10Paladox: Support extension and skin dependacies in the skin pipeline and extension pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/323540 (https://phabricator.wikimedia.org/T151593) 
[23:11:40] <grrrit-wm>	 (03PS12) 10Paladox: Support extension and skin dependacies in the skin pipeline and extension pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/323540 (https://phabricator.wikimedia.org/T151593)