[02:11:59] <shinken-wm_>	 PROBLEM - Puppet staleness on integration-slave-precise-1012 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[02:13:30] <shinken-wm_>	 PROBLEM - Puppet staleness on integration-slave-precise-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[02:14:06] <shinken-wm_>	 PROBLEM - Puppet staleness on integration-slave-precise-1011 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[03:03:08] <wikibugs_>	 (03PS1) 10Reedy: Simplify config, remove additions/removal upto REL1_23 [tools/release] - 10https://gerrit.wikimedia.org/r/343589
[03:05:24] <wikibugs_>	 (03PS2) 10Reedy: Simplify config, remove additions/removal upto REL1_23 [tools/release] - 10https://gerrit.wikimedia.org/r/343589
[03:05:26] <wikibugs_>	 (03PS1) 10Reedy: After REL1_23 is obsolete, simplify config further [tools/release] - 10https://gerrit.wikimedia.org/r/343591
[03:05:47] <wikibugs_>	 (03CR) 10Reedy: [C: 04-2] "Probably shouldn't be merged till ~May when REL1_23 goes EOL" [tools/release] - 10https://gerrit.wikimedia.org/r/343591 (owner: 10Reedy)
[03:45:01] <wikibugs_>	 10Gerrit, 06Operations: Decide how to support polygerrit - https://phabricator.wikimedia.org/T158479#3113507 (10Dzahn) Very nice that we don't need the rewrites anymore :)
[04:04:33] <shinken-wm_>	 PROBLEM - Puppet staleness on deployment-aqs02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[04:07:45] <wmf-insecte>	 Project selenium-MultimediaViewer » safari,beta,OS X 10.9,BrowserTests build #335: 04FAILURE in 11 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=BrowserTests/335/
[04:17:56] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #335: 04FAILURE in 21 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/335/
[05:48:19] <wmf-insecte>	 Project mediawiki-core-code-coverage build #2644: 04STILL FAILING in 2 hr 48 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2644/
[08:51:03] <hashar>	 !log Jenkins: depooling / deleting Precise instances.
[08:51:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[08:52:08] <shinken-wm_>	 PROBLEM - Host integration-slave-precise-1012 is DOWN: CRITICAL - Host Unreachable (10.68.17.174)
[08:52:49] <wikibugs_>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: Depool precise jenkins instances - https://phabricator.wikimedia.org/T158652#3113710 (10hashar) Removed from Jenkins, puppet master and salt master. I have deleted the three instances via Horizon.
[08:53:14] <shinken-wm_>	 PROBLEM - Host integration-slave-precise-1011 is DOWN: CRITICAL - Host Unreachable (10.68.17.70)
[08:53:36] <shinken-wm_>	 PROBLEM - Host integration-slave-precise-1002 is DOWN: CRITICAL - Host Unreachable (10.68.17.87)
[10:09:44] <wikibugs_>	 (03PS2) 10Hashar: [operations/switchdc] add tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/343005 (owner: 10Volans)
[10:09:52] <wikibugs_>	 (03CR) 10Hashar: [C: 032] [operations/switchdc] add tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/343005 (owner: 10Volans)
[10:11:41] <wikibugs_>	 (03Merged) 10jenkins-bot: [operations/switchdc] add tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/343005 (owner: 10Volans)
[10:12:55] <wikibugs_>	 (03PS3) 10Hashar: test: invoke rspec directly [selenium] - 10https://gerrit.wikimedia.org/r/330856 (https://phabricator.wikimedia.org/T137112)
[10:12:57] <wikibugs_>	 (03PS2) 10Hashar: (WIP) have cucumber to auto install phantomjs (WIP) [selenium] - 10https://gerrit.wikimedia.org/r/330864
[13:25:29] <dcausse>	 twentyafterfour: hi, are you ready to switch phab search to codfw? 
[13:26:11] <twentyafterfour>	 dcausse: yeah, I just need to make a config change patch and get +2 in ops/puppet
[13:26:52] <dcausse>	 twentyafterfour: ok, we are going to first depool eqiad so that mw stops writing to it, then we will shutdown the cluster
[13:27:05] <dcausse>	 so please switch when you want 
[13:29:53] <twentyafterfour>	 dcausse: ok I'll put the change through right away
[13:30:19] <dcausse>	 thanks :)
[13:32:50] <twentyafterfour>	 dcausse: https://gerrit.wikimedia.org/r/#/c/343635/
[13:36:24] <dcausse>	 twentyafterfour: will ask gehel to +2 when we're ready but not sure what to do about this tox failure :/ 
[13:36:53] <twentyafterfour>	 yeah I don't know what's up with that either, I can't actually see anything but warnings
[13:36:59] * gehel is having alook...
[13:37:57] <gehel>	 yeah, the 140 char per line is a new rule. The checks are run only on files that are changed, so we need to fix it.
[13:39:12] <gehel>	 fix coming up...
[13:39:45] <twentyafterfour>	 that shouldn't be listed as a warning if it's gonna fail and it should only be enforced on lines of code that changed
[13:40:05] <gehel>	 agreed and agreed, but that's not the case...
[13:46:40] <wmf-insecte>	 Yippee, build fixed!
[13:46:40] <wmf-insecte>	 Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #342: 09FIXED in 2 min 39 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/342/
[13:49:09] <shinken-wm_>	 PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[13:49:31] <shinken-wm_>	 RECOVERY - Puppet staleness on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [3600.0]
[13:52:17] <shinken-wm_>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 3816 bytes in 0.198 second response time
[13:52:17] <shinken-wm_>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 3250 bytes in 0.092 second response time
[13:52:20] <shinken-wm_>	 PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 3816 bytes in 0.185 second response time
[13:57:16] <shinken-wm_>	 RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 33422 bytes in 0.957 second response time
[13:57:16] <shinken-wm_>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 45865 bytes in 0.769 second response time
[13:57:18] <shinken-wm_>	 RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 46474 bytes in 0.678 second response time
[15:33:26] <wikibugs_>	 10Scap: If aborting a scap due to test canary error rate, output some errors for reference - https://phabricator.wikimedia.org/T159991#3114626 (10thcipriani) p:05Triage>03Normal It does seem like showing some errors would be helpful; although possibly a little tricky to get at that information.  The reason s...
[15:43:29] <wikibugs_>	 10Deployment-Systems, 10Scap, 15User-bd808: sync-wikiversions not syncing wikiversions.json with mira - https://phabricator.wikimedia.org/T121585#3114671 (10thcipriani) 05Open>03Resolved a:03bd808 Haven't noticed the problem since {rMSCA9fbd6f8f486a7405e3df7125bd50ad03164f5512} merged.
[15:50:20] <paladox>	 RainbowSprinkles hi, i've managed to fix polygerrit for prefixed urls fully (there may be some places i havent fixed it but it should work in almost all cases in my testing.) I have found significate performance improvements using polygerrit compared to gwt. That includes cherry pick which worked immeditly for me. This dosen't messure up to prod though. :) (this dosen't require rewrites either only a new config in gerrit.config)
[15:53:20] <wmf-insecte>	 Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #365: 04FAILURE in 31 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/365/
[15:54:27] <shinken-wm_>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[16:22:38] <wikibugs_>	 (03PS4) 10Paladox: [operations-puppet-catalog-compiler] Adding it to jenkins job builder [integration/config] - 10https://gerrit.wikimedia.org/r/315994 (https://phabricator.wikimedia.org/T97513)
[16:25:47] <wikibugs_>	 (03PS5) 10Paladox: Create a test that deploy's jjb changes without needing to ssh in and deploy your self [integration/config] - 10https://gerrit.wikimedia.org/r/323198
[16:25:51] <wikibugs_>	 (03PS6) 10Paladox: Create a test that deploy's jjb changes without needing to ssh in and deploy your self [integration/config] - 10https://gerrit.wikimedia.org/r/323198
[16:26:14] <wikibugs_>	 (03PS7) 10Paladox: Create a test that deploy's jjb changes without needing to ssh in and deploy your self [integration/config] - 10https://gerrit.wikimedia.org/r/323198
[16:32:23] <wikibugs_>	 10Continuous-Integration-Infrastructure (Little Steps Sprint): For operations/puppet : merge tox / rake jobs in a single job? - https://phabricator.wikimedia.org/T160923#3114888 (10hashar)
[16:35:17] <wikibugs_>	 10Continuous-Integration-Infrastructure (Little Steps Sprint): Get rid of zend tests for wmf branches - https://phabricator.wikimedia.org/T94149#3114924 (10hashar) Talked about it again during the release engineering meeting.  We believe that although maintenance script are still using Zend php5.5, running tests...
[16:37:51] <wikibugs_>	 (03PS4) 10Paladox: Revert "Rely on `php` in assert-phpflavor macro" [integration/config] - 10https://gerrit.wikimedia.org/r/337225 (https://phabricator.wikimedia.org/T157750)
[16:40:23] <wikibugs_>	 (03PS3) 10Paladox: Whitelist tosfos <tosfos@yahoo.com> [integration/config] - 10https://gerrit.wikimedia.org/r/343403
[16:44:00] <wikibugs_>	 (03Abandoned) 10Paladox: Reuse phplint code in job-template.yaml [integration/config] - 10https://gerrit.wikimedia.org/r/313230 (owner: 10Paladox)
[16:52:00] <SMalyshev>	 anybody here that knows about browser tests for mediawiki?
[16:52:40] <hashar>	 SMalyshev: zeljkof / marxarelli. Though we are in a team meeting right now
[16:52:56] <hashar>	 and we have a patch to migrate the testruner to javascript :]
[16:53:49] <hashar>	 https://gerrit.wikimedia.org/r/#/c/328191/  which will land this week 
[16:56:52] <zeljkof>	 SMalyshev: what's the question?
[16:57:40] <SMalyshev>	 zelikof: I got an impression from https://www.mediawiki.org/wiki/Browser_testing/Writing_tests#API_login that browser tests don't support logging in via browser session
[16:57:46] <SMalyshev>	 is that true
[16:58:00] <SMalyshev>	 zeljkof: ^?
[16:58:18] <zeljkof>	 SMalyshev: oh man, where to start
[16:58:34] <zeljkof>	 not sure how you stumbled upon that page, or why it isn't marked as obsolete
[16:58:56] <SMalyshev>	 I just looked for any docs on our browser tests. Are there better ones?
[16:58:57] <zeljkof>	 in short, you can log in via the API or via the web interface
[16:59:19] <zeljkof>	 SMalyshev: https://www.mediawiki.org/wiki/Selenium
[16:59:34] <zeljkof>	 so, at the moment we have two stacks, ruby and nodejs
[16:59:54] <zeljkof>	 we are phasing out ruby stack, and introducing nodejs stack
[16:59:58] <zeljkof>	 what do you want to do?
[16:59:59] <SMalyshev>	 zeljkof: ah, so you can log in via web int. Cool! 
[17:00:06] <SMalyshev>	 I'm using cucumber tests so that'd be ruby I imagine
[17:00:28] <zeljkof>	 what are you trying to do? what are you testing?
[17:00:31] <SMalyshev>	 since all other tests we have for Cirrus are in cucumber I guess I need that one too
[17:00:50] <zeljkof>	 ok, yes, cirrus has big investment in ruby stack
[17:00:59] <SMalyshev>	 zeljkof: a feature in Special:Undelete. Unfortunately it is a) only available in GUI and b) requires admin to see
[17:01:27] <zeljkof>	 you can log in via the api and then insert a cookie in the browser and be logged in
[17:01:33] <zeljkof>	 I think it is the fastest way to do it
[17:01:34] <SMalyshev>	 you can see it here: https://gerrit.wikimedia.org/r/#/c/281077/19/tests/browser/features/update_general_api.feature
[17:01:59] <SMalyshev>	 zeljkof: are there any examples in the code of how to do it?
[17:02:21] <zeljkof>	 SMalyshev: sure, looking for them, it should be in the mediawiki_selenium gem 
[17:02:57] <zeljkof>	 https://phabricator.wikimedia.org/diffusion/MSEL/browse/master/lib/mediawiki_selenium/step_definitions/login_steps.rb
[17:03:08] <zeljkof>	 I think you can use it like this, it has been a while...
[17:03:37] <zeljkof>	 in the feature file: Given I am logged in as Admin
[17:04:07] <SMalyshev>	 aha sure I will try that. Thanks!
[17:04:26] <zeljkof>	 and it should just work, I mean log you in as Admin, if you have Admin and credentials in env variables or config file
[17:05:03] <zeljkof>	 it's 6pm here so I will probably be away, but feel free to ask, or send me e-mail
[17:05:25] <SMalyshev>	 zeljkof: sure, thanks for your help!
[17:25:45] <paladox>	 RainbowSprinkles, hi im wondering do you know how i can get java to look into json like data: {["fields": {"name": "test"}]}?
[17:26:00] <paladox>	 Im trying to do it here https://gerrit-review.googlesource.com/#/c/98611/12/src/main/java/com/googlesource/gerrit/plugins/its/phabricator/conduit/Conduit.java
[17:26:09] <paladox>	 on replacing phabricator deprecated code in the its-phabricator plugin but carn't find a way to do it
[17:26:18] <RainbowSprinkles>	 Whatever json library they have bundled in, I suppose
[17:26:21] <RainbowSprinkles>	 I dunno what it is
[17:26:25] <paladox>	 gson
[17:26:28] <RainbowSprinkles>	 Yeah that then
[17:27:02] <paladox>	 Yep, been looking at a lot of docs which kind of do what i want to do but in the end it dosen't do it.
[17:53:25] <paladox>	 RainbowSprinkles fixed it by removing the if checks.
[17:56:37] <paladox>	 RainbowSprinkles i've tested https://gerrit-review.googlesource.com/#/c/98576/ and https://gerrit-review.googlesource.com/#/c/98611/ and https://gerrit-review.googlesource.com/#/c/98613/
[17:56:44] <paladox>	 twentyafterfour ^^
[18:06:23] <wikibugs_>	 (03PS1) 10Reedy: Start branching CollaborationKit [tools/release] - 10https://gerrit.wikimedia.org/r/343686 (https://phabricator.wikimedia.org/T138326)
[18:19:05] <SMalyshev>	 anybody knows why git review -d may fail with 404? 
[18:19:06] <SMalyshev>	 Cannot query patchset information
[18:19:06] <SMalyshev>	 The following command failed with exit code 104
[18:19:07] <SMalyshev>	     "GET https://gerrit.wikimedia.org/changes/?q=Id6099fe9fbf18481068a6f0a329bbde0d218135f&o=CURRENT_REVISION"
[18:19:46] <SMalyshev>	 looks like maybe some old config but no idea where it comes from...
[18:22:49] <RainbowSprinkles>	 git-review does dumb things sometimes :(
[18:23:07] <RainbowSprinkles>	 known workaround: ssh as your remote instead of https
[18:23:20] <RainbowSprinkles>	 best workaround: stop using git-review
[18:23:33] <SMalyshev>	 not really an option here I think
[18:24:28] <RainbowSprinkles>	 Not using git-review isn't an option?
[18:24:56] <SMalyshev>	 well, how do I download gerrit change without git-review?
[18:25:30] <RainbowSprinkles>	 The top right corner of a change has a box that says "Download"
[18:25:39] <RainbowSprinkles>	 Gives you copy+pasteable commands for checkout / cherry pick / pull
[18:25:43] <RainbowSprinkles>	 That's what I use all day :)
[18:25:46] <wmf-insecte>	 Project mediawiki-core-code-coverage build #2645: 04STILL FAILING in 3 hr 25 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/2645/
[18:25:59] <SMalyshev>	 unfortunately python scripts are not great at finding boxes and copy-pasting things
[18:26:35] <RainbowSprinkles>	 eg: git fetch https://gerrit.wikimedia.org/r/operations/puppet refs/changes/88/342788/2 && git checkout FETCH_HEAD
[18:26:48] <RainbowSprinkles>	 The only part you don't know from the CLI is the /2 for the current PS #
[18:27:02] <SMalyshev>	 riight. And also I need to get there from change id
[18:27:09] <SMalyshev>	 that's why there is git-review?
[18:27:15] <RainbowSprinkles>	 88 comes from the last 2 digits from the changeid (342788 -> 88, 1234 -> 32)
[18:27:26] * RainbowSprinkles shrugs
[18:27:44] <SMalyshev>	 342788 is not change id. Change id is something like Id6099fe9fbf18481068a6f0a329bbde0d218135f
[18:28:11] <SMalyshev>	 anyway, I'm not looking for a way to redesign the whole system, just to make git-review work as it worked before
[18:28:41] <RainbowSprinkles>	 I dunno, point is this is a known issue and the only known workaround is to use ssh :\
[18:29:01] <SMalyshev>	 ok tahnks, will try to use ssh then
[18:29:08] <RainbowSprinkles>	 Basically, git-review assumes you're running from the docroot and strips the /r/ sometimes
[18:29:14] <RainbowSprinkles>	 (but not always!)
[18:29:28] <SMalyshev>	 is there phab task about it?
[18:29:33] <RainbowSprinkles>	 I think....
[18:30:02] <RainbowSprinkles>	 I thought so, hmm....
[18:31:19] <RainbowSprinkles>	 T159869 / T100987 / T154760
[18:31:20] <stashbot>	 T154760: 404 downloading any changes with https remote url: "The requested URL /changes/ was not found" - https://phabricator.wikimedia.org/T154760
[18:31:20] <stashbot>	 T159869: git review -d fails with 404 on https://gerrit.wikimedia.org/changes/ endpoint - https://phabricator.wikimedia.org/T159869
[18:31:20] <stashbot>	 T100987: "git review -d XXX" doesn't work for http gerrit - https://phabricator.wikimedia.org/T100987
[18:32:17] <RainbowSprinkles>	 T100987 is the best one with workarounds & such
[18:34:11] <SMalyshev>	 RainbowSprinkles: thank you!
[18:34:16] <RainbowSprinkles>	 yw
[18:37:39] <wikibugs_>	 (03CR) 10Chad: [C: 032] Start branching CollaborationKit [tools/release] - 10https://gerrit.wikimedia.org/r/343686 (https://phabricator.wikimedia.org/T138326) (owner: 10Reedy)
[18:45:52] <wikibugs_>	 (03CR) 10Harej: [C: 031] Start branching CollaborationKit [tools/release] - 10https://gerrit.wikimedia.org/r/343686 (https://phabricator.wikimedia.org/T138326) (owner: 10Reedy)
[18:49:16] <wikibugs_>	 (03Merged) 10jenkins-bot: Start branching CollaborationKit [tools/release] - 10https://gerrit.wikimedia.org/r/343686 (https://phabricator.wikimedia.org/T138326) (owner: 10Reedy)
[18:49:32] <wikibugs_>	 06Release-Engineering-Team, 10TimedMediaHandler, 10TimedMediaHandler-Transcode, 07Wikimedia-maintenance-script-run: Please mass-reset the video transcodes of tens of thousand videos stuck in "Unknown" state - https://phabricator.wikimedia.org/T151199#2810203 (10brion) I've started a job running `requeueTra...
[18:49:33] <chasemp>	 hasharAway: thanks for taking care of those precise integration instances
[19:22:29] <hasharAway>	 chasemp: ah yeah Precise is all gone from beta and integration finally!!!! :]
[19:23:46] <hashar>	 chasemp: my next step will be to phase out Trusty entirely and then try to migrate to bootstrap-vz for the base images
[19:24:03] <chasemp>	 bootstrap-vz is pretty slick 
[19:25:17] <hashar>	 I gave it a short try last week, it lacks a lot of features I could use but then it is python so that is easily hackable
[19:25:38] <hashar>	 probably a good opportuniy to simplify the base images we use currently
[19:47:49] <Amir1>	 hashar: Hey, it seems jenkins times out every time on this: https://gerrit.wikimedia.org/r/#/c/343661/
[19:47:55] <Amir1>	 Can you take a look
[19:48:01] <Amir1>	 or it was just bad luck
[19:49:13] <RainbowSprinkles>	 E_TOOMANYTESTS
[19:52:07] <Amir1>	 That's Wikibase ;)
[19:52:38] <Amir1>	 So now jenkins won't anything on master because there are too many tests?
[19:52:53] <wikibugs_>	 (03PS1) 10Umherirrender: [FileImporter] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/343734
[19:53:53] <Reedy>	 jerkins is a dick
[19:54:31] <paladox>	 Maybe up it to 40mins?
[19:58:24] <wikibugs_>	 (03PS2) 10Umherirrender: [FileImporter] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/343734
[19:59:32] <wikibugs_>	 (03PS1) 10Umherirrender: [FileExporter] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/343737
[19:59:36] <wikibugs_>	 (03PS1) 10Ladsgroup: Increase timeout for Wikidata jobs from 30 minutes to 40 minutes [integration/config] - 10https://gerrit.wikimedia.org/r/343738
[19:59:51] <Amir1>	 paladox: https://gerrit.wikimedia.org/r/343738
[19:59:56] <Amir1>	 afak
[19:59:59] <Amir1>	 *afk
[20:00:03] <paladox>	 Thanks
[20:00:22] <wikibugs_>	 (03CR) 10Paladox: [C: 031] "Needed due to ci getting slower at peak times." [integration/config] - 10https://gerrit.wikimedia.org/r/343738 (owner: 10Ladsgroup)
[20:00:55] <RainbowSprinkles>	 Amir1: Another solution is to delete a bunch of tests ;-)
[20:03:51] <wikibugs_>	 (03PS1) 10Umherirrender: [TheWikipediaLibrary] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/343740
[20:04:01] <Reedy>	 and they're merging
[20:09:28] <bearND>	 !log Update mobileapps to c0ab01d
[20:09:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[20:11:53] <andrewbogott>	 I would like to cause some downtime for the following beta VMs:  deployment-pdf01, deployment-puppetmaster02, deployment-urldownloader
[20:11:58] <andrewbogott>	 does anyone object, or have words of wisdom?
[20:12:01] <andrewbogott>	 thcipriani for example?
[20:12:13] <thcipriani>	 :)
[20:12:51] <thcipriani>	 I do not object, puppetmaster02 is going to be noisy to take down in here, but should be fine.
[20:13:11] <andrewbogott>	 thcipriani: ok, I'll do that one first.  Right now OK?
[20:13:48] <thcipriani>	 andrewbogott: should be fine now.
[20:14:03] <andrewbogott>	 thcipriani: ok!  Thanks
[20:17:45] <shinken-wm_>	 PROBLEM - Host deployment-puppetmaster02 is DOWN: CRITICAL - Host Unreachable (10.68.21.200)
[20:18:11] <shinken-wm_>	 PROBLEM - Puppet run on deployment-jobrunner02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:18:54] <shinken-wm_>	 PROBLEM - Puppet run on deployment-fluorine02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:19:06] <shinken-wm_>	 PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:19:20] <shinken-wm_>	 PROBLEM - Puppet run on deployment-kafka04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:19:32] <shinken-wm_>	 PROBLEM - Puppet run on deployment-pdfrender02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:20:00] <shinken-wm_>	 PROBLEM - Puppet run on deployment-mira is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:20:14] <shinken-wm_>	 PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:20:30] <shinken-wm_>	 PROBLEM - Puppet run on deployment-zookeeper01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:20:46] <shinken-wm_>	 PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:21:57] <shinken-wm_>	 PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:22:57] <shinken-wm_>	 PROBLEM - Puppet run on deployment-prometheus01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:23:05] <shinken-wm_>	 PROBLEM - Puppet run on deployment-apertium02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:23:11] <shinken-wm_>	 PROBLEM - Puppet run on deployment-salt02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:23:33] <shinken-wm_>	 PROBLEM - Puppet run on deployment-elastic06 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:23:43] <shinken-wm_>	 PROBLEM - Puppet run on deployment-puppetdb01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:24:04] <shinken-wm_>	 PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:24:13] <shinken-wm_>	 PROBLEM - Puppet run on deployment-mediawiki04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:25:24] <shinken-wm_>	 PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:25:30] <shinken-wm_>	 PROBLEM - Puppet run on deployment-stream is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:25:48] <shinken-wm_>	 PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:26:06] <shinken-wm_>	 PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:27:04] <shinken-wm_>	 PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:27:28] <shinken-wm_>	 PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:28:18] <shinken-wm_>	 PROBLEM - Puppet run on deployment-trending01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:28:30] <shinken-wm_>	 PROBLEM - Puppet run on deployment-poolcounter04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:29:05] <shinken-wm_>	 PROBLEM - Puppet run on deployment-ms-be01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:29:16] <shinken-wm_>	 PROBLEM - Puppet run on deployment-ircd is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:29:21] <shinken-wm_>	 PROBLEM - Puppet run on deployment-sca01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:29:31] <shinken-wm_>	 PROBLEM - Puppet run on deployment-sca02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[20:32:07] <shinken-wm_>	 PROBLEM - Puppet run on deployment-restbase02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:32:13] <shinken-wm_>	 PROBLEM - Puppet run on deployment-memc04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:32:13] <shinken-wm_>	 PROBLEM - Puppet run on deployment-kafka03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:32:38] <shinken-wm_>	 PROBLEM - Puppet run on deployment-db03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:33:16] <shinken-wm_>	 PROBLEM - Puppet run on deployment-sentry01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:33:36] <shinken-wm_>	 PROBLEM - Puppet run on deployment-pdf01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:34:20] <shinken-wm_>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:34:20] <shinken-wm_>	 PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:34:26] <shinken-wm_>	 PROBLEM - Puppet run on deployment-secureredirexperiment is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:34:45] <andrewbogott>	 thcipriani: am I somehow causing a jenkins outage by migrating that beta VM?
[20:35:04] <shinken-wm_>	 PROBLEM - Puppet run on deployment-kafka01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:35:30] <shinken-wm_>	 PROBLEM - Puppet run on deployment-aqs02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:35:42] <shinken-wm_>	 PROBLEM - Puppet run on deployment-memc05 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:35:50] <thcipriani>	 andrewbogott: you shouldn't be...
[20:36:36] <andrewbogott>	 maybe it's just the normal mid-afternoon jenkins jam
[20:36:39] <thcipriani>	 deployment-puppetmaster02 and jenkins are not connected in any way except that it's the puppetmaster for beta and jenkins deploys stuff on beta
[20:37:07] <shinken-wm_>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:37:09] <thcipriani>	 things appear to be moving normally through zuul
[20:37:19] <shinken-wm_>	 PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:37:21] <shinken-wm_>	 PROBLEM - Puppet run on deployment-elastic05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:38:03] <shinken-wm_>	 PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:39:17] <shinken-wm_>	 PROBLEM - Puppet run on deployment-db04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:39:41] <shinken-wm_>	 PROBLEM - Puppet run on deployment-ms-be02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:40:30] <shinken-wm_>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:40:30] <shinken-wm_>	 PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:42:46] <shinken-wm_>	 PROBLEM - Puppet run on deployment-restbase01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:42:50] <shinken-wm_>	 RECOVERY - Host deployment-puppetmaster02 is UP: PING OK - Packet loss = 0%, RTA = 1.10 ms
[20:44:47] <shinken-wm_>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:44:53] <shinken-wm_>	 PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[20:46:23] <shinken-wm_>	 PROBLEM - Puppet run on deployment-mediawiki06 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:46:45] <shinken-wm_>	 PROBLEM - Host deployment-pdf01 is DOWN: CRITICAL - Host Unreachable (10.68.16.73)
[20:52:17] <hashar>	 andrewbogott: thcipriani: looks like that is just the instance  deployment-puppetmaster02 is DOWN
[20:52:28] <hashar>	 so puppet fails on all beta cluster instances which is probably not a big deal
[20:52:29] <shinken-wm_>	 RECOVERY - Host deployment-pdf01 is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms
[20:52:37] <andrewbogott>	 hashar: that was me, migrating it.  Should be back and happy by now.
[20:52:55] <hashar>	 \o/
[20:52:56] <thcipriani>	 indeed, starting to see recoveries
[20:54:02] <andrewbogott>	 thcipriani, hashar, I guess I never explained… one of the labvirts is having goofy IO problems so I'm trying to evacuate it.  Moving the staff-managed stuff first but at some point I'll probably have to make a list post about it.
[20:54:16] <andrewbogott>	 the symptoms are weird, generally undetectable from within a VM but still concerning.
[20:54:23] <hashar>	 which labvirt ?
[20:55:30] <shinken-wm_>	 RECOVERY - Puppet run on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:55:45] <shinken-wm_>	 PROBLEM - Host deployment-urldownloader is DOWN: CRITICAL - Host Unreachable (10.68.16.135)
[20:55:58] <andrewbogott>	 hashar: 1001
[20:56:09] <andrewbogott>	 It's already out of the scheduler pool, so nothing new should land there
[20:56:52] <Amir1>	 RainbowSprinkles: It seems you maintain gerritbot in phab. It has been a week that it doesn't make a comment when I make a patch. It does when they get merged though. And it happens only to me more strangely 
[20:57:17] <Amir1>	 https://phabricator.wikimedia.org/T160462#3105216
[20:57:29] <Amir1>	 https://phabricator.wikimedia.org/T151194
[20:57:45] <Amir1>	 https://phabricator.wikimedia.org/T160613
[20:58:03] <shinken-wm_>	 RECOVERY - Puppet run on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:58:07] <shinken-wm_>	 RECOVERY - Puppet run on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:58:13] <shinken-wm_>	 RECOVERY - Puppet run on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:58:35] <shinken-wm_>	 RECOVERY - Puppet run on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:58:45] <RainbowSprinkles>	 Amir1: I do not :)
[20:58:53] <shinken-wm_>	 RECOVERY - Puppet run on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:58:55] <RainbowSprinkles>	 I don't touch the bot
[20:59:02] <Amir1>	 https://wikitech.wikimedia.org/wiki/Gerrit_Notification_Bot
[20:59:09] <shinken-wm_>	 RECOVERY - Puppet run on deployment-phab02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:59:10] <Amir1>	 If there's a problem, file a bug in Phabricator and bother Chad or Christian.
[20:59:17] <harej>	 I'm the CollaborationKit person. I make myself available for questions, complaints, and other assorted commentary.
[20:59:23] <shinken-wm_>	 RECOVERY - Puppet run on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:59:33] <shinken-wm_>	 RECOVERY - Puppet run on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:59:39] <RainbowSprinkles>	 Amir1: That's old text :p
[20:59:44] <RainbowSprinkles>	 I've actually never touched it
[20:59:59] <shinken-wm_>	 RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[21:00:03] <Reedy>	 loool
[21:00:08] <Amir1>	 Do you know who does? :D
[21:00:14] <shinken-wm_>	 RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:00:22] <shinken-wm_>	 RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0]
[21:00:29] <RainbowSprinkles>	 Amir1: Paladox is the one who works on its-phabricator (the phab/gerrit bridge)
[21:00:38] <shinken-wm_>	 RECOVERY - Host deployment-urldownloader is UP: PING OK - Packet loss = 0%, RTA = 1.32 ms
[21:00:48] <shinken-wm_>	 RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0]
[21:00:48] <shinken-wm_>	 RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:00:51] <RainbowSprinkles>	 I worked on its-bugzilla, the prior version. But since the rewrite never looked at it :)
[21:01:06] <shinken-wm_>	 RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:01:28] <Amir1>	 Thanks. Sorry for bothering
[21:01:58] <shinken-wm_>	 RECOVERY - Puppet run on deployment-phab01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:02:00] <RainbowSprinkles>	 No worries. I'm the likely suspect :p
[21:02:06] <shinken-wm_>	 RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0]
[21:02:38] <Amir1>	 :)))
[21:02:56] <shinken-wm_>	 RECOVERY - Puppet run on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:03:18] <shinken-wm_>	 RECOVERY - Puppet run on deployment-trending01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:03:35] <shinken-wm_>	 RECOVERY - Puppet run on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:03:45] <shinken-wm_>	 RECOVERY - Puppet run on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:04:05] <shinken-wm_>	 RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:04:15] <shinken-wm_>	 RECOVERY - Puppet run on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:04:19] <shinken-wm_>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:05:30] <shinken-wm_>	 RECOVERY - Puppet run on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0]
[21:07:06] <shinken-wm_>	 RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:07:12] <shinken-wm_>	 RECOVERY - Puppet run on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:07:26] <shinken-wm_>	 RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]
[21:08:17] <shinken-wm_>	 RECOVERY - Puppet run on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:08:33] <shinken-wm_>	 RECOVERY - Puppet run on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:09:07] <shinken-wm_>	 RECOVERY - Puppet run on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:09:09] <shinken-wm_>	 RECOVERY - Puppet run on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0]
[21:09:17] <shinken-wm_>	 RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:09:25] <shinken-wm_>	 RECOVERY - Puppet run on deployment-secureredirexperiment is OK: OK: Less than 1.00% above the threshold [0.0]
[21:09:29] <shinken-wm_>	 RECOVERY - Puppet run on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:09:37] <paladox>	 Amir1 hi, what is your patch url please?
[21:10:05] <shinken-wm_>	 RECOVERY - Puppet run on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:10:12] <Amir1>	 paladox: I commented in the phab cards
[21:10:28] <Amir1>	 or when they got merged, you can see gerrit-bot's comment
[21:10:30] <shinken-wm_>	 RECOVERY - Puppet run on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:11:29] <paladox>	 Hmm strange
[21:11:44] <paladox>	 RainbowSprinkles seems the bot is not working on a mediawiki/core change
[21:12:06] <shinken-wm_>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:12:15] <shinken-wm_>	 RECOVERY - Puppet run on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:12:18] <shinken-wm_>	 RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:12:37] <shinken-wm_>	 RECOVERY - Puppet run on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:13:00] <shinken-wm_>	 RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:13:12] <paladox>	 Amir1 does it work on any other changes?
[21:13:35] <Amir1>	 paladox: It doesn't work in any repos but it's just for me
[21:13:43] <Amir1>	 other patches get through 
[21:13:49] <paladox>	 Amir1 dosen't work for me either. 
[21:14:19] <shinken-wm_>	 RECOVERY - Puppet run on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:14:21] <shinken-wm_>	 RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:14:39] <shinken-wm_>	 RECOVERY - Puppet run on deployment-ms-be02 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:15:00] <paladox>	 Scratch that it works for me here https://phabricator.wikimedia.org/T86229#3116139
[21:15:03] <paladox>	 but on my change
[21:15:31] <shinken-wm_>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:15:31] <shinken-wm_>	 RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:15:40] <shinken-wm_>	 RECOVERY - Puppet run on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:16:46] <paladox>	 Amir1 must be a bug or something. As it works on one of my mediawiki/core change but dosent with yours
[21:17:22] <shinken-wm_>	 RECOVERY - Puppet run on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:17:48] <shinken-wm_>	 RECOVERY - Puppet run on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:19:39] * paladox wonders if there is any errors to do with the gerritbot in the last few mins in the error log
[21:21:23] <shinken-wm_>	 RECOVERY - Puppet run on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:21:47] <Amir1>	 paladox: It happens in all repos, zuul config/wikibase/core/etc.
[21:22:18] <paladox>	 Hmm, it must be all your changes as mine worked. hashar did linking to bug reports work for you today?
[21:24:47] <shinken-wm_>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:24:53] <shinken-wm_>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[21:31:39] <MatmaRex>	 so, who clogged jenkins today? ;)
[21:32:02] <hashar>	 andrewbogott: labvirt1001 having an issue, could it be the reason the instances took a while to boot a couple weeks ago ? 
[21:32:04] <paladox>	 At peak times it will get slow
[21:32:19] <hashar>	 andrewbogott: iirc you ended up dropping labvirt1001 and labvirt1002 from the scheduler pool because of slow response
[21:32:24] <hashar>	 MatmaRex: depends :D
[21:32:39] <hashar>	 MatmaRex: yeah too many patches https://integration.wikimedia.org/zuul/ :(
[21:32:45] <hashar>	 we have a sprint to try to mitigate that
[21:32:53] <paladox>	 https://integration.wikimedia.org/zuul/
[21:32:59] <hashar>	 the short rule is for now  CR+2 get priority above everything else
[21:33:12] <hashar>	 we will eventually prioritize patches to operations/puppet and wmf branches
[21:33:34] <MatmaRex>	 some oojs/ui builds are suspiciously timing out: https://integration.wikimedia.org/ci/job/oojs-ui-npm-node-6-jessie/628/consoleFull – our build is terribly slow, but it's not half-an-hour slow
[21:33:49] <MatmaRex>	 npm install took 13 minutes?
[21:33:56] <andrewbogott>	 hashar: that's part of it, yes, scheduling on 1001 was super slow.  I'm not convinced that 1002 is actually troubled but it needs more study.
[21:34:11] <paladox>	 hashar what about merging mediawiki-extensions-qunit-jessie and mediawiki-core-qunit-jessie together?
[21:34:17] <MatmaRex>	 karma runs took 2 and 3 minutes?
[21:34:38] <paladox>	 As we could do an if check to run mediawiki/core specific checks. Which will also let a few more nodes avilable for use
[21:34:48] <hashar>	 andrewbogott: good luck with whatever hardware/driver/kernel issue you might end up hitting :(
[21:35:05] <andrewbogott>	 thanks.  Mostly I'm just trying to clear it out so I can drop it in Chris's lap
[21:35:08] <hashar>	 MatmaRex: potentially that might be some networking delay as well
[21:35:38] <MatmaRex>	 svg2png and image are also slower than i would expect. these are cpu-heavy
[21:35:50] <hashar>	 and yeah eventually some jobs will get merged so we avoid doing the same git clones / composer install / npm insall for each of the jobs
[21:37:20] <hashar>	 MatmaRex: there is some labvirt with high cpu usage (labvirt1006) https://grafana.wikimedia.org/dashboard/db/labs-capacity-planning?panelId=5&fullscreen&from=now-24h&to=now
[21:37:29] <MatmaRex>	 svg2png taking 158 seconds. that's insane. it takes 60 seconds on my machine, and my machine has an intel core 2 duo.
[21:37:33] <MatmaRex>	 yeah
[21:37:44] <hashar>	 that is for oojs/ui right?
[21:37:50] <MatmaRex>	 yes
[21:37:57] <MatmaRex>	 numbers from here specifically: https://integration.wikimedia.org/ci/job/oojs-ui-npm-node-6-jessie/628/consoleFull
[21:38:10] <hashar>	 we have to streamline the three jobs it is triggering in a single one :/
[21:38:18] * paladox wonders what labvirt1014 is. It's at 0%
[21:38:24] <hashar>	 each patchset ends up invoking svg2png three time (via three different jobs)
[21:38:38] <MatmaRex>	 (which is a failed gate build for https://gerrit.wikimedia.org/r/#/c/343748/)
[21:38:42] <hashar>	 that caused some nice overload last tuesday with 30+ patches being sent
[21:39:07] <hashar>	 paladox: I assume it is a spare machine in case one of the other labvirt explodes
[21:39:13] <paladox>	 oh
[21:39:16] <paladox>	 thanks
[21:39:42] <MatmaRex>	 yeah, i was here last tuesday too ;) but we didn't make quite as many patchsets this time (and we started merging them 24 hours early ;) )
[21:39:47] <hashar>	 MatmaRex: yeah and on that build npm install took 13 minutes apparently
[21:40:35] <MatmaRex>	 if the machines can't handle all the parallel cpu-heavy jobs, can they be made to run fewer jobs in parallel?
[21:41:05] <MatmaRex>	 it's silly for jobs to timeout because of that. it would be better if they are slow, but actually finish
[21:41:39] <paladox>	 Woulden't concurrent fix that?
[21:41:46] <paladox>	 ie set concurrent to false
[21:43:04] <hashar>	 MatmaRex: I cant remember the default timeout, but it is probably 30 minutes
[21:43:45] <MatmaRex>	 it is. that job timed out after 30 minutes
[21:44:07] <MatmaRex>	 it should totally be able to complete in 30 minutes though. it takes like 10 usually
[21:44:29] <hashar>	 unless we face CPU or I/O starvation on the node the job happened to run
[21:46:36] <paladox>	 We could put it at 40?
[21:46:54] <wikibugs_>	 (03CR) 10Thiemo Mättig (WMDE): [C: 031] Increase timeout for Wikidata jobs from 30 minutes to 40 minutes [integration/config] - 10https://gerrit.wikimedia.org/r/343738 (owner: 10Ladsgroup)
[21:48:43] <hashar>	 thcipriani: also on friday I crafed a new Grafana board showing the Nodepool pool details https://grafana.wikimedia.org/dashboard/db/nodepool-pool-details
[21:48:49] <hashar>	 eg split the pool graph in a graph per metric
[21:51:08] <thcipriani>	 nice
[21:51:23] <thcipriani>	 so in practice there are 17 nodepool instances ready at any time
[21:52:05] <hashar>	 yeah in nodepool.yaml that is the min-ready  parameter
[21:52:11] <hashar>	 so 12 jessie and 5 trusty
[21:52:35] <hashar>	 we could probably lower the trusty base line 
[21:52:48] <wikibugs_>	 10Continuous-Integration-Config, 06Release-Engineering-Team, 13Patch-For-Review: Switch MediaWiki coverage job from Trusty/Zend PHP 5.5 to Jessie/Zend PHP 7.0 - https://phabricator.wikimedia.org/T147778#3116264 (10Krinkle)
[21:52:50] <thcipriani>	 and like 10 are needed for every core job :(
[21:52:51] <wikibugs_>	 10Continuous-Integration-Config, 06Release-Engineering-Team, 10MediaWiki-Unit-tests: MediaWiki code coverage no longer runs parser tests - https://phabricator.wikimedia.org/T147779#3116262 (10Krinkle) 05Open>03Invalid I don't know about coverage, but both before and after the aforementioned changes, Pars...
[21:52:56] <thcipriani>	 core g+s
[21:53:06] <hashar>	 yeah gotta merge a bunch of those jobs
[21:53:21] <hashar>	 ie make the job that runs phpunit tests to first run composer test
[21:53:23] <wikibugs_>	 10Continuous-Integration-Config, 06Release-Engineering-Team, 10MediaWiki-Unit-tests: MediaWiki code coverage no longer runs parser tests - https://phabricator.wikimedia.org/T147779#3116266 (10Krinkle)
[21:53:25] <wikibugs_>	 10Continuous-Integration-Config, 06Release-Engineering-Team, 13Patch-For-Review: Switch MediaWiki coverage job from Trusty/Zend PHP 5.5 to Jessie/Zend PHP 7.0 - https://phabricator.wikimedia.org/T147778#2703156 (10Krinkle)
[21:53:33] <hashar>	 and we could run npm test as well in the same job
[21:53:38] <thcipriani>	 indeed, optimizing for instance use vs optimizing for fast parallel permanent agents
[21:53:42] <hashar>	 so we would do:  composer test, npm test, phpunit
[21:53:49] <hashar>	 and if composer fails early, the rest is skip entirely
[21:54:03] <paladox>	 MatmaRex James_F they are warnning for users who use the cdn for phantomjs to eanble the caching feature https://github.com/Medium/phantomjs/commit/d8ebc23016c784fe84e5d2b29ae57d157c8c5d84
[21:54:22] <hashar>	 thcipriani: and mediawiki-extensions-hhvm has to be overhauled. It is too slow
[21:55:16] <paladox>	 though i have no idea how it will work for us
[21:55:46] <hashar>	 anyway I am heading to bed it is largely over time
[21:56:03] <hashar>	 will work on the little steps sprint tomororow. and most probably deploy the high prio test pipeline
[22:01:20] <hashar>	 getting out. more CI refactoring tomorrow
[22:50:48] <bd808>	 oojs/ui is hogging the nodepool instances again :/
[22:56:27] <wikibugs_>	 (03PS4) 10Ejegg: Use upstream civicrm-buildkit [integration/config] - 10https://gerrit.wikimedia.org/r/336960
[22:57:30] <wikibugs_>	 (03CR) 10Ejegg: "This is no longer blocked! We now have mcrypt on the integration servers along with the rest of the extensions that upstream buildkit asks" [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg)
[23:17:47] <Krinkle>	 Hm.. yeah, took 45minutes to get a response on a GuidedTour patch.
[23:17:50] <Krinkle>	 each job only taking 2 minutes
[23:17:55] <Krinkle>	 but it got stalled behidn the queue for a long time
[23:17:58] <Krinkle>	 https://grafana.wikimedia.org/dashboard/db/nodepool
[23:18:10] <Krinkle>	 25 limites in total does seem fairly low though
[23:18:20] <Krinkle>	 Seems worth increasing quota just in general?
[23:48:54] <bd808>	 Krinkle: the last time we bumped the nodepool quota is just leaked runners faster