[00:00:58] <RainbowSprinkles>	 By all means
[00:19:25] <RainbowSprinkles>	 Krinkle: I'm also killing a few lagging test queue jobs that were superseded by gate-and-submit.
[00:19:32] <RainbowSprinkles>	 (since we're +2ing on sight)
[00:19:39] <Krinkle>	 indeed
[00:19:48] <Krinkle>	 I just noticed :)
[00:20:18] <RainbowSprinkles>	 You can push for review with CR attached
[00:20:18] <Krinkle>	 I've disabled mwcore-jsduck and -doxygen for the moment (instant auto-skip using a preprended shell script)
[00:20:22] <RainbowSprinkles>	 So I did that :)
[00:20:40] <Krinkle>	 aha, nice
[00:20:53] <RainbowSprinkles>	 `git push origin HEAD:refs/for/<branch>%l=Code-Review+2`
[00:21:17] <RainbowSprinkles>	 Which zuul correctly (thankfully) throws straight into gate-and-submit
[00:21:38] <RainbowSprinkles>	 (it meant the changes all landed in master pretty quick, which was the first branch I did)
[00:29:38] <Krinkle>	 RainbowSprinkles: REL1_27 qunit failing again (wikiBase)
[00:29:49] <RainbowSprinkles>	 Yeah I saw
[00:30:35] <Krinkle>	 also phpcs :/
[00:30:36] <RainbowSprinkles>	 And a minor phpcs failure
[00:30:39] <RainbowSprinkles>	 I'll tidy that up
[00:30:46] <RainbowSprinkles>	 I might wait a few minutes to let test queue further unclog
[00:30:50] <RainbowSprinkles>	 So I can have more nodes
[00:30:55] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0]
[00:31:41] <Krinkle>	 Yeah. 
[00:31:52] <Krinkle>	 A few of them got stuck btw, not sure how or why but it's SNAFU with Zuul
[00:32:10] <Krinkle>	 post-merge has a few >1h old that are queued but jenkins never got it
[00:32:15] <Krinkle>	 and it'll never go away until the next zuul restart
[00:32:26] <Krinkle>	 as long as they don't occupy ghost nodes, it's all fine though
[00:34:15] <RainbowSprinkles>	 I was wondering if we have any ghosts that haven't been reclaimed
[00:34:19] <RainbowSprinkles>	 But dunno how to check tbh
[00:38:21] <Krinkle>	 yeah, me neither. afaik the stuckness is just Zuul seeing it still in internal state array. It doesn't relate to nodes or anything, that happens via Gearman/Jenkins only when it is allocated and then from there it it eventually starts to run without issue afaik, and then reclaimed once aborted/completed.
[00:38:27] <Krinkle>	 So this just means it never got a node in the first place
[00:38:43] <Krinkle>	 Or it got one, was aborted, reclaimed, but Zuul still thinks it's pending in Jenkins
[00:47:09] <RainbowSprinkles>	 A bunch of queued post-merge ones disappeared
[00:47:11] <RainbowSprinkles>	 Something caught up
[00:49:24] <Krinkle>	 And a gazillion fundraising patches just appeared :)
[00:49:35] <RainbowSprinkles>	 Oh that chain has been there
[00:49:36] <RainbowSprinkles>	 Pushed way down
[00:49:50] <RainbowSprinkles>	 And someone playing with MobileFrontened ;-)
[00:50:21] <RainbowSprinkles>	 Sadly, it's not really possible to stop people from committing short of taking gerrit offline :p
[00:50:28] <RainbowSprinkles>	 Which might upset someone!
[01:19:34] <RainbowSprinkles>	 Krinkle: And this time qunit works, figures
[01:24:16] <RainbowSprinkles>	 Oh crap I forgot to fix the phpcs issue
[01:24:17] <RainbowSprinkles>	 dammit
[01:29:18] <RainbowSprinkles>	 YAY THE QUEUE IS ALL GONE
[04:57:54] <wikibugs>	 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#1152205 (10Rammanojpotla) can anyone give me more information of this issue?
[05:05:15] <wikibugs>	 (03PS1) 10KartikMistry: Add apertium-spa-cat package [integration/config] - 10https://gerrit.wikimedia.org/r/346941
[07:50:10] <wikibugs__>	 10Scap, 07HHVM: [scap] Compile HHVM bytecode cache as deployment step - https://phabricator.wikimedia.org/T66272#685932 (10MoritzMuehlenhoff) I'm not convinced that this would be an overall win. Last week we had problems with a depleted byte code cache I checked monitoring after pruning the cache for effects o...
[08:33:17] <wikibugs>	 10Scap (Scap3-Adoption-Phase1), 06Operations, 10RESTBase, 13Patch-For-Review, and 2 others: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#3162764 (10akosiaris) Agreed.
[10:48:44] <wikibugs__>	 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3163027 (10zeljkofilipin) p:05Normal>03Low
[11:08:05] <shinken-wm>	 PROBLEM - Puppet run on integration-c1 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[11:15:26] <wikibugs>	 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3163114 (10zeljkofilipin) Looks like this is the context: T312#1152213  @Mattflaschen-WMF: the license should be similar to license in the fo...
[11:20:27] <wikibugs__>	 06Release-Engineering-Team, 06Operations, 05Goal, 06Services (designing), and 2 others: Prepare and maintain base container images - https://phabricator.wikimedia.org/T162042#3163120 (10MoritzMuehlenhoff) p:05Triage>03High
[11:46:17] <wikibugs__>	 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3163542 (10hashar) Just have the README.md to link to the lICENSE files that are in the repo. That will then show up on the doc page.  Eventu...
[12:28:09] <hashar>	 elukey: good afternoon.
[12:28:21] <hashar>	 I noticed in logstash bunch of redis connections are closed due to "bad file descriptor"
[12:28:49] <hashar>	 no clue if that could be related.  And I dont know why opening a tcp socket would lead to a "bad file" :D
[12:29:31] <elukey>	 hashar: o/ 
[12:29:53] <elukey>	 do you have an example of filter that I can use? I tried a config on mw1306 today but now it is reverted
[12:30:20] <elukey>	 (a socket is a fd so maybe it is complaining about a socket that is not there anymore?)
[12:30:35] <elukey>	 (s/is/has/)
[12:30:41] <hashar>	 ahhhhhh
[12:30:51] <hashar>	 so that explain the "file" prt :]
[12:30:55] <hashar>	 part
[12:31:23] <hashar>	 for logstash I have looked at https://logstash.wikimedia.org/app/kibana#/dashboard/Redis
[12:31:35] <hashar>	 and dropping the filter at top left in a green bubble box
[12:33:54] <hashar>	 elukey: from your last comment maybe that is redis not being fast enough 
[12:34:06] <hashar>	 or maybe the server cant access enough simultaneoux connections
[12:34:46] <elukey>	 still not finding a log with the bad file desc, do you have a link that I can check?
[12:34:57] <elukey>	 I can see only the usual Unable to connect to redis server rdb1005.eqiad.wmnet:6379.
[12:35:00] <elukey>	 etc..
[12:37:18] <hashar>	 I am lost
[12:38:37] <hashar>	 elukey: https://logstash.wikimedia.org/goto/c17233aad31878d5c3d06be602029efb
[12:38:56] <hashar>	 that is one event per minute or so.   Probably less concerning
[12:40:30] <hashar>	 and that would be another reason for the "Could not connect to server "rdb1001.eqiad.wmnet:6381""
[12:41:18] <elukey>	 ah these are appservers, not jobrunners!
[12:41:24] <elukey>	 I believe those are different problems
[12:41:50] <hashar>	 yeah and I filled that one as https://phabricator.wikimedia.org/T158770
[12:41:59] <elukey>	 bad file desc in my experience is usually a write() to a socket already closed
[12:42:19] <hashar>	 the bad file descriptors appeared after Tim applied a fix to "Route PHP warnings from the handler into udp2log"
[12:43:23] <elukey>	 ah nice :)
[12:43:31] <elukey>	 this could also fit very well
[12:43:53] <hashar>	 I got one for mw1287  which is an appserver::api
[12:44:31] <hashar>	 which query site informations, that ends up hitting the jobqueue redis I guess to get the size of the queues
[12:47:05] <hashar>	 elukey: eg https://en.wikipedia.org//w/api.php?action=query&meta=siteinfo&siprop=statistics&format=json
[12:47:16] <hashar>	 which yields statistics including the # of jobs 
[12:47:23] <hashar>	 and that is apparently uncached
[13:03:05] <wikibugs__>	 (03CR) 10Hashar: [C: 032] Add apertium-spa-cat package [integration/config] - 10https://gerrit.wikimedia.org/r/346941 (owner: 10KartikMistry)
[13:04:41] <wikibugs>	 (03Merged) 10jenkins-bot: Add apertium-spa-cat package [integration/config] - 10https://gerrit.wikimedia.org/r/346941 (owner: 10KartikMistry)
[13:09:00] <wikibugs>	 (03Abandoned) 10Hashar: Add more jobs to test-prio [integration/config] - 10https://gerrit.wikimedia.org/r/346656 (owner: 10Zppix)
[13:49:09] <shinken-wm>	 PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[15:37:33] <wmf-insecte>	 Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #383: 04FAILURE in 15 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/383/
[15:46:16] <wmf-insecte>	 Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #383: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/383/
[19:41:21] <wikibugs>	 10MediaWiki-Codesniffer: Provide an MWCS rule to enforce "short" type definitions: int and bool, not integer and boolean - https://phabricator.wikimedia.org/T145162#3164729 (10Krinkle)
[19:56:37] <wikibugs__>	 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 13Patch-For-Review, 07WorkType-NewFunctionality: [keyresult] Migrate as many misc CI jobs as possible to Nodepool - https://phabricator.wikimedia.org/T119140#3164757 (10EddieGP)
[20:00:07] <wikibugs__>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T160551#3164770 (10demon)
[21:45:16] <wikibugs__>	 10Continuous-Integration-Infrastructure, 07Jenkins, 07Upstream, 07WorkType-NewFunctionality: Jenkins trilead-ssh2 doesn't support our MAC/KEX algorithms - https://phabricator.wikimedia.org/T103351#3165201 (10Paladox) A new release may happen next week :)
[22:43:28] <shinken-wm>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[22:58:14] <wikibugs__>	 (03PS1) 10Thcipriani: PoC: Port existing Trusty jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/347130
[22:59:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] PoC: Port existing Trusty jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/347130 (owner: 10Thcipriani)
[22:59:41] <Zppix>	 thcipriani:  whats wrong with just using trusty?
[23:00:15] <thcipriani>	 just another thing to support, but nothing generally :)
[23:03:17] <wikibugs>	 (03PS2) 10Thcipriani: PoC: Port existing Trusty jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/347130
[23:08:01] <Zppix>	 Do we really need 3 instances of saucelabs i barely even see 1 instance being used
[23:17:50] <wikibugs>	 06Release-Engineering-Team, 07Jenkins: Jenkins Web UI error - https://phabricator.wikimedia.org/T162505#3165426 (10Zppix)
[23:18:33] <wikibugs__>	 06Release-Engineering-Team, 07Jenkins: Jenkins Web UI error - https://phabricator.wikimedia.org/T162505#3165426 (10Paladox) This always happens. Workaround is refresh the page. It's problem varnish problem.
[23:18:51] <wikibugs>	 06Release-Engineering-Team, 06Operations: Jenkins Web UI error - https://phabricator.wikimedia.org/T162505#3165440 (10Paladox)
[23:19:09] <wikibugs__>	 06Release-Engineering-Team, 06Operations, 07Jenkins: Jenkins Web UI error - https://phabricator.wikimedia.org/T162505#3165426 (10Paladox)
[23:19:13] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07Jenkins: Jenkins Web UI error - https://phabricator.wikimedia.org/T162505#3165442 (10Zppix) Instead of constant work around why dont we fix it?
[23:20:07] <wikibugs__>	 06Release-Engineering-Team, 06Operations, 07Jenkins: Jenkins Web UI error - https://phabricator.wikimedia.org/T162505#3165443 (10Paladox) It would be good to fix that problem.
[23:22:25] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07Jenkins: Jenkins Web UI error - https://phabricator.wikimedia.org/T162505#3165426 (10Dzahn) >>! In T162505#3165438, @Paladox wrote: > Varnish looks like the problem.  I don't think so. Varnish says "Backend fetch failed". The Backend is integration.wm.org.
[23:29:30] <wikibugs>	 06Release-Engineering-Team, 06Operations, 07Jenkins: Jenkins Web UI error - https://phabricator.wikimedia.org/T162505#3165452 (10Paladox) Oh
[23:47:54] <wikibugs__>	 10Continuous-Integration-Config, 07Regression: doc.wikimedia.org generates master branch docs with label of old release - https://phabricator.wikimedia.org/T162506#3165466 (10Krinkle)
[23:48:29] <wikibugs__>	 10Continuous-Integration-Config, 07Regression: doc.wikimedia.org docs for old releases is actually master - https://phabricator.wikimedia.org/T162506#3165479 (10Krinkle)