[00:52:02] <AndyRussG>	 Hi! Whom should I ask for a user to get additional permissions on the beta cluster?
[01:06:32] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3253896 (10mmodell)
[01:06:42] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3181337 (10mmodell)
[01:32:53] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3253975 (10mmodell)
[06:05:12] <wikibugs>	 10Deployment-Systems, 10Scap (Scap3-MediaWiki-MVP), 06Operations, 13Patch-For-Review, 15User-Joe: Install conftool on deployment masters - https://phabricator.wikimedia.org/T163565#3254105 (10Joe) >>! In T163565#3214272, @mmodell wrote: > @joe: That all seems reasonable. I don't particularly want to dupl...
[06:23:23] <wmf-insecte>	 Yippee, build fixed!
[06:23:24] <wmf-insecte>	 Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #357: 09FIXED in 1 hr 43 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/357/
[06:25:17] <wikibugs>	 10Scap (Scap3-MediaWiki-MVP), 10scap2, 06Operations: Depool proxies temporarily while scap is ongoing to avoid taxing those nodes - https://phabricator.wikimedia.org/T125629#3254119 (10Joe)
[06:25:19] <wikibugs>	 10Deployment-Systems, 10Scap (Scap3-MediaWiki-MVP), 06Operations, 13Patch-For-Review, 15User-Joe: Install conftool on deployment masters - https://phabricator.wikimedia.org/T163565#3254117 (10Joe) 05Open>03Resolved a:03Joe
[06:25:21] <wikibugs>	 10Scap (Scap3-MediaWiki-MVP), 03releng-201617-q4, 10scap2, 06Operations, and 2 others: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#3254120 (10Joe)
[06:34:46] <shinken-wm>	 PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[06:54:36] <wikibugs>	 10Deployment-Systems, 10Scap (Scap3-MediaWiki-MVP), 06Operations, 13Patch-For-Review, 15User-Joe: Install conftool on deployment masters - https://phabricator.wikimedia.org/T163565#3254140 (10mmodell) Thanks @joe!
[07:04:01] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3254169 (10mmodell)
[07:09:45] <shinken-wm>	 RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0]
[08:50:40] <elukey>	 hashar: o/
[08:51:02] <elukey>	 hashar: do you think that we could deploy https://gerrit.wikimedia.org/r/#/c/353247/ today?
[08:51:26] <hashar>	 elukey: the labs one? yes definitely
[08:51:30] <hashar>	 and see what happens :-}
[08:51:59] <elukey>	 nice! 
[08:52:00] <hashar>	 I thought that was already the case. Sorry I have not closely followed the state of the redis/jobrunner things
[08:52:39] <hashar>	 there is two job runners instances: deployment-jobrunner02.deployment-prep.eqiad.wmflabs
[08:52:44] <hashar>	 and deployment-tmh01.deployment-prep.eqiad.wmflabs
[08:52:58] <elukey>	 but the second one is a videoscaler right
[08:52:59] <elukey>	 ?
[08:53:01] <hashar>	 and I am not sure which redis db they end up hitting 
[08:53:02] <hashar>	 yes
[08:53:21] <hashar>	 "tmh" probably stands for TimedMediaHandler
[08:53:26] <elukey>	 there were some rdb instances in labs, let me chec
[08:53:27] <hashar>	 the mediawiki extension that handles transcoding of video
[08:53:28] <elukey>	 check
[08:53:49] <hashar>	 that show up in the file changed by the patch above
[08:53:49] <hashar>	 https://gerrit.wikimedia.org/r/#/c/353247/1/wmf-config/jobqueue-labs.php
[08:53:55] <hashar>	 deployment-redis01 apparently
[08:54:04] <hashar>	 then I dont think beta is affected by the socket timeout
[08:54:45] <elukey>	 yeah
[08:55:12] <elukey>	 but we can measure the TCP time waits and see the number of connections
[08:58:27] <hashar>	 elukey: most probably we will want to deploy that on a single production jobrunner
[08:58:40] <hashar>	 monitor it for a few and see whether there is any impact
[08:59:00] <hashar>	 elukey: would be for later unfortunately.  I am not around today :\
[08:59:57] <elukey>	 okok let me know when you want to test it! I tried to live hack some days ago the jobrunner but didn't find any joy
[09:06:56] <hashar>	 elukey: maybe others can assist. For now I am off for rest of the day sorry !
[09:06:59] <hashar>	 maybe tomorrow :}
[09:07:02] * hashar waves
[09:19:08] <wikibugs>	 (03PS1) 10Addshore: Add extension-qunit-generic for TwoColConflict [integration/config] - 10https://gerrit.wikimedia.org/r/353258 (https://phabricator.wikimedia.org/T165021)
[09:34:48] <wikibugs>	 (03CR) 10WMDE-leszek: [C: 031] Add extension-qunit-generic for TwoColConflict [integration/config] - 10https://gerrit.wikimedia.org/r/353258 (https://phabricator.wikimedia.org/T165021) (owner: 10Addshore)
[09:35:24] <wikibugs>	 (03CR) 10Thiemo Mättig (WMDE): [C: 031] Add extension-qunit-generic for TwoColConflict [integration/config] - 10https://gerrit.wikimedia.org/r/353258 (https://phabricator.wikimedia.org/T165021) (owner: 10Addshore)
[11:21:07] <wikibugs>	 (03CR) 10Tobias Gritschacher: [C: 032] Add extension-qunit-generic for TwoColConflict [integration/config] - 10https://gerrit.wikimedia.org/r/353258 (https://phabricator.wikimedia.org/T165021) (owner: 10Addshore)
[11:22:13] <wikibugs>	 (03Merged) 10jenkins-bot: Add extension-qunit-generic for TwoColConflict [integration/config] - 10https://gerrit.wikimedia.org/r/353258 (https://phabricator.wikimedia.org/T165021) (owner: 10Addshore)
[12:34:20] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:34:34] <shinken-wm>	 RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:35:13] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:38:10] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:40:29] <shinken-wm>	 RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:57:12] <godog>	 !log cherry-pick https://gerrit.wikimedia.org/r/#/c/353282/
[12:57:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[13:44:46] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3255112 (10Addshore)
[13:46:39] <wmf-insecte>	 Yippee, build fixed!
[13:46:39] <wmf-insecte>	 Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #394: 09FIXED in 2 min 38 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/394/
[13:57:51] <andre__>	 There are quite some reports today about new JS/Gadget/RL breakage which I fail to debug (no helpful output in browser's DevTools). See https://phabricator.wikimedia.org/maniphest/?ids=165040,165031,165015#R for a list so far.
[13:58:22] <andre__>	 ^ Krinkle: FYI (if you have somebody better/else in mind please share names :)
[14:16:51] <elukey>	 Krinkle,AaronSchulz - https://gerrit.wikimedia.org/r/#/c/353247/1/wmf-config/jobqueue-labs.php got merged and it is now on deployment-jobrunner02.deployment-prep.eqiad.wmflabs, but I am not really seeing less connections in TIME-WAIT as I was expecting.. Am I missing something or is it intended to be in this way?
[14:29:36] <shinken-wm>	 PROBLEM - Host deployment-phab02 is DOWN: CRITICAL - Host Unreachable (10.68.19.232)
[14:41:52] <elukey>	 (I am checking deployment-prep.deployment-jobrunner02.network.connections.TIME_WAIT)
[14:42:00] <elukey>	 (in https://graphite-labs.wikimedia.org/)
[15:20:12] <RoanKattouw>	 Going to deploy a fix for T165011 (cc twentyafterfour )
[15:20:12] <stashbot>	 T165011: Global default 'hard' is invalid for field oresDamagingPref - https://phabricator.wikimedia.org/T165011
[15:38:51] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3255500 (10Ladsgroup)
[15:49:11] <wikibugs>	 10Scap, 13Patch-For-Review: scap should always announce when it halts a sync due to error rate - https://phabricator.wikimedia.org/T164981#3255532 (10thcipriani) 05Open>03Resolved
[16:17:37] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests, 10Wikibase-Quality, 10Wikibase-Quality-Constraints, and 2 others: Segmentation fault in mwext-testextension-hhvm-composer-jessie builds - https://phabricator.wikimedia.org/T165064#3255706 (10Lucas_Werkmeister_WMDE)
[16:51:03] <AndyRussG>	 Mmm could someone remind me of whom I might ask to get permissions on the beta cluster (short of just updating the beta cluster db directly, which I suppose I could do)?
[16:51:08] <AndyRussG>	 thx in advance!!!
[16:51:29] <Reedy>	 AndyRussG: If you've got access, you can just do that
[16:51:40] <Reedy>	 But what permissions do you want/need on what wikis? :)
[16:52:48] <AndyRussG>	 Reedy: Ah K, thx... Mmm I need to give User:Pcoombe (WMF) Central notice administrator rights on meta.wikimedia.beta.wmflabs.org
[16:53:38] <AndyRussG>	 Yeah I do have ssh access, so if directly updating the db is the "right" way...
[16:54:00] <Reedy>	 16:53, 11 May 2017 Reedy (talk | contribs | block) changed group membership for Pcoombe (WMF) from petitiondata to petitiondata and central notice administrator
[16:54:17] <Reedy>	 It's not the right way, but no one is likely to care if you were to do so
[16:54:26] <AndyRussG>	 Ah K...
[16:54:54] <AndyRussG>	 Reedy: thx so much!!!!! :)
[16:55:28] <Reedy>	 no problem!
[16:55:30] * AndyRussG hides puritan concerns about correctness behind a rock 8p
[16:55:54] <AndyRussG>	 :)
[16:55:58] <Reedy>	 If you did it on the production wikis... Unless for very good reason, yes someone would probably complain :P
[16:57:10] <AndyRussG>	 Heh yeah rightly so
[16:57:48] <Zppix>	 Reedy: I complain regardless xD
[16:58:21] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3255927 (10matmarex)
[17:04:10] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-docker-1000 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[17:19:17] <Zppix>	 is T165069 within releng's scope?
[17:19:19] <stashbot>	 T165069: Update swat deployers  documation - https://phabricator.wikimedia.org/T165069
[17:31:44] <greg-g>	 Zppix: yes
[17:32:00] <greg-g>	 Zppix: a better explainantion of the problem would be helpful. What was missing etc.
[17:33:06] <greg-g>	 otherwise that's simply bug1/task2001
[17:39:07] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-docker-1000 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:40:18] <James_F>	 Hey, https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm-jessie looks pretty unwell (about 90% of jobs are failing with "Lost parent, LightProcess exiting"). Nothing looked likely on Phab directly, though T145819 has the same error.
[17:40:19] <stashbot>	 T145819: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesize limited to 512MBytes - https://phabricator.wikimedia.org/T145819
[17:41:12] <Reedy>	 It's likely it's hhvm related, indeed
[17:41:20] <Reedy>	 Upgrade test of 3.18.2
[17:42:24] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256111 (10daniel)
[17:42:43] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256127 (10daniel)
[17:43:20] <James_F>	 Ah.
[17:44:16] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256111 (10Jdforrester-WMF) Not just Wikibase. Errors in VE-MW and MobileFrontend. :-( Can we revert the test for now to see if that fixes it?
[17:45:08] <greg-g>	 ugh, that error
[17:45:27] <Reedy>	 I think it's not so easy to revert... As it's in the apt repo
[17:45:49] <James_F>	 Oh. Did we just do a cluster upgrade of HHVM?
[17:45:56] <Reedy>	 Not just
[17:45:58] <Reedy>	 Earlier today
[17:46:02] <Reedy>	 And in production, it's not all servers
[17:46:03] <James_F>	 So, yes.
[17:46:04] <greg-g>	 who? moritz?
[17:46:15] <greg-g>	 or joe?
[17:46:24] <Reedy>	 Moritz
[17:46:41] <James_F>	 I know there was some prod testing of the new HHVM which had been going well, but I hadn't heard anything for a couple of weeks.
[17:46:43] <Reedy>	 I guess, specifically this is the problem
[17:46:45] <Reedy>	 15:18 moritzm: uploaded HHVM 3.18.2 and HHVM extensions to apt.wikimedia.org/main (previously only in experimental)
[17:46:48] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3256150 (10Krinkle)
[17:46:51] <Reedy>	 from yesterday
[17:49:19] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256153 (10greg) @MoritzMuehlenhoff HHVM upgrade is causing segfaults in CI
[17:49:35] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256158 (10greg)
[17:53:50] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256111 (10Paladox) May be related T165043
[17:55:41] <wikibugs>	 06Release-Engineering-Team, 07Documentation, 15User-Zppix: Update swat deployers documentation - https://phabricator.wikimedia.org/T165069#3256178 (10Zppix) per @greg  in IRC
[17:56:47] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3256193 (10Krinkle)
[17:58:47] <wikibugs>	 06Release-Engineering-Team, 07Documentation, 15User-Zppix: Update swat deployers documentation - https://phabricator.wikimedia.org/T165069#3256197 (10Zppix)
[17:58:59] <Zppix>	 greg-g:  Updated task description ^
[17:59:05] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256198 (10greg) Can someone create a simple repo case? Or at least a backtrace?
[17:59:30] <Amir1>	 greg-g: Hey, added another window for cleaning the table: https://wikitech.wikimedia.org/wiki/Deployments#Week_of_May_8th I hope that's okay for you
[17:59:51] <Reedy>	 typical, ci-jessie-wikimedia-658297 has disappeared
[18:01:24] <Zppix>	 sorry for mistagging greg :/ I must of misunderstood your answer to my first question.
[18:02:58] <Reedy>	 Hmmmmmmm
[18:03:05] <greg-g>	 well, you asked about SWAT deploys, which is us, but your actual question/issue is with the script, not SWATs
[18:03:12] <greg-g>	 typical X/Y problem
[18:04:18] <greg-g>	 Zppix: you aren't making any sense.
[18:04:29] <Reedy>	 How do we ssh onto the ci slaves?
[18:04:49] <Zppix>	 greg-g:  What do you have questions upon?
[18:05:04] <Zppix>	 exactly*
[18:05:24] <greg-g>	 THAT'S WHAT I'M ASKING YOU
[18:06:08] <greg-g>	 Zppix: seriously, you don't appear to know what the issue is, so please just let others take care of it if they need to.
[18:06:48] <wikibugs>	 10Continuous-Integration-Infrastructure, 05Security: SSH Host Key Verifiers are not configured for all SSH slaves on this Jenkins instance - https://phabricator.wikimedia.org/T165075#3256226 (10Reedy)
[18:27:07] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256310 (10Reedy) >>! In T165074#3256198, @greg wrote: > Can someone create a simple repo case? Or at least a backtrace?  Do we have a jessie hh...
[18:27:47] <greg-g>	 Reedy: a host in beta cluster I guess?
[18:28:09] <Reedy>	 Preferably somewhere we can trivially run the php unit script under gdb or something
[18:29:54] <Reedy>	 I guess beta cluster should work
[18:29:56] * Reedy looks at tin
[18:33:55] <Reedy>	 No phpunit installed on tin
[18:37:49] <Amir1>	 Reedy: Is this failure related to that? https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-composer-jessie/2278/console
[18:38:44] <Amir1>	 "Segmentation fault"
[18:39:07] <Reedy>	 Very likely, yeah
[18:41:29] <James_F>	 I saw that and guessed it was a different HHVM upgrade bug.
[18:41:31] <James_F>	 But yes.
[18:43:37] <wikibugs>	 (03PS1) 10Reedy: Branch LoginNotify [tools/release] - 10https://gerrit.wikimedia.org/r/353348
[18:43:42] <Reedy>	 Niharika: ^
[18:44:43] <Niharika>	 Reedy: Thanks! I also need to add it to extension-list too? 
[18:44:59] <Reedy>	 Yup, remove it from extension-list-labs too
[18:45:28] <Niharika>	 Reedy: What all does one need to do for adding a new extension? 1. Add to tools/release 2. Add to extension-list 3. Add to CS/IS 
[18:45:30] <Niharika>	 Anything else?
[18:47:42] <Reedy>	 https://wikitech.wikimedia.org/w/index.php?title=How_to_deploy_code
[18:48:04] <Reedy>	 That's pretty much it...
[18:48:27] <Reedy>	 tools/release needed because whereas beta has access to all the extensions, production only has that list
[18:48:50] <Reedy>	 extension-list (or moving from beta to productions -- beta uses productions too) for scap and localisation update to include the messages
[18:48:55] <Reedy>	 Then CS/IS for loading/configuration
[18:50:00] <Reedy>	 greg-g: I wonder if beta should just have a server provisioned like tin but with phpunit... Or just put phpunit on tin?
[18:50:13] * greg-g shrugs
[18:50:17] <greg-g>	 probably easier to do option 2
[18:50:28] <Reedy>	 the ci boxes include it via composer
[18:50:39] <Reedy>	 I'm sure we don't really want to apt-get install it... or pear
[18:50:43] * Reedy wget's phpunit.phar
[18:52:19] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256387 (10Jdforrester-WMF) p:05Triage>03High This is at least High, as it's stopping merges into master in most repos.
[18:52:22] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests, 10Wikibase-Quality, 10Wikibase-Quality-Constraints, and 2 others: Segmentation fault in mwext-testextension-hhvm-composer-jessie builds - https://phabricator.wikimedia.org/T165064#3255706 (10Mattflaschen-WMF) I'm getting them as well:  E.g. h...
[18:52:59] <Reedy>	 Of course, this is made harder by having no phpunit.php flag for the phar anymoe
[18:53:17] <Reedy>	 So i have this in my dev wiki...
[18:53:18] <Reedy>	 if ( defined( 'MW_PHPUNIT_TEST' ) && MW_PHPUNIT_TEST ) {
[18:53:18] <Reedy>	         include_once ( '/var/www/wiki/mediawiki/phpunit-old.phar' );
[18:53:18] <Reedy>	 }
[18:55:56] <Reedy>	 Why do I feel I'm over thinking this
[18:57:49] <Niharika>	 Reedy: Is there a way to resurrect a merged (and the reverted) patch? Like https://gerrit.wikimedia.org/r/#/c/351195/ 
[18:57:58] <Niharika>	 Or do I have to make a new one?
[18:57:59] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests, 10Wikibase-Quality, 10Wikibase-Quality-Constraints, and 2 others: Segmentation fault in mwext-testextension-hhvm-composer-jessie builds - https://phabricator.wikimedia.org/T165064#3256438 (10Paladox)
[18:58:01] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256436 (10Paladox)
[18:58:13] <Reedy>	 Niharika: You can revert the revert
[18:58:27] <Reedy>	 And edit the commit summary, to make it nicer
[18:58:35] <Niharika>	 Ah, nice. 
[18:58:38] <Reedy>	 "Enable LoginNotify on testwiki (take 2)"
[18:58:39] <Reedy>	 or similar
[19:00:19] <Reedy>	 And of course, tin doesn't use hhvm by default
[19:01:45] <kaldari>	 Niharika: I didn't realize you were deploying LoginNotify today
[19:02:30] <Niharika>	 kaldari: I added it to the swat but didn't add it to tools/release. 
[19:02:35] <Niharika>	 So reverted for now. 
[19:02:42] <Niharika>	 Can retry in evening swat. 
[19:02:59] <Reedy>	 You'll need to manually add it to the branch, and run scap too obviously when you want to deploy it
[19:03:04] <kaldari>	 Niharika: Yeah, I wrote up some instructions at https://phabricator.wikimedia.org/T165007
[19:03:48] <kaldari>	 Niharika: There is more documentation here: https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Add_new_extension_to_extension-list_and_release_tools
[19:05:17] <kaldari>	 Niharika: Sorry, I didn't mention that part to you :P
[19:25:06] <shinken-wm>	 PROBLEM - Puppet errors on deployment-puppetmaster02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[19:44:53] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Have a way to run phpunit (etc) manually on a machine in beta - https://phabricator.wikimedia.org/T165088#3256575 (10Reedy)
[19:51:57] <Reedy>	 Well...]
[19:52:06] <Reedy>	 I've got a fairly minimal replication case for facebook
[19:52:09] <Reedy>	 use our vagrant
[19:52:12] <Reedy>	 run phpunit in hhvm
[19:52:15] <Reedy>	 segfault
[19:53:09] <Niharika>	 Reedy: I put all of the related changes on https://gerrit.wikimedia.org/r/#/c/353352/ (all changes in config that is) 
[19:53:30] <Niharika>	 Reedy: How do I "manually add it to the branch"?
[19:53:38] <Reedy>	 Niharika: git submodule add...
[19:53:45] <Reedy>	 I think it's on the how to deploy code page
[19:53:53] <Niharika>	 Ah, okay. 
[19:54:25] <Reedy>	 Or it was, it may have been removed at some point :P
[19:54:31] <bd808>	 Reedy: MaxSem and ebernhardson were pretty good at getting useful traces for HHVM crashes in the past if I'm remembering correctly
[19:55:09] <Reedy>	 I've just gotta wait for 500MB of debug stuff to install ;)
[19:55:57] <shinken-wm>	 PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[20:01:24] <mutante>	 ^ aww, why now?
[20:01:32] <mutante>	 was fine yesterday
[20:04:07] <mutante>	 Notice: /Stage[main]/Confd/Base::Service_unit[confd]/Service[confd]/ensure: ensure changed 'stopped' to 'running'
[20:04:13] <mutante>	 Notice: Finished catalog run in 96.21 seconds
[20:05:05] <shinken-wm>	 RECOVERY - Puppet errors on deployment-puppetmaster02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:06:06] <mutante>	 .. ok then ..
[20:16:00] <shinken-wm>	 RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0]
[20:16:38] <kaldari>	 Getting a segmentation fault from mw-phpunit.sh: https://gerrit.wikimedia.org/r/#/c/353341/
[20:17:07] <kaldari>	 Ah, looks like it's due to https://phabricator.wikimedia.org/T165074
[20:17:08] <Reedy>	 kaldari: known issue atm
[20:17:09] <Reedy>	 :)
[20:17:11] <Reedy>	 :(
[20:17:24] <kaldari>	 :|
[20:40:20] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256780 (10hashar) p:05High>03Unbreak! That is caused by the upgrade of HHVM {T158176}.  3.18 has been uploaded to apt.wikimedia.org under j...
[20:42:21] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256789 (10hashar) The snapshots we have:  | ID   | Provider      | Image              | Hostname                      | Version    | Image ID...
[20:43:40] <hashar>	 !log nodepool: delete today jessie image snapshot. It comes with HHVM 3.18 which segfault with MediaWiki/PHPUnit.  Rolled back to snapshot-ci-jessie-1494425642 from 30 hours ago. T165074 
[20:43:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[20:43:44] <stashbot>	 T165074: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074
[20:44:05] <hashar>	 thcipriani: ^^
[20:44:21] <Reedy>	  heh
[20:44:23] <hashar>	 some new version of HHVM ends up segfaulting on PHPUnit so I have deleted nodepool jessie image
[20:44:29] <Reedy>	 yeah, we know
[20:44:30] <Reedy>	 :P
[20:44:35] <hashar>	 in theory it should rollback to the image from yesterday
[20:44:40] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256794 (10greg) @MoritzMuehlenhoff we should probably downgrade the HHVM version from Beta and CI and work on repro'ing elsewhere. This is prev...
[20:44:44] <Reedy>	 I'm currently fighting it to get a backtrace out of it
[20:44:51] <thcipriani>	 ah, right, I remember that "fix" from last time we upgraded hhvm :\
[20:45:21] <mutante>	 moritz told me there is a known bug in the new version but systemd restarts it and it's not critical for now
[20:45:52] <mutante>	 so some (automatic) restarts would still be in the known category, but manual ones should not be needed
[20:45:53] <hashar>	 yeah
[20:45:58] <Reedy>	 We've possibly found other segfaults
[20:46:02] <mutante>	 ugh, ok
[20:46:33] <Reedy>	 trying to get a backtrace... to see if it's one of the others we know about
[20:46:41] <Reedy>	 and if so, a minimal replication case
[20:46:43] <Reedy>	 or if it's a new one
[20:46:55] <hashar>	 mutante: i would need the old hhvm 3.12 to be uploaded to jessie-wikimedia/main
[20:47:09] <hashar>	 else hhvm 3.18 is going to be reinstalled again tomorrow :/
[20:49:11] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256803 (10hashar) Jessie instances are now being booted from `snapshot-ci-jessie-1494425642` which should have the previous HHVM version.  What...
[20:50:01] <mutante>	 hashar: if it is urgent for right now i'd rather make the phone call, instead of trying to downgrade and remove new version from reprepro. 
[20:50:21] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256804 (10Reedy) Ok, so a clean vagrant vm (with 4GB ram!), will segfault by running phpunit with no extensions  From gdb attached...  ``` Cont...
[20:50:21] <hashar>	 Reedy: should be good now
[20:50:38] <hashar>	 mutante: well I think CI is fine now
[20:50:39] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256805 (10MoritzMuehlenhoff) We can't easily downgrade the HHVM package in the main repo, it's otherwise working fine in production and running...
[20:50:42] <mutante>	 i have tried that for releases/misc before and became a looong issue.. including caching
[20:50:49] <mutante>	 hashar: pheew. ok! great
[20:50:49] <hashar>	 mutante: I will circle back with moritz tomorrow morning
[20:50:58] <mutante>	 hashar: awesome
[20:53:32] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256807 (10greg) We should really use the new HHVM in testing first before going to production. If the tests are broken it means fix the tests/t...
[20:56:19] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256810 (10hashar) p:05Unbreak!>03High a:03hashar CI instances have been rollbacked to the last known snapshot which uses HHVM  3.12.14. I...
[20:56:57] <hashar>	 mutante: yeah new jobs definitely run on 3.12 so it is all fine and there is no need to page :]
[20:59:09] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256816 (10MoritzMuehlenhoff) The new HHVM version has been extensively tested on five canary servers in production for 5-6 week now. As per Ree...
[21:00:01] <mutante>	 hashar: yay:)
[21:01:29] <hashar>	 mutante: moritz has been in touch with me about it almost on a daily basis
[21:01:35] <hashar>	 we will figure out something tomorrow :]
[21:02:17] <Platonides>	 it's fun that we had a bug when running wikipedia and then just running the testsuite exposes it :P
[21:03:02] <greg-g>	 Platonides: yuuup ;)
[21:03:13] <Reedy>	 Can't seem to identify which test is causing it to segfault
[21:03:15] <Reedy>	 It's one of the last few
[21:03:38] <Reedy>	 ........................................................... 14927 / 14947 ( 99%)
[21:03:38] <Reedy>	 ....................
[21:03:58] <Platonides>	 maybe it's related to the cleanup rather than the actual tests executed ?
[21:04:03] <Reedy>	 It kinda seems to be
[21:04:06] <Reedy>	 there's 20 dots after
[21:04:20] <Reedy>	 so, all the tests are running
[21:04:20] <Reedy>	 https://github.com/facebook/hhvm/issues/7779#issuecomment-300914747
[21:04:39] <Reedy>	 I guess these are telling
[21:04:40] <Reedy>	 #0  HPHP::UserFile::close (this=0x7f1afd461090) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/user-file.cpp:212
[21:04:40] <Reedy>	 #1  0x0000000001a74ec2 in HPHP::XMLReader::close (this=0x7f1afd411ec0) at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/ext/xmlreader/ext_xmlreader.cpp:95
[21:04:40] <Reedy>	 #2  0x0000000002133130 in HPHP::MemoryManager::sweep (this=0x7f1b25a8c840, this@entry=<error reading variable: Asked for position 0 of stack, stack only has 0 elements on it.>)
[21:04:40] <Reedy>	     at /tmp/buildd/hhvm-3.18.2+dfsg/hphp/runtime/base/memory-manager.cpp:471
[21:05:00] <Platonides>	 I had seen the comment :)
[21:05:16] <Platonides>	 seems a bug in xml extension
[21:08:07] <Reedy>	 Yup, narrowed case
[21:08:08] <Reedy>	 tests/phpunit/includes/import/ImportTest.php
[21:10:04] <Reedy>	 How do I get phpunit to run individual tests in the file?
[21:11:06] <Reedy>	 --filter apparently..
[21:11:25] <Reedy>	 testUnknownXMLTags
[21:17:01] <hashar>	 Reedy: I dont think it is a specific test.  it is most probably late when hhvm clean up the memory
[21:17:10] <Reedy>	 hashar: I know
[21:17:16] <hashar>	 ok ok :]
[21:17:20] <Reedy>	 But I'm narrowing the test case for hhvm people
[21:17:29] <Reedy>	 Rather than saying "run all our phpunit tests!"
[21:17:30] <Reedy>	 :P
[21:17:30] <hashar>	 neat!
[21:17:42] <Reedy>	 they're not quick to run, as we know
[21:17:44] <Platonides>	 that seems like a test which doesn't even need a db
[21:17:56] <Reedy>	 so having only one test file to run... which takes seconds
[21:18:11] <Platonides>	 so that would help building the build environment
[21:18:21] <hashar>	 Reedy: have you seen https://phabricator.wikimedia.org/T156923  ? "New HHVM 3.12.11 segfault at end of MediaWiki PHPUnit tests"
[21:18:30] <hashar>	 mentions xmlreader as well
[21:18:51] <Reedy>	 The stack trace looks very similar
[21:19:13] <hashar>	 Reedy: that tasks has log of my debugging / repro journey
[21:19:30] <Reedy>	 Of course, this just means it's not been fixed in hhvm ;)
[21:19:45] <hashar>	 hhvm -v Eval.Jit=false tests/phpunit/phpunit.php tests/phpunit/includes/import/
[21:19:48] <hashar>	 try that one maybe?
[21:20:03] <hashar>	 that should hit  includes/import/WikiImporter.php   / XMLReader
[21:20:09] <hashar>	 it might just be that bug surfacing again
[21:20:13] <Reedy>	 [492dbf97df60049a66692f6d] [no req]   Wikimedia\Rdbms\DBConnectionError from line 769 of /vagrant/mediawiki/includes/libs/rdbms/database/Database.php: Cannot access the database: Unknown database 'tests/phpunit/includes/import/' (127.0.0.1)
[21:20:14] <Reedy>	 :D
[21:20:33] <hashar>	 ???!!!!
[21:21:40] <Reedy>	 vagrant
[21:22:00] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Operations, 10Wikidata, 07HHVM, 07Jenkins: CI tests failing with segfault - https://phabricator.wikimedia.org/T165074#3256869 (10hashar) Might be {T156923} surfacing again which mentionned XMLReader.
[21:22:02] <hashar>	 ;)
[21:22:06] <Reedy>	 hhvm -v Eval.Jit=false tests/phpunit/phpunit.php --wiki=wiki tests/phpunit/includes/import/
[21:22:18] <Reedy>	 But I don't get core dumps
[21:22:22] <Reedy>	 Which is annoying as hell
[21:22:40] <hashar>	 gotta enable them  and set the core file max size
[21:22:55] <hashar>	  ResourceLimit.CoreFileSize   + some ulimit
[21:23:02] <Reedy>	 ulimit -c unlimited
[21:23:04] <Reedy>	 that's enough
[21:23:07] <hashar>	 ah
[21:23:08] <hashar>	 :]
[21:23:14] <Reedy>	 before... the vm just didn't have enough memory :P
[21:25:56] <Reedy>	 "/vagrant/mediawiki/core": not in executable format: File format not recognized
[21:26:00] <Reedy>	 silly hhvm-gdb
[22:00:50] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.30.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T163512#3257029 (10mmodell)
[22:09:38] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3257055 (10mmodell)
[23:15:12] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3257236 (10mmodell) 05Open>03Resolved
[23:42:27] <shinken-wm>	 PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]