[00:00:40] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10MobileFrontend, 13Patch-For-Review: MediaWiki core and MobileFrontend break branches REL1_25 and fundraising/REL1_25 and REL1_26 and REL1_27 tests - https://phabricator.wikimedia.org/T135906#2316908 (10Bawolff) So, I can't figure out the re... [00:01:15] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10MobileFrontend, 13Patch-For-Review: MediaWiki core and MobileFrontend break branches REL1_25 and fundraising/REL1_25 and REL1_26 and REL1_27 tests - https://phabricator.wikimedia.org/T135906#2316911 (10Bawolff) >>! In T135906#2316881, @gerr... [02:38:21] PROBLEM - Parsoid on deployment-parsoid06 is CRITICAL: Connection refused [04:04:32] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #22: 04FAILURE in 8 min 31 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/22/ [04:17:26] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #22: 04FAILURE in 21 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/22/ [04:49:28] 10MediaWiki-Codesniffer: [GSoC 2016]Phabricator tasks (tracker) - https://phabricator.wikimedia.org/T135966#2317032 (10Lethexie) [04:57:59] 10MediaWiki-Codesniffer: Community Bonding Report for Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T135393#2317049 (10Lethexie) [07:23:18] 10MediaWiki-Codesniffer: [GSoC 2016]Phabricator tasks (tracker) - https://phabricator.wikimedia.org/T135966#2317032 (10Danny_B) @Lethexie Could you please elaborate more on the task description and its purpose? Thank you. [07:35:31] 10MediaWiki-Codesniffer: [GSoC 2016] Improving static analysis tools for MediaWiki Phabricator tasks (tracker) - https://phabricator.wikimedia.org/T135966#2317344 (10Lethexie) [07:38:17] 10MediaWiki-Codesniffer: [GSoC 2016] Improving static analysis tools for MediaWiki Phabricator tasks (tracker) - https://phabricator.wikimedia.org/T135966#2317346 (10Lethexie) @Danny_B thanks and is it ok now? [08:12:48] 10MediaWiki-Codesniffer, 07Tracking: [GSoC 2016] Improving static analysis tools for MediaWiki Phabricator tasks (tracker) - https://phabricator.wikimedia.org/T135966#2317414 (10Lethexie) [08:16:46] 10MediaWiki-Codesniffer, 07Tracking: [GSoC 2016] Improving static analysis tools for MediaWiki Phabricator tasks (tracking) - https://phabricator.wikimedia.org/T135966#2317416 (10Lethexie) [08:54:40] !log Regenerating Nodepool image manually. Broke over the week-end due to a hhvm/libicu transition. Should get pip 8.1.x now [08:54:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:56:50] <_joe_> !log deployment-prep: starting upgrade of HHVM to a version linked to libicu52, T86096 [08:56:51] T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096 [08:56:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:57:57] <_joe_> hashar: where in beta should I run scripts from? [08:58:38] <_joe_> as in php scripts [09:01:18] !log Image ci-trusty-wikimedia-1463993508 in wmflabs-eqiad is ready [09:01:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:01:31] _joe_: usually deployment-tin.eqiad.wmnet [09:01:31] <_joe_> hashar: ^^ :) [09:01:38] <_joe_> oh ok sorry, syncronicity :P [09:01:40] which has Jenkins jobs running scap / l10n etc automatically [09:01:46] <_joe_> ok [09:01:57] we have a salt master on deployment-salt.deployment-prep.eqiad.wmflabs [09:02:05] with app servrs being deployment-mediawiki* [09:02:11] <_joe_> the wiki list for beta is where? [09:02:15] and some others like deployment-jobrunner* [09:02:39] operations/mediawiki-config.git dblist/all-labs.dblist iirc [09:02:49] <_joe_> ok thanks [09:02:49] there might be an up to date map on one of the page [09:03:18] _joe_: lame one at http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:SiteMatrix [09:03:27] lame cause there is lot of red links hehe [09:03:50] and https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor might be handy [09:04:10] as well as https://integration.wikimedia.org/ci/view/Beta/ which are the jobs updating beta [09:10:36] <_joe_> hashar: so there is no deployment-terbium? [09:10:43] <_joe_> a host where you can run crons? [09:10:47] nop [09:11:11] we have a task about having the puppet mediawiki::maintenance:scripts applied somewhere [09:11:20] which enable the cron jobs [09:11:30] !log Image ci-jessie-wikimedia-1463994307 in wmflabs-eqiad is ready [09:11:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:11:49] <_joe_> ok cool [09:11:57] https://phabricator.wikimedia.org/T125976 Run mediawiki::maintenance scripts? [09:12:29] apparently due to most of those cron / scripts being hardcoded to use production dblist instead of beta ones [09:12:49] <_joe_> !log deployment-prep: all hhvm hosts in beta upgraded to run on the newer libicu; now running updateCollation.php (T86096) [09:12:49] T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096 [09:12:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:13:15] bunch of " Lost parent, LightProcess exiting" but I guess that is related to the restart [09:23:30] <_joe_> hashar: yes [09:23:42] <_joe_> that is what hhvm spits out every time you stop it [09:23:52] <_joe_> in fact, I should make a patch to that :P [09:24:27] would be rather nice to have yeah :) [09:24:36] bonus point if you get a nice log like "hhvm restarting" [09:26:20] <_joe_> well, no [09:26:24] <_joe_> it's not telling you that [09:26:38] <_joe_> formally, the process has just lost the handle to its parent [10:22:32] (03CR) 10Zfilipin: [C: 032] Matt and Sam are owners of selenium-GettingStarted job [integration/config] - 10https://gerrit.wikimedia.org/r/289828 (https://phabricator.wikimedia.org/T134492) (owner: 10Zfilipin) [10:23:37] (03Merged) 10jenkins-bot: Matt and Sam are owners of selenium-GettingStarted job [integration/config] - 10https://gerrit.wikimedia.org/r/289828 (https://phabricator.wikimedia.org/T134492) (owner: 10Zfilipin) [10:54:07] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 10Traffic: deployment-cache-upload04 (m1.medium) / is almost full - https://phabricator.wikimedia.org/T135700#2317668 (10hashar) I have checked after the week-end and deployment-cache-upload04 shows the FD leak. Via `lsof -X -n|grep deleted`: * Lot of... [11:01:55] !log Upgrading hhvm on Trusty slaves. Bring him hhvm compiled against libicu52 instead of libicu48 [11:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:05:19] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 10Traffic: deployment-cache-upload04 (m1.medium) / is almost full - https://phabricator.wikimedia.org/T135700#2317675 (10Joe) @hashar the reason you see all those deleted "varnishd" lines is that varnish has been updated on disk but not restarted, which... [11:10:03] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 10Traffic: deployment-cache-upload04 (m1.medium) / is almost full - https://phabricator.wikimedia.org/T135700#2317678 (10Joe) So the problem - that we have in production too (!!!) is that the logrotate receipt calls ``` invoke-rc.d varnishlog reload ```... [11:11:29] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 10Traffic: Varnishlog doesn't properly rotates logs, varnish.log is empty since forever (was: deployment-cache-upload04 (m1.medium) / is almost full) - https://phabricator.wikimedia.org/T135700#2317679 (10Joe) p:05Low>03High [12:17:55] 10Beta-Cluster-Infrastructure, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Tracking: Thumbnail generation should happen via the same setup in the beta cluster and in production (tracking) - https://phabricator.wikimedia.org/T84950#2318115 (10Danny_B) [12:18:58] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10MobileFrontend, 13Patch-For-Review, 03Reading-Web-Sprint-73-O: MediaWiki core and MobileFrontend break branches REL1_25 and fundraising/REL1_25 and REL1_26 and REL1_27 tests - https://phabricator.wikimedia.org/T135906#2318124 (10dr0ptp4kt) [12:38:12] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 10Traffic: Varnishlog doesn't properly rotates logs, varnish.log is empty since forever (was: deployment-cache-upload04 (m1.medium) / is almost full) - https://phabricator.wikimedia.org/T135700#2318200 (10Joe) A third option is we just stop varnishlog as... [12:51:31] (03PS5) 10Zfilipin: Created selenium-Wikidata Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) [12:51:41] (03PS6) 10Zfilipin: Created selenium-Wikidata Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) [12:52:23] (03CR) 10Zfilipin: "Patch set 5 is adding Adrian as owner of selenium-Wikidata job." [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) (owner: 10Zfilipin) [12:52:47] (03CR) 10Zfilipin: [C: 032] Created selenium-Wikidata Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) (owner: 10Zfilipin) [12:53:53] (03Merged) 10jenkins-bot: Created selenium-Wikidata Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) (owner: 10Zfilipin) [13:00:46] (03CR) 10Zfilipin: "The job is deployed:" [integration/config] - 10https://gerrit.wikimedia.org/r/289828 (https://phabricator.wikimedia.org/T134492) (owner: 10Zfilipin) [13:12:17] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 10Traffic, 13Patch-For-Review: Varnishlog doesn't properly rotates logs, varnish.log is empty since forever (was: deployment-cache-upload04 (m1.medium) / is almost full) - https://phabricator.wikimedia.org/T135700#2318265 (10Joe) 05Open>03Resolved [13:31:28] Project selenium-Wikidata » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #1: 15ABORTED in 30 min: https://integration.wikimedia.org/ci/job/selenium-Wikidata/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/1/ [13:32:09] !log Upgrading Jenkins git plugins and restarting Jenkins [13:32:10] qa-morebots: mog [13:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:32:14] I am a logbot running on tools-exec-1206. [13:32:14] Messages are logged to https://tools.wmflabs.org/sal/releng. [13:32:14] To log a message, type !log . [13:33:47] do we have a swift repo available from beta? [13:47:25] hashar: zeljkof: where does the selenium user creation code live? [13:48:06] tgr: what do you mean? [13:49:03] zeljkof: what creates Selenium User after the wiki has been set up? [13:50:21] tgr: where? in mediawiki-vagrant? [13:54:51] zeljkof: I'm trying to find the culprit for these errors whoch coincide with enabling AuthManager: https://integration.wikimedia.org/ci/job/mwext-mw-selenium/6607/consoleFull [13:55:10] they seem to be related to user account creation [13:55:31] the unexpected API error message is 21:21:22 The token parameter must be set (createnotoken) (MediawikiApi::ApiError) [13:56:05] and it's either a "Given page X exists" or a "Given I am logged in" rule [13:56:38] I would like to find out what code raises that and how can I test patches to it [13:58:07] tgr: I see [13:58:18] Selenium user exists only on beta cluster [13:58:48] mwext-mw-selenium creates users as needed, on the fly [13:58:58] as far as I know [14:00:01] stack trace does not tell you where there error is raised? [14:00:38] 00:01:26.357 The token parameter must be set (createnotoken) (MediawikiApi::ApiError) [14:00:46] 00:01:26.357 ./features/step_definitions/common_steps.rb:30:in `/^I am logged into the mobile website$/' [14:00:56] looks like common_steps.rb:30 [14:01:29] can you reproduce the error when you run the test locally using mediawiki-vagrant? [14:16:35] is jenkins having problems (can't connect to local mysql server through socket) ? [14:17:09] e.g. random failures https://gerrit.wikimedia.org/r/#/c/290216/ [14:31:45] 10Continuous-Integration-Infrastructure: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2318474 (10JanZerebecki) [14:32:18] !log offlined integration-slave-trusty-1004 because it can't connect to mysql T135997 [14:32:19] T135997: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997 [14:32:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:33:52] Project selenium-WikiLove » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #29: 04FAILURE in 1 min 51 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/29/ [14:47:29] 07Browser-Tests: Make Selenium tests work with AuthManager - https://phabricator.wikimedia.org/T135884#2318528 (10Tgr) [14:55:41] hasharAway: Hi i think https://integration.wikimedia.org/ci/job/mwext-qunit/16415/console has stalled [14:56:40] hasharAway seems nodepool has gotten unstable since on friday tests were freezing manly qunit then nodepool went down [14:56:44] Per tasks https://phabricator.wikimedia.org/T135885 [14:56:50] https://phabricator.wikimedia.org/T135875 [15:31:03] 10MediaWiki-Codesniffer, 03Google-Summer-of-Code-2016: Improving an static analysis tools for MediaWiki - Weekly reports - https://phabricator.wikimedia.org/T134225#2318723 (10EBernhardson) [15:31:21] 10MediaWiki-Codesniffer, 03Google-Summer-of-Code-2016: Community bonding evaluation for Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T133829#2318725 (10EBernhardson) [15:31:43] 10MediaWiki-Codesniffer: Community bonding evaluation for Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T133829#2246443 (10EBernhardson) [15:31:46] zeljkof: I set up mobilefrontend + selenium locally and can reproduce [15:32:02] tgr: ok, great [15:32:04] common_steps.rb:30 is "log_in" which is the mediawiki_selenium gem [15:32:10] how do I get a trace within that? [15:32:23] 10MediaWiki-Codesniffer, 03Google-Summer-of-Code-2016: Community Bonding Report for Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T135393#2318728 (10EBernhardson) [15:32:33] 10MediaWiki-Codesniffer, 03Google-Summer-of-Code-2016, 07Tracking: [GSoC 2016] Improving static analysis tools for MediaWiki Phabricator tasks (tracking) - https://phabricator.wikimedia.org/T135966#2318730 (10EBernhardson) [15:32:48] run: bundle exec cucumber --verbose feature_name.feature [15:32:53] hm [15:32:57] or something lese [15:32:59] else [15:33:00] let me check [15:33:18] try also: bundle exec cucumber --backtrace feature_name.feature [15:36:55] thanks, --backtrace worked [15:47:19] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #24: 04FAILURE in 25 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/24/ [15:49:18] !log beta code update not running, disconnect-reconnect dance resulted in: [05/23/16 15:48:39] [SSH] Authentication failed. [15:49:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:50:16] deployment-tin ssh appears to still be acting wacky. [16:03:25] 10scap: Add blacklist support to scap.tasks.check_valid_syntax linter - https://phabricator.wikimedia.org/T136009#2318831 (10bd808) [16:06:09] 10scap: Add blacklist support to scap.tasks.check_valid_syntax linter - https://phabricator.wikimedia.org/T136009#2318865 (10bd808) [16:27:49] 10Beta-Cluster-Infrastructure, 10Parsoid, 06Services, 13Patch-For-Review: Creating wiki at beta cluster for the Dutch Wikipedia - https://phabricator.wikimedia.org/T118005#2318993 (10Krenair) 05Open>03Resolved Looks like the Parsoid issue got fixed at some point. [16:32:58] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Ownership of Selenium tests - https://phabricator.wikimedia.org/T134492#2319019 (10TerraCodes) Shouldn't someone claim `mediawiki/core`? [16:37:09] Hi releng! Looks like a ton of gate-and-submit jobs are stuck waiting behind hung job https://integration.wikimedia.org/ci/job/mwext-testextension-php55/11652/ [16:37:09] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10MobileFrontend, 13Patch-For-Review, 03Reading-Web-Sprint-73-O: MediaWiki core and MobileFrontend break branches REL1_25 and fundraising/REL1_25 and REL1_26 and REL1_27 tests - https://phabricator.wikimedia.org/T135906#2319030 (10Florian)... [16:37:48] ostriches thcipriani ^^ [16:37:53] https://integration.wikimedia.org/ci/job/mwext-testextension-php55/11652/ [16:38:08] 10Deployment-Systems, 06Release-Engineering-Team: thoroughly document the new branch cutting plan / strategy - https://phabricator.wikimedia.org/T136015#2319033 (10mmodell) [16:38:17] is it ok if I log into jenkins and kill that one build? [16:38:27] ejegg yes you can kill the build [16:38:32] thanks paladox [16:38:34] It keeps happening [16:38:43] never sure what will scrable zuul's brain... [16:39:10] ejegg on friday and wedsday last week we had problems with nodepool kept failing and then it went down [16:39:59] dang [16:44:02] ejegg thanks [16:45:17] paladox besides mentioning it here & killing the blocking build, is there anything else that's helpful to do when that happens? [16:45:57] ejegg im not sure normaly when that happens we write on this channel and ping some admins. [16:46:18] sounds good [16:46:35] ejegg but if you find you have to keep killing it or keep asking here we should write a task so it can be investigated. [16:47:07] right, makes sense. That's the first I'd seen that particular one hang so long [16:48:50] ejegg oh, qunit was failing last week for us. Once we killed it an hour later it failed causing us to kill it again. Maybe migrating to nodepol will fix the issue [16:49:06] But nodepool failed but not as much as the normal tests. [16:49:37] yeah, this one was just a failure in mw-setup that timed out, then also seemed to hang at the db teardown step [16:52:37] 10Continuous-Integration-Infrastructure: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2319121 (10hashar) @thcipriani mentioned slaves are somehow/sometime missing mysql :( I have rebooted that host earlier today. So maybe our puppet / service does not start on... [16:55:05] paladox: I've got a meeting to go to so I don't have time for a long explanation, but the sort answer is, no we can't make random branches/forks/sub-directories to fix the Composer 1.1.x problem [16:55:17] Oh ok. [16:55:49] We would need to ignore that file in mw core. [16:55:52] we need to make the mainline mediawiki/vendor.git work with PHP >=5.6. And we will need to fix various tests to do that [16:55:58] no, we don't [16:56:00] Oh7 [16:56:01] OH [16:56:05] we just need to not lint it [16:56:16] the file is properly excluded from runtime use [16:56:16] Ok. [16:58:18] (03PS1) 10Gergő Tisza: Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) [16:58:21] So, on to the tricky bit... fundraising/REL1_25 is currently failing hard (Failures: 3, Errors: 103), but it's running a different set of tests than REL1_25 (which I understand has a couple of its own fails now) [16:58:45] here's the fr branch: https://integration.wikimedia.org/ci/job/mediawiki-phpunit-php55-trusty/218/console [16:59:06] (03CR) 10jenkins-bot: [V: 04-1] Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) (owner: 10Gergő Tisza) [17:00:56] ejegg Hi yep i think there are some patches in mw core REL1_25 [17:01:09] See https://phabricator.wikimedia.org/T135906 please. [17:01:12] oh cool, i'll take a look [17:01:22] I was actually just going to ask about that [17:01:42] also, I think we want to run a different test suite on the fr branch [17:01:42] https://gerrit.wikimedia.org/r/#/c/290153/ https://gerrit.wikimedia.org/r/#/c/290151/ and https://gerrit.wikimedia.org/r/#/c/290150/ [17:02:06] Am I allowed to self merge them, if they're breaking silly unit tests, and its only on a REL branch? [17:02:10] mediawiki-phpunit-php55-trusty doesn't make a lot of sense given the prod environment [17:02:24] if not, can someone merge them for me? [17:02:30] and REL1_25 is running mediawiki-phpunit-php53 [17:02:50] ejegg: REL1_25 is supposed to maintain back compat with php 5.4 [17:02:51] *5.3 [17:03:11] I made this layout.yaml patch to try to bring the fr branch tests in line with the matching REL tests: https://gerrit.wikimedia.org/r/289975 [17:03:20] ejegg would you like to update to REL1_26 which is more modern [17:03:32] And has more fixes for extension.json [17:03:36] And is more stable [17:03:59] paladox: yeah, we're hoping to get a little closer to upstream [17:04:11] ejegg Ok :) [17:04:20] Would 1.27 not work for you. [17:04:27] ideally we could follow the mw deploy train at a few weeks distance [17:04:31] Are you looking to stay compat with php 5.3 and 5.4 [17:04:37] ejegg yep [17:04:55] paladox: for now, yeah, till we overhaul the whole payments cluster [17:05:32] ok, then REL1_26 would be good for you since it still supports php 5.3 and is supported longer then REL1_25. [17:05:42] but until then, we still need to be able to merge things into fundraising/REL1_25 since that's part of our deployment process [17:05:58] Ok [17:06:32] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1754 bytes in 2.089 second response time [17:06:36] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Wikipedia' not found on 'http://en.m.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 2165 bytes in 6.001 second response time [17:06:44] constant battle between near-term campaign requirements and modernizing code / keeping up with core [17:06:58] Oh [17:06:58] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1755 bytes in 0.463 second response time [17:07:02] PROBLEM - Host deployment-mediawiki01 is DOWN: CRITICAL - Host Unreachable (10.68.17.170) [17:07:45] PROBLEM - Host deployment-db2 is DOWN: CRITICAL - Host Unreachable (10.68.17.94) [17:08:30] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 2165 bytes in 6.005 second response time [17:08:35] (03PS2) 10Gergő Tisza: Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) [17:09:28] I'm rebooting a virt host which is going to cause deployment-prep to get all freaky [17:09:28] (03CR) 10jenkins-bot: [V: 04-1] Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) (owner: 10Gergő Tisza) [17:09:31] shouldn't take too long [17:09:46] PROBLEM - Host integration-slave-trusty-1017 is DOWN: CRITICAL - Host Unreachable (10.68.17.28) [17:10:48] RECOVERY - Host integration-slave-trusty-1017 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [17:10:55] Anybody have a minute to review https://gerrit.wikimedia.org/r/289975 ? Hoping that'll get us a good ways towards being able to deploy fundraising stuff [17:11:22] andrewbogott, I don't think it took down deployment-mediawiki02 or 03 though... [17:11:58] Krenair: yeah, it's just whatever's on labvirt1003 [17:12:03] https://phabricator.wikimedia.org/P3159 [17:12:42] Ah, I see why it's all failing now [17:12:52] Can't connect to MySQL server on '10.68.17.94' (4) (10.68.17.94) [17:13:00] db2 [17:13:11] (03CR) 10Paladox: [C: 031] "Looks good but this will stop php53 from testing this repo which means php55 will test now." [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [17:13:24] db2 was suspended [17:15:15] PROBLEM - Host deployment-sca01 is DOWN: CRITICAL - Host Unreachable (10.68.20.183) [17:15:45] PROBLEM - Host deployment-mx is DOWN: CRITICAL - Host Unreachable (10.68.17.78) [17:15:45] PROBLEM - Host integration-slave-trusty-1017 is DOWN: CRITICAL - Host Unreachable (10.68.17.28) [17:15:54] PROBLEM - Host deployment-zookeeper01 is DOWN: CRITICAL - Host Unreachable (10.68.17.157) [17:15:57] PROBLEM - Host deployment-sentry2 is DOWN: CRITICAL - Host Unreachable (10.68.17.204) [17:16:09] PROBLEM - Host deployment-redis01 is DOWN: CRITICAL - Host Unreachable (10.68.16.177) [17:16:18] PROBLEM - Host deployment-ores-web is DOWN: CRITICAL - Host Unreachable (10.68.21.158) [17:16:59] bd808, hi, could you take a look at https://gerrit.wikimedia.org/r/#/c/289506/ [17:17:31] PROBLEM - Host integration-slave-trusty-1023 is DOWN: CRITICAL - Host Unreachable (10.68.18.10) [17:18:59] PROBLEM - Host deployment-memc05 is DOWN: CRITICAL - Host Unreachable (10.68.23.49) [17:19:53] PROBLEM - Host integration-publisher is DOWN: CRITICAL - Host Unreachable (10.68.16.255) [17:20:34] yurik: Looks ok at a quick glance. Have you tested it locally? [17:21:16] bd808, a bit - i am having some issues with integrating it with restbase, but on the other hand - it is much less broken now than it is in master :) [17:21:20] bd808: We can blacklist that file in https://phabricator.wikimedia.org/diffusion/MCUT/browse/master/lint.php;708a35d0121703257c5ab938d2bed92de1aef8b8 [17:22:40] yurik: :) less broken is good [17:22:55] maybe mobrovac can give you a hand with the restbase bits? [17:22:57] it was hard to make it more broken ;) [17:23:25] would be great, but that's not a req really for that patch, could be done in the next one [17:23:46] yurik: *nod* let's +2 this one then and you can work on some followup [17:23:53] sounds good [17:24:03] thx [17:25:59] bd808 actually not that file this one https://github.com/wikimedia/integration-config/blob/ab9882384665a704dae02ed70e865662f03f698c/jjb/macro.yaml#L533 [17:26:53] paladox: yeah, slave-scripts/bin/git-changed-in-head is probably what needs some fixing [17:27:03] Ok thanks [17:28:03] bd808 this is what it is https://phabricator.wikimedia.org/diffusion/CIJE/browse/master/bin/git-changed-in-head [17:28:12] Im not sure which bit to change or add to ignore files. [17:29:40] paladox: can you open anther blocker task of T135161 for that? It needs some thought. Ideally we would find a way to list files to ignore in the repo itself so that the git-change-in-head and scap scripts can share the same config [17:29:40] T135161: Composer v1.1.0 generated vendor dirs will fail lint by PHP <5.6 - https://phabricator.wikimedia.org/T135161 [17:30:00] Ok [17:30:32] naively something like a .lintignore file, but I'm not sure if that will work well or not [17:30:38] (03CR) 10JanZerebecki: [C: 04-1] "This change handles some of the fr branches like master instead of the normal REL branches. In general I would like these branch regex to " [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [17:31:03] bd808: thats actually is a great idea [17:31:17] Since we would be able to do it per repo like we can with composer phplint [17:31:23] right [17:31:32] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 42951 bytes in 4.034 second response time [17:31:32] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 29446 bytes in 0.902 second response time [17:32:02] RECOVERY - Host deployment-mediawiki01 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [17:32:04] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 42939 bytes in 7.099 second response time [17:32:08] RECOVERY - Host deployment-ores-web is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [17:32:10] RECOVERY - Host deployment-redis01 is UP: PING OK - Packet loss = 0%, RTA = 0.52 ms [17:32:22] RECOVERY - Host deployment-db2 is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms [17:32:34] RECOVERY - Host deployment-zookeeper01 is UP: PING OK - Packet loss = 0%, RTA = 0.69 ms [17:33:08] RECOVERY - Host deployment-mx is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms [17:33:13] (03CR) 10Ejegg: "Thanks JanZerebecki. I'll try to make a more limited change. I guess I'd really just like to swap out mediawiki-phpunit-php55-trusty for" [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [17:33:25] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 43268 bytes in 1.017 second response time [17:33:35] RECOVERY - Host deployment-sca01 is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [17:35:23] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Update git-change-in-head script to be able to allow us to ignore files - https://phabricator.wikimedia.org/T136021#2319320 (10Paladox) [17:35:32] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Update git-change-in-head script to be able to allow us to ignore files - https://phabricator.wikimedia.org/T136021#2319333 (10Paladox) [17:35:40] bd808: https://phabricator.wikimedia.org/T136021 [17:36:22] ok, that should be it for the deployment-prep outage. Ping me if anything is still broken. [17:36:23] RECOVERY - Host deployment-sentry2 is UP: PING OK - Packet loss = 0%, RTA = 0.58 ms [17:36:35] RECOVERY - Host integration-slave-trusty-1017 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [17:37:09] PROBLEM - Puppet run on deployment-ores-web is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [17:38:03] RECOVERY - Host integration-publisher is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [17:38:11] RECOVERY - Host integration-slave-trusty-1023 is UP: PING OK - Packet loss = 0%, RTA = 0.88 ms [17:41:22] right. back to poking at the postmerge queue. [17:42:58] (03CR) 10JanZerebecki: "To find where that can be done search for:" [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [17:43:20] zeljkof: how do I run the rspec tests locally for https://gerrit.wikimedia.org/r/#/c/290269 ? [17:43:21] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Update git-change-in-head script to be able to allow us to ignore files - https://phabricator.wikimedia.org/T136021#2319361 (10bd808) Composer 1.1.0 introduced a conditionally included autoloader file that is optimized for PHP >=5.6 usag... [17:43:56] the gem is in /var/lib/gems/1.9.1/gems/mediawiki_api-0.5.0 so I turned that into a git repo, ran bundle install, but still get: cannot load such file -- support/request_helpers [17:44:44] 10Continuous-Integration-Infrastructure, 07Documentation, 13Patch-For-Review: Jenkins: Generate CSS docs from LESS and publish to doc.wikimedia.org - https://phabricator.wikimedia.org/T60620#2319367 (10Danny_B) [17:45:51] hrmmm, something must be weird with the /etc/security whatever... pam_access(sshd:account): access denied for user `jenkins-deploy' from `gallium.wikimedia.org' [17:45:57] 10Continuous-Integration-Infrastructure, 07Documentation, 13Patch-For-Review: Jenkins: Generate CSS docs from LESS and publish to doc.wikimedia.org - https://phabricator.wikimedia.org/T60620#2319370 (10Jdlrobson) 05Open>03declined We are no longer using kss in MobileFrontend. It wasn't being maintained d... [17:47:00] RECOVERY - Puppet run on deployment-ores-web is OK: OK: Less than 1.00% above the threshold [0.0] [17:51:32] bd808: https://phabricator.wikimedia.org/D234 [17:58:49] ejegg Hi could you force merge https://gerrit.wikimedia.org/r/#/c/275035/ please. [18:03:05] done [18:11:05] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10MobileFrontend, 06Reading-Web-Backlog, and 2 others: MediaWiki core and MobileFrontend break branches REL1_25 and fundraising/REL1_25 and REL1_26 and REL1_27 tests - https://phabricator.wikimedia.org/T135906#2320559 (10Jdlrobson) [18:13:49] (03CR) 10Anomie: Update account creation code for AuthManager (031 comment) [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) (owner: 10Gergő Tisza) [18:24:30] (03PS3) 10Ejegg: Swap out php55 for php53 on fundraising branches [integration/config] - 10https://gerrit.wikimedia.org/r/289975 [18:29:15] (03PS4) 10Awight: Use php 5.3 rather than php55 on fundraising branches [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [18:29:27] (03CR) 10Awight: [C: 031] "Let's try it!" [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [18:31:27] thcipriani: any chance you could help me out with a rubygem problem? [18:32:25] so... does 'check experimental' run tests from unmerged layout.yaml changes? [18:32:30] ejegg: Hi could you force merge https://gerrit.wikimedia.org/r/#/c/290281/ [18:32:31] please [18:32:44] ejegg no. It runs the experimental pipeline [18:32:53] tgr: I could try :) the browser-test ruby stuff isn't my specialty certainly. [18:32:57] ah, got it [18:33:12] So for example We only have jsonlint on one repo but want to add composer but the test wasent added to the repo [18:33:35] We use check experimental which if jsonlint has the experimental pipeline and composer test was listed in there [18:33:39] it will test composer [18:33:43] ejegg ^^ [18:33:44] paladox: i'm hoping to use that commit to see if we can get our tests passing [18:34:21] I think the pared down layout.yaml patch i just amended should just swap us to php53 [18:34:27] without any of the other changes [18:35:01] so if https://gerrit.wikimedia.org/r/289975 merges, I'm hoping https://gerrit.wikimedia.org/r/#/c/290281/ won't need to be forced [18:35:27] good evening [18:35:45] ejegg: need some CI config loves ? :) [18:36:06] hashar Hi [18:36:07] yes please hashar ! [18:36:28] oh men 55 vs 53 ... :) [18:36:28] ejegg looking at the patch yes that should do it. [18:36:30] :) [18:36:47] trying to get the php53 tests which mostly work for REL1_25 to run on fundraising/REL1_25 [18:36:47] thcipriani: I'm trying to write a patch for the mediawiki_api gem, and would need to update its rspec tests but not sure how to run them [18:36:48] (03CR) 10Paladox: [C: 031] "Looks alright too me :)" [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [18:37:11] ejegg: with fundraising/REL1_25 being your fork of REL1_25 isnt it ? [18:37:20] yeah, hardly any changes [18:37:25] so I figure it should run the same tests [18:37:41] also, 53 matches prod environment for now [18:37:44] yeah that makes sense [18:37:50] eeeek [18:37:54] iknow [18:38:06] I installed the gem by running bundle install for some extension, found where it was installed in /var/lib, set up a git remote for that, wrote the patch, ran bundler install, ran rspec and got `require': cannot load such file -- support/request_helpers (LoadError) [18:38:26] ejegg maybe we should merge REL1_25 into funraising REL1_2 [18:38:27] 5 [18:38:49] fundraising has a different deployment / code maintenance process [18:38:50] paladox: that's all set [18:38:50] since it is failing the tests with different error porbaly missing the update we did to 1.25 a few days ago [18:38:53] 07Browser-Tests, 13Patch-For-Review: Make Selenium tests work with AuthManager - https://phabricator.wikimedia.org/T135884#2320661 (10Anomie) The shared-secret SessionProvider is a good idea if the tests just need to be logged in and not worry about how that happened. Or, if you can assume a `$wgAuthManagerCon... [18:38:58] ok thanks [18:38:59] Oh [18:39:46] https://gerrit.wikimedia.org/r/#/c/289975/4/tests/test_zuul_scheduler.py,cm not sure why the test is updated there [18:39:50] paladox: even cherry-picked a couple test fixes that aren't merged into 1.25 yet 'cause they were tiny and harmless [18:39:56] that is to make sure rake is not triggered on fundraising branches [18:40:08] hashar: oh, oops [18:40:11] ejegg yep. [18:40:24] I was making more changes in that patch before, forgot to undo that one [18:40:27] one sec [18:40:31] we have introduced the Rakefile with 1.26 maybe [18:40:41] so the fundraising/REL1_25 does not have it [18:41:02] nah, it's in there, from REL1_25 [18:41:30] but there are a bunch of things that aren't run on any fr branch, and I decided not to change that just yet [18:41:33] hashar: I think mw core is failing. Since we updated 1.25 and 1.26 [18:41:50] then that is for linting the ruby files used with the browser tests, hardly needed on fundraising branch [18:41:50] Only the REL1_25 and REL1_26 branches. [18:41:53] (03PS5) 10Ejegg: Swap out php55 for php53 on fundraising branches [integration/config] - 10https://gerrit.wikimedia.org/r/289975 [18:43:53] paladox: bawolff's last couple of patches should fix that (for REL1_25 at least, not sure about 1_26) [18:44:15] Yep [18:44:31] It fixes both REL1_26 and REL1_25 mobilefrontend now breaks REL1_26 [18:45:26] tgr: hmm, so I was able to get medaiwki-api rspec to run just now: cloned the repo, ran bundle, then bundle exec rspec [18:46:45] thcipriani: ah, thanks, that does work, I just tried plain rspec [18:47:17] hashar: i took that bogus test change out of https://gerrit.wikimedia.org/r/289975 [18:47:41] tgr: glad to hear it, I think I just exhausted the sum total of my knowledge of ruby tooling :) [18:48:07] ejegg: I am trying to understand what that change is actually doing :) [18:48:31] and what is the intent :) [18:48:58] I'd like fundraising/REL1_25 to get the same php flavor tests as REL1_25 [18:49:50] since the php55 tests it's getting now are bombing out with 103 errors and 3 failures [18:51:23] then the regex are large [18:51:28] might well skip the php5 lint job [18:51:35] so i'm adding the fundraising branches to the negative-lookahead pattern for php55 things [18:52:02] and to the double-negative clause that'll run php53 tests [18:52:35] ohhhh yeah a double negative .. [18:52:42] that is what was confusing me ahah [18:53:01] (03PS6) 10Hashar: Swap out php55 for php53 on fundraising branches [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [18:53:03] lets push that [18:53:07] heh, not sure why they're written that way, it tripped me up for a couple hrs too! [18:53:21] (03CR) 10Hashar: [C: 032] "The double negative is rather confusing thanks Ejegg :)" [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [18:53:37] thanks hashar ! [18:53:39] we probably had a hard time dealing with it a while back [18:54:49] i forget, do layout updates take effect on merge, or do you have to deploy them in a separate step? [18:55:17] ejegg they have to be deployed [18:56:13] ah, cool. I'mma grab a bite and see where we are in 20 min. [18:56:21] thanks everyone [18:56:32] (03Merged) 10jenkins-bot: Swap out php55 for php53 on fundraising branches [integration/config] - 10https://gerrit.wikimedia.org/r/289975 (owner: 10Ejegg) [18:56:58] ejegg|food: deployed [18:57:22] I'm a little bit creeped out--I've been watching the CI jobs for SmashPig#290291, and they have run gate-and-submit three times now... Not a huge problem for me, but I thought you should know. [18:57:37] It looks like they're running every time something before them in the job queue finishes. [18:58:40] hashar: https://gerrit.wikimedia.org/r/#/c/290281/ recheck wont run mediawiki-phpunit-php55-trusty [18:58:46] unless on gate and sibmit [18:59:15] hashar: it seems https://integration.wikimedia.org/zuul/ has frozen again [18:59:15] that does not make sense [18:59:25] that is the exact same job [18:59:44] it is not frozen [18:59:55] pending executors to be available on Jenkins :D [18:59:58] then [19:00:07] mysql is down on Trusty slves apparently [19:00:10] hashar: Oh [19:01:30] thcipriani has filled a bug about mysql being down [19:01:37] Ok [19:03:34] 10Continuous-Integration-Infrastructure, 07WorkType-Maintenance: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2320744 (10hashar) p:05Triage>03Unbreak! Sounds bad: ``` integration-slave-trusty-1017.integration.eqiad.wmflabs: mysql stop/waiting integra... [19:04:17] https://www.mediawiki.org/wiki/Extension:WhosOnline < Will these extensuon have problems with AuthManager [19:05:53] 10Continuous-Integration-Infrastructure, 07WorkType-Maintenance: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2320749 (10hashar) The mysql service is managed by puppet. Due to T96230 / T126699 we have a custom patch to handle mysql https://gerrit.wikimedia.or... [19:05:54] Project beta-scap-eqiad build #103713: 04FAILURE in 1 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103713/ [19:07:35] *this [19:11:46] 10Continuous-Integration-Infrastructure, 07WorkType-Maintenance: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2320775 (10thcipriani) These machines seems to have mysql enabled on reboot: thcipriani@integration-saltmaster:~$ sudo salt -G 'oscodename:trusty' c... [19:15:06] 10Continuous-Integration-Infrastructure, 07WorkType-Maintenance: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2320799 (10hashar) The puppet service uses `provider => debian` and puppet agent eventually runs: ``` /etc/init.d/mysql status; echo $? mysql stop/wa... [19:15:37] Project beta-scap-eqiad build #103714: 04STILL FAILING in 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103714/ [19:16:02] oh boy. [19:20:32] thcipriani: that's me [19:20:39] it should be back now though [19:21:08] kk. deployment-tin ssh is still acting weird for me :\ Like getting *to* deployment-tin [19:21:51] I haven't had any problem with that :-/ [19:21:56] 10Continuous-Integration-Infrastructure, 07WorkType-Maintenance: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2320849 (10hashar) Restarting an instance: ``` Notice: /Stage[main]/Role::Ci::Slave::Labs/File[/var/lib/mysql]/owner: owner changed 'root' to 'mysql'... [19:22:00] I am fed up with puppet [19:22:07] can we switch to something else ? :D [19:22:07] hashar: :-o [19:22:12] lol [19:22:13] 10Beta-Cluster-Infrastructure: deployment-tin ssh: Connection closed by UNKNOWN - https://phabricator.wikimedia.org/T134777#2320850 (10thcipriani) Just ran into this again. I realized I already had a shell open to `deployment-tin` in a different window. Tailed `/var/log/auth.log` while trying to ssh from the oth... [19:22:16] scap3 [19:22:26] we can make it implement all of config management [19:22:29] ;) [19:22:49] NIH, I can't even...etc [19:22:52] oh good :) [19:23:02] woohoo, fr branch is passing again! thanks hashar [19:25:06] stuck behind a hung qunit though.... 14 minutes since it said Disconnecting all browsers [19:25:48] Project beta-scap-eqiad build #103715: 04STILL FAILING in 1 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103715/ [19:26:34] 10Continuous-Integration-Infrastructure, 07WorkType-Maintenance: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2320856 (10hashar) On some instances we have two process: ``` /usr/sbin/mysqld /bin/sh /usr/bin/mysqld_safe \_ /usr/sbin/mysqld --basedir=/usr --da... [19:30:40] :-/ [19:30:50] freakin keyholder [19:35:27] !log killed all mysqld process on Trusty CI slaves [19:35:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:35:44] Project beta-scap-eqiad build #103716: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103716/ [19:36:22] 10Continuous-Integration-Infrastructure, 07WorkType-Maintenance: integration-slave-trusty-1004 can't connect to mysql - https://phabricator.wikimedia.org/T135997#2320878 (10hashar) 05Open>03Resolved a:03hashar I have ended up with `killall mysqld` and upstart restarted it: ``` # salt -v '*trusty*' cmd.... [19:42:44] I'm confused - shouldn't zuul merge something that passes gate & submit tests? https://gerrit.wikimedia.org/r/#/c/290281/ [19:42:57] Or is that a per-project setting? [19:43:31] I can totally submit it manually, but I'm not sure if that's supposed to be necessary [19:44:24] (03PS3) 10Hashar: Whitelist user Urbanecm [integration/config] - 10https://gerrit.wikimedia.org/r/290001 (owner: 10Paladox) [19:44:35] (03CR) 10Hashar: [C: 032] "Thank you for the reviews :)" [integration/config] - 10https://gerrit.wikimedia.org/r/290001 (owner: 10Paladox) [19:45:06] ejegg: gate-and-submit set verified +1 and ask Gerrit to submit the change (ie get it merged) [19:45:19] let me check the logs :) [19:45:47] Project beta-scap-eqiad build #103717: 04STILL FAILING in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103717/ [19:46:00] cool, so it is supposed to auto-merge on pass [19:47:06] that particular patch set might be odd, with the aborted build earlier and me removing & re-adding myself [19:47:43] ohh [19:48:57] ejegg: it just Zuul being lagged [19:49:22] ah no [19:49:24] got a trace [19:49:58] 2016-05-23 19:31:14,261 INFO zuul.DependentPipelineManager: Reporting change , actions: [, {'verified': 2, 'submit': True}>] [19:49:58] 2016-05-23 19:31:14,902 INFO zuul.DependentPipelineManager: Reported change status: all-succeeded: True, merged: False [19:49:58] 2016-05-23 19:31:14,902 INFO zuul.DependentPipelineManager: Resetting builds for change because the item ahead, in gate-and-submit>, failed to merge [19:50:28] item ahead failed to merge, huh? [19:50:30] lemme see [19:50:50] Exception: Gerrit error executing gerrit review --project mediawiki/core --message "Gate pipeline build succeeded. [19:50:50] ... [19:51:00] so I am not sure what happened on Gerrit side [19:51:45] so if I manually submit that one, it shouldn't confuse zuul further, since zuul's all done with it [19:52:27] it would [19:52:32] I am wondering why it did not merge it [19:52:36] (03Merged) 10jenkins-bot: Whitelist user Urbanecm [integration/config] - 10https://gerrit.wikimedia.org/r/290001 (owner: 10Paladox) [19:52:37] maybe the branch has special permissions [19:53:51] that would make sense [19:54:00] apparently it does not :( [19:54:07] https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/core,access [19:54:18] OHH [19:54:19] https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki,access [19:54:41] so projects under mediawiki/* have a special rule for the fundraising/* branches [19:54:43] and [19:54:43] ohhh, no submit access [19:54:46] yeah :) [19:54:50] for jenkinsbot [19:55:14] so if one adds the JenkinsBot group to that ref that will let it submit for you [19:55:32] pretty sure it should have that. Who can admin those permissions? [19:55:42] Project beta-scap-eqiad build #103718: 04STILL FAILING in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103718/ [19:55:44] which would also mean anyone able to do CR+2 can merge, but that is already the case imho [19:56:15] yah, looks like the case [19:56:50] looking at the histore of mw/core fundraising branch https://gerrit.wikimedia.org/r/#/q/status:merged+project:mediawiki/core+branch:fundraising/REL1_25,n,z [19:56:59] apparently all patches got merged manually :) [19:57:20] I can change the permissions [19:58:19] could you please? Would be nice to have it act like DonationInterface, where jenkinsbot can merge [19:58:22] ejegg: I have did the change, could you highlight that change to fr-tech ? [19:58:31] will do [19:58:33] thanks! [19:58:44] so from now on a +2 would get passing tests [19:58:52] and Zuul to submit the patch in Gerrit which will merge_ [19:58:54] progress!!! [19:59:37] woohoo! [20:00:10] rsync --archive --compress --contimeout 3 rsync://castor.integration.eqiad.wmflabs:/caches/mediawiki-vagrant/master/rake-jessie/ /home/jenkins [20:00:11] rsync error: timeout waiting for daemon connection (code 35) at socket.c(281) [Receiver=3.1.0] [20:00:24] is castor sick? [20:01:46] oh [20:02:15] bd808: some labs instance got killed apparently [20:02:19] or suspended [20:02:33] hmm [20:02:35] they should all be back up and running now I thought [20:02:37] not that one (27 days up) [20:05:41] Project beta-scap-eqiad build #103719: 04STILL FAILING in 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103719/ [20:07:19] https://integration.wikimedia.org/ci/job/rake-jessie/40551/consoleFull [20:07:20] bah [20:07:29] bundle: command not found [20:08:31] bd808: I have no clue bryan sorry :( [20:08:45] I think it was transient [20:08:56] maybe a network hiccup of some sort [20:14:14] (03PS3) 10Gergő Tisza: Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) [20:16:43] Project beta-scap-eqiad build #103720: 04STILL FAILING in 1 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103720/ [20:17:04] (03CR) 10jenkins-bot: [V: 04-1] Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) (owner: 10Gergő Tisza) [20:20:53] (03CR) 10Gergő Tisza: Update account creation code for AuthManager (031 comment) [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) (owner: 10Gergő Tisza) [20:26:00] Project beta-scap-eqiad build #103721: 04STILL FAILING in 1 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103721/ [20:28:51] (03PS4) 10Gergő Tisza: Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) [20:30:49] (03CR) 10jenkins-bot: [V: 04-1] Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) (owner: 10Gergő Tisza) [20:37:10] Project beta-scap-eqiad build #103722: 04STILL FAILING in 1 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103722/ [20:37:13] PROBLEM - Host Generic Beta Cluster is DOWN: (Host Check Timed Out) [20:40:52] :( [20:42:31] PROBLEM - Puppet run on deployment-jobrunner01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:44:28] RECOVERY - Host Generic Beta Cluster is UP: PING OK - Packet loss = 0%, RTA = 0.97 ms [20:45:23] can't get this damn keyholder to work now :-/ [20:46:11] Project beta-scap-eqiad build #103723: 04STILL FAILING in 1 min 24 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103723/ [20:55:44] Project beta-scap-eqiad build #103724: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103724/ [21:02:37] (03PS5) 10Gergő Tisza: Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) [21:09:17] Project beta-scap-eqiad build #103725: 04STILL FAILING in 4 min 32 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103725/ [21:15:22] could someone review https://gerrit.wikimedia.org/r/#/c/290269/ ? [21:15:39] hashar or thcipriani maybe? [21:15:52] it should be merged before 1.27rc0 is released today [21:15:58] tgr: zeljkof could do [21:16:16] hashar: wrong timezone though [21:16:21] (I think?) [21:16:35] yeah he's probably not around [21:16:50] why would it block 1.27rc0 ? [21:17:26] Project beta-scap-eqiad build #103726: 04STILL FAILING in 2 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103726/ [21:17:27] because of https://gerrit.wikimedia.org/r/#/c/289972/ [21:17:42] oh [21:18:33] hashar: do you know if there are any other ways in which CI account creation happens? [21:19:01] well that mediawiki-api gem is mostly used for the browser tests [21:19:06] against beta / prod [21:19:26] so if it is broken for 1.27rc0 it might not be too much of a worry [21:19:35] though we haqve some browsertests running on test pipeline [21:19:41] eg for a few extensions [21:20:25] sure, but it would avoid a lot of confusion if we could keep the flag the same for 1.27 and master [21:21:44] 06Release-Engineering-Team, 05Release: MW-1.28.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T135559#2303065 (10greg) [21:21:47] maybe had a bunch of context to the commit message [21:21:52] I will poke zeljkof about it tomorrow [21:21:53] 06Release-Engineering-Team, 05Release: MW-1.28.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T135559#2303065 (10greg) p:05Triage>03Normal [21:22:12] Project selenium-Wikidata » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #2: 04FAILURE in 2 hr 32 min: https://integration.wikimedia.org/ci/job/selenium-Wikidata/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/2/ [21:25:01] 06Release-Engineering-Team, 10ReleaseTaggerBot: No activity by ReleaseTaggerBot since 17 May - https://phabricator.wikimedia.org/T136041#2321160 (10Jdforrester-WMF) [21:25:40] Project beta-scap-eqiad build #103727: 04STILL FAILING in 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103727/ [21:27:26] hashar: is jenkins ailing? [21:27:26] tgr: one sure thing it is to late for me to take a glance at it :) I have wrote a self note for tomorrow morning :) [21:27:34] andrewbogott: I hvqave no idea [21:27:49] andrewbogott: CI runs a lot of jobs and I am not really monitoring all of them :D [21:28:01] just thought you might have noticed [21:28:08] hashar: thanks [21:28:10] but I noticed a weird range of issues over the last few hours [21:28:15] I guess related to labs maintenance [21:28:32] tgr: maybe you can amend the commit message and add a bit of context, that will help :) [21:28:44] yeah, but things should be better now, I hope... [21:29:24] andrewbogott: no major outage as far as CI is concerned [21:29:35] ok, probably I just need to be patient [21:29:39] andrewbogott: it might have been disrupted a bit, but overall self recovered from whatever happened [21:30:16] if it can help, Nodepool spams its errors to labnodepool1001.eqiad.wmnet tail -F /var/log/nodepool/nodepool.log [21:30:50] the last one is from 19:57UTC or one hour and a half ago. So that looks good to me andrewbogott :) [21:30:56] 06Release-Engineering-Team: MW-1.28.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T136040#2321135 (10greg) [21:31:12] 06Release-Engineering-Team: MW-1.28.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T136042#2321180 (10greg) [21:31:48] (03PS6) 10Gergő Tisza: Update account creation code for AuthManager [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) [21:32:07] 06Release-Engineering-Team, 05Release: MW-1.28.0-wmf.4 deployment blockers - https://phabricator.wikimedia.org/T136040#2321207 (10Danny_B) [21:32:40] RECOVERY - Puppet run on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:33:00] tgr: neat :) [21:33:57] (03CR) 10Gergő Tisza: "This is a weak blocker for 1.27rc0 (we want AuthManager enabled by default in the release candidate since that's how it will be in the fin" [ruby/api] - 10https://gerrit.wikimedia.org/r/290269 (https://phabricator.wikimedia.org/T135884) (owner: 10Gergő Tisza) [21:34:19] tgr: must have been the mwext-mw-selenim jobs that run on patchset proposal ? [21:35:24] hashar: the ones I noticed ran after merge [21:35:33] still not fun to break a lot of them [21:35:41] might be https://integration.wikimedia.org/ci/job/mwext-mw-selenium/ [21:35:44] Project beta-scap-eqiad build #103728: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103728/ [21:35:53] if you get any change that showed failure, that would be helpful :) [21:36:50] anyway sleeppy time [21:36:56] thx for the commit msg update [21:37:25] hashar: MobileFrontend changes are the ones I can recall [21:54:17] Yippee, build fixed! [21:54:18] Project beta-scap-eqiad build #103729: 09FIXED in 9 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103729/ [22:15:37] Project beta-scap-eqiad build #103732: 04FAILURE in 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103732/ [22:20:53] 10Beta-Cluster-Infrastructure, 10Flow, 03Collab-Team-2016-Apr-Jun-Q4: Run Flow External Store migration in dry-run mode on Beta - https://phabricator.wikimedia.org/T119567#2321363 (10Mattflaschen-WMF) a:03Mattflaschen-WMF [22:25:38] Project beta-scap-eqiad build #103733: 04STILL FAILING in 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103733/ [22:35:42] Project beta-scap-eqiad build #103734: 04STILL FAILING in 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103734/ [22:45:50] Project beta-scap-eqiad build #103735: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103735/ [22:55:41] Project beta-scap-eqiad build #103736: 04STILL FAILING in 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103736/ [23:05:36] Project beta-scap-eqiad build #103737: 04STILL FAILING in 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103737/ [23:10:40] Project beta-scap-eqiad build #103738: 04STILL FAILING in 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103738/ [23:15:34] Project beta-scap-eqiad build #103739: 04STILL FAILING in 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103739/ [23:25:39] Project beta-scap-eqiad build #103740: 04STILL FAILING in 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103740/ [23:35:34] Project beta-scap-eqiad build #103741: 04STILL FAILING in 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103741/ [23:42:07] 23:35:33 23:35:33 ['/usr/bin/scap', 'pull-master', 'deployment-tin.deployment-prep.eqiad.wmflabs'] on mira.deployment-prep.eqiad.wmflabs returned [255]: Permission denied (publickey,keyboard-interactive) [23:43:32] Reedy: twentyafterfour was working on keyholder earlier [23:45:39] Project beta-scap-eqiad build #103742: 04STILL FAILING in 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103742/ [23:48:20] thcipriani: I'm still working on it [23:55:40] Project beta-scap-eqiad build #103743: 04STILL FAILING in 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/103743/