[03:06:51] RECOVERY - Free space - all mounts on deployment-cache-upload04 is OK: OK: All targets OK [04:18:39] Yippee, build fixed! [04:18:39] Project selenium-MultimediaViewer » chrome,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #19: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/19/ [05:12:02] RECOVERY - Free space - all mounts on deployment-fluorine is OK: OK: All targets OK [06:35:10] 10Beta-Cluster-Infrastructure, 06Labs, 06Operations, 10Traffic: deployment-cache-upload04 (m1.medium) / is almost full - https://phabricator.wikimedia.org/T135700#2311660 (10Joe) @hashar that was exactly my plan [08:19:03] jzerebecki: this is amazing, wikidata browser test run, using new job [08:19:04] https://integration.wikimedia.org/ci/view/Selenium/job/selenium-Wikidata-289396/9/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/consoleFull [08:19:25] 210 scenarios (2 failed, 4 skipped, 204 passed) [08:21:45] nice [08:22:27] zeljkof: btw why did you abandon quite a few of your patches? [08:23:00] jzerebecki: thought of a better way to do it [08:23:10] can always restore them [08:31:42] Yippee, build fixed! [08:31:42] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #24: 09FIXED in 3 min 34 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/24/ [08:52:26] ostriches: Is there a git hook we can use to block refs/changes/* from being pushed [08:52:28] please [08:54:29] but even then mw core was failing to push [08:54:56] due to it having 60 thousond plus commits [08:55:00] in refs/heads/* [09:23:42] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Ownership of Selenium tests - https://phabricator.wikimedia.org/T134492#2311869 (10zeljkofilipin) [09:24:16] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Ownership of Selenium tests - https://phabricator.wikimedia.org/T134492#2267217 (10zeljkofilipin) [09:27:46] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Ownership of Selenium tests - https://phabricator.wikimedia.org/T134492#2311878 (10zeljkofilipin) >>! In T134492#2309747, @Mattflaschen-WMF wrote: >> mediawiki/extensions/GettingStarted > > I thought it was enough to confirm I was lis... [09:33:14] can anyone kick jenkins? [09:33:16] or zuul? [09:33:27] there's a gate-and-submit job that's been running for 7 hours [09:33:28] https://integration.wikimedia.org/zuul/ [09:36:51] legoktm jzerebecki ^^ [09:37:00] or zeljkof ^^ [09:37:28] please [09:37:51] phuedx: I killed the build. That should clear it out. [09:38:07] Thanks James_F [09:38:39] phuedx: OK for me to re-+2? [09:39:11] (03PS1) 10Zfilipin: Matt and Sam are owners of selenium-GettingStarted job [integration/config] - 10https://gerrit.wikimedia.org/r/289828 (https://phabricator.wikimedia.org/T134492) [09:40:30] (03CR) 10Zfilipin: "Please +1/+2 and I will merge the commit and update the job." [integration/config] - 10https://gerrit.wikimedia.org/r/289828 (https://phabricator.wikimedia.org/T134492) (owner: 10Zfilipin) [09:41:59] James_F: go for it! [09:42:00] \o/ [09:42:16] :-) [09:43:48] (03CR) 10Phuedx: [C: 031] Matt and Sam are owners of selenium-GettingStarted job [integration/config] - 10https://gerrit.wikimedia.org/r/289828 (https://phabricator.wikimedia.org/T134492) (owner: 10Zfilipin) [10:22:21] (03CR) 10Zfilipin: "The last job run was almost green!" [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) (owner: 10Zfilipin) [10:24:57] (03PS2) 10Zfilipin: Created selenium-Wikidata Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) [10:25:05] (03PS3) 10Zfilipin: Created selenium-Wikidata Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) [10:25:09] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Ownership of Selenium tests - https://phabricator.wikimedia.org/T134492#2267217 (10Nemo_bis) When the table is ready, remember to move it to the wiki. The logical place would be https://www.mediawiki.org/wiki/Developers/Maintainers [10:25:56] (03CR) 10Zfilipin: "I think this is ready to get merged. T128097 is no longer blocking this." [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) (owner: 10Zfilipin) [10:30:57] (03PS4) 10Zfilipin: Created selenium-Wikidata Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/289396 (https://phabricator.wikimedia.org/T128190) [10:55:01] 10Browser-Tests-Infrastructure, 10Wikidata, 13Patch-For-Review, 15User-zeljkofilipin: Merge tests/browser/environments.yml and tests/browser/config/config.yml in WikidataBrowserTests - https://phabricator.wikimedia.org/T128097#2312116 (10zeljkofilipin) a:05zeljkofilipin>03None [11:48:10] 06Release-Engineering-Team, 13Patch-For-Review, 05Release, 05WMF-deploy-2016-05-17_(1.28.0-wmf.2): MW-1.28.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T134450#2312259 (10Pokefan95) [12:08:31] ostriches: GitHub i think limit uploading to 100mb [12:08:41] So we go over the limit eith refs/changes [12:53:25] twentyafterfour and ostriches i found a way to mirror mw core without refs/changes being mirrored. and without manua [12:53:36] But open changes will still be mirrored to github [12:53:44] Just wont be the working copy [12:55:24] All we have to do is go in to mw core and add [12:55:25] [remote "https://github.com/wikimedia/mediawiki"] [12:55:25] url = https://github.com/wikimedia/mediawiki [12:55:25] push = +refs/heads/*:refs/heads/* [12:55:25] push = +refs/tags/*:refs/tags/* [12:55:25] threads = 3 [12:55:38] to config in the mw core repo directory [12:56:22] It is a better solution then mirroing refs/changes directory, commits from refs/changes is still mirrored [12:56:36] due to the fact they are saved in objects instead of refs/* [13:45:35] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #25: 04FAILURE in 1 min 35 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/25/ [13:49:50] Yippee, build fixed! [13:49:50] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #26: 09FIXED in 3 min 28 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/26/ [13:52:17] 05Gerrit-Migration, 10Diffusion, 10GitHub-Mirrors, 06Repository-Admins: Have Phabricator take over replication to Github - https://phabricator.wikimedia.org/T115624#2312556 (10Paladox) @QChris hi could you when you create repos in gerrit and also create them in diffusion please add the mirror to diffusion... [14:05:26] ostriches and twentyafterfour that prevents any new open changes from being mirrored there [14:05:39] so manually adding refs/changes is the solution now [14:12:34] Yippee, build fixed! [14:12:34] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #20: 09FIXED in 26 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/20/ [14:12:58] 05Gerrit-Migration, 10Diffusion, 10GitHub-Mirrors, 06Repository-Admins: Have Phabricator take over replication to Github - https://phabricator.wikimedia.org/T115624#2312665 (10Paladox) These repos have been turned on for diffusion mirroring to GitHub.. * integration/config * operations/mediawiki/config... [14:49:32] gate-and-submit jobs backing up again [14:53:28] 10scap: l10nupdate failing due to sudo rights - https://phabricator.wikimedia.org/T135849#2312828 (10bd808) [14:59:07] Yeah just saw that. [14:59:10] Hmm, 1h49m? [14:59:37] mediawiki-extensions-qunit stuck on teardown [15:00:27] I'm going to kill that job. [15:01:26] mdholloway: Marked it as failed, queue is moving again [15:01:30] Should catch up pretty soon [15:03:02] ostriches: cool thx! [15:19:29] James_F It seems zuul has frozen again https://integration.wikimedia.org/zuul/ [15:19:37] legoktm jzerebecki ^^ [15:19:58] James_F: https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/43601/console [15:20:16] This is a second time that it has frozen on qunit within a day [15:25:52] Ugh, same problem. [15:26:02] I just killed a previous one. [15:26:42] ostriches: Thanks [15:26:51] And this one now too [15:26:54] *sigh* [15:26:57] * ostriches goes back to work [15:27:12] Ok [16:51:27] 10scap, 13Patch-For-Review, 15User-bd808: l10nupdate failing due to sudo rights - https://phabricator.wikimedia.org/T135849#2313102 (10matmarex) 05Open>03Resolved a:03bd808 [17:01:17] eek, we're seeing a node version complaint in a merge job for something completely unrelated. "Assertion error: node version v0.10.25 does not match '^v4[.]3[.]'" https://integration.wikimedia.org/ci/job/npm-node-4.3/12380/console [17:06:20] niedzielski: Oh, nodejs 0.10 should not be running on jessie [17:06:26] I wonder why [17:09:08] niedzielski: Im not sure if it is related but we have been having qunit tests freeze today [17:09:30] and also a day or two ago nodepool stopped working [17:10:03] Could you report it please. Hashar may be able to look into that. [17:10:12] It may have been a one time thing but not sure [17:18:01] ostriches: one more qunit sitting on it's thumb in the corner :/ -- https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/65228/console [17:18:15] are you just aborting the job via jenkins UI? [17:18:15] sonofa.... [17:18:23] Good thing I skipped jenkins for all the security patches. [17:18:24] Yes [17:19:32] * bd808 locks up jenkins web ui somehow.... [17:23:06] ok. I shot that test run in the head. The hang on that one looked like a comm problem between the runner and the browser [17:44:38] 10scap, 13Patch-For-Review, 15User-bd808: l10nupdate failing due to sudo rights - https://phabricator.wikimedia.org/T135849#2313324 (10bd808) a:05bd808>03thcipriani [18:26:35] ostriches: One of the blocked tasks is still not public. Expected or not? https://phabricator.wikimedia.org/T124940 [18:27:10] Yeah, it wasn't actually included in the release. [18:27:12] Lemme unlink it [18:27:16] ok [19:03:12] (03CR) 10Mattflaschen: [C: 031] Matt and Sam are owners of selenium-GettingStarted job [integration/config] - 10https://gerrit.wikimedia.org/r/289828 (https://phabricator.wikimedia.org/T134492) (owner: 10Zfilipin) [19:45:15] (03PS3) 10Florianschmidtwelzow: Add MGChecker to test pipeline (Verified+2) group [integration/config] - 10https://gerrit.wikimedia.org/r/288514 [20:25:39] ostriches it seems https://integration.wikimedia.org/zuul/ has frozen [20:25:42] again [20:25:46] bd808 ^^ [20:26:31] It seems nodepool has gone down [20:29:07] andrewbogott hi im not sure if nodepool is failing again but looking at https://integration.wikimedia.org/zuul/ it is [20:30:17] paladox: I have no idea how to read that page :) [20:30:17] !log Killing https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/43608/ which has been running for 5 hours [20:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:30:31] Do you have any info about what's actually going wrong? [20:31:02] andrewbogott Oh no. But i only see that -trusty and -jessie are the ones stalled [20:31:09] I doint have access to check [20:31:27] andrewbogott: it actually looks like the same problem we've seen before. All instances are in a 'delete' state. [20:31:50] I think this problem happended on wedsday [20:32:06] hashar said they ran out of instances to spawn [20:33:01] time to kill the wabbit :) [20:34:03] I can restart on labcontrol here... [20:34:21] doing [20:34:22] bd808: Im going to create a task on phabricator about mw qunit tests been failing all day [20:34:45] paladox: good idea [20:35:17] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling: mediawiki qunit tests have been failing all day - https://phabricator.wikimedia.org/T135875#2313752 (10Paladox) [20:35:24] bd808 ^^ [20:35:43] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling: mediawiki qunit tests have been failing all day - https://phabricator.wikimedia.org/T135875#2313764 (10Paladox) [20:35:53] so how to tell if a rabbitmq restart did anything? [20:36:06] * thcipriani looks [20:36:34] I was able to delete an instance, so that's a good sign [20:37:06] the nova-api log is less full of errors about impl_rabbit (yay new access :)) [20:37:19] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 10MediaWiki-General-or-Unknown: mediawiki qunit tests have been failing all day - https://phabricator.wikimedia.org/T135875#2313752 (10Paladox) [20:37:20] still throwing a lot o [20:37:22] May 20 20:37:02 labnodepool1001 nodepoold[6306]: raise Exception("Timeout waiting for %s" % purpose) [20:37:22] May 20 20:37:02 labnodepool1001 nodepoold[6306]: Exception: Timeout waiting for server 75838c2d-86f2-4311-91d4-3d5104766183 deletion in wmflabs-eqiad [20:37:39] but some are succeeding... [20:39:09] yeah, it was rabbit this time [20:39:17] Needs more tuning apparently [20:39:23] man the rabbitmq logs are terrible [20:40:29] CI seems to be recovering new instances are being built. [20:42:34] thcipriani: It seems in the pulish queue mediawiki change is 0000000 [20:42:47] andrewbogott: is nodepool causing this somehow? Seems like this wasn't much of a problem with just node jobs, now that php is moved it seems like it's gotten a bit worse. this could also be that I'm just more aware of this being a problem. [20:43:18] I don't know. It might be that nodepool is hammering on the api when something fails [20:43:44] it sure looks that way [20:44:01] anecdotally, it totally is. if you do: nodepool delete [instance] it makes a bunch of connections to labnet1002 [20:44:03] and it's timing out but what's the timeout and where is it set? [20:44:24] but the timeout between urllib making connections is...maybe a second? [20:44:47] lots of: INFO urllib3.connectionpool: Starting new HTTP connection (1): labnet1002.eqiad.wmnet [20:45:17] so maybe that's a nodepool bug… it should only take one connection to perform a delete [20:46:01] I mean, I also don't see the response from nova, hopefully it's waiting for a response before making a new connection. [20:46:12] and not just a client timeout. [20:46:54] * thcipriani digs in logs [20:47:52] thcipriani zuul is linking one of the changes too https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commitdiff;h=0000000000000000000000000000000000000000 [20:48:04] in the publish columb [20:48:30] paladox: that seems wrong [20:48:37] Yep [20:48:48] it is naming the change as 0000000 [20:49:07] mediawiki/core [20:50:18] I don't have a lot of fighting zuul knowledge, so I'm not clear on what to do about that. [20:50:58] oh [20:53:10] paladox: where are you seeing that information/where is the documentation for that? [20:54:11] thcipriani: Hi, here https://integration.wikimedia.org/zuul/ and scroll down to publish which is on the right near to bottom [20:57:13] ah, I see what you're saying now. hmmm. mediawiki-core-doxygen is backed up, too. [20:57:33] (03CR) 10Paladox: [C: 031] Add MGChecker to test pipeline (Verified+2) group [integration/config] - 10https://gerrit.wikimedia.org/r/288514 (owner: 10Florianschmidtwelzow) [20:58:03] thcipriani: Yep. Tests have also been failing all day [20:58:05] qunit [20:58:23] (03CR) 10Luke081515: [C: 031] Add MGChecker to test pipeline (Verified+2) group [integration/config] - 10https://gerrit.wikimedia.org/r/288514 (owner: 10Florianschmidtwelzow) [20:58:25] I reported it here https://phabricator.wikimedia.org/T135875 for qunit [20:59:55] thcipriani: Also mediawiki-core-doxygen-publish has backedup [20:59:59] and is stalled [21:01:52] the job seems to be moving, it is definitely backed-up. Why do you think it's stalled? [21:04:50] thcipriani: Because it was not testing. It just said queued [21:04:56] but yes it seems to be running now [21:10:58] so fwiw, api timeout for nodepool is set to 60 seconds. http://docs.openstack.org/infra/nodepool/configuration.html it says it prefers the value in clouds.yaml which either doesn't exist (there is no /etc/openstack but there is an /etc/nova) or I don't have access to it (not allowed in /etc/nova) [21:20:08] 60s seems reasonable to me but then again maybe not [21:36:10] ostriches: I think enabling AuthManager broke a shitload of browser tests [21:36:36] since the account creation API changed and they probably used that to set up the test users [21:46:20] zeljkof: any idea where that code lives? [21:46:46] tgr zeljkof i doint think is online. His name is greyed out [21:51:03] tgr: We can revert if you'd like [21:51:52] 07Browser-Tests: Make Selenium tests work with AuthManager - https://phabricator.wikimedia.org/T135884#2314108 (10Tgr) [21:53:01] 07Browser-Tests: Make Selenium tests work with AuthManager - https://phabricator.wikimedia.org/T135884#2314122 (10Tgr) [21:54:39] ostriches: we probably just need to set $wgDisableAuthManager = true for the CI MediaWiki installs [21:54:48] no clue how to do that though [21:55:59] tgr you do [21:56:01] if ( isset( $wgWikimediaJenkinsCI ) && $wgWikimediaJenkinsCI == true ) { [21:56:08] $wgDisableAuthManager = tru [21:56:12] $wgDisableAuthManager = true [21:56:17] } else { [21:56:24] $wgDisableAuthManager = false; [21:56:25] } [21:56:33] in core? eww [21:56:34] I forgot to add ; to $wgDisableAuthManager = tru [21:56:35] yes [21:56:41] isn't there any separate config file? [21:56:54] surely the test machines need their own config tweaks [21:57:02] Nope [21:57:09] It is done by if ( isset( $wgWikimediaJenkinsCI ) && $wgWikimediaJenkinsCI == true ) { [21:58:06] If this $wgWikimediaJenkinsCI is set in if the ci will do what ever is in there overridding all the defaults if that is set in there [22:02:07] at a glance https://github.com/wikimedia/integration-jenkins/blob/master/bin/mw-apply-settings.sh seems like the place [22:03:12] Ok [22:06:42] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling: Nodepool has gotten alot unstable since introducing php tests - https://phabricator.wikimedia.org/T135885#2314186 (10Paladox) [22:06:58] thcipriani: Ive created https://phabricator.wikimedia.org/T135885 [22:07:04] for nodepool unstability. [22:07:09] since introducing php tests [22:07:29] paladox: thank you [22:07:54] also it seems it needs more instances since it is only running 2 for trusty but can go past 2 if there is anoft left [22:07:57] your welcome [22:11:47] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling: Nodepool has gotten alot unstable since introducing php tests - https://phabricator.wikimedia.org/T135885#2314246 (10Paladox) We should probably add support for monitoring nodepool. Which if it fails should notify channels #wikimedia-op... [22:14:22] 07Browser-Tests: Make Selenium tests work with AuthManager - https://phabricator.wikimedia.org/T135884#2314274 (10Tgr) In the shorter term, `$wgDisableAuthManager = true` should probably be set for CI MediaWiki installs which run browser tests (but not the ones which run PHPUnit). Or if that's not possible just... [22:16:06] ostriches: summarized in ^, I'll leave the decision to revert or find a different fix to somone who knows more about the CI setup than I do [22:20:15] 06Release-Engineering-Team, 06Project-Admins, 15User-greg: Add in Phabricator quarterly milestones for RelEng - https://phabricator.wikimedia.org/T75729#2314298 (10Danny_B) [22:54:55] tgr: I bet the easier thing for the weekend would be to just revert for now [22:55:00] We can revisit Monday [22:55:32] works for me [22:56:27] https://gerrit.wikimedia.org/r/#/c/289972/ [23:04:29] 07Browser-Tests: Make Selenium tests work with AuthManager - https://phabricator.wikimedia.org/T135884#2314108 (10demon) Reverted out of master for now, we'll revisit Monday prior to cutting the RC. [23:51:01] anyone around from releng? we are trying to merge security patches from 1.25 but have run into some problems [23:51:45] Tons of failing tests - maybe mediawiki-phpunit-php55-trusty is out of date? [23:53:21] greg-g: are you around? [23:53:53] we might just need to switch the fundraising/REL1_25 branch to using the same tests as regular REL1_25 [23:54:23] that would probably be a good place to start [23:54:35] * bd808 goes to look at config [23:56:01] bd808: thanks. yeah for instance here the tests are different: https://gerrit.wikimedia.org/r/#/c/254419/ [23:56:41] however the last two patches show different tests: https://gerrit.wikimedia.org/r/#/c/289908/ [23:56:46] and were force merged [23:57:05] where can I see one of yours? I'm not seeing an obvious difference for the fundraising stuff in the zuul files [23:57:16] bd808: https://gerrit.wikimedia.org/r/#/c/289974/ [23:57:45] i was confused to see it run different tests on pushing a patch vs. gate+submit [23:58:26] I think some of that has been split up to make the initial test faster [23:59:01] ah sure