[00:35:41] (03CR) 10Legoktm: "That's what I thought too....in any case it needs to be re-done." [integration/config] - 10https://gerrit.wikimedia.org/r/267829 (owner: 10Legoktm) [00:35:45] (03Abandoned) 10Legoktm: Fix php53/php55 regexes. Regex is hard [integration/config] - 10https://gerrit.wikimedia.org/r/267829 (owner: 10Legoktm) [00:38:48] legoktm[NE]: So… https://gerrit.wikimedia.org/r/#/c/268047/ happening? :-) [00:39:55] Yeah, I have to rebase the parent first [00:39:59] * James_F nods. [00:40:00] Yay. [00:40:14] Because of course people changed the code after I moved it around :P [00:41:25] * James_F blames everyone else. [00:50:04] (03PS2) 10Legoktm: Use a wrapper parameter-function so jobs can have multiple functions [integration/config] - 10https://gerrit.wikimedia.org/r/268031 (https://phabricator.wikimedia.org/T125498) [00:50:22] (03CR) 10Legoktm: [C: 04-1] "PS2: Only rebased" [integration/config] - 10https://gerrit.wikimedia.org/r/268031 (https://phabricator.wikimedia.org/T125498) (owner: 10Legoktm) [00:50:51] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2006998 (10scfc) The triggering package has now moved to `groff-base` and others (https://integration.wikimedia.org/ci/job/debian-glue/89/... [00:52:58] (03CR) 10Legoktm: Use a wrapper parameter-function so jobs can have multiple functions (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/268031 (https://phabricator.wikimedia.org/T125498) (owner: 10Legoktm) [00:53:40] (03PS3) 10Legoktm: Use a wrapper parameter-function so jobs can have multiple functions [integration/config] - 10https://gerrit.wikimedia.org/r/268031 (https://phabricator.wikimedia.org/T125498) [00:55:37] (03CR) 10Legoktm: [C: 032] Use a wrapper parameter-function so jobs can have multiple functions [integration/config] - 10https://gerrit.wikimedia.org/r/268031 (https://phabricator.wikimedia.org/T125498) (owner: 10Legoktm) [00:56:49] (03Merged) 10jenkins-bot: Use a wrapper parameter-function so jobs can have multiple functions [integration/config] - 10https://gerrit.wikimedia.org/r/268031 (https://phabricator.wikimedia.org/T125498) (owner: 10Legoktm) [00:57:38] !log deploying https://gerrit.wikimedia.org/r/268031 [00:57:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:07:11] 10Continuous-Integration-Config, 7WorkType-NewFunctionality: Zuul can only apply one parameter function. Prevent us from injecting both php55 and extension dependencies - https://phabricator.wikimedia.org/T125498#2007011 (10Jdforrester-WMF) a:3Legoktm [01:14:53] 10Continuous-Integration-Config, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 5Patch-For-Review, 7WorkType-Maintenance: ContentTranslation builds fail because of missing UniversalLanguageSelector dependency - https://phabricator.wikimedia.org/T125495#2007030 (10Legoktm) [01:14:56] 10Continuous-Integration-Config, 7WorkType-NewFunctionality: Zuul can only apply one parameter function. Prevent us from injecting both php55 and extension dependencies - https://phabricator.wikimedia.org/T125498#2007028 (10Legoktm) 5Open>3Resolved Implemented. I'll file a follow-up for writing tests for s... [01:17:03] 10Continuous-Integration-Config, 7Technical-Debt: Write unit tests for set_parameters() function in zuul config - https://phabricator.wikimedia.org/T126182#2007031 (10Legoktm) 3NEW [01:54:30] (03PS4) 10Legoktm: Set up php55lint jobs, adapt phplint macro for $PHP_BIN (re-do) [integration/config] - 10https://gerrit.wikimedia.org/r/268047 [01:59:51] (03CR) 10Legoktm: [C: 032] Set up php55lint jobs, adapt phplint macro for $PHP_BIN (re-do) [integration/config] - 10https://gerrit.wikimedia.org/r/268047 (owner: 10Legoktm) [02:01:02] Yay. [02:01:13] Only 20 minutes of the pipeline outstanding. ;-) [02:09:03] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: Make puppet provider for scap3 - https://phabricator.wikimedia.org/T113072#2007099 (10mmodell) I tried to install the backport on my test instance (deploy.eqiad.wmflabs) but I got an error: https://github.com/ptomulik/puppet-backport_package_settings/issues/1 [02:09:42] (03Merged) 10jenkins-bot: Set up php55lint jobs, adapt phplint macro for $PHP_BIN (re-do) [integration/config] - 10https://gerrit.wikimedia.org/r/268047 (owner: 10Legoktm) [02:09:49] !log deploying https://gerrit.wikimedia.org/r/268047 [02:09:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [02:50:50] 10Continuous-Integration-Infrastructure: PHP5.5 tests say tidy is not installed, even though it appears to be installed - https://phabricator.wikimedia.org/T124801#2007111 (10Legoktm) ``` legoktm@integration-slave-trusty-1014:~$ php5 -i | grep tidy /etc/php5/cli/conf.d/20-tidy.ini, tidy tidy.clean_output => no v... [02:58:06] 10Continuous-Integration-Infrastructure: CI trusty slaves do not have php5-apcu installed - https://phabricator.wikimedia.org/T124800#2007113 (10Legoktm) Hrm...those are all about the old APC extension, and problems caused by the opcode caching part of it. That part is now in PHP itself, and only the userland ca... [02:59:49] 10Continuous-Integration-Infrastructure: CI trusty slaves do not have php5-apcu installed - https://phabricator.wikimedia.org/T124800#2007114 (10Legoktm) p:5Triage>3Low The tests being skipped are minimal, so I don't think this is a blocker to the php55 transition. And HHVM runs them as well. [03:09:15] 10Continuous-Integration-Infrastructure: PHP5.5 tests say tidy is not installed, even though it appears to be installed - https://phabricator.wikimedia.org/T124801#2007116 (10Legoktm) https://integration.wikimedia.org/ci/job/mediawiki-phpunit-php53/359/consoleFull apparently php53 isn't running these tests, so n... [03:20:30] 10Continuous-Integration-Infrastructure: Zend tests say tidy is not installed, even though it appears to be installed; they're run with HHVM - https://phabricator.wikimedia.org/T124801#2007118 (10Jdforrester-WMF) [03:21:29] * legoktm[NE] stabs "zend" [03:21:48] :-) [03:21:50] 10Continuous-Integration-Infrastructure: PHP53 & PHP55 tests say tidy is not installed, even though it appears to be installed; they're run with HHVM - https://phabricator.wikimedia.org/T124801#2007120 (10Legoktm) [03:22:20] Eww. [03:22:23] For each new Zend version we add we're going to edit the title to be longer? :-) [03:22:41] PHP53 & PHP55 & PHP56 & PHP57 tests say… [03:23:32] lol [03:23:47] I hope we'll have fixed the bug by then ;) [03:24:14] Your optimism bias is showing. ;-) [03:25:38] legoktm: So… are we good to throw the switch? [03:25:49] don't think so [03:25:53] I updated https://www.mediawiki.org/wiki/User:Legoktm/PHP_5.5 [03:26:07] I'll try and do the next two bullet points today [03:26:10] Wait, what? [03:26:25] ? [03:26:27] When did we suddenly add yet another 48 hour delay? [03:26:47] Because I expect adding the php55 jobs are going to break a bunch of extensions [03:27:15] Waiting 'til the SessionManager train isn't a great way to get them fixed. :-) [03:27:16] And I don't think it's a great idea to do the branch cut with CI broken [03:27:25] Hrmmm... [03:27:34] I forgot about that :P [03:27:47] Yeeeah, clearly. :-) [03:27:51] So the concern here is: [03:28:23] Extensions that are on the cluster work on PHP53 and HHVM but not in PHP55 in master; if we needed to adjust/backport it'd be a pain. [03:28:37] Is this likely? [03:31:03] More like: extensions on the cluster work functionally fine on PHP53 and HHVM, but only pass tests in PHP53, and are broken in PHP55 and HHVM [03:31:09] Most extensions don't actually run HHVM tests :S [03:31:43] They don't? Eww. [03:31:46] Only ones that are a part of the shared job do [03:32:02] * James_F sighs. [03:32:46] If they functionally work in HHVM they probably functionally work in PHP55, so it's 'just' test failures that block backports, right? [03:33:28] We could pin PHP53 to -wmf.13 for a while to make this week easier, maybe? [03:34:24] Yeah. [03:35:00] We can't keep running php53 jobs if core master bumps the version requirement [03:35:31] RegExs let us do totally random CI jobs for the -wmf.13 branches. [03:35:43] Not that that's a good idea, but we have easy emergency measures. [03:36:08] Every php53 job run against master *after* the core version bump requirement is going to fail... [03:36:33] And if the version bump goes in wmf.13, php53 jobs against that will fail too. [03:39:33] I was assuming we'd wait on the actual version bump until after the cut. [03:39:51] Then we shouldn't do the CI change yet [03:41:18] I'm objecting to sending an e-mail inviting people to try to veto it again. [03:41:31] No happiness will arise from that. [03:43:56] Right, so the email should be 'This is going to happen on Tuesday, you should check that your extensions pass. If not, lets file bugs and figure stuff out. But it's going to happen anyways." [03:44:23] Even then, people will try to say 'no'. :-( [03:52:56] People are still trying to say no :P [03:53:00] Well, one person is. [03:53:50] Indeed. [03:57:52] legoktm: Should we (you?) aim to do it at 13:00 on Tuesday? [03:58:48] Yeah, that sounds good [03:59:21] Cool. [03:59:35] Also https://gerrit.wikimedia.org/r/#/c/268951/ from F.lorianSW would be nice to merge. [03:59:44] * James_F grumbles about 'read only' repos that aren't. [06:11:47] !log tgr set $wgAuthenticationTokenVersion on beta cluster (test run for T124440) [06:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [06:12:31] gah, I can never find my way around the labs SALs [06:12:35] thanks [06:13:39] yeah they are kind of a mess [06:31:01] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 2 failures [06:59:20] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:31:32] (03PS1) 10Legoktm: Rename composer-test jobs for packages to "composer-package-{phpflavor}" [integration/config] - 10https://gerrit.wikimedia.org/r/269093 [09:31:34] (03PS1) 10Legoktm: Add composer-package-php55 job [integration/config] - 10https://gerrit.wikimedia.org/r/269094 [09:36:00] !log restarting integration puppetmaster (out of memory / cannot fork) [09:36:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:38:21] Yippee, build fixed! [09:38:21] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #763: 09FIXED in 1 min 20 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/763/ [09:39:01] (03CR) 10Legoktm: [C: 032] Rename composer-test jobs for packages to "composer-package-{phpflavor}" [integration/config] - 10https://gerrit.wikimedia.org/r/269093 (owner: 10Legoktm) [09:39:07] (03CR) 10Legoktm: [C: 032] Add composer-package-php55 job [integration/config] - 10https://gerrit.wikimedia.org/r/269094 (owner: 10Legoktm) [09:40:42] (03Merged) 10jenkins-bot: Rename composer-test jobs for packages to "composer-package-{phpflavor}" [integration/config] - 10https://gerrit.wikimedia.org/r/269093 (owner: 10Legoktm) [09:40:58] (03Merged) 10jenkins-bot: Add composer-package-php55 job [integration/config] - 10https://gerrit.wikimedia.org/r/269094 (owner: 10Legoktm) [09:41:08] !log deploying https://gerrit.wikimedia.org/r/269093 https://gerrit.wikimedia.org/r/269094 [09:41:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:44:47] legoktm: :-} [09:45:59] my list on https://www.mediawiki.org/wiki/User:Legoktm/PHP_5.5 keeps getting longer and longer :P [09:48:09] hashar: do you know if there's a way to get the list of repos that trigger a specific job in zuul? [09:48:36] legoktm: potentially using the integration/config test suite test_zuul_scheduler [09:48:46] it loads Zuul scheduler and make it load the layout.yaml [09:48:58] so from there you have a state of pipelines -> repos -> jobs [09:49:04] but no tool out of the box :( [09:49:54] ah, good enough for my purposes, thanks :) [10:24:29] 10Deployment-Systems, 3Scap3: Scap3 needs a way to handle large binary file transport - https://phabricator.wikimedia.org/T119443#2007425 (10fgiunchedi) git-annex has released a feature to make it even easier to work with large binary files, https://git-annex.branchable.com/tips/unlocked_files/ [10:48:50] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2007473 (10hashar) I am trying to update the cow image manually with: ``` jenkins-deploy@integration-slave-jessie-100... [10:55:12] http://fpaste.org/319796/92890614/raw/ hehe [11:03:13] (03CR) 10Paladox: "@Legoktm it seems that this test runs php 5.6 instead of 5.5. Please see https://integration.wikimedia.org/ci/job/composer-package-php55/1" [integration/config] - 10https://gerrit.wikimedia.org/r/269094 (owner: 10Legoktm) [11:04:00] whoops [11:05:55] :D [11:06:20] (03CR) 10Legoktm: "Oops, you're right. Let me see how to fix that..." [integration/config] - 10https://gerrit.wikimedia.org/r/269094 (owner: 10Legoktm) [11:07:23] legoktm: my bet is that a bunch of script references 'php' explicitly [11:07:33] legoktm: stuff like hardcoded #!/usr/bin env php [11:07:39] yeah,,, [11:08:40] legoktm@integration-slave-trusty-1011:~$ php5 /srv/deployment/integration/composer/vendor/bin/composer -v [11:08:43] so that works [11:08:44] legoktm: my idea was to have a 'php' slave script that would switch case [11:08:48] and have it injected ini the PATH [11:09:22] (03CR) 10Paladox: "Ok thanks." [integration/config] - 10https://gerrit.wikimedia.org/r/269094 (owner: 10Legoktm) [11:10:25] hmm [11:10:47] that would let us centralize the $PHP_BIN logic [11:11:20] but that would require us to do a bunch more puppet changes because we'd have to override the current stuff that makes php == hhvm on trusty [11:12:16] What I'd really like is a macro that sets up the phpenv thing :) [11:14:29] 10Continuous-Integration-Config: Set up composer-test for all MW extensions where it isn't broken - https://phabricator.wikimedia.org/T124342#2007508 (10Paladox) @Legoktm we could use the skip-if function in Zuul so that the repos we know will fail we can skip. [11:14:47] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2007509 (10hashar) I tried tweaking the $basepath in puppet, but that is not the issue actually though we should stil... [11:16:56] legoktm: a long term solution would be to have 5.3 / 5.5 / 5.6 / hhvm etc installed in parallel in /opt/versionXXX or something [11:17:16] then use debian alternative system to have /usr/bin/php --> /etc/alternatives/php --> /opt/version5.5/bin/php set for us [11:17:22] but would need to do that in nodepool instances [11:17:45] I wanna migrate the nodejs/npm jobs there first. it is being delayed :( [11:34:18] oh come on [11:34:39] even if you change the `php` that the composer process is invoked by, we need to change all the parallel-lint stuff [11:34:41] and phpunit [11:34:41] grr [11:35:30] okay, I think we need to change /usr/bin/php... [11:46:16] 10Continuous-Integration-Infrastructure: Make /usr/bin/php a wrapper that picks the right PHP version on CI slaves - https://phabricator.wikimedia.org/T126211#2007557 (10Legoktm) 3NEW [11:47:14] hashar: ^ if you have any thoughts you want to add, I'll try and tackle that in the morning. good night :) [11:47:28] legoktm: will try! good night :-} [12:01:33] 10Continuous-Integration-Infrastructure: Make /usr/bin/php a wrapper that picks the right PHP version on CI slaves - https://phabricator.wikimedia.org/T126211#2007583 (10hashar) On Debian systems, `php` is managed by the 'alternative` system. When we wanted to run tests with HHVM I have split the jobs so that Z... [12:01:38] (03PS1) 10Hashar: (WIP) php flavored entry point (WIP) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269109 (https://phabricator.wikimedia.org/T126211) [12:02:45] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Make /usr/bin/php a wrapper that picks the right PHP version on CI slaves - https://phabricator.wikimedia.org/T126211#2007587 (10hashar) https://gerrit.wikimedia.org/r/269109 Lame https://gerrit.wikimedia.org/r/269109 being a slave script bin/php: ```... [12:51:00] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2007646 (10hashar) The reason for the symlink of sid/unstable is {T111097} [12:54:55] 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review, 7WorkType-Maintenance: Change sid pbuilder image name to 'unstable' - https://phabricator.wikimedia.org/T111097#2007656 (10hashar) Funny side effect found on {T125999}. The labs/toollabs repo mentions `unstable` and thus the... [12:55:07] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #750: 04FAILURE in 1 min 6 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/750/ [12:55:25] hashar: any idea why "full log" link here opens a page that obviously does not have a full log :/ [12:55:26] https://integration.wikimedia.org/ci/view/BrowserTests/view/language-screenshot/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/82/LANGUAGE_SCREENSHOT_CODE=en,label=contintLabsSlave%20&&%20UbuntuTrusty/console [12:57:33] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2007660 (10hashar) Something I don't quite understand yet is that the `base-unstable-amd64.cow` is a symlink to `base... [12:58:40] zeljkof: eek [12:58:44] zeljkof: some jenkins bug / oddity :-( [12:59:08] I have noticed that a long time ago, always forget to ask [12:59:32] must be a Jenkins bug of some sort [13:00:00] zeljkof: note how there are two views referred to in the URL /view/BrowserTests/view/language-screenshot [13:00:22] https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/82/LANGUAGE_SCREENSHOT_CODE=en,label=contintLabsSlave%20&&%20UbuntuTrusty/consoleFull [13:00:24] fails as well [13:00:39] hashar: sorry, in a meeting with aharoni, we have just noticed it, wanted to ping you before I forget [13:01:56] zeljkof: workaround, on the left of the console window use the [View as plain text] https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/82/LANGUAGE_SCREENSHOT_CODE=en,label=contintLabsSlave%20&&%20UbuntuTrusty/consoleText [13:02:00] that ones work [13:02:10] hashar: great, did not notice that [13:02:17] one of the plugin formatting the console output must have some issue :( [13:03:10] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: Make puppet provider for scap3 - https://phabricator.wikimedia.org/T113072#2007665 (10akosiaris) >>! In T113072#2006936, @mmodell wrote: > It looks like we can't use `package_settings` in the provider because that feature was added in puppet 3.5 and Trusty use... [13:07:01] zeljkof: the HTML is there but hidden. Collapsing Console Sections plugin has an issue :-} [13:07:25] hashar: makes sense [13:07:43] I think I have noticed the problem after we have installed the extension [13:17:22] zeljkof: fixed. https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/82/LANGUAGE_SCREENSHOT_CODE=en,label=contintLabsSlave%20&&%20UbuntuTrusty/consoleFull [13:17:35] hashar: you are quick! :D [13:17:40] zeljkof: I got rid of the broken feature that attempts to create a section for a shell command [13:41:45] I'm looking at how to deploy a new jar for Elasticsearch on labs (context: https://phabricator.wikimedia.org/T109101). Is there some doc on lab's puppetmaster? [13:43:26] This change would require adding a jar to Elasticsearch lib folder, is there a standard way to integrate this with puppet code, or should it be done with a standalone git-fat / git-deploy ? [14:30:33] (03PS1) 10Hashar: dib: stop installing grunt-cli [integration/config] - 10https://gerrit.wikimedia.org/r/269126 (https://phabricator.wikimedia.org/T119143) [14:30:43] (03CR) 10Hashar: [C: 032] dib: stop installing grunt-cli [integration/config] - 10https://gerrit.wikimedia.org/r/269126 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [14:31:43] (03Merged) 10jenkins-bot: dib: stop installing grunt-cli [integration/config] - 10https://gerrit.wikimedia.org/r/269126 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [14:40:57] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#2007826 (10hashar) Got nodejs 4.2 provisioned. The mathoid tests seems to fail though https://integration.wikimedia.org/ci/job/npm-node-4.2/4/console :( [14:47:00] (03PS1) 10Hashar: npm-node-4.2 on a bunch of JS services repos [integration/config] - 10https://gerrit.wikimedia.org/r/269129 (https://phabricator.wikimedia.org/T124989) [14:47:10] (03CR) 10Hashar: [C: 032] npm-node-4.2 on a bunch of JS services repos [integration/config] - 10https://gerrit.wikimedia.org/r/269129 (https://phabricator.wikimedia.org/T124989) (owner: 10Hashar) [14:47:58] !log regenerated nodepool reference image (got rid of grunt-cli https://gerrit.wikimedia.org/r/269126 ) [14:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:48:44] (03Merged) 10jenkins-bot: npm-node-4.2 on a bunch of JS services repos [integration/config] - 10https://gerrit.wikimedia.org/r/269129 (https://phabricator.wikimedia.org/T124989) (owner: 10Hashar) [14:50:55] 10Beta-Cluster-Infrastructure, 6Services, 6operations, 5Patch-For-Review: Move Node.JS services to Jessie and Node 4.2 - https://phabricator.wikimedia.org/T124989#2007854 (10hashar) For the sources repositories under mediawiki/services/.* you should now be able to comment `check experimental` to trigger `... [14:54:46] (03PS1) 10Hashar: zuul: template npm-node-4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/269131 (https://phabricator.wikimedia.org/T119143) [14:54:54] !log nodepool: refreshed snapshot image , Image ci-jessie-wikimedia-1454942958 in wmflabs-eqiad is ready [14:54:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:58:15] (03CR) 10Hashar: [C: 032] zuul: template npm-node-4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/269131 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [14:59:56] (03Merged) 10jenkins-bot: zuul: template npm-node-4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/269131 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [15:11:04] PROBLEM - Puppet failure on integration-slave-precise-1011 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [15:13:04] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:13:58] PROBLEM - Puppet failure on deployment-upload is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:14:48] (03PS1) 10Hashar: jjb: macro to archive /log/ and allow it to be empty [integration/config] - 10https://gerrit.wikimedia.org/r/269136 [15:14:50] (03PS1) 10Hashar: jjb: npm-node-4.2 archive /log/ (allow it to be empty) [integration/config] - 10https://gerrit.wikimedia.org/r/269137 [15:16:16] 7Browser-Tests, 10VisualEditor, 5Patch-For-Review: Delete or fix failed VisualEditor browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94162#2007907 (10zeljkofilipin) a:3zeljkofilipin [15:16:30] (03CR) 10jenkins-bot: [V: 04-1] jjb: npm-node-4.2 archive /log/ (allow it to be empty) [integration/config] - 10https://gerrit.wikimedia.org/r/269137 (owner: 10Hashar) [15:16:56] (03PS2) 10Hashar: jjb: macro to archive /log/ and allow it to be empty [integration/config] - 10https://gerrit.wikimedia.org/r/269136 [15:16:58] (03PS2) 10Hashar: jjb: npm-node-4.2 archive /log/ (allow it to be empty) [integration/config] - 10https://gerrit.wikimedia.org/r/269137 [15:19:17] PROBLEM - Puppet failure on integration-slave-trusty-1015 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [0.0] [15:19:30] (03CR) 10Hashar: [C: 032] jjb: npm-node-4.2 archive /log/ (allow it to be empty) [integration/config] - 10https://gerrit.wikimedia.org/r/269137 (owner: 10Hashar) [15:19:38] (03CR) 10Hashar: [C: 032] jjb: macro to archive /log/ and allow it to be empty [integration/config] - 10https://gerrit.wikimedia.org/r/269136 (owner: 10Hashar) [15:22:42] PROBLEM - Puppet failure on deployment-logstash2 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [0.0] [15:28:35] (03Merged) 10jenkins-bot: jjb: macro to archive /log/ and allow it to be empty [integration/config] - 10https://gerrit.wikimedia.org/r/269136 (owner: 10Hashar) [15:28:37] (03Merged) 10jenkins-bot: jjb: npm-node-4.2 archive /log/ (allow it to be empty) [integration/config] - 10https://gerrit.wikimedia.org/r/269137 (owner: 10Hashar) [15:32:03] (03PS1) 10Hashar: jjb: port {name}-{repository}-npm to nodejs 4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/269142 (https://phabricator.wikimedia.org/T119143) [15:34:26] (03PS1) 10Hashar: citoid: migrate to node4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/269144 (https://phabricator.wikimedia.org/T119143) [15:34:40] (03CR) 10Hashar: [C: 032] citoid: migrate to node4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/269144 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [15:37:27] (03Merged) 10jenkins-bot: citoid: migrate to node4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/269144 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [15:44:54] (03PS1) 10Hashar: graphoid: add noop jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269147 (https://phabricator.wikimedia.org/T106668) [15:46:17] (03CR) 10Hashar: [C: 032] "@yurik this way you can CR+2 and Zuul will happily merge the change for you instead of you force merging them." [integration/config] - 10https://gerrit.wikimedia.org/r/269147 (https://phabricator.wikimedia.org/T106668) (owner: 10Hashar) [15:46:34] hashar, saw your patches, i need to do? [15:46:48] yurik: hello :) [15:46:53] hi ) [15:46:55] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:46:58] yurik: seems npm install against Graphoid requires a bunch of -dev dependencies [15:47:03] yurik: and they are not on the CI slaves :D [15:47:19] hashar, i think its not the dev deps, its the apt-get deps [15:47:27] canvas uses some native code [15:47:32] yurik: but they are in puppet. So one would need to extract the dependency list in a standalone puppet class so we can get those deps installed on CI ( https://phabricator.wikimedia.org/T119693 ) [15:47:55] meanwhile [15:48:04] I am adding a job "noop" which always succeed :-} [15:48:09] hashar, i moved the "vega" dep from required to required-dev [15:48:42] in reality it should be part of req [15:48:52] noops are good ) [15:49:18] yurik: and if you comment in gerrit "check experimental", that will cause Zuul to run npm install && npm test using nodejs 4.2 :-} [15:49:27] though still without the -dev .deb packages [15:52:18] hashar, which dev deps are needed/fail? [15:52:41] all "dependencies" are for regular usage, not dev [15:52:48] (inside the "deploy") [15:52:59] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:53:17] yurik: I am referring to system packages [15:53:52] seems there libjpeg*-dev libcairo*-dev needed according to an old task I filled about it https://phabricator.wikimedia.org/T119693 [15:54:01] RECOVERY - Puppet failure on deployment-upload is OK: OK: Less than 1.00% above the threshold [0.0] [15:55:41] yurik: earlier I have picked the last merged change https://gerrit.wikimedia.org/r/#/c/268819/ and commented 'check experimental' the npm install failure can be seen at https://integration.wikimedia.org/ci/job/npm-node-4.2/5/console [15:55:52] gyp: Call to './util/has_lib.sh freetype' returned exit status 0 while in binding.gyp. [15:55:56] and I have no clue what it is [15:55:57] :( [15:56:08] oh pkg-config [15:57:31] 10Deployment-Systems, 6Performance-Team, 10Traffic, 6operations, 5Patch-For-Review: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime - https://phabricator.wikimedia.org/T99096#2008045 (10Krinkle) [15:57:50] RECOVERY - Puppet failure on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [15:58:08] hashar: Did you ever find out what was causing Jenkins/Zuul to not be able to test things right after submission? I see things like this very often https://gerrit.wikimedia.org/r/#/c/269149/ [15:58:11] Couple of times a week [15:58:16] Seems like a race condition. [15:58:30] Somehow Zuul ends up testing the patch before it "exists" [15:59:07] ah [15:59:18] RECOVERY - Puppet failure on integration-slave-trusty-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [15:59:43] Krinkle: when the patchset event is received by Zuul scheduler, it invokes a merge function that is processed by the Zuul mergers process [16:00:05] they fetch the patch, attempt to merge it on tip of branch and fail whenever it can't trivially merge it [16:00:14] so https://gerrit.wikimedia.org/r/#/c/269149/ is in conflict [16:00:48] can dig in log on either gallium.wikimedia.org or scanidum.eqiad.wmnet (the two hosts of zuul-merger service) and look at /var/log/zuul/merger.log or something [16:01:42] hashar, yeah, i have no clue what that is either - i am guessing it is part of installing canvas, which rebuilds itself from C [16:02:17] if the server had all the dependencies (packages, not npm) listed at the end, it probably should have worked [16:03:24] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Provision pkg-config on Nodepool instances - https://phabricator.wikimedia.org/T126230#2008087 (10hashar) 3NEW [16:03:43] 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#2008096 (10hashar) [16:03:46] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Provision pkg-config on Nodepool instances - https://phabricator.wikimedia.org/T126230#2008087 (10hashar) [16:04:05] yurik: yup [16:04:23] yurik: so to provision the dependencies on CI we would need a puppet class extracted from modules/graphoid/xxx.pp [16:04:34] and that we can include when bootstrapping the CI slaves [16:04:45] * yurik has no clue what that means [16:12:01] hashar: did zuul go crazy and start processing events from long ago? I got several emails about test builds like this one -- https://gerrit.wikimedia.org/r/#/c/249908/ -- on patches there were merged days or even months ago. [16:12:28] bd808: oh [16:12:49] bd808: yeah that one is missing a meaningful comment / message [16:13:09] bd808: legoktm has been working on adding php55 support in CI and he probably enqueued a bunch of changes directly in Zuul [16:13:16] bd808: i.e. without commenting in Gerrit "recheck" [16:13:26] ah. makes sense I guess [16:13:38] he probably listed the last change merged for any repo that got a zend 53 job [16:13:52] then mass triggered a recheck bypassing Gerrit and looked at the result of php55 jobs [16:14:02] which ends up being slightly confusing indeed hehe [16:18:48] (03Merged) 10jenkins-bot: graphoid: add noop jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269147 (https://phabricator.wikimedia.org/T106668) (owner: 10Hashar) [16:28:17] 10Deployment-Systems, 6operations, 5Patch-For-Review: l10nupdate user uid mismatch between tin and mira - https://phabricator.wikimedia.org/T119165#2008191 (10Joe) Just for the record, cron needs to be restarted if an uid has been changed, this made the l10nupdate job fail for days in a row [16:33:32] 7Browser-Tests, 10Continuous-Integration-Config, 10MediaWiki-extensions-RelatedArticles: RelatedArticles browser tests should run on a commit basis - https://phabricator.wikimedia.org/T120715#2008202 (10zeljkofilipin) [16:47:35] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#2008246 (10mobrovac) Mathoid needs the JRE for the tests :/ [16:49:59] 7Browser-Tests, 10Continuous-Integration-Config, 10MediaWiki-extensions-RelatedArticles: RelatedArticles browser tests should run on a commit basis - https://phabricator.wikimedia.org/T120715#2008249 (10zeljkofilipin) >>! In T120715#1930208, @bmansurov wrote: > @zeljkofilipin, what may be causing an element... [17:08:23] 10Deployment-Systems, 6operations, 5Patch-For-Review: l10nupdate user uid mismatch between tin and mira - https://phabricator.wikimedia.org/T119165#2008312 (10Dzahn) >>! In T119165#2008191, @Joe wrote: > Just for the record, cron needs to be restarted if an uid has been changed, this made the l10nupdate job... [17:11:55] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: Make puppet provider for scap3 - https://phabricator.wikimedia.org/T113072#2008319 (10mmodell) @akosiaris: I agree, the backport doesn't seem to work right and I can come up with something that uses install_options easily enough. Thanks for your feedback. [17:13:21] 6Release-Engineering-Team, 5WMF-deploy-2016-02-16_(1.27.0-wmf.14): MW 1.27.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T125597#2008321 (10hashar) a:3hashar Per #releng meeting I will handle that train. [17:15:46] 6Release-Engineering-Team, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13): MW 1.27.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T125596#2008324 (10hashar) I will cut 1.27.0-wmf.13 early in my european afternoon probably in the span of 1pm-3pm UTC. That will get us SessionManager again. I might sc... [17:30:26] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#2008381 (10Physikerwelt) cf. T71702 [17:36:20] (03CR) 10Ricordisamoa: "Is it advisable to start migrating to npm-node-4.2?" [integration/config] - 10https://gerrit.wikimedia.org/r/263344 (owner: 10Hashar) [17:40:40] bd808: hey. can I convince you to process https://phabricator.wikimedia.org/T116506 ? would instantly save you work, as I would then have permission to merge https://gerrit.wikimedia.org/r/#/c/267864/ ;) my attempts to trick other people into merging patches in mediawiki/vagrant always fail... [17:43:29] jzerebecki: I have to admit I've never messed with gerrit perms [17:45:08] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#2008446 (10hashar) All the CI slaves have `openjdk-7-jre-headless` which is required by the Jenkins client : https://packages.debian.org/jessie/openjdk-7-jre-headl... [17:45:15] bd808: easy: https://gerrit.wikimedia.org/r/#/admin/groups/11,members enter name, click add [17:46:32] jzerebecki: apparently I don't have perms to change that so you'll need to find another victim [17:47:22] aw sorry didn't check, that group is owned by gerrit Administrators [17:47:55] thx for trying [17:49:42] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#2008483 (10hashar) I am wrong. Java is not provisioned on the Jessie slaves. Jenkins automatically install it on the host for us though but that is not really avai... [17:51:07] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Provision openjdk-8-jre-headless on Nodepool slaves - https://phabricator.wikimedia.org/T126246#2008504 (10hashar) 3NEW a:3hashar [17:58:45] twentyafterfour: may I bother you to process a gerrit premission request https://phabricator.wikimedia.org/T116506 ? [18:00:08] jzerebecki: I am not usually the one to handle gerrit permissions, though I can probably figure it out... [18:00:17] nevermind hashar was faster. [18:00:20] twentyafterfour: thx anyway [18:00:23] :) [18:03:33] greg-g: can i deploy a patch? no one has a deploy window for a few hours. An array of namespace weights for search gets re-indexed by php and ends up wrong, meaning if you search the main ns and another namespace, the main namespace score gets reduced by 95% and gives shitty results: https://gerrit.wikimedia.org/r/269168 [18:03:46] greg-g: and multiple namespaces basically means anything with multiple content namespaces, like wikitech or wikisource [18:13:19] ebernhardson: yeah [18:16:29] greg-g: thanks [18:24:39] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Provision openjdk-8-jre-headless on Nodepool slaves - https://phabricator.wikimedia.org/T126246#2008674 (10Johsthao) [18:24:54] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Provision pkg-config on Nodepool instances - https://phabricator.wikimedia.org/T126230#2008681 (10Johsthao) [18:25:22] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Make /usr/bin/php a wrapper that picks the right PHP version on CI slaves - https://phabricator.wikimedia.org/T126211#2008695 (10Johsthao) [18:25:57] 10Continuous-Integration-Config, 7Technical-Debt: Write unit tests for set_parameters() function in zuul config - https://phabricator.wikimedia.org/T126182#2008708 (10Johsthao) [18:27:06] Johsthao is mucking up phabricator, merging all kinds of tasks into https://phabricator.wikimedia.org/T126250 [18:32:24] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Provision openjdk-8-jre-headless on Nodepool slaves - https://phabricator.wikimedia.org/T126246#2008782 (10matmarex) 5duplicate>3Open [18:32:26] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#2008783 (10matmarex) [18:32:47] 10Continuous-Integration-Config, 10Graphoid, 6Services, 5Patch-For-Review: Enable jenkins test & submit for graphoid repo - https://phabricator.wikimedia.org/T106668#2008797 (10matmarex) [18:32:50] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Provision pkg-config on Nodepool instances - https://phabricator.wikimedia.org/T126230#2008791 (10matmarex) 5duplicate>3Open [18:33:52] 10Continuous-Integration-Config, 7Technical-Debt: Write unit tests for set_parameters() function in zuul config - https://phabricator.wikimedia.org/T126182#2008830 (10matmarex) 5duplicate>3Open [18:55:05] (03CR) 10JanZerebecki: "This is not sufficient as there is still the mediawiki-extensions-qunit job which also runs the Wikidata tests if it runs for other extens" [integration/config] - 10https://gerrit.wikimedia.org/r/268790 (owner: 10Paladox) [18:55:14] (03CR) 10JanZerebecki: [C: 04-1] Blacklist Wikidata in branch REL1_25 for the qunit tests [integration/config] - 10https://gerrit.wikimedia.org/r/268790 (owner: 10Paladox) [18:58:04] (03CR) 10Paladox: [C: 04-1] "Ive fixed the problem on the REL1_25 branch just waiting for review. Unit tests still doint work though so I disabled them on the extensio" [integration/config] - 10https://gerrit.wikimedia.org/r/268790 (owner: 10Paladox) [19:04:12] hey all, what's the typical deployment cadence for mw extension changes? [19:04:17] specifically i’m wondering if this https://gerrit.wikimedia.org/r/#/c/268577/ will go out this week as part of 1.27.0-wmf.13 [19:05:14] mdholloway: we cut from master every tuesday and then https://wikitech.wikimedia.org/wiki/Deployments/One_week [19:06:01] greg-g: ah, that's the page i was looking for but didn't know if it existed, thanks! [19:06:42] mdholloway: linked in the top part of The One True Place To Look For Deployment Inforamtion (ish): https://wikitech.wikimedia.org/wiki/Deployments [19:21:48] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Flow, 10Wikidata, and 2 others: Wikidata QUnit broken on branch REL1_25 causing other extensions to fail - https://phabricator.wikimedia.org/T126073#2009092 (10JanZerebecki) The fix to Wikidata to make it pass that job on master wa... [19:26:53] (03CR) 10Paladox: Blacklist Wikidata in branch REL1_25 for the qunit tests [integration/config] - 10https://gerrit.wikimedia.org/r/268790 (owner: 10Paladox) [19:28:22] (03PS2) 10Paladox: Blacklist Wikidata in branch REL1_25 for the qunit tests [integration/config] - 10https://gerrit.wikimedia.org/r/268790 [19:32:27] (03PS1) 10Paladox: Add php55 zuul template [integration/config] - 10https://gerrit.wikimedia.org/r/269188 [19:37:06] (03CR) 10Reedy: Add php55 zuul template (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/269188 (owner: 10Paladox) [19:39:27] (03PS1) 10Paladox: Add php55 to mwext-testextension- [integration/config] - 10https://gerrit.wikimedia.org/r/269189 [19:41:46] (03CR) 10jenkins-bot: [V: 04-1] Add php55 to mwext-testextension- [integration/config] - 10https://gerrit.wikimedia.org/r/269189 (owner: 10Paladox) [19:42:32] (03PS1) 10Paladox: [SemanticForms] Add npm and move jshint and jsonlint to check: [integration/config] - 10https://gerrit.wikimedia.org/r/269191 [19:45:14] 3Scap3: Parameterize global /etc/scap.cfg in ops/puppet - https://phabricator.wikimedia.org/T126259#2009242 (10dduvall) 3NEW [19:46:25] 3Scap3: Parameterize global /etc/scap.cfg in ops/puppet - https://phabricator.wikimedia.org/T126259#2009259 (10dduvall) p:5Triage>3Normal [19:47:38] 3Scap3: Parameterize global /etc/scap.cfg in ops/puppet - https://phabricator.wikimedia.org/T126259#2009242 (10dduvall) [19:47:40] 10Deployment-Systems, 3Scap3: scap3 configuration selection is confusing - https://phabricator.wikimedia.org/T120410#2009271 (10dduvall) [19:57:21] The gate-and-submit queue is being stoopid slow. job at the top has been enqueued for 38 minutes [19:58:32] also post-merge. [20:00:02] (03PS3) 10Paladox: Blacklist Wikidata in branch REL1_25 for the qunit tests [integration/config] - 10https://gerrit.wikimedia.org/r/268790 [20:03:55] 7Browser-Tests, 10MediaWiki-Authentication-and-authorization: Create some end-to-end tests for SessionManager - https://phabricator.wikimedia.org/T125599#2009364 (10dduvall) Thanks, @tgr! It's tremendously helpful to have such well-defined scenarios to work from. In our meeting last week, we came to the concl... [20:07:10] bd808: more dependent pipeline fun it looks like [20:07:22] CheckUser php53 tests are holding everything up [20:07:45] (03PS2) 10Paladox: [SemanticForms] Add npm and move jshint and jsonlint to check: [integration/config] - 10https://gerrit.wikimedia.org/r/269191 [20:07:45] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Flow, 10Wikidata, and 2 others: Wikidata QUnit broken on branch REL1_25 causing other extensions to fail - https://phabricator.wikimedia.org/T126073#2009379 (10Mattflaschen) 5. Fix the Wikidata extension directly on the release bra... [20:08:23] !log toggled "Enable Gearman" off and on in Jenkins to wake up deployment-bastion workers [20:08:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:09:33] marxarelli: the php53 tests seem to be even slower than usual today [20:09:58] er, now that job is queued but still blocking?! [20:10:34] this really doesn't look good -- https://integration.wikimedia.org/ci/job/mediawiki-extensions-php53/ [20:11:20] my +2 for my deploy window has now been stuck for 30m -- https://gerrit.wikimedia.org/r/#/c/269083/ [20:11:25] that's not cool [20:13:02] marxarelli: thcipriani help ping ^ [20:14:10] !log aborting pending mediawiki-extensions-php53 job for CheckUser [20:14:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:15:07] 7Browser-Tests, 10MediaWiki-Authentication-and-authorization: Create some end-to-end tests for SessionManager - https://phabricator.wikimedia.org/T125599#2009422 (10bd808) I think we need these tests before rolling out AuthManager which is ~~hopefully~~ going to deploy this quarter. [20:15:29] looks like i can't abort a pending job in jenkins [20:19:44] anybody here familiar with gerrit setup? [20:19:53] frack. so my "extra long" hour to do 15 minutes of work is all stuck thanks to my buddy jerkins [20:19:57] * bd808 pouts [20:22:59] what's up with the rise in mediawiki-extensions-php53 runs? did we add that job to the project templates recently or something? [20:23:07] marxarelli: is that CheckUser merge just completely stuck now? [20:23:50] I wonder if some/all are related to the emails I was asking about earlier today? [20:23:57] bd808: i think restarting gearman through it into a weird state [20:24:13] "[09:13] < hashar> bd808: legoktm has been working on adding php55 support in CI and he probably enqueued a bunch of changes directly in Zuul" [20:25:15] marxarelli: hmm.. I didn't actually restart gearman, I just toggled it to clear the "waiting for available workers" deadlock [20:25:40] but I certainly could have been the one who broke it [20:27:28] is it normal for only 1 executor per node to be in use? [20:28:15] it depends on the job I think thcipriani [20:29:19] Krinkle: are you around? We've got zuul messed up and need to make it forget about a job [20:29:49] bd808: checking [20:29:56] ugh, where does gearman run? [20:30:09] i don't see it on gallium [20:30:18] inside the zuul process on gallium I think [20:30:53] bd808: Blocked on CheckUser, which is blocked on https://integration.wikimedia.org/ci/job/mediawiki-extensions-php53/, which is blocked on available executors for the CI slaves [20:31:27] Krinkle: right. i tried to abort the CheckUser job, but it doesn't seem to work from the jenkins ui [20:31:37] And beta update is broken for 3+ hours as well [20:31:42] can we tell gearman/zuul to drop it? [20:31:46] No [20:31:47] Hold on [20:33:53] I woke the beta updates up briefly but doing so may have caused the CheckUser job mess [20:34:53] I did what it listed on https://phabricator.wikimedia.org/T72597 to break the deadlock ~25 minutes ago [20:35:29] Fixed [20:35:34] Hold on [20:35:40] i'm not sure there was a deadlock. it was just gate-and-submit contention due to long running php53 jobs [20:36:04] beta jenkins host is always broken due to the job construction and further unknown brokenness that creeps up from time to time causing Jenkins to lose ability to schedule jobs there [20:36:08] doesn't affect mw jobs in anyway [20:36:23] CheckUser is at least running the php53 job now [20:36:35] so ~20m from now maybe it will finish [20:37:38] the php53 job went from running around 4k unit tests to around 10k tests 10 days ago [20:38:06] perhaps this is yet another dirty workspace issue [20:38:06] And as ever, Jenkins/Zuul is completely incompetent about smartly scheduling jobs on hosts and systematically under using executors and blocked for mysterious reasons waiting for executors on a host with only 1/4 executors used. No reason for it to have been waiting [20:38:27] Yeah, no deadlock [20:38:37] I killed a few non-gate jobs for Precise [20:38:45] to re-order the queue [20:38:52] the php53 job is definitely a 1-per-node job, FWIW [20:39:12] Krinkle: you killed pending jobs or active ones to make room for the pending jobs? [20:39:26] mix of both [20:39:32] and yeah, there are an additional 2000 tests being run as of build 699 which is weird. [20:39:39] The php53 shouldn't be a 1-per-node [20:39:47] how do you kill a pending job. i wasn't able to for some reason [20:39:52] ^ ? [20:39:59] I killed that concept 1.5 years ago because it didn't scale and wasn't compatible with how Jenkins likes to schedule things [20:40:14] Only works if you have disposable pre-assigned nodes [20:40:39] to clarify, the job template '{name}-{ext-name}-testextension-{phpflavor}' is 1-per-node [20:40:45] marxarelli: Click [x] in the sidebar of the job page in the list of builds [20:40:51] It shouldn't be [20:40:56] We shoudl add more precise workers in that case [20:41:01] Krinkle: that's what i tried [20:41:03] It wasn't originally [20:41:14] I guess hashar changed that for some reason at some point. Don't know why. [20:42:10] Krinkle: in any case, thanks for unclogging CI :) [20:42:37] we should look into why the number of executed unit tests has escalated for that job [20:43:03] it hasn't (I think) [20:43:14] It's a generic job for running "the unit tests" of one or more extensions. [20:44:03] And its "mediawiki-extensions-{phpflavor}" not '{name}-{ext-name}-testextension-{phpflavor}' [20:44:06] the latter is deprecated [20:44:09] and mostly unused [20:44:10] right, but wouldn't the fact that we leave cloned repos around result in more tests being run? [20:44:21] or is the unit test runner smart enough to not include other repos? [20:44:45] I'm not sure I understand [20:44:52] " we leave cloned repos around result in more tests being run" [20:44:59] leave them around where [20:45:42] In the workspace itself? I sure hope not. [20:45:56] If that is the case, that should be immediately undone. That doesn't scale in anyway [20:47:20] Krinkle: sec, let me find the irc log but yeah, re undone and doesn't scale, i completely agree [20:47:31] Either the workspace is cleaned and we clone from git cache. Or it isn't cleaned and list of exts is provided at run time [20:47:35] I think we currently do the latter [20:47:48] we should make zuul-cloner use a cache directory to improve performance issues, not leave cloned repos around [20:47:50] We used to do the former on the prod slaves [20:48:08] but Antoine changed it as part of migration to labs because we still havent' deployed git caches on ci nodes... [20:48:20] That should work fine though [20:48:38] marxarelli: Yes, I wrote 90% of the code for that including an upstream patch with Antoine. It's just waiting to be activated now [20:48:43] Speak of the hashar. ;-) [20:48:52] o/ [20:48:55] Krinkle: oh, awesome :) [20:49:21] It requires scaling down ci nodes to 1 executor, creating more nodes (to match executor number), and then enablign the git cache script, and updating the zuul-cloner call in Jenkns to use the directory in question. [20:49:24] I don't know why that hasn't been done yet :) [20:49:28] Not a prio I guess. [20:49:40] re one-per-node: https://github.com/wikimedia/integration-config/commit/e0f978cfd32805df4d500a0dfb10c7c445e27076 [20:50:06] Krinkle: do we necessarily need single-executor nodes to start using it. i thought zuul-cloner had a --cache-dir option [20:50:21] unless there are locking issues with it [20:50:27] marxarelli: Yes, but that needs to be populated and kept update by something [20:50:33] Which cannot concurrently with jobs [20:50:34] i see [20:50:49] So you need single-executor nodes, and schedule the update as a job presumably [20:51:06] That's the closest you can get short of actually moving to disposable vms [20:51:20] I wrote it all up before I stopped working on ci infra last eyar [20:51:26] what would be wrong with a mutex-protected 2 step clone? [20:51:43] i.e. clone to a cache directory (mutex around this), then to the workspace [20:51:55] then cleanup the workspace repo clone during teardown [20:52:07] Workspace clean up is not an issue. That's 1 line in jjb yaml. [20:52:18] (and not at tear down but startup)_ [20:52:21] (or both rather) [20:52:36] sure, that's not the tricky part [20:52:38] Short answer: That's not how Zuul does or wants to work (mutex before clone). [20:52:58] could be an flock around zuul-cloner [20:52:59] And woudl rquire aditional infra just for it, infra we won't use in the disposable model. [20:53:16] I recommended this way so that we don't have to redo much in the Next iteration of the infra [20:53:37] true true. we're moving toward that anyhow [20:53:47] It's simple and known to work. And leaves us with single-exeec nodes which is better and many ways and simplifies so much [20:54:01] such as our cache handling which is still broken until recently. [20:54:17] Nothing is truely properly concurrent in our stack. We just approach as as much as we can. [20:54:23] Better to just abandon [20:54:32] https://phabricator.wikimedia.org/T97098 [20:54:51] https://phabricator.wikimedia.org/T96627 [20:55:02] https://phabricator.wikimedia.org/T86730 [21:00:36] Going to git-cache, improves perf a lot, allows enabling workspace clean, which fixes many bugs including Zuul lock and corruption. [21:22:48] marxarelli|lunch: so yeah what Timo said [21:23:34] marxarelli|lunch: basically migrate all jobs to Nodepool instances, have them use zuul-cloner and a local cache or maybe for single repo the Jenkins git plugin that would use the local repo as reference [21:23:43] hashar: If/How do git-cache (for current ci nodes), and vm isolation fit in to current quarter plan? [21:23:45] progress is slow overall :( [21:23:57] It's been a while now. [21:24:06] Krinkle: we keep juggling with priority to be honest [21:24:15] I know the feeling :) [21:24:23] so the whole priority of CI is really : dont die() :D [21:24:54] last week services team announced us they are migrating to nodejs 4.2, and apparently prod already migrated so I am catching up [21:25:17] gotta migrate all repos one by one making sure they pass fine and migrate the few crazy integration tests we have [21:32:00] [08:13:09] bd808: legoktm has been working on adding php55 support in CI and he probably enqueued a bunch of changes directly in Zuul <-- yup that. I haven't figured out a way to do it yet without the spam :( [21:33:05] [12:22:59] what's up with the rise in mediawiki-extensions-php53 runs? did we add that job to the project templates recently or something? <-- Wikibase was recently added to the shared job, and they have a lot of tests. [21:34:15] we need to kill off the php53 tests because the new runtime it gross [21:35:19] over 1.5h to merge this patch -- https://gerrit.wikimedia.org/r/#/c/269083/ [21:35:52] da fuck? [21:36:38] https://grafana.wikimedia.org/dashboard/db/releng-kpis [21:36:59] the longest job was 20 minutes [21:36:59] 2nd graph is max time spent in gate-and-submit for a mediawiki/core change [21:37:08] https://grafana.wikimedia.org/dashboard/db/releng-kpis?panelId=2&fullscreen [21:37:28] mediawiki-extensions-php53 SUCCESS in 20m 55s [21:37:28] yeah that one is bad [21:37:37] but crap, that doesn't look good :( [21:37:56] with the mediawiki-extensions-hhvm job taking 13m 17s [21:38:08] both jobs have core + a bunch of extensions [21:40:28] which ends up running the whole Scribunto ( a lot of minutes) and the whole Wikibase (moaaar minutes) [21:44:27] just to be clear, this will cause a lot of bad feelings if we leave it to be so slow for so long, we should revert whatever caused this to get us back to our awesome times :) [21:48:50] well [21:48:54] at first lets get a task [21:52:22] 10Continuous-Integration-Infrastructure: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2009830 (10hashar) 3NEW [21:52:38] bd808: I have cced you to the new https://phabricator.wikimedia.org/T126274 [21:53:24] * marxarelli grumbles something about unit tests that aren't unit tests [21:55:16] lol [21:55:56] marxarelli: 90+% of our tests are full stack integration tests [21:58:10] 10Continuous-Integration-Infrastructure: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2009859 (10hashar) Part of the slowness can be explained by the addition of Scribunto to the shared job. The change https://gerrit.wikimedia.... [21:58:24] marxarelli: yeah we have no such thing as unit tests [21:58:29] the whole mess needs an overhaul [21:58:43] bd808: our testing pattern is a pyramid balancing on a toothpick [21:59:51] oh maybe a street juggler standing in a wheelbarrow that's balanced on a toothpick [22:02:44] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Provision openjdk-8-jre-headless on Nodepool slaves - https://phabricator.wikimedia.org/T126246#2009901 (10hashar) The apt configuration is broken on Nodepool instances, most probably because of duplicate definitions: The disk image bui... [22:10:17] !log Deleting pmcache.integration.eqiad.wmflabs (was to investigate various kind of central caches). [22:10:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:11:58] PROBLEM - Host pmcache is DOWN: CRITICAL - Host Unreachable (10.68.22.133) [22:13:04] 10Deployment-Systems, 10Salt, 6operations, 5Patch-For-Review: [Trebuchet] Salt times out on parsoid restarts - https://phabricator.wikimedia.org/T63882#2009937 (10ArielGlenn) hm, testing failed for want of an argument to the git deploy restart code (trigger). and service-restart no longer gets deployed.... [22:13:25] !log Deleted cache-rsync instance superseded by castor instance [22:13:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:14:32] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling: Provision openjdk-8-jre-headless on Nodepool slaves - https://phabricator.wikimedia.org/T126246#2009939 (10mobrovac) AFAIK, duplicates are considered only as warnings and do not influence the update/upgrade path. [22:15:23] 10Continuous-Integration-Infrastructure: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2009950 (10hashar) Pooled four more Precise instances with 2 CPU, will have two executors and that will let Jenkins to spread some jobs from... [22:15:57] !log Provisioning integration-slave-precise-{1001-1004} https://phabricator.wikimedia.org/T126274 (need more php53 slots) [22:16:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:16:24] PROBLEM - Host cache-rsync is DOWN: CRITICAL - Host Unreachable (10.68.23.165) [22:25:10] !log integration-slave-precise-{1001-1004} applied role::ci::slave::labs, running puppet in slaves. I have added the instances as Jenkins slaves and put them offline. Whenever puppet is done, we can mark them online in Jenkins then monitor the jobs running on them are working properly [22:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:25:38] that will give us four more precise slots [22:26:19] 10Continuous-Integration-Infrastructure: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2009991 (10hashar) I have applied role::ci::slave::labs and puppet is running on all four instances. I have added the instances as Jenkins s... [22:26:38] the puppet provisioning is going to take a couple hours though [22:29:17] greg-g: marxarelli: I think we should just remove Scribunto from the shared job [22:29:25] that seems to be the main cause of slowness / timeout failures [22:30:17] kk [22:32:13] (03CR) 10Paladox: Add php55 zuul template (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/269188 (owner: 10Paladox) [22:38:44] (03PS1) 10Tim Starling: Fix screenShotDelay interpretation [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/269316 [22:39:45] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [22:40:21] (03CR) 10Subramanya Sastry: [C: 032 V: 032] Fix screenShotDelay interpretation [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/269316 (owner: 10Tim Starling) [22:45:58] !log Pooled https://integration.wikimedia.org/ci/computer/integration-slave-precise-1003/ [22:46:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:47:21] !log Err need to reboot newly provisioned instances before adding them to Jenkins (kernel upgrade,apache restart etc) [22:47:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:50:02] (03PS1) 10Subramanya Sastry: Set postJSON: true in the testreduce client config [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/269326 [22:50:10] hashar: what is WIP about https://gerrit.wikimedia.org/r/#/c/269109/ ? I tested it and the script seems to work fine... [22:50:33] (03CR) 10Subramanya Sastry: [C: 032 V: 032] Set postJSON: true in the testreduce client config [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/269326 (owner: 10Subramanya Sastry) [22:50:34] hashar: There seems to be a big queue at https://integration.wikimedia.org/zuul/ in gate and submit. Also postmerge also isent loading. [22:50:56] legoktm: been around for a few weeks on my laptop, I have just git add && git-review following your comment last night :-} [22:51:08] paladox: yeah it is broken [22:51:08] ok :) [22:51:13] !sal [22:51:13] https://tools.wmflabs.org/sal/releng [22:51:38] paladox: the huge queue is https://phabricator.wikimedia.org/T126274 [22:51:56] and one of the issue is that Scribunto got added to the shared job. Those tests are notably long [22:52:08] PROBLEM - Puppet failure on integration-slave-precise-1003 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [22:52:13] I am adding four more instances with 2 cpu / 2 executors eaches [22:53:00] and maybe labs infra is a bit overloaded [22:53:03] hashar: Oh ok. Should we revert the patch until we make scribunto faster to be tested. [22:53:24] yup reverting is probably a good idea [22:53:30] hashar: Ok. [22:53:44] the full stack integration test for each change is not great :D [22:53:59] i.e. if one change Flow, there is most like no need to run the Scirbunto LUA tests [22:54:17] we are going to need to heavily lift how tests are split [22:54:19] (03PS1) 10Paladox: Revert "[Scribunto] Add template extension-gate to Scribunto" [integration/config] - 10https://gerrit.wikimedia.org/r/269327 (https://phabricator.wikimedia.org/T126274) [22:54:31] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata, 5Patch-For-Review: Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#2010034 (10Paladox) 5Resolved>3Open [22:55:03] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata, 5Patch-For-Review: Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#1973302 (10Paladox) Re opening since this extension will need some performance improvements such as testing since it is causing... [22:55:07] hashar: https://gerrit.wikimedia.org/r/#/c/269327/ [22:55:15] (03PS2) 10Paladox: Revert "[Scribunto] Add template extension-gate to Scribunto" [integration/config] - 10https://gerrit.wikimedia.org/r/269327 (https://phabricator.wikimedia.org/T126274) [22:55:19] (03PS2) 10Legoktm: Add bin/php wrapper entry point [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269109 (https://phabricator.wikimedia.org/T126211) (owner: 10Hashar) [22:56:28] paladox: yeah :-} [22:56:38] (03PS3) 10Legoktm: Add bin/php wrapper entry point [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269109 (https://phabricator.wikimedia.org/T126211) (owner: 10Hashar) [22:56:57] legoktm: feel free to make yourself the commit author [22:56:58] hashar: Yep. I doint know why scribunto slows down everything. [22:57:08] it is probably not the only reason [22:57:10] RECOVERY - Puppet failure on integration-slave-precise-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [22:57:17] (03CR) 10Legoktm: [C: 032] "PS3: Use $PHP_BIN variable since that's what we're already using" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269109 (https://phabricator.wikimedia.org/T126211) (owner: 10Hashar) [22:57:20] oh no, I barely changed anything :) [22:57:38] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Make /usr/bin/php a wrapper that picks the right PHP version on CI slaves - https://phabricator.wikimedia.org/T126211#2010052 (10Legoktm) a:3Legoktm [22:57:55] PROBLEM - Puppet failure on integration-slave-precise-1001 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [22:58:11] hashar: Oh. It seems to be slow on the extension-unittests. [22:58:20] legoktm: so ideally that script should be in puppet and we can then have puppet set it as the alternative for 'php' [22:58:31] legoktm: but at least in integration/jenkins you can give it a try / build around it hehe$ [22:59:03] mhm, I'm looking into the puppet changes we need to make now :) [22:59:08] legoktm: ah in puppet have a look at alternatives::select :D [22:59:40] you can have it pin php to the slave script and add a dependency to make sure integration/jenkins is cloned [23:00:00] Notice: Finished catalog run in 1281.19 seconds [23:00:00] ! [23:00:54] https://integration.wikimedia.org/zuul/ [23:00:56] legoktm: will you be able to remove Scribunto from the shared job ? Paladox kindly provided the revert patch https://gerrit.wikimedia.org/r/#/c/269327/ :D [23:01:00] this is unusable [23:01:27] bd808: Scribunto is being removed from the shared job [23:01:35] sorry to be the whiner today but I'm trying to actually deploy config changes and jenkins/zuul are in the way pretty seriously [23:01:47] bd808: and i am half asleep provisioning more Precise nodes to run the php53 jobs [23:01:55] I think a task needs to be created so that some how we can improve the performance of scribunto before bringing it back into the shared job. [23:02:21] (03PS3) 10Paladox: Revert "[Scribunto] Add template extension-gate to Scribunto" [integration/config] - 10https://gerrit.wikimedia.org/r/269327 (https://phabricator.wikimedia.org/T126274) [23:02:59] paladox: Scribunto tests should only be run for ... Scribunto tests [23:03:09] PROBLEM - Puppet failure on integration-slave-precise-1003 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [23:03:11] Yes. [23:03:11] bd808: just force merge? [23:03:14] hashar: thanks for working on it [23:03:20] the scribunto tests are currently running on a Math change https://gerrit.wikimedia.org/r/269166 [23:03:22] hashar: yep, looking [23:03:30] because Scribunto is somehow pulled as a dependency of Math [23:03:59] legoktm: if one force merge, that cancel the whole queue [23:04:08] and retrigger everything [23:04:12] (03PS4) 10Legoktm: Revert "[Scribunto] Add template extension-gate to Scribunto" [integration/config] - 10https://gerrit.wikimedia.org/r/269327 (https://phabricator.wikimedia.org/T126274) (owner: 10Paladox) [23:04:34] yeah, but prod being happy is typically more important than CI being happy. [23:04:40] RECOVERY - Puppet failure on integration-slave-precise-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [23:04:56] Anyways, this is a bit of an oddity [23:05:10] er, wrong button [23:05:17] aoeroaeruaoerouaerouaze [23:05:28] Math depends on VE and Wikidata [23:06:23] (03CR) 10Legoktm: [C: 032] Revert "[Scribunto] Add template extension-gate to Scribunto" [integration/config] - 10https://gerrit.wikimedia.org/r/269327 (https://phabricator.wikimedia.org/T126274) (owner: 10Paladox) [23:06:31] err [23:06:38] legoktm: What's up with the php5.3 tests being run for everything now? For Flow they appear to be so slow that they just stopped something from merging by hitting the 30-minute timeout [23:06:41] integration/config is still in the mediawiki queue... [23:06:51] RoanKattouw: we're working on it :( [23:08:29] PROBLEM - Puppet failure on integration-slave-precise-1004 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [23:08:55] legoktm: well you can push the jobs [23:08:57] RoanKattouw: https://phabricator.wikimedia.org/T126274 [23:08:59] legoktm: using jjb [23:09:02] hashar: yeah, already did that [23:09:16] legoktm: the zuul part is just to stop triggering the shared job from Scribunto changes so that can wait [23:09:37] PROBLEM - Puppet failure on integration-slave-precise-1002 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:09:46] (03CR) 10Paladox: "You will probably want to force merge this or it will be a while." [integration/config] - 10https://gerrit.wikimedia.org/r/269327 (https://phabricator.wikimedia.org/T126274) (owner: 10Paladox) [23:09:50] I'm aware of the super slow zuul queue, but this seems to be a separate problem (the job itself took 30 mins and timed out), wanted to make sure that was known too [23:10:08] !log pooling integration-slave-precise-1001 1002 1004 [23:10:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:10:42] RoanKattouw: its because we introduced Scribunto into gate and submit. We are reverting the patch so that it is not in the shared test any more. [23:11:25] Ooh, every extension tests with Scribunto now? Yeah that would take a while [23:11:29] yeah [23:11:47] that has been reverted by paladox/legoktm [23:11:48] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2010097 (10Catrope) Not sure if this is related: https://gerrit.wikimedia.org/r/#/c/269171/ "only" took 43 minutes to go... [23:11:52] but the job running still have it :( [23:12:29] hashar/legoktm we may have to force merge the patch or we will probaly be waiting more then an hour for the patch to merge. [23:12:38] bd808: can you list me change you need in ? [23:12:39] paladox: I already deployed the jjb part of it [23:12:50] the zuul thing can happen later [23:12:50] legoktm: Ok thanks. [23:12:50] we can prioritize them in zuul though I dont think I have tried before [23:12:53] hashar: this one is waiting now -- https://gerrit.wikimedia.org/r/#/c/269065/ [23:12:57] RECOVERY - Puppet failure on integration-slave-precise-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [23:13:11] RECOVERY - Puppet failure on integration-slave-precise-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [23:13:13] Should I re-+2 my change now, or wait? [23:13:24] wait a bit [23:13:47] OK [23:14:00] !log zuul promote --pipeline gate-and-submit --changes 269065,2 https://gerrit.wikimedia.org/r/#/c/269065/ [23:14:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:14:10] legoktm: ^^^^ magic command for gallium to bump a change on top of queue [23:14:17] OOOH [23:14:23] in theory [23:14:38] well, mediawiki-config is already in its own queue [23:14:39] RECOVERY - Puppet failure on integration-slave-precise-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [23:14:40] that recompute the WHOLE pipeline (i.e. cancel everything) [23:15:00] guess I will need to write some postmortem [23:15:08] and list a bunch of actions [23:15:24] errr so that means all the mw core jobs will restart from the beginning? :S [23:15:45] Yeah, that makes sense cause you'd change the dependency chain, right? [23:16:15] right...but that change was in a totally separate pipeline [23:16:20] er, queue [23:16:21] That one was. [23:16:43] Also it merged 2 mins ago [23:16:50] 268460,3 is going to cause the whole chain to re-start, right? [23:16:53] ('cos it failed php53lint) [23:17:05] Probably yes [23:17:09] * James_F sighs. [23:17:25] Also that repo shouldn't have php53lint probably? Since its master doesn't pass [23:17:30] (The failure is in a file that wasn't touched in that change, IIRC) [23:17:42] FR repos are often a bit… odd. [23:17:53] hahaha no look at this [23:17:55] 23:15:07 PHP Parse error: syntax error, unexpected T_STRING in vendor/psr/log/Psr/Log/LoggerAwareTrait.php on line 8 [23:18:35] :-/ [23:18:38] because php53 doesn't know about T_TRAIT yet :P [23:18:44] It probably shouldn't be linting vendor/ , but it also makes no sense to php53lint it if it pulls in a library that uses traits [23:18:52] (Also doesn't MW core use Psr for logging?) [23:18:56] how do we get Trait? We still have a bunch of wikimedia servers running Zend 5.3 [23:19:07] RoanKattouw: Not until php55 lands? [23:19:07] hashar: No we don't. [23:19:17] Do we? I thought the last one was reimaged recently [23:19:41] Yeah, tin was the last production 5.3 server, I thought? [23:19:53] at least "Complete the use of HHVM over Zend PHP on the Wikimedia cluster (tracking)" https://phabricator.wikimedia.org/T86081 still has 3 blockers [23:20:05] hashar: HHVM !== 5.6 [23:20:15] wikitech (silver) | snapshot hosts | HAT appservers [23:20:40] RoanKattouw: we had to delete that file from mw/vendor [23:20:49] What's HAT? [23:21:01] HHVM Apache Trusty [23:21:04] ho if only I knew even 5% of what we are running ... :D [23:21:05] HHVM Apache ?Tomcat. [23:21:13] aha [23:21:27] Because the HHVM migration was also a migration from precise -> trusty and apache 2.2 -> 2.4 [23:21:58] Because why do it simply and swiftly in little steps when we can get stuck on a major migration for three years instead? [23:22:54] hashar: It seems that postmerge queue is not moving. But it is seperate from gate and submit. [23:23:22] RECOVERY - Puppet failure on integration-slave-precise-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [23:23:35] paladox: That's probably just Beta being stuck. [23:23:35] paladox: Don't worry. [23:23:52] James_F: Ok. Yes probaly. [23:24:08] James_F: Thanks for replying.7 [23:24:09] yeah. beta is locked up. I've tried to shake it loose but to no avail [23:24:20] bd808: Ok thanks. [23:24:38] James_F: posted on the tracking task https://phabricator.wikimedia.org/T86081#2010135 [23:24:56] #8​91​08 (pending—Waiting for next available executor on deployment-bastion.eqiad) [23:25:08] bd808: yeah it is a pity, to unblock the beta job we get to disconnect the Jenkins gearman client which as a side effect cancel running jobs [23:25:43] Let's… not. [23:25:44] And the other two beta jobs are blocked on their upstream job being pending [23:25:50] lol yeah let's not [23:26:13] we should setup a different Jenkins just for beta :D [23:26:32] https://gerrit.wikimedia.org/r/#/c/261617 had better be worth it. [23:27:29] hashar: Yes. Would that stop the side effect of it cancelling the job. [23:28:08] paladox: probably [23:28:31] hashar: Ok thanks. How would we do that setup a jenkins job for beta. [23:30:07] paladox: I think hashar meant an entirely different installation of Jenkins just for processing Beta Cluster jobs. [23:30:21] (And not being serious.) [23:30:25] hashar, legoktm: can you bump https://gerrit.wikimedia.org/r/#/c/269330/ to the top of the queue for ori? (prod backport) [23:30:29] James_F: Oh ok. [23:30:43] "23:26 <+ hashar> we should setup a different Jenkins just for beta :D" probably wise [23:30:43] oh for god sake [23:30:44] bd808: ori already force merged it [23:31:13] nevermind me then [23:32:36] Force-merge in MW-core will make Jenkins /so/ happy. [23:34:27] ho beta jobs are running again [23:38:26] Project beta-scap-eqiad build #89109: 04FAILURE in 21 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/89109/ [23:40:23] At least it ran. [23:40:39] rerunning it [23:47:31] (03Merged) 10jenkins-bot: Add bin/php wrapper entry point [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269109 (https://phabricator.wikimedia.org/T126211) (owner: 10Hashar) [23:48:24] (03Merged) 10jenkins-bot: Revert "[Scribunto] Add template extension-gate to Scribunto" [integration/config] - 10https://gerrit.wikimedia.org/r/269327 (https://phabricator.wikimedia.org/T126274) (owner: 10Paladox) [23:48:32] !log finally deploying https://gerrit.wikimedia.org/r/269327 [23:48:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:49:57] paladox: ^ [23:50:18] legoktm: Thanks. [23:53:57] thank you -:) [23:54:59] (03PS1) 10Subramanya Sastry: Hide the read-more block in the default mediawiki rendering [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/269335 [23:57:34] Is zuul working ok now? I'm trying to merge https://gerrit.wikimedia.org/r/#/c/269211/ but it's being ignored for some reason [23:58:24] SMalyshev: more or less. it is busy catching up with all the mess :-( [23:58:35] SMalyshev: that one will land eventually [23:59:02] hashar: Is there any way for zuul to merge patches regardless if there is a patch infront being tested. [23:59:17] hashar: Maybe we should put a comment of "DO NOT ADD EXTENSIONS TO THIS WITHOUT CHECKING FIRST" to extensions-gate? [23:59:19] hashar: do I need some special config for new repos? Because this change is sitting there for 3 hrs already and ones after it in other repos got merged [23:59:36] so I wonder maybe it's not configured properly [23:59:43] James_F: it is more like: we wrote too many tests as complete newbies. Time to refactor the whole crap