[00:04:10] beer sounds nice [00:05:14] 10Continuous-Integration-Config: Timeouts in mediawiki-extensions-php53 - https://phabricator.wikimedia.org/T126406#2013605 (10Mattflaschen) 3NEW [00:05:33] 10Continuous-Integration-Config: Timeouts in mediawiki-extensions-php53 - https://phabricator.wikimedia.org/T126406#2013612 (10Mattflaschen) [00:06:02] bd808: thanks for updating the image for #scap :) [00:11:11] Ahm, so I'm doing SWAT and running sync-file [00:11:16] And I get this: [00:11:16] No syntax errors detected in /srv/mediawiki-staging/wmf-config/InitialiseSettings.php [00:11:17] [Wed Feb 10 00:09:26 2016] [hphp] [28614:7f2089c32d00:0:000001] [] Lost parent, LightProcess exiting [00:11:17] [Wed Feb 10 00:09:26 2016] [hphp] [28613:7f2089c32d00:0:000001] [] Lost parent, LightProcess exiting [00:11:19] [Wed Feb 10 00:09:26 2016] [hphp] [28611:7f2089c32d00:0:000001] [] Lost parent, LightProcess exiting [00:11:20] [Wed Feb 10 00:09:26 2016] [hphp] [28615:7f2089c32d00:0:000001] [] Lost parent, LightProcess exiting [00:11:22] [Wed Feb 10 00:09:26 2016] [hphp] [28612:7f2089c32d00:0:000001] [] Lost parent, LightProcess exiting [00:11:24] 00:09:26 Started sync-masters [00:11:34] normal [00:12:10] Why does it crash hhvm five times? That didn't happen last time I deployed (TBF I have been away for 2 weeks) [00:13:46] did you last deploy from a server running php 5.3? [00:14:21] RoanKattouw: it's most likely the lint checks in scap [00:14:44] i was able to avoid those last time i deployed by forcing it to run `php5 -l` instead [00:14:49] (with a local hack) [00:15:00] no idea why hhvm is crapping out though [00:15:09] RoanKattouw: it's a known problem with the HHVM runtime. Not scary, just ignore it [00:15:28] OK [00:15:45] https://phabricator.wikimedia.org/T124956 [00:22:58] greg-g: that was from mw1019, not from tin/mira [00:23:09] Legoktm: could you review https://gerrit.wikimedia.org/r/#/c/267548/ please. It is about updating composer to alpha11. The patch is much different to the one you uploaded a few months ago. And it should not repeat the error that happend when we upgraded. The reasons are in commit msg. [00:25:41] 6Release-Engineering-Team, 10scap: refreshCdbJsonFiles in scap fails on mira due to missing dba_open function in hhvm - https://phabricator.wikimedia.org/T125477#2013660 (10greg) [00:26:05] marxarelli: right right [00:34:18] (03CR) 10Legoktm: "Thanks paladox. I will look into this after the php5.5 migration." [integration/composer] - 10https://gerrit.wikimedia.org/r/267548 (https://phabricator.wikimedia.org/T125343) (owner: 10Paladox) [00:49:42] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: [Regression wmf.13] Math tags not rendering on beta labs - https://phabricator.wikimedia.org/T126371#2013707 (10Esanders) [00:54:23] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Math tags not rendering on Beta Cluster due to Mathoid not working(?) - https://phabricator.wikimedia.org/T126371#2013730 (10Jdforrester-WMF) p:5Triage>3High [00:59:11] (03PS1) 10Legoktm: Add php55 phpunit jobs for MediaWiki core and gate extensions [integration/config] - 10https://gerrit.wikimedia.org/r/269576 [01:01:21] (03CR) 10Legoktm: [C: 032] Add php55 phpunit jobs for MediaWiki core and gate extensions [integration/config] - 10https://gerrit.wikimedia.org/r/269576 (owner: 10Legoktm) [01:02:54] (03Merged) 10jenkins-bot: Add php55 phpunit jobs for MediaWiki core and gate extensions [integration/config] - 10https://gerrit.wikimedia.org/r/269576 (owner: 10Legoktm) [01:03:03] !log deploying https://gerrit.wikimedia.org/r/269576 [01:03:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:06:05] legoktm: Still need to switch composer-php53 for composer-php55 right? [01:06:20] (Awesome.) [01:07:12] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Math tags not rendering on Beta Cluster due to Mathoid not working(?) - https://phabricator.wikimedia.org/T126371#2013758 (10mobrovac) I confirmed Mathoid is running on `deployment-mathoid.deployment-prep.eqiad.wmflabs` (IP: `10.68.20.104`): ``` $ netstat -n... [01:08:13] James_F: yeah, and then remove the php53 jobs [01:08:23] * James_F nods. [01:08:25] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid, 10VisualEditor: Math tags not rendering on Beta Cluster due to Mathoid not working(?) - https://phabricator.wikimedia.org/T126371#2013763 (10mobrovac) [01:12:05] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Math tags not rendering on Beta Cluster due to Mathoid not working(?) - https://phabricator.wikimedia.org/T126371#2013768 (10mobrovac) [01:14:20] Hmm. Why would mediawiki-extensions-php55 take longer than mediawiki-extensions-php53? [01:14:48] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid: Math tags not rendering on Beta Cluster due to Mathoid not working(?) - https://phabricator.wikimedia.org/T126371#2013782 (10mobrovac) While trying to save a page with the `` tag, I got: ``` Exception encountered, of type "UnexpectedValueException" [fa... [01:15:40] 10Beta-Cluster-Infrastructure, 10Math, 10Mathoid, 10MediaWiki-Cache: Math tags not rendering on Beta Cluster due to Mathoid not working(?) - https://phabricator.wikimedia.org/T126371#2013784 (10mobrovac) [01:21:23] 3Scap3, 6Phabricator, 5Patch-For-Review, 7WorkType-Maintenance: Refactor phabricator module in puppet to remove git tag pinning behavior - https://phabricator.wikimedia.org/T125851#2013788 (10mmodell) [01:23:38] 3Scap3, 10scap: Scap should touch symlinks when originals are touched - https://phabricator.wikimedia.org/T126306#2013798 (10greg) [01:23:40] 3Scap3, 10scap: Parameterize global /etc/scap.cfg in ops/puppet - https://phabricator.wikimedia.org/T126259#2013799 (10greg) [01:23:42] 6Release-Engineering-Team, 3Scap3, 10scap: updateWikiversions: Don't assume that all versions being operated on +/- of each other - https://phabricator.wikimedia.org/T125702#2013800 (10greg) [01:23:44] 6Release-Engineering-Team, 3Scap3, 6operations, 10scap: Depool proxies temporarily while scap is ongoing to avoid taxing those nodes - https://phabricator.wikimedia.org/T125629#2013801 (10greg) [01:39:27] 3Scap3, 6Phabricator, 7WorkType-Maintenance: Move /srv/phab/repos to /srv/repos - https://phabricator.wikimedia.org/T125853#2013839 (10mmodell) [01:45:32] why oh why is MobileFrontend special cased >.> [01:46:54] (03PS1) 10Legoktm: Add php55 and hhvm tests for all extensions that have php53 tests [integration/config] - 10https://gerrit.wikimedia.org/r/269582 [01:47:06] Is it? [01:47:43] Oh, I see. [01:48:00] Is that just tech debt at this point? [01:48:48] I think we can convert it to use a generic template [01:48:55] but I don't want to mess with that now [01:49:19] Sure. [01:49:43] 10Continuous-Integration-Config: MobileFrontend zuul config should use one of the normal templates - https://phabricator.wikimedia.org/T126412#2013849 (10Legoktm) 3NEW [01:49:54] So the idea is to run php55 and hhvm for all things, rather than just one of them? [01:50:41] (Presumably we had a reason before to comment out the hhvm jobs?) [01:51:02] Because it was probably going to break things [01:51:12] But I assume php55 will break things at the same scale [01:51:15] Ha. [01:51:23] Fair. [01:51:34] *And* I figured out a way to figure out which ones are broken [02:20:27] (03CR) 10Legoktm: [C: 032] Add php55 and hhvm tests for all extensions that have php53 tests [integration/config] - 10https://gerrit.wikimedia.org/r/269582 (owner: 10Legoktm) [02:21:23] (03Merged) 10jenkins-bot: Add php55 and hhvm tests for all extensions that have php53 tests [integration/config] - 10https://gerrit.wikimedia.org/r/269582 (owner: 10Legoktm) [02:21:30] !log deploying https://gerrit.wikimedia.org/r/269582 [02:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [02:23:57] Progress. [02:26:44] !log queuing mwext jobs server-side to identify failing ones [02:26:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [02:29:32] 10Continuous-Integration-Config: Make AWS extension tests voting because they pass - https://phabricator.wikimedia.org/T126414#2013893 (10Legoktm) 3NEW [02:31:42] legoktm: AjaxLogin, Annotator, maybe some others. [02:33:29] :( I have a script that'll generate a list of the ones that failed [02:33:54] Oh, OK. [02:34:10] * legoktm queues up the next set [02:34:27] So far the only one we care about that I've seen failing is ArticlePlaceholder. (Failed as hhvm, passed as php55 and php53; it's due to go into prod very soon.) [02:35:04] o.O [02:35:35] Yeah. [02:35:38] I was surprised too. [02:36:40] 02:31:46 Fatal error: Uncaught exception 'InvalidArgumentException' with message 'The value for 'SkipSkins' should be an array' in /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm-composer/src/includes/registration/ExtensionProcessor.php:367 [02:36:42] oh ffs [02:36:45] that's a different issue [02:37:56] !log deleting /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm-composer on trusty-1017, it had a skin cloned into it [02:37:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [02:44:13] https://www.mediawiki.org/wiki/User:Legoktm/PHP_5.5/Extensions [02:45:39] hmm, the trusty slaves are fully loaded because of the double php55 and hhvm [02:45:55] Yeah. [02:46:09] This is why we don't want to load all three on every patch. :-) [02:50:58] there's some hilarious irony in how the php53 jobs are running faster than the php55 or hhvm ones [02:51:35] Ha. [02:51:39] Not normally, AFAICT. [02:51:51] mediawiki-extensions-php53 SUCCESS in 11m 24s [02:51:51] mediawiki-extensions-php55 SUCCESS in 6m 44s [02:51:58] From an earlier run on a VE patch. [02:52:59] those are jobs where the run time is larger than the zuul/jenkins overhead :P [02:53:13] "Real" jobs. [02:56:29] PROBLEM - Free space - all mounts on integration-slave-precise-1004 is CRITICAL: CRITICAL: integration.integration-slave-precise-1004.diskspace._mnt.byte_percentfree (<37.50%) [02:56:48] Oops. Space exhaustion? [02:57:02] (Same issue as when we ran CI for the i18n changes.) [02:58:05] hmm [02:58:23] 14G mediawiki-core-php53lint [02:58:44] legoktm@integration-slave-precise-1004:/mnt/jenkins-workspace/workspace/mediawiki-core-php53lint$ du -hs .git [02:58:44] 14G .git [02:59:04] bonus points: the git repo is busted [02:59:45] !log deleting 14GB broken workspace of mediawiki-core-php53lint from integration-slave-precise-1004 [02:59:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [03:01:53] * James_F grns. [03:07:16] PROBLEM - Free space - all mounts on integration-slave-trusty-1016 is CRITICAL: CRITICAL: integration.integration-slave-trusty-1016.diskspace._mnt.byte_percentfree (<25.00%) [03:07:46] * legoktm looks [03:09:46] legoktm: Did you already do the 53->55 switchover and I didn't notice? [03:09:51] No [03:10:20] https://gerrit.wikimedia.org/r/#/c/269592/ only ran the -55 and -hhvm tests [03:10:32] Or just the -55 ones. [03:10:44] because php53 only runs in gate [03:11:06] Im looking into CodeReview failure [03:11:08] "check php53" will run them [03:11:23] Ah, right. [03:11:27] Never mind me. :-) [03:11:28] RECOVERY - Free space - all mounts on integration-slave-precise-1004 is OK: OK: All targets OK [03:14:05] * legoktm goes off for dinner [03:18:09] * James_F too. [03:22:19] RECOVERY - Free space - all mounts on integration-slave-trusty-1016 is OK: OK: All targets OK [04:03:14] * legoktm back [04:18:42] Math is failing under hhvm [04:21:08] PROBLEM - Free space - all mounts on integration-slave-trusty-1011 is CRITICAL: CRITICAL: integration.integration-slave-trusty-1011.diskspace._mnt.byte_percentfree (<50.00%) [04:26:38] ^ yeah, I'm deleting stuff [04:31:06] PROBLEM - Free space - all mounts on integration-slave-trusty-1013 is CRITICAL: CRITICAL: integration.integration-slave-trusty-1013.diskspace._mnt.byte_percentfree (<88.89%) [04:48:20] gah [04:48:25] a bunch failed due to a full disk [04:50:10] PROBLEM - Free space - all mounts on integration-slave-trusty-1017 is CRITICAL: CRITICAL: integration.integration-slave-trusty-1017.diskspace._mnt.byte_percentfree (<28.57%) [04:51:12] RECOVERY - Free space - all mounts on integration-slave-trusty-1013 is OK: OK: All targets OK [04:58:17] legoktm: tests not cleaning up on failure or just too many new tests? [04:58:25] lots of new ones [04:58:36] I've run like 400+ extensions now? [04:58:53] and concurrently, so they duplicate workspaces [05:10:09] RECOVERY - Free space - all mounts on integration-slave-trusty-1017 is OK: OK: All targets OK [06:03:49] https://www.mediawiki.org/wiki/User:Legoktm/PHP_5.5/Extensions [06:03:51] not too bad! [06:06:35] 10Continuous-Integration-Config, 10Math: Math test fail fpr php55 - https://phabricator.wikimedia.org/T126422#2014058 (10Physikerwelt) 3NEW [06:23:36] (03PS1) 10Legoktm: Disable php53 jobs on MW master + extensions [integration/config] - 10https://gerrit.wikimedia.org/r/269605 [06:31:56] (03PS2) 10Legoktm: Disable php53 jobs on MW master + extensions [integration/config] - 10https://gerrit.wikimedia.org/r/269605 [06:33:45] (03CR) 10Legoktm: [C: 032] Disable php53 jobs on MW master + extensions [integration/config] - 10https://gerrit.wikimedia.org/r/269605 (owner: 10Legoktm) [06:34:43] (03Merged) 10jenkins-bot: Disable php53 jobs on MW master + extensions [integration/config] - 10https://gerrit.wikimedia.org/r/269605 (owner: 10Legoktm) [06:34:52] what's the worst that could happen? [06:34:58] !log deploying https://gerrit.wikimedia.org/r/269605 [06:35:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [06:37:56] Well...that didn't actually work. [06:38:34] oh whoops [06:39:41] (03PS1) 10Legoktm: Flip skip-if regex for php53 jobs, whoops [integration/config] - 10https://gerrit.wikimedia.org/r/269607 [06:39:56] (03CR) 10Legoktm: [C: 032] Flip skip-if regex for php53 jobs, whoops [integration/config] - 10https://gerrit.wikimedia.org/r/269607 (owner: 10Legoktm) [06:41:36] (03Merged) 10jenkins-bot: Flip skip-if regex for php53 jobs, whoops [integration/config] - 10https://gerrit.wikimedia.org/r/269607 (owner: 10Legoktm) [06:41:44] !log deploying https://gerrit.wikimedia.org/r/269607 [06:41:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [06:49:57] zuul is laaaaging [06:58:29] (03PS1) 10Legoktm: Add Generic.Arrays.DisallowLongArraySyntax to ruleset, autofix this repo [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/269612 [07:04:10] (03PS1) 10Legoktm: Drop php- prefix for composer validate jobs, pin to trusty [integration/config] - 10https://gerrit.wikimedia.org/r/269613 [07:15:48] Project beta-scap-eqiad build #89271: 04FAILURE in 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/89271/ [07:19:36] (03CR) 10Nikerabbit: Add Generic.Arrays.DisallowLongArraySyntax to ruleset, autofix this repo (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/269612 (owner: 10Legoktm) [07:20:01] Project beta-update-databases-eqiad build #6370: 04FAILURE in 0.98 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/6370/ [07:57:27] (03PS1) 10Legoktm: Run composer-(php55|hhvm) too! [integration/config] - 10https://gerrit.wikimedia.org/r/269618 [08:00:02] (03CR) 10Legoktm: [C: 032] Drop php- prefix for composer validate jobs, pin to trusty [integration/config] - 10https://gerrit.wikimedia.org/r/269613 (owner: 10Legoktm) [08:01:17] (03Merged) 10jenkins-bot: Drop php- prefix for composer validate jobs, pin to trusty [integration/config] - 10https://gerrit.wikimedia.org/r/269613 (owner: 10Legoktm) [08:02:00] (03PS1) 10Adrian Lang: Make new jobs non-voting for WikibaseJavaScriptApi [integration/config] - 10https://gerrit.wikimedia.org/r/269619 [08:02:04] (03CR) 10Legoktm: [C: 032] Run composer-(php55|hhvm) too! [integration/config] - 10https://gerrit.wikimedia.org/r/269618 (owner: 10Legoktm) [08:02:47] (03PS2) 10Adrian Lang: Make new jobs non-voting for WikibaseJavaScriptApi [integration/config] - 10https://gerrit.wikimedia.org/r/269619 [08:03:08] (03Merged) 10jenkins-bot: Run composer-(php55|hhvm) too! [integration/config] - 10https://gerrit.wikimedia.org/r/269618 (owner: 10Legoktm) [08:03:38] !log deploying https://gerrit.wikimedia.org/r/269613 and https://gerrit.wikimedia.org/r/269618 [08:03:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:05:09] (03PS3) 10Legoktm: Make new jobs non-voting for WikibaseJavaScriptApi [integration/config] - 10https://gerrit.wikimedia.org/r/269619 (owner: 10Adrian Lang) [08:05:15] (03CR) 10Legoktm: [C: 032] Make new jobs non-voting for WikibaseJavaScriptApi [integration/config] - 10https://gerrit.wikimedia.org/r/269619 (owner: 10Adrian Lang) [08:06:15] (03Merged) 10jenkins-bot: Make new jobs non-voting for WikibaseJavaScriptApi [integration/config] - 10https://gerrit.wikimedia.org/r/269619 (owner: 10Adrian Lang) [08:07:07] !log deploying https://gerrit.wikimedia.org/r/269619 [08:07:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:08:57] 10Continuous-Integration-Config: Timeouts in mediawiki-extensions-php53 - https://phabricator.wikimedia.org/T126406#2014103 (10Legoktm) https://integration.wikimedia.org/ci/job/mediawiki-extensions-php55/buildTimeTrend averages ~10 minutes. The php53 variant which is only run for older branches is even slower. [08:11:04] 10Continuous-Integration-Infrastructure: Consider increasing number of trusty CI slaves - https://phabricator.wikimedia.org/T126423#2014105 (10Legoktm) 3NEW [08:13:14] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Adjust CI for PHP5.5 support - https://phabricator.wikimedia.org/T119675#2014114 (10Legoktm) 5Open>3Resolved a:3Legoktm This happened. There were 20+ changes involved, I'm not going to bother to link them all. [08:15:22] 10Continuous-Integration-Infrastructure: Zuul seems to be running slower - https://phabricator.wikimedia.org/T118083#2014129 (10Legoktm) [08:15:35] 10Continuous-Integration-Infrastructure: Zuul seems to be running slower - https://phabricator.wikimedia.org/T118083#1791243 (10Legoktm) I have noticed this as well, however I believe the issue is that zuul is slower, not jenkins. [08:18:47] 10Continuous-Integration-Infrastructure: Zuul seems to be running slower - https://phabricator.wikimedia.org/T118083#2014134 (10Paladox) I believe that it is Zuul I think they added in some kind of timer so when you upload a patch in gerrit Zuul waits a few seconds to detect it. [08:22:28] (03PS1) 10Legoktm: Test to verify skins do not run 'testextension' jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269624 (https://phabricator.wikimedia.org/T117710) [08:22:28] 10Continuous-Integration-Infrastructure: some tests run from mwext-testextension-hhvm will pick up files from extensions that were not checked out for this job - https://phabricator.wikimedia.org/T117710#2014143 (10Legoktm) Certain skins have a "testextension" jobs in their experimental pipelines. If someone tri... [08:23:02] 10Continuous-Integration-Infrastructure: PHP53 & PHP55 tests say tidy is not installed, even though it appears to be installed; they're run with HHVM - https://phabricator.wikimedia.org/T124801#2014148 (10Legoktm) [08:23:04] 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests: MediaWiki PHPUnit tests skips TidyTest because "Tidy not found" - https://phabricator.wikimedia.org/T118814#2014147 (10Legoktm) [08:23:29] (03CR) 10jenkins-bot: [V: 04-1] Test to verify skins do not run 'testextension' jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269624 (https://phabricator.wikimedia.org/T117710) (owner: 10Legoktm) [08:23:36] 10Continuous-Integration-Infrastructure: Transient mwext-qunit failure: The value for 'SkipSkins' should be an array - https://phabricator.wikimedia.org/T124394#2014151 (10Legoktm) [08:23:38] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: some tests run from mwext-testextension-hhvm will pick up files from extensions that were not checked out for this job - https://phabricator.wikimedia.org/T117710#2014152 (10Legoktm) [08:31:50] what's going on with jenkins? [08:32:13] seems to retest every single patch in existence, including already merged ones [08:32:38] that's funny yes [08:32:51] legoktm: https://phabricator.wikimedia.org/T97889 partly fixed implicitly now? [08:33:46] Nikerabbit: no, it's still running php53lint, I only touched stuff under mediawiki/* [08:34:15] tgr: I triggered a bunch of extension jobs today to make sure they still passed under php55/hhvm, sorry for the spam [08:43:35] 10Continuous-Integration-Config: gate-and-submit may run `composer validate` twice - https://phabricator.wikimedia.org/T114451#2014174 (10Legoktm) 5Open>3declined a:3Legoktm Running an extra `composer validate` is extremely cheap, and running the extra job avoids a lot of CI complexity. [08:46:21] 10Continuous-Integration-Config, 10MassAction: Add HHVM Jenkins jobs for MassAction - https://phabricator.wikimedia.org/T100649#2014182 (10Legoktm) [08:46:23] 10Continuous-Integration-Config: Trigger PHPUnit job for MediaWiki extensions with HHVM instead of Zend in 'test' pipeline - https://phabricator.wikimedia.org/T101392#2014179 (10Legoktm) 5Open>3Resolved a:3Legoktm HHVM tests for extensions are now run as part of test and gate-and-submit. [08:48:55] 10Continuous-Integration-Infrastructure, 7Tracking: Have unit tests of all wmf deployed extensions pass when installed together, in both PHP-Zend and HHVM (tracking) - https://phabricator.wikimedia.org/T69216#2014188 (10Legoktm) [08:48:57] 10Continuous-Integration-Config, 5Release-Engineering-Epics, 7HHVM: Jenkins: Implement hhvm based voting jobs for mediawiki and extensions (tracking) - https://phabricator.wikimedia.org/T75521#2014185 (10Legoktm) 5Open>3Resolved a:3Legoktm This happened as part of the php55 version bump. [08:49:03] 10Continuous-Integration-Config, 5Release-Engineering-Epics, 7HHVM: Jenkins: Implement hhvm based voting jobs for mediawiki and extensions (tracking) - https://phabricator.wikimedia.org/T75521#2014189 (10Legoktm) a:5Legoktm>3None [09:02:03] (03CR) 10Paladox: "Ok. I was using php7 when I did this. But the error that happen when we did upgrade previously was fixed in master branch after the releas" [integration/composer] - 10https://gerrit.wikimedia.org/r/267548 (https://phabricator.wikimedia.org/T125343) (owner: 10Paladox) [09:13:05] PROBLEM - Free space - all mounts on integration-slave-trusty-1013 is CRITICAL: CRITICAL: integration.integration-slave-trusty-1013.diskspace._mnt.byte_percentfree (<40.00%) [10:04:52] (03PS1) 10Hoo man: Use PHP 5.5 lint for Wikibase [integration/config] - 10https://gerrit.wikimedia.org/r/269644 [10:04:57] jzerebecki: ^ [10:06:37] (03PS2) 10Hoo man: Use PHP 5.5 lint for Wikibase [integration/config] - 10https://gerrit.wikimedia.org/r/269644 [10:11:24] (03CR) 10Thiemo Mättig (WMDE): [C: 031] Use PHP 5.5 lint for Wikibase [integration/config] - 10https://gerrit.wikimedia.org/r/269644 (owner: 10Hoo man) [10:32:02] (03PS1) 10JanZerebecki: Use extension-qunit-composer template [integration/config] - 10https://gerrit.wikimedia.org/r/269651 [10:32:04] (03PS1) 10JanZerebecki: Move WikibaseQuality extensions to generic composer jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269652 [10:32:06] (03PS1) 10JanZerebecki: [WIP] Add a sqlite variant of extension-unittests-composer [integration/config] - 10https://gerrit.wikimedia.org/r/269653 [10:33:49] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add a sqlite variant of extension-unittests-composer [integration/config] - 10https://gerrit.wikimedia.org/r/269653 (owner: 10JanZerebecki) [10:55:31] 10Continuous-Integration-Config: Drop PHP 5.3 enforcement from CI for mediawiki and extensions - https://phabricator.wikimedia.org/T126440#2014412 (10JanZerebecki) 3NEW [10:58:37] 10Continuous-Integration-Config: Drop PHP 5.3 enforcement from CI for mediawiki and extensions - https://phabricator.wikimedia.org/T126440#2014425 (10Paladox) @JanZerebecki do you know which tests we would need to update to do mediawiki 1.26 or lower. I thought the unit tests at least would not run the php 5.3... [10:59:02] 10Continuous-Integration-Config, 10Wikidata: Support PHP 5.5 in CI for Wikidata stuff - https://phabricator.wikimedia.org/T126441#2014426 (10JanZerebecki) 3NEW [11:01:17] 10Continuous-Integration-Config, 10Wikidata, 3Wikidata-Sprint-2016-02-02: Support PHP 5.5 in CI for Wikidata stuff - https://phabricator.wikimedia.org/T126441#2014433 (10JanZerebecki) [11:01:36] 10Continuous-Integration-Config: Drop PHP 5.3 enforcement from CI for mediawiki and extensions - https://phabricator.wikimedia.org/T126440#2014436 (10JanZerebecki) [11:01:39] 10Continuous-Integration-Config, 10Wikidata, 3Wikidata-Sprint-2016-02-02: Support PHP 5.5 in CI for Wikidata stuff - https://phabricator.wikimedia.org/T126441#2014426 (10JanZerebecki) [11:05:01] (03PS3) 10JanZerebecki: Use PHP 5.5 lint for Wikibase [integration/config] - 10https://gerrit.wikimedia.org/r/269644 (https://phabricator.wikimedia.org/T126441) (owner: 10Hoo man) [11:08:18] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Split layout.yaml into smaller files - https://phabricator.wikimedia.org/T126442#2014453 (10Paladox) [11:12:59] 10Continuous-Integration-Config, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2016-02-02: Support PHP 5.5 in CI for Wikidata stuff - https://phabricator.wikimedia.org/T126441#2014463 (10JanZerebecki) [11:13:02] 10Continuous-Integration-Config: Drop PHP 5.3 enforcement from CI for mediawiki and extensions - https://phabricator.wikimedia.org/T126440#2014460 (10JanZerebecki) 5Open>3Invalid a:3JanZerebecki Never mind, I looked at an old patch and was confused by the skip in zuul/layout.yaml. [11:15:25] (03PS2) 10JanZerebecki: Move WikibaseQuality extensions to generic composer jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269652 (https://phabricator.wikimedia.org/T126441) [11:15:51] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Split layout.yaml into smaller files - https://phabricator.wikimedia.org/T126442#2014482 (10Paladox) Ive filed the WebKit bug here https://bugs.webkit.org/show_bug.cgi?id=154068 [12:38:54] (03CR) 10JanZerebecki: [C: 032] Use extension-qunit-composer template [integration/config] - 10https://gerrit.wikimedia.org/r/269651 (owner: 10JanZerebecki) [12:40:17] (03Merged) 10jenkins-bot: Use extension-qunit-composer template [integration/config] - 10https://gerrit.wikimedia.org/r/269651 (owner: 10JanZerebecki) [12:45:43] (03CR) 10JanZerebecki: [C: 032] Move WikibaseQuality extensions to generic composer jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269652 (https://phabricator.wikimedia.org/T126441) (owner: 10JanZerebecki) [12:49:27] (03Merged) 10jenkins-bot: Move WikibaseQuality extensions to generic composer jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269652 (https://phabricator.wikimedia.org/T126441) (owner: 10JanZerebecki) [12:52:09] (03PS2) 10JanZerebecki: Add HtmlFormatter [integration/config] - 10https://gerrit.wikimedia.org/r/269530 (https://phabricator.wikimedia.org/T125001) (owner: 10MaxSem) [12:52:17] (03CR) 10JanZerebecki: [C: 032] Add HtmlFormatter [integration/config] - 10https://gerrit.wikimedia.org/r/269530 (https://phabricator.wikimedia.org/T125001) (owner: 10MaxSem) [12:53:16] (03Merged) 10jenkins-bot: Add HtmlFormatter [integration/config] - 10https://gerrit.wikimedia.org/r/269530 (https://phabricator.wikimedia.org/T125001) (owner: 10MaxSem) [12:55:20] Yippee, build fixed! [12:55:21] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #752: 09FIXED in 1 min 20 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/752/ [13:00:15] 5Testing-Initiative-2015, 10Browser-Tests-Infrastructure, 7JavaScript, 5Patch-For-Review: Experiment with browser testing in other software languages - https://phabricator.wikimedia.org/T108874#2014659 (10zeljkofilipin) [13:15:02] !log reloading zuul for 3be81c1..e8e0615 [13:15:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:22:03] (03PS1) 10JanZerebecki: DO NOT MERGE test php53lin skip [integration/config] - 10https://gerrit.wikimedia.org/r/269667 [13:23:10] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Split layout.yaml into smaller files - https://phabricator.wikimedia.org/T126442#2014698 (10Aklapper) > **Steps to Reproduce** > > * Go to https://gerrit.wikimedia.org/r/#/c/269651/ > > You can use iOS 9.2 on an iPhone or iPad. > Or u... [13:27:36] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Split layout.yaml into smaller files - https://phabricator.wikimedia.org/T126442#2014700 (10Aklapper) 5Open>3declined a:3Aklapper Browsers should not crash. If they do, report it to your browser's maintainers. Nothing to fix in W... [13:37:11] (03PS2) 10JanZerebecki: Fix php lint job attributes [integration/config] - 10https://gerrit.wikimedia.org/r/269667 [13:38:48] (03PS3) 10JanZerebecki: Fix php lint job attributes [integration/config] - 10https://gerrit.wikimedia.org/r/269667 (https://phabricator.wikimedia.org/T126440) [13:43:50] (03CR) 10JanZerebecki: [C: 032] Fix php lint job attributes [integration/config] - 10https://gerrit.wikimedia.org/r/269667 (https://phabricator.wikimedia.org/T126440) (owner: 10JanZerebecki) [13:45:59] (03Merged) 10jenkins-bot: Fix php lint job attributes [integration/config] - 10https://gerrit.wikimedia.org/r/269667 (https://phabricator.wikimedia.org/T126440) (owner: 10JanZerebecki) [13:46:56] !log reloading zuul for 639dd40 [13:46:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:52:41] (03CR) 10JanZerebecki: "Removal happens in If19e4769. Will likely try to move Wikibase to the composer templates also used by other extensions, which would fix th" [integration/config] - 10https://gerrit.wikimedia.org/r/269644 (https://phabricator.wikimedia.org/T126441) (owner: 10Hoo man) [13:53:36] 10Continuous-Integration-Config, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2016-02-02: Support PHP 5.5 in CI for Wikidata stuff - https://phabricator.wikimedia.org/T126441#2014762 (10JanZerebecki) [13:53:41] 10Continuous-Integration-Config, 5Patch-For-Review: Drop PHP 5.3 enforcement from CI for mediawiki and extensions - https://phabricator.wikimedia.org/T126440#2014760 (10JanZerebecki) 5Invalid>3Resolved It seems only the first match for job attributes sticks, that is why php53lint was not skipped. [14:12:40] !log recover a bit of disk space: integration-saltmaster:~# salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/*WikibaseQuality*' [14:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:15:39] 7Browser-Tests, 10MediaWiki-extensions-MultimediaViewer: Fix failed MultimediaViewer browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94157#2014792 (10zeljkofilipin) >>! In T94157#1909192, @Jdlrobson wrote: > @zeljkofilipin would you be willing to fix this since I am not too familiar with the inn... [14:25:20] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2014827 (10zeljkofilipin) [14:25:32] good morning [14:31:24] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2014841 (10zeljkofilipin) [14:31:26] 7Browser-Tests, 6Collaboration-Team-Backlog, 10Notifications: Fix or delete failing Echo browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94152#2014838 (10zeljkofilipin) 5Open>3Resolved a:3zeljkofilipin Echo browser test jobs are now back to green: https://integration.wikimedia.org/ci/vie... [14:52:04] (03PS1) 10JanZerebecki: Wikibase: use extension-qunit-generic instead of special job [integration/config] - 10https://gerrit.wikimedia.org/r/269679 [14:58:35] (03CR) 10Hashar: [C: 031] "One more extension moved to generic job. You might want to add it to experimental first if you wanna confirm it is working properly." [integration/config] - 10https://gerrit.wikimedia.org/r/269679 (owner: 10JanZerebecki) [15:01:10] (03CR) 10JanZerebecki: "Tested with experimental here: https://gerrit.wikimedia.org/r/#/c/269643/2" [integration/config] - 10https://gerrit.wikimedia.org/r/269679 (owner: 10JanZerebecki) [15:02:07] (03PS2) 10JanZerebecki: Wikibase: use extension-qunit-composer instead of special job [integration/config] - 10https://gerrit.wikimedia.org/r/269679 [15:08:03] 10Continuous-Integration-Config: Drop PHP 5.3 enforcement from CI for mediawiki and extensions - https://phabricator.wikimedia.org/T126440#2014955 (10Ricordisamoa) [15:09:12] * greg-g waves to hashar [15:12:34] (03CR) 10JanZerebecki: [C: 032] Wikibase: use extension-qunit-composer instead of special job [integration/config] - 10https://gerrit.wikimedia.org/r/269679 (owner: 10JanZerebecki) [15:20:26] hashar: I belive https://integration.wikimedia.org/ci/job/mwext-testextension-php53-composer/208/console is failing because we need to re organise the database. It happends to semanticmediawiki too. [15:20:48] I think extension-unittests works or the generic one not sure which one. [15:21:00] jzerebecki: ^ [15:21:31] (03Merged) 10jenkins-bot: Wikibase: use extension-qunit-composer instead of special job [integration/config] - 10https://gerrit.wikimedia.org/r/269679 (owner: 10JanZerebecki) [15:21:44] paladox: I am in meeting/checkin [15:21:58] hashar: Ok. [15:22:47] (03PS1) 10JanZerebecki: Wikibase: use extension-unittests-composer [integration/config] - 10https://gerrit.wikimedia.org/r/269693 [15:22:56] (03CR) 10Paladox: "I think we need to reorganise the database since it is causing errors in branches such as REL1_26." [integration/config] - 10https://gerrit.wikimedia.org/r/269651 (owner: 10JanZerebecki) [15:25:47] (03CR) 10Paladox: [C: 04-1] "@JanZerebecki there seems to be a problem in the -composer jobs that's test unit tests and qunit where as in generic I belive it works." [integration/config] - 10https://gerrit.wikimedia.org/r/269693 (owner: 10JanZerebecki) [15:29:22] 10Continuous-Integration-Config, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2015087 (10Physikerwelt) [15:29:57] (03CR) 10JanZerebecki: "Where does it fail?" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:31:18] (03CR) 10JanZerebecki: "@Paladox That is not relevant to this patch as nothing in the patch stops using the -generic template." [integration/config] - 10https://gerrit.wikimedia.org/r/269693 (owner: 10JanZerebecki) [15:33:24] (03PS2) 10JanZerebecki: Wikibase: use extension-unittests-composer [integration/config] - 10https://gerrit.wikimedia.org/r/269693 (https://phabricator.wikimedia.org/T126441) [15:33:45] (03CR) 10JanZerebecki: [C: 032] Wikibase: use extension-unittests-composer [integration/config] - 10https://gerrit.wikimedia.org/r/269693 (https://phabricator.wikimedia.org/T126441) (owner: 10JanZerebecki) [15:34:21] (03CR) 10JanZerebecki: [C: 04-1] Lets install MySQL before installing dependacy extensions [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:35:53] (03Abandoned) 10JanZerebecki: Use PHP 5.5 lint for Wikibase [integration/config] - 10https://gerrit.wikimedia.org/r/269644 (https://phabricator.wikimedia.org/T126441) (owner: 10Hoo man) [15:36:16] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2015106 (10zeljkofilipin) a:3zeljkofilipin [15:36:41] (03Merged) 10jenkins-bot: Wikibase: use extension-unittests-composer [integration/config] - 10https://gerrit.wikimedia.org/r/269693 (https://phabricator.wikimedia.org/T126441) (owner: 10JanZerebecki) [15:40:48] 7Browser-Tests, 10Math: Math Selenium test fails with unable to locate element, using {:id=>"wpTextbox1", :tag_name=>"textarea"} (Watir::Exception::UnknownObjectException) - https://phabricator.wikimedia.org/T126463#2015110 (10zeljkofilipin) 3NEW a:3zeljkofilipin [15:41:32] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2015127 (10zeljkofilipin) [15:41:33] (03CR) 10JanZerebecki: "16:20 < paladox> hashar: I belive https://integration.wikimedia.org/ci/job/mwext-testextension-php53-composer/208/console is failing becau" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:42:10] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#1177837 (10zeljkofilipin) [15:42:39] !log reloading zuul for 639dd40..41a92d5 [15:42:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:43:03] (03CR) 10Paladox: "Oh. But it was passing before." [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:44:00] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2015138 (10zeljkofilipin) [15:44:39] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#1188957 (10zeljkofilipin) [15:48:05] 5Release-Engineering-Epics, 10Browser-Tests-Infrastructure, 7Epic, 7Tracking: Fix or delete failing browser tests Jenkins jobs - https://phabricator.wikimedia.org/T94150#2015152 (10zeljkofilipin) [15:48:42] (03CR) 10Paladox: "Please see https://integration.wikimedia.org/ci/job/mwext-SemanticMediaWiki-testextension-php55/2/console which fails. Also in SemanticFor" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:52:31] (03CR) 10Paladox: "@ but if you look at https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm/2081/console yes it fails but if you look closly yo" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:52:47] (03CR) 10Paladox: "@JanZerebecki" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:52:51] jzerebecki: ^ [15:53:37] 10Deployment-Systems, 10MediaWiki-Releasing, 6Release-Engineering-Team: make-wmf-branch instance requires ssh auth forwarding - https://phabricator.wikimedia.org/T126335#2015155 (10greg) [15:53:40] 10Deployment-Systems, 10MediaWiki-Releasing, 6Release-Engineering-Team: make-wmf-branch doesn't ensure git has proper user.name and user.email - https://phabricator.wikimedia.org/T126334#2015156 (10greg) [16:02:39] (03PS1) 10JanZerebecki: Add missing dependency for ext:Cite [integration/config] - 10https://gerrit.wikimedia.org/r/269704 [16:02:59] (03CR) 10JanZerebecki: [C: 032] Add missing dependency for ext:Cite [integration/config] - 10https://gerrit.wikimedia.org/r/269704 (owner: 10JanZerebecki) [16:03:36] (03CR) 10Jforrester: "How did this get removed?" [integration/config] - 10https://gerrit.wikimedia.org/r/269704 (owner: 10JanZerebecki) [16:05:21] (03Merged) 10jenkins-bot: Add missing dependency for ext:Cite [integration/config] - 10https://gerrit.wikimedia.org/r/269704 (owner: 10JanZerebecki) [16:06:34] !log reloading zuul for 41a92d5..5b971d1 [16:06:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:06:51] jzerebecki: (It's a good thing, but I thought we added it before.) [16:07:24] James_F: I don't think anyone did, only the other way around. [16:07:37] Oh. [16:07:37] Hmm. [16:08:25] Thanks, anyway. :-) [16:14:20] jzerebecki legoktm hashar it dosent seem like the composer test no longer works in experimental please see https://gerrit.wikimedia.org/r/#/c/269705/ [16:14:50] 00:00:37.574 Could not insert main page: Invalid callback SFFormUtils::purgeCache in hooks for ArticleSave [16:15:07] I think we have seen that on Semantic already [16:15:58] hashar: I mean the run that runs composer test. [16:16:08] the MediaWiki installer creates a Main Page , trigger a hook which fail because the hook is missing apparently [16:16:10] hashar: This one composer-php53 [16:16:59] hashar: Similar test https://integration.wikimedia.org/ci/job/composer-php55/13/console but in php53. https://gerrit.wikimedia.org/r/#/c/267431/ [16:17:54] This version of the DeletePagesForGood extension requires MediaWiki 1.25+.. [16:17:57] booh [16:18:02] Doesent look like https://integration.wikimedia.org/ci/job/composer-php55/13/console runs the test. Since i see this message now This version of the DeletePagesForGood extension requires MediaWiki 1.25+.. where as before it ran the phpcs test and phplint. [16:18:16] Before the composer update it worked. [16:18:24] Not sure why it is starting to say that. [16:19:17] that extension dies unless function_exists( 'wfLoadExtension' ) which is rom Mw 1.25 [16:19:45] hashar: Oh. Could we install mediawiki with that test. [16:19:46] oh [16:20:03] I think we had that one as well [16:20:04] I just looked through and i am wrong it does say that in the prevous test. [16:20:10] for some reason composer ends up loading the .php file [16:20:13] ie include them [16:20:17] but core is not around and that fails [16:20:47] hashar: Yes. Maybe we could clone core in that test. Since when we include any core code it fails because it carnt find it. [16:21:10] na [16:21:26] funnily it works on my setup... [16:22:48] hashar: Oh. It fails on mine when i just use composer inside that extension and not include core. But when i do include core it passes. [16:23:16] 7Browser-Tests, 10Math, 5Patch-For-Review: Math Selenium test fails with unable to locate element, using {:id=>"wpTextbox1", :tag_name=>"textarea"} (Watir::Exception::UnknownObjectException) - https://phabricator.wikimedia.org/T126463#2015220 (10zeljkofilipin) [16:27:14] paladox: will want to ask legoktm. Maybe that is our old composer version misbehaving [16:27:23] it should not have to include the .php [16:27:37] legoktm: reference ( https://integration.wikimedia.org/ci/job/composer-php55/13/console but in php53. https://gerrit.wikimedia.org/r/#/c/267431/ ) [16:27:46] hashar: Ok. It does it in the new test too. [16:28:00] or composer test that runs parallel-lint include the php entry point and executes the die() statement ini there [16:29:36] hashar: Ok. [16:30:53] paladox: pretty sure we encountered that previously [16:30:56] can't find [16:30:59] the task though [16:31:02] meeting again! [16:31:04] hashar: Ok. [16:34:43] 10Continuous-Integration-Config, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2016-02-02: Support PHP 5.5 in CI for Wikidata stuff - https://phabricator.wikimedia.org/T126441#2015236 (10JanZerebecki) We still have a few jobs that need a php55 variant. [16:34:46] PROBLEM - Host deployment-mediawiki02 is DOWN: PING CRITICAL - Packet loss = 86%, RTA = 2485.75 ms [16:37:47] legoktm: ah, I see I missed an email about that [16:38:01] that looks like very solid preparation, thanks for doing that! [16:38:29] hashar: can you help me figure out what's going on with https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/30042/console ? [16:39:09] qunit tests call load.php, which initiates session handling, wich calls IP::isPublic (that's added in the patch) and that blows up spectacularly [16:40:25] but the xdebug trace does not contain function arguments so I don't know how exactly that happens [16:54:49] Fatal error: Maximum function nesting level of '100' reached, aborting! in /mnt/jenkins-workspace/workspace/mediawiki-extensions-qunit/src/vendor/wikimedia/ip-set/src/IPSet.php [16:54:50] doh [16:54:55] tgr: looking :d [16:55:02] and yeah the stacktrace are mungled [16:55:20] https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/30042/ has a few wfDebugLog* outputs [16:55:38] mw-debug-www.log being from the http://localhost/ install [16:56:11] stupid HTML trace [16:57:09] https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/30042/artifact/log/mw-debug-www.log/*view*/ has a trace [16:57:14] [fatal] [3eb962c1] PHP Fatal Error: Maximum function nesting level of '100' reached, aborting! [16:57:14] #0 [internal function]: MWExceptionHandler::handleFatalError() [16:57:15] #1 {main} [16:57:17] totally unhelpful [16:59:43] hashar: maybe hhvm.error_handling.call_user_handler_on_fatals is not set? [17:00:23] but then, that stack trace would not include arguments either [17:02:53] tgr: found that https://phabricator.wikimedia.org/P2585 [17:02:57] https://doc.wikimedia.org/mediawiki-core/master/php/IP_8php_source.html#l00381 is the call that explodes [17:03:49] in the qunit job we do warm up / try Special:BlankPage whith curl --include http://localhost:9412/jenkins-mediawiki-extensions-qunit-30042/index.php/Special:BlankPage [17:03:58] and that output the html formatted trace [17:04:08] hhvm.error_handling.call_user_handler_on_fatals <-- I have no clue what it is [17:04:10] yeah, that trace is included in the artifacts, it's just hard to read because xdebug turns it into a html table [17:04:29] hhvm.error_handling.call_user_handler_on_fatals will make fatal error logging show real stack traces [17:04:33] now the question is why we have infinite calls of IPSet\IPSet::recOptimize [17:04:43] it tells hhvm to call the error handler on a fatal error [17:04:48] oh [17:05:07] I had a bug about thatç [17:05:13] by default, that does not happen, and MediaWiki catches the fatal in a shutdown handler, and stack traces have been unwound by then [17:05:17] for production, something like "restore stacktrace on php fatals for logstash) [17:05:29] yeah, for production that has been fixed [17:05:39] but it relies on that setting [17:05:51] * hashar looks at a slave [17:06:41] re: IPSet, as far as I can tell that whole class has no configurable dependencies whatsoever [17:06:49] so it should behave the same on any machine [17:06:59] so I am not sure what's going on [17:07:20] but seeing what argument is passed to recOptimize would maybe give a clue [17:08:57] 10Continuous-Integration-Infrastructure: CI slaves should have HHVM call the exception user handler so we have useful stack trace on fatal errors - https://phabricator.wikimedia.org/T126473#2015381 (10hashar) 3NEW [17:09:18] tgr: the poor lame way would be to have a var_dump() :D [17:09:30] is it possible to set xdebug.collect_params [17:09:37] to 2? [17:10:07] maybe via ini_set ? [17:10:20] I dont think we have any specific xdebug config on the ci slave [17:10:43] I'll try, my guess would be that that's too late to change XDebug settings though [17:11:30] well [17:11:37] we can have that settings set for all CI instanecs [17:11:53] we just install php5-xdebug [17:12:37] find /etc -name '*xdebug*' [17:12:52] /etc/php5/mods-available/xdebug.ini [17:12:52] /etc/php5/cli/conf.d/20-xdebug.ini [17:12:53] /etc/php5/apache2/conf.d/20-xdebug.ini [17:13:28] so one might have puppet to provision some .ini file that set the xdebug.collect_params [17:13:50] xdebug.collect_assignments => Off => Off [17:13:50] xdebug.collect_includes => On => On [17:13:51] xdebug.collect_params => 0 => 0 [17:13:51] xdebug.collect_return => Off => Off [17:13:53] xdebug.collect_vars => Off => Off [17:14:31] tgr: so essentially yeah that sounds like a good. Probably want to task it and figure out a puppet patch to ship a .ini file with wanted xdebug settings [17:14:56] is there a way to trigger that for specific builds? [17:15:05] I don't think it would be a good default [17:15:41] I dont know whether you can set it via ini_set [17:15:53] I'm just about to try that [17:15:55] potentially can be a hack in your patch to set it up in Setup.php or index.php [17:18:30] one sure thing, would be nice to reproduce and fix up IPSet to prevent it from recursing infinitely [17:20:23] tgr: well hacked! [17:21:40] I guess I could use wfDebugLog if nothing else, that should end up in the artifact files, right? [17:21:45] yup [17:22:01] var_dump() might spurt it out as well [17:22:17] tgr: can you reproduce the issue locally ? [17:22:25] a curl to your localhost SpecialBlank would do [17:22:44] I didn't try but the tests seem to blow up on load.php as well [17:22:51] and that definitely works locally [17:22:52] yeah pretty much anywhere I guess [17:22:56] since that is in Setup.php [17:23:00] doh [17:23:07] maybe IPSet version is different :( [17:23:25] no, all the other test pass, just qunit fails [17:23:35] this is somehow related to curl [17:24:07] in which case I guess I did not try to sensibly reproduce it yet, since I didn't use curl [17:24:44] well if you get the array/parameters passed to IPSet::__construct() that might help reproduce the loop [17:25:10] and maybe attempt to get a unit test that reproduce it in IPSet.git [17:25:15] (random thoughts really sorry ) [17:26:16] hashar: Could we set up a new kind of the non generic composer test so that we can make the job a non voting for some repos. So for example mwext-ExtensionName-testextension-phpflavour-composer [17:26:45] nope, curl works fine locally [17:26:53] Since SemanticMediaWiki should really be running the composer unit tests not non generic. [17:27:07] hashar: the weird thing is that the parameters are constant [17:27:33] https://doc.wikimedia.org/mediawiki-core/master/php/IP_8php_source.html#l00381 [17:27:57] that just sets up an IPSet with the address ranges which are considered nonpublic per RFC [17:28:11] so I have no clue what's going on there [17:29:08] will have to check what is passed to IP::isPublic() [17:29:23] maybe it is a corner case that ends up causing IPSet to deathloop [17:29:29] such as passing '' or null [17:30:09] that should be the request origin IP [17:30:14] so 127.0.0.1 I guess [17:30:15] paladox: potentially, but what about using the generic composer test in experimental ? [17:30:27] paladox: then iterate until it pass, and at that point we can use the generic job [17:30:29] but that parameter is not passed to IPSet at all [17:31:31] (03PS1) 10Zfilipin: WIP recreate VisualEditor Selenium Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/269732 (https://phabricator.wikimedia.org/T94162) [17:31:32] hard to tell [17:31:37] from https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/59062/consoleFull [17:31:42] hashar: Yes. I belive that it isent todo with the extension failing i think its related to something with the db https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-composer/985/console [17:31:49] IP::isPublic( ) does not show the parameter [17:32:14] paladox: looks like the table creation is not registered for install.Php to create it [17:32:27] paladox: or that depends on another Semantic extension [17:32:47] hashar: No doesent depend on any other extension. [17:33:17] paladox: ah that is happening again when creating the main page .. [17:33:26] hashar: Its because of the way db is installed. I think the db should be installed first then it should test the extension when running db update. [17:33:32] a hook is kicking that rely on smw_object_ids table to be created but it hasn't been created yet [17:33:57] hashar: See https://gerrit.wikimedia.org/r/#/c/264333/ please. [17:34:41] hashar: Because it gets class missing on non generic and generic. [17:35:07] ahh [17:36:20] hashar: I think because it tests with the extension installed where as it should do it after db is installed. [17:37:22] (03CR) 10Hashar: "Yeah that might be solving an issue that is gone on with install.php failing because the database is not all setup yet but hooks are kicki" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [17:37:37] paladox: we will want to list out what each kind of jobs are doing and in which order [17:37:41] and find out the proper order [17:37:47] no idea right now really :( [17:37:53] gotta commute out back home sorry [17:38:03] ok. [17:40:35] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce-T94162 build #1: 04FAILURE in 4 min 45 sec: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce-T94162/1/ [17:45:18] (03PS7) 10Paladox: Lets install MySQL before installing dependacy extensions [integration/config] - 10https://gerrit.wikimedia.org/r/264333 [17:47:24] (03CR) 10Paladox: "@Hashar yes. This will work because if you look at" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [17:54:10] o/ Hello! We're seeing errors this morning on a job that nearly never fails. I think it's an issue with integration-slave-precise-1004 specifically. Example error: https://integration.wikimedia.org/ci/job/tox-flake8/12544/console [18:04:47] 7Browser-Tests, 10VisualEditor, 5Patch-For-Review: Delete or fix failed VisualEditor browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94162#2015621 (10zeljkofilipin) 5 tests are passing: P2586. [18:17:24] (03PS1) 10Paladox: Add mw-submodule-update.sh file [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269742 [18:20:42] !log Creating a Trusty slave to support increased demand following MediaWIki php53(precise)>php55(trusty) bump [18:20:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:29:36] PROBLEM - Host integration-dev is DOWN: CRITICAL - Host Unreachable (10.68.16.227) [18:34:42] Krinkle: yeah, looking at the bottom two graphs on https://integration.wikimedia.org/zuul/ looks intense [18:34:46] thank you [18:42:28] greg-g: yw. It'll take about 30min to fully provision, but I assume the demand will stay, not one-time [18:42:42] * greg-g nods [18:58:11] PROBLEM - Free space - all mounts on integration-slave-trusty-1011 is CRITICAL: CRITICAL: integration.integration-slave-trusty-1011.diskspace._mnt.byte_percentfree (<42.86%) [19:08:36] jzerebecki: Why not bring the old test back for WikibaseQualityExternalValidation but for branches REL1_26 or lower. But use the new test for master branch. [19:09:17] hey, how do I remove 53 lint from my extensions/modules? [19:11:32] MaxSem: You could use skip-if. [19:11:55] MaxSem: which branch? [19:12:04] master, jzerebecki [19:12:13] MaxSem: For example [19:12:14] - name: ^rake-jessie$ [19:12:14] # Rake entry points have not been backported to wmf branches yet [19:12:14] # -- hashar Nov 10th 2015 [19:12:14] branch: (?!^wmf/1\.27\.0-wmf\.[45]) [19:12:15] skip-if: [19:12:17] - project: '^mediawiki/core$' [19:12:19] branch: (?:^REL1_23$|^REL1_24$|^fundraising/REL.*) [19:12:21] - project: '^mediawiki/extensions/Flow$' [19:12:23] branch: (?:^REL1_25$) [19:13:10] MaxSem: What extensions/modules do you want it disabled on. [19:13:28] ideally, all WMF-deployed :D [19:13:43] MaxSem: I should have already fixed that. do you have a link wwhere it happens? [19:13:45] but now just the stuff I'm amaintainer of [19:14:23] jzerebecki, https://gerrit.wikimedia.org/r/#/c/262681/ [19:14:39] PROBLEM - Puppet failure on integration-slave-trusty-1020 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [19:17:42] MaxSem: yes fixed that earlier. should work now. [19:17:56] 10Continuous-Integration-Config, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2015922 (10Physikerwelt) as discussed on more hints lead to the absense of https://www.ctan.org/tex-archive/macros/latex/contrib/teubner jzerebecki do you by chance know if the php55 tes... [19:25:55] 10Continuous-Integration-Infrastructure, 7Regression: Puppet failure: "Exec[install_alternative_php] path slave-scripts/bin/php doesn't exist" - https://phabricator.wikimedia.org/T126498#2015967 (10Krinkle) 3NEW [19:28:35] (03PS1) 10Paladox: Fix phplint 5.3 loading for master [integration/config] - 10https://gerrit.wikimedia.org/r/269754 [19:29:22] (03CR) 10JanZerebecki: [C: 04-2] "git submodule update --init does not execute PHP code." [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [19:29:24] (03PS2) 10Paladox: Fix phplint 5.3 loading for master [integration/config] - 10https://gerrit.wikimedia.org/r/269754 [19:29:29] jzerebecki: ^ [19:30:20] (03CR) 10Paladox: "@JanZerebecki I'm not sure what you mean. This is moving the db install above loading the extension." [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [19:31:08] (03PS3) 10Paladox: Fix phplint 5.3 loading for master [integration/config] - 10https://gerrit.wikimedia.org/r/269754 [19:31:19] (03CR) 10JanZerebecki: "What do you mean exactly with "loading the extension"?" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [19:34:13] (03CR) 10Paladox: "Well I mean that if you look at" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [19:36:57] hashar: Im not sure but the queue at https://integration.wikimedia.org/zuul/ is very long. [19:38:13] paladox: 18:20 < Krink.le> !log Creating a Trusty slave to support increased demand following MediaWIki php53(precise)>php55(trusty) bump [19:38:13] ENOTENOUGHRESOURCES [19:38:30] I'm creating two instances [19:38:33] hashar: Oh. [19:38:37] Ok. [19:38:53] ohh [19:39:08] yeah timo is right php53 --> php55 caused the workload to shift to Trusty instances grblblbl [19:39:27] Krinkle: last tuesday I have created a few 2 CPU / 2 executor slots precise instances [19:39:44] ci1.medium I'm using for these two instances [19:39:58] also, people started committing 5.5 fixes like hell [19:40:14] which isi also 2 CPU but with custom memory [19:40:15] have we officially switched core to 5.5 ? [19:40:19] Yes [19:40:21] yup [19:40:41] \O/ [19:40:49] hashar: I'm tailing syslog, provisioning takes looong [19:40:51] like 45min [19:40:58] the puppet one yeah [19:40:59] for ci slave labs [19:41:18] because that includes puppet class mediawiki::package which install a lot of cruft [19:42:24] (03CR) 10JanZerebecki: [C: 04-1] "What is currently broken?" [integration/config] - 10https://gerrit.wikimedia.org/r/269754 (owner: 10Paladox) [19:42:31] Krinkle: I will phase out a couple Precise m1.large instances not much use for them now [19:42:40] that will free up some quota [19:42:48] hashar: quota raised [19:43:29] !log Dropping slaves Precise m1.large integration-slave-precise-1014 and integration-slave-precise-1013 , most load shifted to Trusty (php53 -> php55 transition) [19:43:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:43:47] (03CR) 10Paladox: "Oh wait looking at https://gerrit.wikimedia.org/r/#/c/262681/ it was done before you fixed the problem woops sorry." [integration/config] - 10https://gerrit.wikimedia.org/r/269754 (owner: 10Paladox) [19:44:15] Also, it'll take like forever for the first jobs to finish thanks to cloning mediawiki core [19:44:37] (03Abandoned) 10Paladox: Fix phplint 5.3 loading for master [integration/config] - 10https://gerrit.wikimedia.org/r/269754 (owner: 10Paladox) [19:44:37] RECOVERY - Puppet failure on integration-slave-trusty-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [19:44:38] hashar: I've pooled trusty-1020 [19:44:46] hashar: trusty-1019 is still provisoning [19:44:56] Krinkle: need to reboot the instance after puppet is done [19:45:05] there are some services that are not properly initialized by puppet [19:45:10] apache is one of them iirc [19:45:27] hashar krinkle in the generic job it is quicker because it clones faster could we bring that to the non generic test which should speed up everything, [19:45:27] I also usually do apt-get update && apt-get dost-upgrade [19:45:45] PROBLEM - Host integration-slave-precise-1013 is DOWN: CRITICAL - Host Unreachable (10.68.17.209) [19:45:49] for https://phabricator.wikimedia.org/T126422 can somebody look at trusty slaves if they have texlive-lang-greek installed? [19:45:51] hashar: Document it or it doesn't exist [19:45:55] PROBLEM - Host integration-slave-precise-1014 is DOWN: CRITICAL - Host Unreachable (10.68.17.16) [19:45:57] https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup [19:46:03] doing [19:46:20] hey it is already there: Reboot the instance (Before adding to Jenkins). This cleans state, launches deamons, and fixes Shinken monitoring (phabricator:T91351). [19:46:20] :D [19:48:03] !log did cleanup across all integration slaves, some were very close to out of room. results: https://phabricator.wikimedia.org/P2587 [19:48:05] hashar: What paladox describes sounds like https://gerrit.wikimedia.org/r/#/c/195021/ [19:48:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:48:10] gah, meant to prefix with ariel [19:48:21] !log that cleanup was done by apergos [19:48:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [19:50:09] Krinkle: looking at name: 'zuul-cloner-extdeps' it seems that mediawiki/core is not cloned it is instead copied i think which increases the speed. [19:50:20] But yes that looks similar to what im saying. [19:50:41] Krinkle: What does shallow-clone do. [19:50:58] paladox: It doesn't work with zuul, it is a thing for simple jobs that use Jenkins' internal git clone function [19:51:12] (e.g. scm: git-(something), instead of builder: zuul) [19:51:22] Krinkle: Oh. Ok [19:51:39] shallow-clone means that Git will only copy the needed revision from Gerrit, and not the full history. [19:51:50] I will do my best to complete the npm/nodejs migration to nodepool instance [19:51:53] There is no need to download 15 years of Git History when testing the curent master of MEdiaWiki [19:52:07] and I guess then start figuring out to ship zend 5.5 and beg for hhvm on Jessie to switch the php jobs to nodepool [19:52:10] Which zuul currently still does for all repos I think. [19:52:15] Oh thanks for explaning. [19:52:34] Without shallow and without git-cache, Zuul clone is a regression from the Git clone Jenkins used to do. [19:53:11] Krinkle: Should we open your patch since it would help probably alot with all the load. [19:53:43] Nah, hashar better create a new patch for the current infra. [19:53:47] Mine is outdated [19:53:50] Krinkle: But looks like the difference between generic test and non is that the non uses zuul-clone whereas generic dosent. [19:53:54] Krinkle: Ok. [19:54:15] paladox: We can't use this for jobs with more than 1 git repo needed because the generic can only clone 1 repo [19:54:20] E.g. if you need mwcore + extensions [19:54:23] than this cannot work [19:54:29] it needs Zuul to orchestrate the mulitple repos [19:54:41] Krinkle: Ok. [19:56:12] well folks even CR+2 folks without waiting for tests result :( [19:59:18] bah slaves dieing due to proc_open(): fork failed - Cannot allocate memory [19:59:19] :( [20:00:36] Krinkle: the ci.medium slaves only have 2GB of ram that is not enough for 4 builds in // [20:00:45] hashar: 2 builds, not 4 [20:00:47] I fixed it 20 minutes agso [20:00:47] ah [20:00:54] it's slow like I said [20:00:59] :) [20:01:26] It's finished now, back to 2 [20:01:40] Existing builds are allowed to finish [20:02:00] well they broke [20:02:06] some of the tests failed due to out of memory [20:02:11] but that will get reenqueued I guess [20:02:21] Ah, yeah [20:02:25] hashar: puppet is getting TERM [20:02:29] is that normal? [20:02:30] (03PS1) 10Paladox: Improve cloning mediawiki/core and mediawiki/vendor under non generic tests [integration/config] - 10https://gerrit.wikimedia.org/r/269768 [20:02:35] Had to run it like 3 times [20:02:49] before it actually finished properly [20:02:52] Krinkle hashar legoktm jzerebecki https://gerrit.wikimedia.org/r/#/c/269768/ [20:02:54] and then 2 more times before it became no-op [20:03:16] yeah the manifests have a bunch of breakage [20:04:15] paladox: Hm.. That commit doesn't change anything. It just moves the name of the repo from the job to a text file. Still cloned exactly the same. [20:05:16] Krinkle: But it dosent use zuul cloner i doint think. But how does the generic test clone fast. [20:05:40] which generic test do you have in mind? [20:05:51] paladox: Creating a text file does not clone a repository. This changes moves the list of repos to a new file deps.txt and then tells Zuul to clone it, just like before? [20:05:58] * twentyafterfour grumbles at zuul [20:06:13] * hashar blames php55 switch overloading CI slaves :D [20:06:16] We should not have switched over without changing slaves first. [20:06:16] PROBLEM - Puppet failure on integration-slave-trusty-1019 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [20:06:32] Krinkle: Oh. Do you know where the generic is set to have a look at seeing how mediawiki clone is set. [20:06:36] !log creating integration-slave-trusty-1021 and integration-slave-trusty-1022 (ci.medium) [20:06:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:07:25] Krinkle: Would it being moved to a text file cause it to take really long cloning mediawiki. [20:07:33] paladox: The generic test is fast because it does not use Zuul. The generic test works without Zuul because it only needs to clone 1 repoitory from Gerrit. The more complex jobs need multiple repositories, which means you have to use Zuul. [20:07:58] paladox: No. The text file is unrelated. It's just a way to transfer information from the job configuration to Zuul. [20:08:03] Krinkle: re not switching until we have more executers first, yeah :/, next time I guess [20:08:15] RECOVERY - Free space - all mounts on integration-slave-trusty-1011 is OK: OK: All targets OK [20:08:19] Krinkle: Would it be possible to clone mediawiki/core and vendor without using zuul then use zuul to clone the rest. [20:08:48] hashar: Also getting lots of puppet errors about mysql user and group not existing. [20:08:58] I don't recall those errors existing when creating an instance last year [20:09:41] maybe a race condition with some manifest attempting to create a file owned by user mysql before mysql package is installed [20:09:57] Yes, the tmpdir probably. But that used to work fine. [20:10:00] 10Continuous-Integration-Infrastructure, 7Puppet: Need a better way of testing puppet patches for contint/integration stuff - https://phabricator.wikimedia.org/T126370#2016189 (10scfc) The use case fits what Puppet calls "environments" (which I think is where `production` for the default branch comes from as t... [20:10:32] Puppet failures should be filed and result in revert. I used to perform zero tolerance on that. [20:10:45] Anyway, existing slaves are unaffected it seems, so not too bad. [20:11:19] it seems that it is duplicated in name: 'mwext-testextension-{phpflavor}' - job-template: [20:11:19] name: 'mwext-testextension-{phpflavor}' [20:11:20] node: 'contintLabsSlave && ((UbuntuPrecise && phpflavor-php53 && phpflavor-{phpflavor}) || (UbuntuTrusty && phpflavor-hhvm && phpflavor-{phpflavor}))' [20:11:20] concurrent: true [20:11:56] phpflavor-php53 && phpflavor-{phpflavor}) are the same thing just that php53 will have the same name twice. [20:13:12] hashar: If php55 is on trusty why is it set to UbuntuPrecise for phpflavour. [20:13:27] it is not [20:13:40] the node: is longer than that it is a label selector [20:14:06] it is a dirty hack though [20:17:08] Hm.. my IP ping is resolving incorrectly hashar [20:17:21] ping integration-slave-trusty-1020 [20:17:21] integration-slave-trusty-1020.integration.eqiad.wmflabs (10.68.17.66) [20:17:23] That one is fine [20:17:28] (03PS1) 10Paladox: Add php55 to phpflavour in mwext-testextension-generic [integration/config] - 10https://gerrit.wikimedia.org/r/269815 [20:17:32] ping integration-slave-trusty-1019 [20:17:33] create-test-121.testlabs.eqiad.wmflabs (10.68.17.61) [20:17:36] What is that [20:18:03] RECOVERY - Free space - all mounts on integration-slave-trusty-1013 is OK: OK: All targets OK [20:18:12] oh man [20:18:14] labs is confused [20:18:27] I have added integration-slave-trusty 1021 and 1022 [20:18:44] !log created integration-slave-trusty-1009 and 1010 (trusty ci.medium) [20:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:19:06] 12.17.68.10.in-addr.arpa. 60 IN PTR ci-jessie-wikimedia-22792.contintcloud.eqiad.wmflabs. [20:19:06] 12.17.68.10.in-addr.arpa. 60 IN PTR integration-slave-trusty-1022.integration.eqiad.wmflabs. [20:19:18] puppet keeps failing with some kind of dpkg time out. Cascades all the way with dependencies has failure: true [20:19:20] (03PS2) 10Paladox: Add php55 to phpflavour in mwext-testextension-generic [integration/config] - 10https://gerrit.wikimedia.org/r/269815 [20:19:48] ips are getting assigned multiple times [20:20:41] Krinkle: Where would i look for looking at how mediawiki is cloned so i could apply it to the non generic. So all we have todo is use zuul to clone the extensions and anything else. [20:20:49] paladox: I can't help you now. [20:20:57] Busy fixing ci capacity [20:20:59] Krinkle: Ok. [20:21:22] Ask hashar later, or file a task and we'll see there [20:23:54] Krinkle: I will file a task. [20:24:32] !log created integration-slave-trusty-1019 and integration-slave-trusty-1020 (ci1.medium) [20:24:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:28:40] !log Pooled integration-slave-trusty-1020 (new) [20:28:42] 1021 / 1022 / 1009 and 1010 being build still [20:28:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:31:11] Krinkle: Sorry to ask you in the middle of something but can i add you to the task i am about tot create. [20:31:23] I would prefer not. [20:32:06] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Doint use zuul to clone mediawiki/core and mediawiki/vendor under non generic tests including composer - https://phabricator.wikimedia.org/T126519#2016378 (10Paladox) 3NEW [20:32:07] Krinkle: Ok. [20:32:22] I'm not actively working on new CI patches [20:32:25] hashar: Could i add you too ^ and legoktm please. [20:32:27] Ok. [20:32:40] (03CR) 10JanZerebecki: "The call to wfLoadExtensions is only added to LocalSettings.php during mw-apply-settings. Maybe some of this is because the LodalSettings." [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [20:33:12] paladox: if it is CI, feel free to CC me anytime. But I am already receiving email for anything happing in the CI projects ( continuous-integration-infrastructure and continuous-integration-config ) [20:33:41] hashar: btw, I always run `println "uname -a".execute().text` from the Scrip console in Jenkins after creating a slave to verify the IP [20:33:52] oh [20:33:54] hashar: Ok thanks, Yes it is ci related. [20:34:15] Krinkle: I myself just /usr/sbin/ifconfig on slave then copy paste in Jenkins UI [20:34:22] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Doint use zuul to clone mediawiki/core and mediawiki/vendor under non generic tests including composer - https://phabricator.wikimedia.org/T126519#2016387 (10Paladox) [20:34:34] !log Pooled integration-slave-trusty-1019 (new) [20:34:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:35:01] hashar: Yeah. I use ping myself instead. But either way, it's nice to verify [20:35:11] sometimes labs screws it up, or it's unreachable, or I copy to the wrong tab in my browser :D [20:35:25] (03CR) 10Paladox: "But I doint belive it to be that bug. I compare the db code to the other and the db install is loaded after extension but generic is loade" [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [20:38:27] !log cancelling mediawiki-core-jsduck-publish and mediawiki-core-doxygen-publish jobs manually. They will catch up on next merge [20:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:38:38] 10Continuous-Integration-Infrastructure: Consider increasing number of trusty CI slaves - https://phabricator.wikimedia.org/T126423#2016405 (10Krinkle) 5Open>3Resolved a:3Krinkle @hashar and I just created half a dozen Trusty slaves to supply for the new demand of php55/trusty jobs. [20:40:07] hashar: https://phabricator.wikimedia.org/T94715 [20:40:09] :) [20:40:25] 10Continuous-Integration-Infrastructure: Consider increasing number of trusty CI slaves - https://phabricator.wikimedia.org/T126423#2016426 (10greg) Thank you both, seriously. [20:41:04] 10Continuous-Integration-Infrastructure: Consider increasing number of trusty CI slaves - https://phabricator.wikimedia.org/T126423#2016431 (10Paladox) [20:41:04] I have this bad habit of saying "seriously" when I mean "really" or "like, a lot dude" [20:41:10] ah [20:41:18] so folks now change everything to 55 [20:41:19] RECOVERY - Puppet failure on integration-slave-trusty-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [20:41:24] include mass changes of array() ---> [] [20:41:29] heh [20:41:33] which is going to cause a lot of nightmare to backport patches [20:41:35] breaks git blame [20:41:36] etc [20:41:37] ... [20:41:48] hashar: I think we agreed not to do that as mass change [20:41:51] if it aint broke, dont fix it! should be our motto [20:41:51] but we'll see [20:42:07] ex: https://gerrit.wikimedia.org/r/#/c/269745/ [20:45:57] * MaxSem bites hashar [20:48:27] PROBLEM - Puppet failure on integration-slave-trusty-1021 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [20:48:41] MaxSem: :D [20:49:02] 10Continuous-Integration-Config, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016466 (10MaxSem) That thing is contained in `texlive-lang-greek` which is present in production but not CI, apparently. From a cursory glance in Puppet, it should be there via `texlive-lang-all` - ca... [20:49:15] hey, can ppl look ^^^ plz? [20:51:09] hashar: Hello! We we're seeing errors this morning on a job that nearly never fails. I think it was an issue with integration-slave-precise-1004 specifically. Example error: https://integration.wikimedia.org/ci/job/tox-flake8/12544/console . Now no tests are being executed for new patches, https://gerrit.wikimedia.org/r/#/c/269548/ [20:51:25] 10Continuous-Integration-Config, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016476 (10Paladox) https://github.com/search?l=&q=texlive-lang-all+user%3Awikimedia&ref=advsearch&type=Code&utf8=%E2%9C%93 [20:51:44] MaxSem: done [20:51:44] 10Continuous-Integration-Config, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016477 (10hashar) CI slaves are provisioned by puppet using role::ci::slaves::labs which includes `mediawiki::packages`. That should chip the same stuff the Wikimedia MediaWiki application servers are... [20:51:56] niedzielski: Could you recheck please. [20:52:00] 10Continuous-Integration-Infrastructure, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016478 (10hashar) p:5High>3Normal [20:53:03] hashar, but can you check for the actual presence of this package on slaves? [20:53:23] paladox: hm, doesn't seem to be getting picked up https://integration.wikimedia.org/ci/job/apps-android-wikipedia-test/ [20:54:53] 10Continuous-Integration-Infrastructure, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016486 (10hashar) Looking for `texlive-lang-greek` Precise: ``` integration-slave-precise-1012.integration.eqiad.wmflabs: texlive-lang-greek: Installed: 2009-3 Candidate: 2009... [20:55:11] MaxSem: yeah did and paste result on task. The texlive-lang-greek package is around [20:55:35] thanks! [20:55:59] MaxSem: the php55 jobs run on Trusty, so maybe the texlive package doesn't have the \Foobar that are reported missing [20:56:48] niedzielski: Hum yes. [20:56:59] so mebbe broken/incompatible package [20:57:26] niedzielski: Im not sure why it is not packaging. [20:57:59] integration-slave-trusty-1010 ... Notice: Finished catalog run in 1664.45 seconds [20:58:02] that takes a while [20:58:26] RECOVERY - Puppet failure on integration-slave-trusty-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:27] Krinkle: got two slaves for which the initial puppet run took 1664 and 1668 seconds. That is consistent [21:00:11] PROBLEM - Puppet failure on integration-slave-trusty-1022 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [21:00:25] PROBLEM - Puppet failure on integration-slave-trusty-1010 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [21:00:31] mmm, the lag approaches 2 hours even though the graph looks much better [21:01:12] 10Continuous-Integration-Infrastructure, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016506 (10JanZerebecki) Both `texlive-lang-greek` and `texlive-lang-all` are installed on both the trusty (php55) and precise (php53) slaves. [21:02:23] 10Continuous-Integration-Infrastructure, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016520 (10Paladox) Could it be because maybe math is incompatible with 2013 version and is compatible with 2009 version which means math needs a compatibility update to support 2013. [21:02:43] PROBLEM - Puppet failure on integration-slave-trusty-1009 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:03:09] ahem, hashar (i guess?) - do you know why beta puppetmaster has tons of weird patches different from production? [21:03:27] 10Continuous-Integration-Infrastructure, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016545 (10Physikerwelt) on vagrant we have ``` 'ocaml-native-compilers', 'texlive', 'texlive-bibtex-extra', 'texlive-font-utils', '... [21:03:34] yurik: your guess? [21:03:52] hashar, i guess you might know :) [21:04:06] hashar the all mighty [21:04:09] yurik: maybe you can look at the patch and see what they do :D [21:04:47] yurik: mostly it is either because A) we play test on beta before having them pushed to prod / reviewed by ops B) we need puppet change right now ™, i.e. without having to beg/annoy ops to merge a change [21:04:57] after looking at the actual patches for clues, next asking thciprian.i is probably a good choice [21:05:08] yurik: and there is a few ones prefixed [LOCAL] which are really only for beta and dont make sense in puppet.git [21:05:32] hashar, ah, so it might be totally ok to have a big list - i was following the instructions on how to test my puppet patch [21:05:35] wasn't sure [21:05:38] thanks for expalining [21:05:49] thx greg-g [21:06:20] hashar: email sent to wikitech-l re slowness [21:06:22] jzerebecki hashar how would we ignore the composer-package-php53 for repo HtmlFormatter https://gerrit.wikimedia.org/r/#/c/267391/ [21:06:24] paladox: should i open a bug to track this issue? we can't merge anything at the moment [21:07:17] niedzielski: Yes since there is quite high load on zuul which krinkle and hashar are trying to fix. The load is going down but is still very high. [21:07:49] paladox: ok cool, thanks! [21:07:51] greg-g: awesome thank you [21:08:12] !log pooling trusty slaves 1009, 1010, 1021, 1022 with 2 executors (they are ci.medium) [21:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:09:22] hi [21:09:22] niedzielski: An email has been sent out by greg-g to explain the slowness. But nothing is broken it will just be slow testing. [21:09:25] is it really necessary to re-run all the jenkins jobs for a simple rebase? we've been trying to merge https://gerrit.wikimedia.org/r/#/c/262742/ for a long time but every time it gets tested, something else merges and resets the verified status [21:09:28] hashar: ^ [21:10:01] paladox: oh ok. well i'll hold off on the bug then [21:10:15] and *rebase resets the verified status, then back to the end of the queue, rinse and repeat [21:10:32] so what's broken? :) [21:10:55] niedzielski: I doint think the test your running would be affected by the high load it will just be slow testing. I would file a bug because php55 is causing the high load, [21:10:56] 10Continuous-Integration-Infrastructure, 10Math: Math test fail for php55 - https://phabricator.wikimedia.org/T126422#2016586 (10JanZerebecki) ``` integration-slave-trusty-1012:~$ dpkg -s 'ocaml-native-compilers' 'texlive' 'texlive-bibtex-extra' 'texlive-font-utils' 'texlive-fonts-extra' 'texlive-lang-croatian... [21:10:57] legoktm: the world? [21:10:58] twentyafterfour: just merge it? ops puppet doesn't have jenkins submit [21:11:07] legoktm: I can't merge it... [21:11:14] I mean, have someone from ops do it. [21:11:21] legoktm: composer is broke for extensions using the extension registration system please see https://integration.wikimedia.org/ci/job/php-composer-test/30659/console and https://integration.wikimedia.org/ci/job/composer-php55/13/console please. [21:11:32] legoktm: nothing, just overloaded [21:11:36] legoktm: apergos has been doing the rebase dance for a while [21:12:12] paladox: you need to remove the PHP entrypoint from the autoload [21:12:14] 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing or not running - https://phabricator.wikimedia.org/T126532#2016587 (10Niedzielski) 3NEW [21:12:34] legoktm: Oh woops. [21:12:36] paladox: okey dokey. i've posted a bug here https://phabricator.wikimedia.org/T126532 [21:12:44] twentyafterfour: yeah a rebase gotta be retested. You never know what has landed in between. [21:13:00] niedzielski: Thanks. [21:13:13] twentyafterfour: the CI got some issue and had exhausted all its capacity. It prioritize changes in gate-and-submit over the one in test pipeline [21:13:43] hashar: there is still a gate-and-submit job that runs after the merge, right? [21:13:47] twentyafterfour: and since operations/puppet is fast forward only, you get to rebase, that readied it to test pipeline and thus it sit idling for a change to run (because gate takes all resources) [21:13:59] 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing or not running - https://phabricator.wikimedia.org/T126532#2016596 (10greg) Regarding tests "not being executed": that's actually not true (thankfully), things are just really slow (sadly) due to the php5.3 -> 5.5 change.... [21:14:02] krinkle and I have added a few more slaves [21:14:25] hashar: what's the use of re-test if gate will catch the error (assuming there is one, which there isn't) [21:14:42] twentyafterfour: Jenkins/Zuul doesn't have submit rights on puppet.git so there is no gate-and-submit [21:14:48] 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2016605 (10greg) [21:14:49] ohhh [21:15:07] twentyafterfour: I am not sure why ops switched to fast forward only merge strategy [21:15:08] hashar: I see, I didn't realize that [21:15:12] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2016587 (10greg) [21:15:21] RECOVERY - Puppet failure on integration-slave-trusty-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [21:15:28] twentyafterfour: maybe to avoid stuff breaking when changes are merged in between (since they dont have gate-and-submit [21:15:30] niedzielski: Could you recheck your patch now since the load is almost fully gone now. [21:15:36] twentyafterfour: yeah it is all very scary :-D [21:16:06] twentyafterfour: maybe we can have puppet.git tests to run in their own pipeline with a high priority [21:16:29] paladox: seems to have started which is great :) [21:16:44] legoktm hashar why does it say php 5.6.99 when no such php exist. Shoulden it be php 5.5. See bottom of https://integration.wikimedia.org/ci/job/composer-php55/174/console please. [21:16:59] !log CI dust have settled. Krinkle and I have pooled a lot more Trusty slaves to accommodate for the overload caused by switching to php55 (jobs run on Trusty) [21:17:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:17:06] Krinkle: thank you very much to have pooled some more new instances [21:17:26] hashar: are the new slaves set up now? I need to re-apply https://gerrit.wikimedia.org/r/#/c/269370/ to the puppetmaster [21:17:28] niedzielski: Ok i hope it works this time for you. [21:17:38] Krinkle: see my comment on https://gerrit.wikimedia.org/r/#/c/269370/ [21:17:48] paladox: working on it :) [21:18:08] legoktm: Please don't apply it [21:18:08] legoktm: thanks. Im not sure where php 5.6.99 came from. [21:18:11] paladox greg-g hashar Krinkle: seems to have worked. yay! \o/ [21:18:14] legoktm: The loop is irrelevant [21:18:17] the puppet code doesn't work [21:18:27] Yes it does [21:18:35] It just doesn't work while provisioning new slaves, as you discovered [21:18:46] Because the slave-scripts repo hasn't been cloned yet [21:19:06] niedzielski: Ok. [21:19:12] Right now all the composer-php55 jobs are actually running HHVM, not PHP5.5 because that patch is missing [21:19:40] legoktm: yup they are [21:19:47] legoktm: OK. I'm done. Just look at https://phabricator.wikimedia.org/T126498 - do as you see fit. I'm off now [21:20:11] RECOVERY - Puppet failure on integration-slave-trusty-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [21:20:17] Krinkle: thank you again ! [21:20:22] Probably need to add a dependency on the slave-scripts [21:20:33] Krinkle: will do, thanks [21:20:47] legoktm: I think I removed your puppet patch because of the death loop [21:20:51] wasn't sure what to revert/remove [21:21:07] legoktm: oh actually I have commented on the patch "removed from puppet master. That is a death loop of doom :(" [21:21:16] 10Continuous-Integration-Infrastructure, 7Regression: Puppet failure: "Exec[install_alternative_php] path slave-scripts/bin/php doesn't exist" - https://phabricator.wikimedia.org/T126498#2016657 (10Legoktm) Caused by https://gerrit.wikimedia.org/r/#/c/269370/, we probably need a dependency on the cloning of sl... [21:21:23] 10Continuous-Integration-Infrastructure, 7Regression: Puppet failure: "Exec[install_alternative_php] path slave-scripts/bin/php doesn't exist" - https://phabricator.wikimedia.org/T126498#2016659 (10Legoktm) a:3Legoktm [21:21:29] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2016660 (10Paladox) @Niedzielski should we close this since you said it passed now. [21:22:07] hashar: yeah, I re-applied it after fixing it, just didn't comment back on the patch :/ [21:22:26] legoktm: another thing is that I also had to salt a git pull of /srv/deployment/integration/slave-scripts iirc [21:22:42] RECOVERY - Puppet failure on integration-slave-trusty-1009 is OK: OK: Less than 1.00% above the threshold [0.0] [21:22:46] !log cherry-picking https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster again [21:22:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:23:05] legoktm: Could i add you task https://phabricator.wikimedia.org/T126519 please. [21:23:07] 10Continuous-Integration-Infrastructure: Consider increasing number of trusty CI slaves - https://phabricator.wikimedia.org/T126423#2016667 (10greg) Visual representation of what happened: {F3330913} and {F3330915} [21:23:13] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2016668 (10Niedzielski) 5Open>3Resolved a:3Niedzielski @Paladox, I haven't seen the tests run on the 1004 server yet but I'll op... [21:24:16] thcipriani: ostriches: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ :( [21:24:29] (I haven't look at all yet) [21:25:11] greg-g: oh good. [21:25:21] oh speaking of beta-scap-eqiad [21:25:23] bastion is kind of messed up right now. [21:25:29] https://phabricator.wikimedia.org/P2588 [21:25:58] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2016682 (10Paladox) @Niedzielski oh so the tests are meant to run on 1004 server. I'm not sure if the tests are being moved around ser... [21:26:04] maybe we will want to create a new instance with more CPUs and with Jessie [21:26:14] deployment-bastion is Precise and only 4 cpu iirc [21:26:42] hashar: yeah, that might be the best thing. /var is constantly full (which is why puppet is failing there right now) [21:26:51] eeeew MediaWiki 1.27 needs PHP 5.5.9 or higher. [21:27:29] there is a puppet class to add the very basic stuff needed to make it a Jenkins slave [21:27:33] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2016693 (10Niedzielski) @Paladox, they aren't tied to 1004 but can execute on there. Based on the past few failures, it seemed to be s... [21:27:47] maybe some other classes are needed. Then the Jenkins jobs are tied to deployment-bastion with the usual node: stanza [21:30:13] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2016700 (10Paladox) @Niedzielski did you notice in the last 3 days that it wasent running on 1004. It may be because of php 5.5 migrat... [21:30:58] hashar: seems like all production deployment-hosts are on trusty [21:31:07] tin and mira [21:31:08] thcipriani: yeah mostly [21:31:15] err [21:31:28] ah yeah we have Trusty on production. Because of HHVM I guess [21:31:43] ops went for Jessie after we already started migrating to Trusty [21:31:48] hashar: Is php5.5 being run on https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ [21:31:59] so we end up with Precise/Trusty/Jessie. Precise hosts are being moved straight to Jessie though [21:32:21] paladox: nop. Tyler was mentioning it earlier. It is Precise and there is Zend 5.3 / no hhvm on Precise [21:32:32] paladox: so we need a new instance and migrate [21:32:33] PHP 5.3.10 [21:33:25] hashar: Yes. Would we be doing that today or tomaror. [21:33:43] so we should move to trusty to match prod, then make the move to jessie once they've gone through the update again. [21:34:04] just to avoid any weird issues with puppet. [21:34:12] thcipriani: Yes. [21:34:17] making a ticket for it now, just confiming. [21:34:20] *confirming [21:34:41] thcipriani: yup and hhvm is not available on Jessie [21:35:25] thcipriani: make sure to get an instance with a bunch of CPU to speed up rebuild CDB thing. Eventually we can have a custom flavor of image by asking labs ops (i.e. 8 cpu / 4GB ram / 50 gb disk) or whatever [21:35:33] hashar: Is Jessie newer then trusty. [21:35:38] I am not sure how CPU are provisioned, in openstack though [21:35:50] 10Beta-Cluster-Infrastructure: rebuild deployment-bastion on trusty - https://phabricator.wikimedia.org/T126537#2016744 (10thcipriani) 3NEW [21:36:04] paladox: Ubuntu is based on Debian and has its own release: Lucid -> Precise -> Trusty ... [21:36:12] hashar: Maybe by migrating it will reduce the liklyness of it keep failing every so often. [21:36:20] paladox: Debian is the base and has different release name. The current stable one is nicknamed Jessie [21:36:27] hashar: Oh thanks. [21:37:01] the beta-scap-eqiad job usually fail because it reach the 30 minutes timeout while regenerating the localization files (that takes a while) . Part of the reason is it only uses two CPUs [21:37:29] greg-g: I need to do a messy OpenStack migration that will cause a substantial wikitech outage — the content will be visible but no logins for maybe 90-120 minutes. [21:37:34] hashar: Should we migrate zuul to trusty. [21:37:42] Optimistically, we might be ready to go on Friday; will that ruin anyone’s day that you know of? [21:38:04] paladox: nope to Jessie. There is a task floating around for that but not a prio [21:38:13] hashar: Ok. [21:39:14] andrewbogott: if the OpenStack API is not reachable, nodepool would not be able to replenish it is pool of instances and thus CI jobs will idle / wait [21:41:34] hashar: it’ll be weirder than that… keystone will temporarily forget about all of its projects, and then re-learn about them gradually. [21:41:43] So yeah, it might break nodepool :( [21:42:20] hashar: is there any ‘good’ time for that to happen? [21:42:42] andrewbogott: then we dont deploy on friday so that has little impact. But puppet.git and mediawiki-config changes might end up not having jobs reported to them [21:43:11] I guess if it is announced ahead of time, people can wait a couple hours for changes to land [21:43:38] or maybe you can trick keystone to relearn about contintcloud first [21:44:02] Is there any reason to explicitly set nodepool to downtime? Or will it just wait patiently as long as the api returns ‘forbidden’? [21:45:53] andrewbogott: friday is great [21:46:03] andrewbogott: I am pretty sure nodepool loops till the api is back [21:46:05] andrewbogott: but I haven't read what antoine's saying, I'm in a 1:1 [21:46:27] andrewbogott: last time we had a labs outage on a sunday, I havent had to do anything with nodepool it just caught up something [21:46:29] somehow [21:46:50] I guess it really does: while True: try: domystuff except: pass [21:46:58] hashar legoktm: composer-hhvm fails now https://integration.wikimedia.org/ci/job/composer-hhvm/187/console [21:47:23] uhmmm [21:47:23] 21:45:29 Assertion failure: tl_base != ((void *) -1) [21:47:23] 21:45:29 Failed to mmap persistent RDS region. errno = Cannot allocate memory [21:47:24] 21:45:29 [21:47:24] 21:45:29 /tmp/buildd/hhvm-3.6.5+dfsg1/hphp/runtime/base/rds.cpp:435: void HPHP::rds::threadInit(): assertion `tl_base != ((void *) -1)' failed. [21:47:26] > errno = Cannot allocate memory [21:47:29] it ran out of memory? [21:47:38] legoktm: Oh. [21:48:12] which is fishy [21:48:23] paladox: try recheck and see if it works? [21:48:42] legoktm: Ok trying now. [21:50:17] https://integration.wikimedia.org/ci/job/mwext-Echo-testextension-php55/18/console on the same slave also had OOM issues [21:50:27] grmbmbl [21:50:33] so the slaves do not have enough memory [21:50:39] they are ci.medium with 2GB [21:50:49] https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/50030/console different slave, but segfaulted [21:51:07] 21:44:50 /tmp/hudson3001619205221190848.sh: line 9: 1657 Segmentation fault zuul-cloner --color --verbose --map /srv/deployment/integration/slave-scripts/etc/zuul-clonemap.yaml --workspace src https://gerrit.wikimedia.org/r/p $(cat deps.txt) [21:52:49] legoktm: Now passes hhvm thanks.https://integration.wikimedia.org/ci/job/composer-php55/190/console [21:53:15] legoktm: https://gerrit.wikimedia.org/r/#/c/269815/ [21:55:49] andrewbogott: so i think it will be fine for nodepool. Just have to announce the jobs running on nodepool slaves will end up idling for a while [21:56:54] hashar: https://gerrit.wikimedia.org/r/#/c/269815/ [21:57:02] looking [21:57:10] legoktm: integration-slave-trusty-1009 memory looks fine https://grafana.wikimedia.org/dashboard/db/labs-project-board?panelId=17&fullscreen&from=1455130608565&to=1455141108566&var-project=integration&var-server=integration-slave-trusty-1009 [21:57:16] legoktm: but them it might be a spike [21:58:10] paladox: oh good catch, we never re-added those jobs to jjb. I'll amend and merge [21:59:06] legoktm: Ok thanks. [21:59:15] hashar: ok, thanks [22:00:20] hashar: That is alot of cached data. [22:01:06] I found the memory issue [22:01:23] (03PS3) 10Legoktm: Add php55 to phpflavour in mwext-testextension-generic [integration/config] - 10https://gerrit.wikimedia.org/r/269815 (owner: 10Paladox) [22:02:19] (03CR) 10Paladox: [C: 031] "Thanks @Legoktm." [integration/config] - 10https://gerrit.wikimedia.org/r/269815 (owner: 10Paladox) [22:03:08] hashar: Oh where. [22:03:38] hmm actually no [22:03:47] Our current hhvm version only has 3 weeks more support from FB [22:03:47] got confused by virtual vs resident size [22:04:04] Reedy: please fill a task for ops :-} [22:04:09] I think there's numerous [22:04:18] (03CR) 10Legoktm: [C: 032] "Deploying" [integration/config] - 10https://gerrit.wikimedia.org/r/269815 (owner: 10Paladox) [22:04:26] If Joe, ori et al don't know, I shall be slapping them soon ;) [22:04:35] (03CR) 10Paladox: "Thanks." [integration/config] - 10https://gerrit.wikimedia.org/r/269815 (owner: 10Paladox) [22:04:35] I think we're waiting for the next (point) release [22:05:40] hashar: I tried to add debug logic to IPSet and that weirded composer out: https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/50028/consoleFull [22:05:58] 21:43:50 wikimedia/ip-set: 1.0.1 installed, sandbox/tgr/loop-debug required. [22:06:01] 21:43:50 Error: your composer.lock file is not up to date. Run "composer update" to install newer dependencies [22:06:17] is composer.lock committed somewhere? [22:06:17] legoktm: I think we need to force all the repos to be re tested after the patch is merged to make sure it passes. [22:06:53] tgr: mediawiki/vendor [22:07:11] I thought we don't use that for CI [22:07:17] well I am off [22:07:20] gotta sleep now [22:07:27] good night :) [22:07:27] will catch up with madness if any tomorrow morning [22:07:30] \O/ [22:07:34] (03Merged) 10jenkins-bot: Add php55 to phpflavour in mwext-testextension-generic [integration/config] - 10https://gerrit.wikimedia.org/r/269815 (owner: 10Paladox) [22:07:54] tgr: we do, I think there's an experimental job that actually uses composer [22:12:00] 10Beta-Cluster-Infrastructure: rebuild deployment-bastion on trusty - https://phabricator.wikimedia.org/T126537#2016874 (10thcipriani) p:5Triage>3High This is also causing breakage for the beta-scap-eqiad-job https://phabricator.wikimedia.org/P2588 [22:14:39] paladox: the jobs were running, they just weren't in jjb, so there's no point re-testing everything [22:15:17] legoktm: Oh ok. What would happen if they wernt in jjb. [22:16:26] PROBLEM - Host cache-rsync is DOWN: CRITICAL - Host Unreachable (10.68.23.165) [22:17:18] legoktm: PHP Warning: popen(/usr/bin/diff -u -w '/mnt/home/jenkins-deploy/tmpfs/jenkins-1/merge-old-7lSOL7' '/mnt/home/jenkins-deploy/tmpfs/jenkins-1/merge-your-nfK0gW',r): Cannot allocate memory in /mnt/jenkins-workspace/workspace/mediawiki-extensions-php55/src/includes/GlobalFunctions.php on line 2827 [22:17:28] hashar_ ^ [22:17:36] https://integration.wikimedia.org/ci/job/mediawiki-extensions-php55/339/console [22:18:40] uh [22:18:46] legoktm hashar seems https://integration.wikimedia.org/zuul/ has gone down. [22:18:58] works for me? [22:19:24] legoktm: I mean the tests arent being tested they show grey. [22:20:30] legoktm: With mediawiki/extensions/MobileFrontend in gate and submit it shows that mwext-MobileFrontend-qunit is not being tested but it is. It is being tested at https://integration.wikimedia.org/ci/job/mwext-MobileFrontend-qunit/17699/console [22:20:34] lets wait a little bit, I'm still deploying new jobs [22:21:16] legoktm: Oh ok [22:21:17] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2016911 (10Legoktm) 3NEW [22:21:37] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2016919 (10Paladox) [22:21:47] (03PS1) 10MaxSem: Don't run 5.3 tests for HtmlFormatter [integration/config] - 10https://gerrit.wikimedia.org/r/269853 [22:25:35] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2016938 (10hashar) Looks similar to {T110506} :-( [22:29:03] 10Continuous-Integration-Infrastructure, 6operations: Provide lint for yaml files in operations repository - https://phabricator.wikimedia.org/T91496#2016943 (10ori) 5declined>3Open https://gerrit.wikimedia.org/r/#/c/269849/ just broke Puppet everywhere (because of invalid YAML in eqiad.yaml -- see the bot... [22:30:10] legoktm: Seems Zuul doesn't trigger on mediawiki-config anymore [22:30:22] I'm considering a restart of Zuul given there are many new slaves registered [22:30:24] maybe it got confused [22:30:32] > Queue lengths: 65 events, 0 results. [22:30:40] maybe gearman got stuck? [22:30:50] many idle executors [22:31:06] everythin on https://integration.wikimedia.org/zuul/ is still queued [22:31:17] 2016-02-10 22:27:10,659 ERROR gear.Client.unknown: Connection timed out waiting for a response to a submit job request: [22:31:22] !log Full restart of Zuul. Seems Gearman/Zuul got stuck. All executors were idling. No new Gerrit events processed either. [22:31:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:33:28] (03CR) 10Paladox: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/269853 (owner: 10MaxSem) [22:33:44] OK. Zuul is now picking up mediawiki-config cange [22:33:49] but still queued against Jenkins [22:34:27] !log Zuul is back up and procesing Gerrit events, but jobs are still queued indefinitely. Jenkins is not accepting new jobs [22:34:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:34:53] Will try a Gearman plugin restart [22:35:19] I'm still deploying ~100 jobs with jjb if that could be related [22:35:37] 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7WorkType-Maintenance: Fix tox installation on new Precise slaves - https://phabricator.wikimedia.org/T110506#2016984 (10hashar) Since the dependencies are crazy, one wants to run: pip install --upgrade setuptools [22:35:47] legoktm: It seems that Slave ci-jessie-wikimedia-29436 has gone offline https://integration.wikimedia.org/ci/computer/ci-jessie-wikimedia-29436/ Disconnected by anonymous : Offline due to Gearman request [22:35:52] errr, "429" apparently >.> [22:36:54] legoktm: Krinkle do we need to create more executors due to that memory limit issue? [22:36:57] paladox: That's normal. ci-jessie* slaves are re-created for one job only. [22:37:06] greg-g: No, this is something else. [22:37:09] Krinkle: Oh ok thanks. [22:37:30] Krinkle: I know what you're working on right now is, but https://phabricator.wikimedia.org/T126545 [22:37:33] Seems like one of the Zuul/Gearman/Jenkins instabilities is hitting again. This should seriously be investigated. It's taking down Zuul like once every two weeks [22:37:46] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 3Wikipedia-Android-App: Wikipedia Android CI tests are failing - https://phabricator.wikimedia.org/T126532#2017005 (10hashar) That is definitely the same as T110506. There is a puppet patch but it does not properly --upgrade setuptools. Gahh... [22:37:48] * greg-g nods [22:38:04] hashar_: zuul/gearman issues ^ [22:38:35] eeeke [22:40:08] (my jjb deploy finished) [22:40:22] might be related [22:40:28] 2016-02-10 22:40:22,657 DEBUG gear.Server: Polling 6 connections [22:40:36] from /var/log/zuul/gearman-server-debug.log [22:40:53] so Jenkins is no more connected to Gearman blblbl [22:41:33] * hashar_ and zuul-gearman.py status reports jobs have 0 workers [22:42:08] well [22:42:25] legoktm: easy fix https://integration.wikimedia.org/ci/configure Gearman is disabled :-} [22:42:33] ...what [22:43:24] well, things are running now [22:43:35] https://graphite.wikimedia.org/render/?from=-2hours&height=600&width=800&target=alias(color(zuul.geard.queue.running,'blue'),'Running')&target=alias(color(zuul.geard.queue.waiting,'red'),'Waiting')&target=alias(color(zuul.geard.queue.total,'888888'),'Total')&title=Zuul%20Geard%20job%20queue%20(8%20hours) [22:43:37] grr [22:44:13] something weird happened around 22:15 [22:44:22] maybe the gearman server got overloaded somehow and Jenkins disconnected itself for good [22:44:25] never seen that myself [22:44:30] (03CR) 10Legoktm: [C: 04-1] "Let's just modify the pipeline to not run php53 tests. I'll amend." [integration/config] - 10https://gerrit.wikimedia.org/r/269853 (owner: 10MaxSem) [22:46:18] java.lang.RuntimeException: Could not create rootDir /var/lib/jenkins/config-history/config/2016-02-10_22-36-03 [22:46:18] eeek [22:47:10] (03PS2) 10Legoktm: Don't run 5.3 tests for HtmlFormatter [integration/config] - 10https://gerrit.wikimedia.org/r/269853 (owner: 10MaxSem) [22:47:48] (03CR) 10Legoktm: [C: 031] "Going to wait a bit for CI to settle before deploying" [integration/config] - 10https://gerrit.wikimedia.org/r/269853 (owner: 10MaxSem) [22:47:56] does anyone know what deployment-tin.deployment-prep.eqiad.wmflabs is? it's a jessie instance, spawned in Nov. 2015. [22:48:00] 10Continuous-Integration-Infrastructure, 7Jenkins: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2017065 (10hashar) 3NEW [22:48:23] thcipriani: not me [22:48:31] I was going to use the deployment-tin name for the new bastion instance, but realized it was taken... [22:49:47] kk, looks deployment-tin is not configured as anything. I'm going to kill it. [22:50:25] thcipriani: only mention in SAL is from 2014: https://tools.wmflabs.org/sal/log/AU8UJaLi6snAnmqnLieW [22:50:36] +1 to killing [22:51:06] greg-g: kk, thanks [22:51:19] +1 [22:51:31] was probably setup to try scap3 deployment under jessie [22:51:34] or whoever knows [22:51:42] (03PS2) 10Legoktm: Add Generic.Arrays.DisallowLongArraySyntax to ruleset, autofix this repo [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/269612 [22:51:56] legoktm: If we didnt do those changes in jjb for composer what differences would that make. [22:52:20] 10Deployment-Systems, 6Performance-Team, 10Traffic, 6operations, 5Patch-For-Review: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime - https://phabricator.wikimedia.org/T99096#2017086 (10Krinkle) [22:52:31] paladox: which changes? [22:52:53] legoktm: This one https://gerrit.wikimedia.org/r/#/c/269815/ [22:53:04] what differences would it make. [22:54:07] PROBLEM - Host deployment-tin is DOWN: CRITICAL - Host Unreachable (10.68.20.251) [22:54:56] paladox: currently none, but if someone changed the job definition, the php55 ones wouldn't be updated. Also all of our job definitions should be stored in jjb so we can easily re-create them and view diffs, not just in jenkins xml [22:55:24] legoktm: Oh ok thanks for explaning and replying. [22:55:32] !log gallium: find /var/lib/jenkins/config-history/config -type f -wholename '*/2015*' -delete ( https://phabricator.wikimedia.org/T126552 ) [22:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:59:04] legoktm: That fixed the problem for composer using the wrong verison. See https://integration.wikimedia.org/ci/job/composer-php55/201/console [22:59:13] (03CR) 10jenkins-bot: [V: 04-1] Add Generic.Arrays.DisallowLongArraySyntax to ruleset, autofix this repo [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/269612 (owner: 10Legoktm) [22:59:20] It now uses php 5.5 and not hhvm and php 5.6 [22:59:28] wonderful! [23:00:05] (03PS3) 10Legoktm: Add Generic.Arrays.DisallowLongArraySyntax to ruleset, autofix this repo [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/269612 [23:00:24] legoktm: But why is it saying php 5.6.99 on https://integration.wikimedia.org/ci/job/composer-hhvm/202/console where is php 5.6 comming from since php 5.6.99 there is no version from php called that. [23:01:03] HHVM pretends to be PHP 5.6.99 [23:01:11] since it matches 5.6 compatability [23:01:17] legoktm: Oh ok. [23:01:32] 10Continuous-Integration-Infrastructure, 7Jenkins: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2017109 (10hashar) a:3hashar On gallium: `find /var/lib/jenkins/config-history/config -type f -wholename '*/2015*' -delete`... [23:03:37] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017120 (10greg) So... do we need to recreate those 6 new executers that Timo and Antoine made today? [23:03:45] (03PS1) 10Paladox: [FundraisingEmailUnsubscribe] Switch extension-jslint to jshint and jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/269865 [23:04:36] jenking-bot doesn't auto-merge after V+2, e.g. https://gerrit.wikimedia.org/r/#/c/244632/ [23:05:28] (03CR) 10Paladox: "jslint passes here https://integration.wikimedia.org/ci/job/mwext-FundraisingEmailUnsubscribe-jslint/25/console https://gerrit.wikimedia.o" [integration/config] - 10https://gerrit.wikimedia.org/r/269865 (owner: 10Paladox) [23:05:54] MaxSem: zuul is running behind [23:06:01] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017129 (10Krinkle) >>! In T126545#2017120, @greg wrote: > So... do we need to recreate those 6 new executers that Timo and Antoine made today? Just talked on IRC with @hashar. I made the... [23:06:39] RECOVERY - Host deployment-tin is UP: PING OK - Packet loss = 0%, RTA = 0.98 ms [23:07:20] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017133 (10greg) +1 thank you (and then we can create more nodes if that reduction in slots is too much) [23:07:55] hashar_ legoktm i carnt rember what the blocker was for upgrading phpunit to 4 but now we can update to 5. But was it because we were using php 5.3. [23:08:05] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017134 (10hashar) The new Trusty slaves have been created with ci.medium flavor which has 2 CPU and 2GB RAM. They have been pooled with 2 executors. So we have 2GB RAM being shared by th... [23:08:21] paladox: no idea off hand ,but there is definitely a task about it [23:08:38] paladox: the idea is to have the mediawiki tests to be run via composer , but that doesn't work yet [23:08:52] hashar_: Yes. Oh wait i think that was one of the blockers. [23:10:03] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017136 (10hashar) No mistake @krinkle. I really thought 2GB would be enough to run two of our jobs in parallel .... :} salt/puppet etc randomly kicking in must consume what is left and... [23:10:45] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017139 (10greg) >>! In T126545#2017136, @hashar wrote: > No mistake @krinkle. I really thought 2GB would be enough to run two of our jobs in parallel .... :} salt/puppet etc randomly k... [23:11:30] PROBLEM - Puppet failure on deployment-tin is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:13:47] thcipriani: Would deployment-tin be the new https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/ [23:14:29] paladox: that's my current plan. Applying configuration, currently [23:14:48] thcipriani: Ok thanks. [23:15:47] 10Continuous-Integration-Infrastructure, 7Jenkins: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2017160 (10hashar) Same for `2014` [23:17:48] 10Continuous-Integration-Infrastructure, 7Jenkins: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2017167 (10hashar) Down to 9k entries ``` # ls -ld /var/lib/jenkins/config-history/config drwxrwsr-x 9203 ``` Now we need a b... [23:18:08] 10Continuous-Integration-Infrastructure, 7Jenkins: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2017168 (10hashar) p:5Triage>3Normal [23:24:18] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017186 (10hashar) I have changed the six slaves we have created to have only one executor. We are out of quota though: | Cores | 81/85 | <--- | RAM | 151552/204800 | ok | Instances | 29/... [23:27:36] hashar_: want me to file a task to ask for more cores/instances? [23:28:19] greg-g: thinking about numbers :-} [23:28:27] kk [23:28:55] hashar_, DEMAND ALL THE CORES [23:33:02] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017205 (10hashar) For 8 more executor slots we would need 8 `ci.medium` or: | Stuff | ci.medium | 8 of them |--|--|-- | CPU | 2 | 16 | RAM | 2GB | 16GB | Disk | 40GB | 320GB We would ne... [23:39:14] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: Bump labs quota for 'integration' project - https://phabricator.wikimedia.org/T126557#2017237 (10hashar) 3NEW [23:39:29] greg-g: filled the quota request as https://phabricator.wikimedia.org/T126557 [23:39:38] word [23:40:03] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017246 (10hashar) Quota bump asked with {T126557} [23:40:18] and it is almost 1am already [23:40:34] I know why I am not a sysadmin. It is time consuming :-} [23:41:07] no kidding [23:41:30] " Via SSH, force a puppet run (provisions the instance, takes about an hour)." [23:41:33] :) [23:41:37] g'night hashar_ [23:43:43] yeah we install a loot of packages [23:52:10] oh for god sake [23:54:27] 10Continuous-Integration-Infrastructure: CI trusty slaves running out of memory - https://phabricator.wikimedia.org/T126545#2017301 (10hashar) Additionally we have a couple tmpfs systems: | /var/lib/mysql | 256 MB | /mnt/home/jenkins-deploy/tmpfs | 512 MB That does not help for the ci.medium that only have 2GB... [23:54:44] !log depooling Trusty slaves that only have 2GB of ram that is not enough. https://phabricator.wikimedia.org/T126545 [23:54:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:56:12] :/// [23:56:46] We can move the php55 jobs only into gate [23:56:59] So they don't run on test for now [23:57:10] the ci.medium nodes are too small [23:57:19] even with 1 slot? [23:57:21] the system itself consume a bunch of ram (diamond/puppet/linux whatever) [23:57:29] and we have 768 MB of ram used for tmpfs [23:57:36] :/ [23:57:38] so there is too little left for a job [23:57:48] specially I have seen failure that are totally crazy [23:58:06] such as install.php balling out because recentchanges mysql table is gone [23:58:17] I guess when linux has too little memory it reclaims from tmpfs [23:58:21] (hearsay dont quote me)