[00:22:52] Reedy: seems to be working... [00:38:05] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:47] geez, why are the beta cluster app servers falling over so frequently recently? [00:39:35] I broke them earlier [00:39:37] then I unbroke them [00:39:44] and at the same time labs stuff was being restarted [00:39:51] no idea what's up this time though [00:41:10] seems fine to me from a quick check [00:41:37] yeah, just shinken has been complaining at least once/day this week about at least one of the hosts [00:41:50] mostly all has been fine with them, so I'm curious what's happening [00:42:06] * greg-g can ignore for now as long as no real breakages happen/the browser tests run fine [00:43:04] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39399 bytes in 7.713 second response time [00:43:19] see, I'm ok now [00:43:20] :P [00:43:31] aka greg-g is easily pleased [00:43:51] if you haven't figured that out yet, you're too slow :) [00:44:02] it was aka, not TIL ;) [00:44:45] :) [00:55:50] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 10Reading-Web, 5Patch-For-Review: Failed Jenkins job sets Sauce Labs job to passed - https://phabricator.wikimedia.org/T105589#1954351 (10Jdlrobson) @dduval https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmfl... [02:08:45] Project browsertests-Wikidata-SmokeTests-linux-firefox-sauce build #509: 04FAILURE in 51 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-SmokeTests-linux-firefox-sauce/509/ [02:34:57] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #824: 04FAILURE in 1 min 56 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/824/ [04:11:30] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1954591 (10Volker_E) +1 WFM too! Awesome, thanks all people involved! > time GIT_SSH_COMMAND="ssh -v" git clone ssh://vcs@git-ssh.wikimedia.org/... [04:26:14] 6Release-Engineering-Team, 10MediaWiki-extensions-ContentTranslation, 5ContentTranslation-Release8, 3LE-CX8-Sprint 1, and 2 others: Review and create CX Parallel corpora table - https://phabricator.wikimedia.org/T120815#1954599 (10KartikMistry) 5Open>3Resolved [05:22:03] Project browsertests-Wikidata-WikidataTests-linux-firefox-sauce build #494: 15ABORTED in 4 hr 0 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-WikidataTests-linux-firefox-sauce/494/ [05:29:02] Project browsertests-Wikidata-WikidataTests-linux-chrome-sauce build #278: 15ABORTED in 4 hr 0 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-WikidataTests-linux-chrome-sauce/278/ [07:06:28] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster - https://phabricator.wikimedia.org/T124388#1954740 (10Bugreporter) 3NEW [07:44:13] 07:35:11 Fatal error: Uncaught exception 'InvalidArgumentException' with message 'The value for 'SkipSkins' should be an array' in /mnt/jenkins-workspace/workspace/mwext-qunit/src/includes/registration/ExtensionProcessor.php:367 [07:49:31] 10Continuous-Integration-Infrastructure: Transient mwext-qunit failure: The value for 'SkipSkins' should be an array - https://phabricator.wikimedia.org/T124394#1954807 (10Nikerabbit) 3NEW [07:56:32] 6Release-Engineering-Team, 7Jenkins, 7Puppet: Jenkins jobs for puppet failing for no good reason - https://phabricator.wikimedia.org/T124395#1954817 (10Joe) 3NEW [08:06:26] 10Continuous-Integration-Infrastructure: Transient mwext-qunit failure: The value for 'SkipSkins' should be an array - https://phabricator.wikimedia.org/T124394#1954832 (10Nikerabbit) Happened again in https://integration.wikimedia.org/ci/job/mwext-qunit/10339/console ... usually when I am trying to merge a patc... [08:10:43] there are three patches stuck in mediawiki-config: https://integration.wikimedia.org/zuul/ [08:12:11] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1954847 (10Aklapper) [08:13:29] 10Continuous-Integration-Infrastructure: Transient mwext-qunit failure: The value for 'SkipSkins' should be an array - https://phabricator.wikimedia.org/T124394#1954850 (10santhosh) Not sure about the root cause, but I got the exception y'day in my local machine and went away when I took latest core and did comp... [08:44:39] 10Continuous-Integration-Infrastructure: Transient mwext-qunit failure: The value for 'SkipSkins' should be an array - https://phabricator.wikimedia.org/T124394#1954867 (10Nikerabbit) The two above were for ContentTranslation, but it also happens in Translate: https://integration.wikimedia.org/ci/job/mwext-qunit... [09:16:41] morning y'all [09:16:49] is the beta cluster update job/thing running? [09:17:08] the quicksurveys extension is currently a patch version behind [09:20:33] 10Continuous-Integration-Config, 7Puppet: Jenkins jobs for puppet failing for no good reason - https://phabricator.wikimedia.org/T124395#1954910 (10hashar) a:3hashar [09:25:10] 10Continuous-Integration-Config, 7Puppet: Jenkins jobs for puppet failing for no good reason - https://phabricator.wikimedia.org/T124395#1954914 (10hashar) That is the job https://integration.wikimedia.org/ci/job/operations-puppet-tox-py27/ which fails whenever it runs on the integration-slave-precise1011 . Th... [09:26:28] hashar: do you know about how the beta cluster gets updated? [09:26:31] and how one might poke it [09:29:30] 10Continuous-Integration-Config, 7Puppet: Jenkins jobs for puppet failing for no good reason - https://phabricator.wikimedia.org/T124395#1954916 (10hashar) p:5Unbreak!>3Normal The object is a single file and is 0 size: ``` -r--r--r-- 1 jenkins-deploy wikidev 0 Jan 21 19:32 .git/objects/0a/25119b9c0c2fb8705... [09:30:21] phuedx: good morning [09:31:27] phuedx: interestingly someone filled a bug enquiring about how the beta cluster is being updated. which is https://phabricator.wikimedia.org/T124198 [09:31:30] phuedx: the actual doc is https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated [09:31:32] or in short [09:31:51] we have a jenkins job that brute git pull mediawiki/core mediawiki/vendor and mediawiki/extensions.git every 10 minutes or so [09:32:02] but sometime it is stalled [09:32:31] !log beta cluster Jenkins job have been stalled for 9hours and 25 minutes. Disabling/reenabling the Gearman plugin to remove the deadlock [09:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:33:29] awesome docs, thanks hashar! [09:34:03] phuedx: there is bad interaction between the Jenkins scheduler that trigger the beta cluster job and the Zuul/Gearman stuff that runs jobs in reaction to Gerrit patches [09:34:21] phuedx: and somehow, the beta cluster jobs ends up being deadlocked / waiting for Null/None whatever [09:34:27] :( [09:34:35] who watches the watcher processes? [09:34:38] ;D [09:34:39] phuedx: the wikitech link is still a good read though it might deserve some update [09:34:51] end-users monitoring! [09:34:51] folks complain, we fix it :-} [09:41:22] classic ops/releng :D [09:42:16] (03PS1) 10Hashar: Migrate operations-puppet-tox-* jobs to Nodepool/Castor [integration/config] - 10https://gerrit.wikimedia.org/r/265698 (https://phabricator.wikimedia.org/T124395) [09:46:48] (03CR) 10Hashar: [C: 032] Migrate operations-puppet-tox-* jobs to Nodepool/Castor [integration/config] - 10https://gerrit.wikimedia.org/r/265698 (https://phabricator.wikimedia.org/T124395) (owner: 10Hashar) [09:48:31] (03Merged) 10jenkins-bot: Migrate operations-puppet-tox-* jobs to Nodepool/Castor [integration/config] - 10https://gerrit.wikimedia.org/r/265698 (https://phabricator.wikimedia.org/T124395) (owner: 10Hashar) [09:56:03] hashar: any news on that zuul/gearman thingemy -- i'm still seeing an out of date version of the quicksurveys extension [09:56:10] y'know what [09:56:16] i should raise a bug, shouldn't i :) [09:56:23] phuedx: ah yeah sorry [09:56:26] so if you look at https://integration.wikimedia.org/zuul/ [09:56:31] there is a section "postmerge" [09:56:44] that list jobs triggered after a change has been merged in Gerrit [09:56:53] 3 of them are deadlocked [09:57:21] and on Jenkins main page https://integration.wikimedia.org/ci/ , the build queue shows some beta cluster related jobs that are pending [09:57:29] kicking it again [09:58:50] just seen the postmerge section clear out [09:59:03] https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/89537/console [09:59:11] that is the job fetching mediawiki/extensions.git [09:59:28] that is a meta repo in Gerrit which get automatically updated by Gerrit whenever a change under mediawiki/extensions/*.git repos is merged [09:59:52] phuedx: once git pull ran, the job triggers scap the deployment tool [09:59:55] currently running at https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87181/ [10:00:24] that page shows a bunch of upstream jobs that triggered it [10:00:28] and a spam of "Started by timer" [10:00:43] so in the end, scap never run for the last 9+ hours and it is now catching up [10:01:15] now pending for the localisation cache generation which takes a long time :( I guess it will be updated in roughly 40 minutes :( [10:02:15] yowza [10:02:20] thanks for your help hashar [10:03:37] phuedx: sorry if all of that is fairly confusing :( [10:03:47] but the idea is roughly: change get merged, pull it, trigger scap [10:03:49] profit! [10:03:56] no it kinda makes sense [10:04:00] we also run update.php on all wiki database once per hour [10:04:04] lame brute force approach [10:04:09] ^ will remember that [10:04:26] so code is every 10 but db is every 60 [10:04:51] and in theory if you go to https://integration.wikimedia.org there is a link [Jenkins view of Beta] which shows all jobs running on beta [10:04:51] https://integration.wikimedia.org/ci/view/Beta/ [10:05:08] that more or less give a status about what is going on and freshness of updates [10:49:17] 5Gerrit-Migration, 10Analytics-Tech-community-metrics, 3DevRel-March-2016: Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#1955010 (10Aklapper) [10:53:10] 5Gerrit-Migration, 10Analytics-Tech-community-metrics, 6Developer-Relations, 3DevRel-March-2016: Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#1955014 (10Aklapper) [10:57:47] (03PS1) 10Hashar: Make operations-puppet-tox-pep8-jessie non voting [integration/config] - 10https://gerrit.wikimedia.org/r/265708 (https://phabricator.wikimedia.org/T124395) [10:58:31] (03CR) 10Hashar: [C: 032] Make operations-puppet-tox-pep8-jessie non voting [integration/config] - 10https://gerrit.wikimedia.org/r/265708 (https://phabricator.wikimedia.org/T124395) (owner: 10Hashar) [10:59:39] (03Merged) 10jenkins-bot: Make operations-puppet-tox-pep8-jessie non voting [integration/config] - 10https://gerrit.wikimedia.org/r/265708 (https://phabricator.wikimedia.org/T124395) (owner: 10Hashar) [11:38:40] 10Continuous-Integration-Config, 5Patch-For-Review, 7Puppet: Jenkins jobs for puppet failing for no good reason - https://phabricator.wikimedia.org/T124395#1955103 (10hashar) 5Open>3Resolved Solved by clearing out the workspace. I have then migrate the job to run on Nodepool disposable instances, i.e. t... [12:18:06] hashar: looks like beta-scap-eqiad is failing [12:18:14] (and has been failing for some 18 hrs) [12:18:33] i'll raise a bug if there isn't one already [12:18:47] eek [12:19:11] Permission denied (publickey). [12:19:11] grr [12:23:33] !log beta: reinitialized keyholder on deployment-bastion. The proxy apparently had no identity [12:23:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:23:58] phuedx: scap use a ssh agent holding the ssh identity [12:24:03] named keyholder [12:24:11] apparently it was broken somehow so I followed a bunch of step from https://wikitech.wikimedia.org/wiki/Keyholder [12:24:14] you fix bugs faster than i write tasks ;) [12:24:37] namely: ssh deployment-bastion ; keyholder restart ; keyholder arm ; *enter ssh private keys passwords from wiki page* [12:24:51] task is still worthwhile :-} [12:24:53] so now [12:25:21] on deployment-bastion "keyholder status" reports some identities [12:25:28] \o/ [12:26:01] it is building at https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/87197/console [12:26:04] we will see [12:29:14] phuedx: seems good now! [12:29:46] \o/ [12:29:49] * phuedx watches [12:32:06] Yippee, build fixed! [12:32:06] Project beta-scap-eqiad build #87197: 09FIXED in 9 min 18 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87197/ [12:36:15] woo! [12:40:27] Yippee, build fixed! [12:40:28] Project browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #150: 09FIXED in 3 min 43 sec: https://integration.wikimedia.org/ci/job/browsertests-QuickSurveys-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/150/ [12:41:12] YESSSSSSS [12:41:16] thanks for your help hashar [12:41:23] sincerely [12:41:26] ^^^^ [12:54:41] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #728: 04FAILURE in 41 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/728/ [13:10:42] 5Gerrit-Migration, 10Diffusion, 10GitHub-Mirrors, 6Repository-Admins: Have Phabricator take over replication to Github - https://phabricator.wikimedia.org/T115624#1955232 (10Paladox) I think that they are currently trying to do this to mediawiki/core but it doesn't seem to be working. mediawiki/core at gi... [13:27:52] phuedx: and magically the QuickSurveys browser test got fixed \O/ [14:21:20] hashar: but not quite, but still magic ;) :D [14:22:23] hashar: Maybe you or someone other with knowlege at jenkins can look: Why is there no verified from jenkins? https://gerrit.wikimedia.org/r/#/c/265647/ [14:48:10] Luke081515: looking [14:49:29] Luke081515: there is nothing defined in CI for labs/tools/crosswatch :( [14:50:59] Luke081515: we will want to define the test entry points for python it is tox https://www.mediawiki.org/wiki/Continuous_integration/Tutorials/Test_your_Python [14:51:16] hashar: ok, thanks [14:51:26] Luke081515: want me to do the boiler plate ? [14:51:38] tox is basically a wrapper around virtualenv [14:51:45] let you easily define a bunch of venv and commands to run into them [14:52:03] once defined, [14:52:20] we can configure CI to run the 'tox' job that clones the repo, checkout the patch and run 'tox' [14:52:32] which would spawn each env and run the command passed to it [14:52:33] eg: [14:52:42] environnement py27, commands = nosetests [14:52:50] or python setup.py test [14:52:50] or whatever [14:52:54] would be great [14:53:13] I will setup an experimental job on ci [14:53:22] so you can trigger the jenkins job by commenting in gerrit "check experimental" [14:54:13] (03PS1) 10Hashar: labs/tools/crosswatch: experimental tox job [integration/config] - 10https://gerrit.wikimedia.org/r/265729 [14:54:42] Luke081515: ^^^ :-} [14:55:29] (03CR) 10Luke081515: [C: 031] "thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/265729 (owner: 10Hashar) [15:04:07] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:04:15] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:04:27] Luke081515: the zuul-layout-diff output shows the repo is added with a job :D [15:04:28] https://integration.wikimedia.org/ci/job/integration-zuul-layoutdiff/7265/console [15:04:32] deploying that [15:04:37] (03CR) 10Hashar: [C: 032] labs/tools/crosswatch: experimental tox job [integration/config] - 10https://gerrit.wikimedia.org/r/265729 (owner: 10Hashar) [15:05:48] (03Merged) 10jenkins-bot: labs/tools/crosswatch: experimental tox job [integration/config] - 10https://gerrit.wikimedia.org/r/265729 (owner: 10Hashar) [15:09:07] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39854 bytes in 0.419 second response time [15:13:05] * hashar Luke081515: I am creating some basic template :-) [15:13:05] ok :) [15:17:37] Luke081515: https://gerrit.wikimedia.org/r/#/c/265735/ if you comment "check experimental" that will trigger the job [15:18:26] Luke081515: we have another jenkins job that invokes npm [15:18:37] hmm really: npm install && npm test [15:18:56] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39540 bytes in 0.643 second response time [15:19:15] Luke081515: that can be used to run the i18n lint checker https://www.npmjs.com/package/grunt-banana-checker :} [15:20:54] or chain to bower/gulp [15:21:11] the idea is CI is rather dump [15:21:38] ok, thanks. I sthis now active, or is the review of https://gerrit.wikimedia.org/r/#/c/265735/ needed first? [15:25:31] works, but build failed ^^ [15:26:20] Luke081515: so flake8 runs the linter utilities pep8 and pyflakes [15:26:27] and yeah it reports bunch of errors [15:26:50] some you might want to ignore, in this case in /tox.ini you would add a section: [15:26:50] [flake8] [15:26:54] ignore = W241,E221 [15:27:00] which would ignore errors W241 or E221 [15:27:05] or you can fix them :D [15:27:30] or skip some files entirely with exclude = [15:27:53] will have to find a way to have npm install && npm test to run the gulp stuff for frontend [15:34:19] (03PS2) 10Hashar: Dummy job to let wikimedia/fundraising/crm gate [integration/config] - 10https://gerrit.wikimedia.org/r/263045 (https://phabricator.wikimedia.org/T120881) [15:35:41] (03PS3) 10Hashar: Dummy job to let wikimedia/fundraising/crm gate [integration/config] - 10https://gerrit.wikimedia.org/r/263045 (https://phabricator.wikimedia.org/T120881) [15:35:58] (03CR) 10Hashar: [C: 032] "Added noop to the test pipeline per Jan." [integration/config] - 10https://gerrit.wikimedia.org/r/263045 (https://phabricator.wikimedia.org/T120881) (owner: 10Hashar) [15:37:09] (03Merged) 10jenkins-bot: Dummy job to let wikimedia/fundraising/crm gate [integration/config] - 10https://gerrit.wikimedia.org/r/263045 (https://phabricator.wikimedia.org/T120881) (owner: 10Hashar) [15:38:00] 10Continuous-Integration-Config, 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM, 5Patch-For-Review, 7WorkType-Maintenance: Bad empty CI jobs on wikimedia/fundraising/crm deployment branch - https://phabricator.wikimedia.org/T120881#1955530 (10hashar) 5Open>3Resolved I have added the `noop` jo... [15:43:30] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Migrate javascript npm CI jobs to Nodepool - https://phabricator.wikimedia.org/T119143#1955559 (10hashar) [15:43:32] 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Tracking: [tracking] Disposable VMs need a cache for package managers - https://phabricator.wikimedia.org/T112560#1955556 (10hashar) 5Open>3Resolved a:3hashar Being bold. This is solved by adding in JJB jobs running on Nodepool: ``` lang=yaml builder... [15:45:32] (03PS1) 10Hashar: Enable castor on tox-{toxenv}-jessie jobs [integration/config] - 10https://gerrit.wikimedia.org/r/265741 (https://phabricator.wikimedia.org/T112560) [15:53:09] (03CR) 10Hashar: [C: 032] Enable castor on tox-{toxenv}-jessie jobs [integration/config] - 10https://gerrit.wikimedia.org/r/265741 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:54:24] (03PS1) 10Hashar: Enable castor on tox-jessie job [integration/config] - 10https://gerrit.wikimedia.org/r/265744 (https://phabricator.wikimedia.org/T112560) [15:54:37] (03CR) 10Hashar: [C: 032] Enable castor on tox-jessie job [integration/config] - 10https://gerrit.wikimedia.org/r/265744 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:54:54] (03Merged) 10jenkins-bot: Enable castor on tox-{toxenv}-jessie jobs [integration/config] - 10https://gerrit.wikimedia.org/r/265741 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:56:22] (03Merged) 10jenkins-bot: Enable castor on tox-jessie job [integration/config] - 10https://gerrit.wikimedia.org/r/265744 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [15:57:55] 10Continuous-Integration-Infrastructure, 6operations: Investigate usage of ttf-ubuntu-font-family which is not available on Jessie - https://phabricator.wikimedia.org/T103325#1955622 (10akosiaris) Any news on this ? [15:59:35] (03PS1) 10Hashar: rake-jessie was no more generated [integration/config] - 10https://gerrit.wikimedia.org/r/265745 [16:01:44] ryasmeen|Away: do we have the meeting now? or do we start next week? [16:09:05] (03PS1) 10Hashar: Enable castor on rake-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/265747 (https://phabricator.wikimedia.org/T112560) [16:09:48] (03CR) 10Hashar: [C: 032] "It is back!" [integration/config] - 10https://gerrit.wikimedia.org/r/265745 (owner: 10Hashar) [16:11:47] (03Merged) 10jenkins-bot: rake-jessie was no more generated [integration/config] - 10https://gerrit.wikimedia.org/r/265745 (owner: 10Hashar) [16:16:53] (03CR) 10Hashar: "I have manually retriggered this change so rake-jessie populate the package management cache. i.e. https://integration.wikimedia.org/ci/j" [ruby/api] - 10https://gerrit.wikimedia.org/r/252698 (https://phabricator.wikimedia.org/T117993) (owner: 10Zfilipin) [16:17:08] (03CR) 10Hashar: "recheck" [ruby/api] - 10https://gerrit.wikimedia.org/r/252698 (https://phabricator.wikimedia.org/T117993) (owner: 10Zfilipin) [16:25:19] (03CR) 10Hashar: [C: 032] "Gave it a try on https://gerrit.wikimedia.org/r/#/c/252698/2 which is merged. I reenqueued it in gate-and-submit to have the cache populat" [integration/config] - 10https://gerrit.wikimedia.org/r/265747 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [16:25:53] 5Continuous-Integration-Scaling, 5Patch-For-Review, 7Tracking: [tracking] Disposable VMs need a cache for package managers - https://phabricator.wikimedia.org/T112560#1955763 (10hashar) I have enabled castor on the rake-jessie job as well. Gave it a try on https://gerrit.wikimedia.org/r/#/c/252698/2 which is... [16:27:06] (03Merged) 10jenkins-bot: Enable castor on rake-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/265747 (https://phabricator.wikimedia.org/T112560) (owner: 10Hashar) [16:28:29] so [16:28:44] rake-jessie use the central package manager cache now should be slightly faster [16:32:06] !log Nuked corrupted git repo on integration-slave-precise-1012 /mnt/jenkins-workspace/workspace/mediawiki-extensions-php53 [16:32:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:52:56] ostriches: Could the reasons why mediawiki/core isent being updated in git.wikimedia.org because off mediawiki-replication being set to denied https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/core,access where as in https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki,access it is set as allow. [16:55:07] Its set to denied because we want phab to take over [16:57:52] ostriches: Oh ok. [17:13:42] ostriches: Would mediawiki/core require pushing in ssh.https://developer.blender.org/T37592 [17:16:42] No [17:18:22] paladox, the links that you post are not clickable when they start with an "ssh." prefix. Just saying. :) [17:18:51] andre__: Im not sure what link you mean. [17:19:23] paladox, your previous message that you posted in this very channel. [17:19:58] andre__: Yoy mean this link https://developer.blender.org/T37592 [17:20:03] andre__: Depends on your client :p [17:20:13] Mine figures it out fine because . can't be part of a scheme :) [17:21:13] 17:19:39 Fatal error: Uncaught exception 'InvalidArgumentException' with message 'The value for 'SkipSkins' should be an array' in /mnt/jenkins-workspace/workspace/mwext-qunit/src/includes/registration/ExtensionProcessor.php:367 [17:21:19] paladox: obviously yes, as your previous message only contained one link. [17:21:47] ostriches: heh. my client is very simple it seems :) [17:22:01] andre: Ok. It isent an ssh link. It is a normal http link. [17:25:23] gnome-terminal dealt fine with the ssh. :) [17:30:10] i asked this y'day but anyone know what this git error is about? https://integration.wikimedia.org/ci/job/operations-puppet-tox-py27/17678/console [17:31:30] subbu: I think this is related to when it happend to other tests yesturday. [17:31:58] subbu: Hashar should have fixed it by now since the test now passes in mediawiki. Not sure if he would need to do it on that test. [17:32:14] let me recheck then. [17:33:29] subbu: Seems the test is a no vote now. [17:34:00] Reedy: Do you know why it says here https://integration.wikimedia.org/ci/job/operations-puppet-tox-pep8-jessie/18/console this ERROR: unknown environment 'pep8' [17:34:07] looks like it passed. thanks. all good now. [17:39:26] Reedy: Im getting this error IOError: Lock at '/mnt/jenkins-workspace/workspace/mediawiki-extensions-qunit/src/extensions/cldr/.git/refs/heads/wmf/1.25wmf8.lock' could not be obtained here https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit/27365/console [17:43:01] greg-g: we have a logging channel that was generating 100G of logs per day which has caused some disk space issues on fluorine. Can i deploy https://gerrit.wikimedia.org/r/265772 which stops the logging to fluorine? [17:43:50] yes [17:43:54] that no good [17:44:02] this week [17:44:05] such a bad week [17:44:15] things will be better next week :) [17:44:41] we kinda expected this to happen..which is why all of this data is now logged to hadoop, I just didn't follow up and turn off this old channel when we finished converting... [17:44:47] ahhhh [17:44:49] gotcha [17:58:04] Project browsertests-Wikidata-WikidataTests-linux-firefox build #81: 15ABORTED in 4 hr 0 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-WikidataTests-linux-firefox/81/ [18:02:24] heh, the wikidata test hit the timelimit [18:11:19] ostriches: Did you set the gh credential for mediawiki core repo. Meaning gh credential for wikimedia account at github.com. [18:11:34] I just had some help in phabricator to do it and i managed to do. [18:11:37] Of course. [18:12:07] I have to do some testing and fix it. [18:12:56] ostriches: ok. [18:13:36] ostriches: Did you use the https link instead of ssh. Since tryed using ssh first and didnt work since using https it works. [18:13:48] I'm not using ssh. [18:14:02] ostriches: Ok i mean github links. [18:14:23] I'm using https. I'm also in the middle of a meeting and not working on it right now [18:15:08] ok. [18:21:39] 10Continuous-Integration-Config, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956210 (10Physikerwelt) 3NEW [18:22:58] 10Continuous-Integration-Config, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956221 (10mobrovac) ping @hashar! :) [18:58:29] 10Deployment-Systems, 3Scap3: Merge `git_deploy_user` and `ssh_user` - https://phabricator.wikimedia.org/T124460#1956464 (10thcipriani) 3NEW [19:14:42] 10Deployment-Systems, 3Scap3: Merge `git_deploy_user` and `ssh_user` - https://phabricator.wikimedia.org/T124460#1956549 (10dduvall) a:3dduvall [19:15:14] 10Deployment-Systems, 3Scap3: Merge `git_deploy_user` and `ssh_user` - https://phabricator.wikimedia.org/T124460#1956464 (10dduvall) p:5Triage>3Normal [19:17:36] 10Continuous-Integration-Infrastructure, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956562 (10hashar) For the permanent slaves we have: | Distro | Nodes |--|-- | Precise | v0.8.2 | Trusty | v0.10.25 The disposable slaves instances are running Jessie: | Distro | No... [19:25:28] 10Continuous-Integration-Infrastructure, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956612 (10mobrovac) That's great news, @hashar, because we moving all of the Node.JS services to Jessie and 4.2 (cf. {T96017}). Thanks a ton! [19:27:01] 10Continuous-Integration-Infrastructure: QUnit on integration-slave-trusty-1011 fails with IOError: Lock at '/extensions/cldr/.git/refs/heads/wmf/1.25wmf8.lock' could not be obtained - https://phabricator.wikimedia.org/T124462#1956632 (10Florian) 3NEW [19:28:07] legoktm: want to take a look? ^ :) [19:32:09] 10Continuous-Integration-Infrastructure, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956644 (10hashar) Found the nodejs issue. We ensure `nodejs-legacy` is latest but that doesn't bump nodejs: ``` $ apt-cache depends nodejs-legacy nodejs-legacy Depends: nodejs ```... [19:32:12] ostriches: Does https://github.com/wikimedia/mediawiki have protection on the branchs. [19:35:18] paladox, what is "protection" exactly? [19:35:28] hashar: https://gerrit.wikimedia.org/r/#/c/265520/ please. [19:35:58] andre__: Please see https://help.github.com/articles/about-protected-branches/ [19:36:04] hashar, stuff like "Please merge this" should not be part of a commit message. [19:36:09] err paladox ^^ [19:36:16] paladox, see https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines [19:36:25] (03PS2) 10Paladox: [OpenLayers] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265520 [19:37:01] andre__: It is to prevent --force pushes. Which phabricator needs so that may be a reason why it isent pushing to github. [19:37:30] paladox: ah. that explanation is welcome at the beginning of your question. [19:37:38] without having to explicitly ask. :) thanks. [19:37:54] andre__: Sorry. [19:38:21] np [19:40:29] 10Continuous-Integration-Infrastructure, 10Mathoid: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956710 (10hashar) Nodepool spawn instances based on a snapshot which is updated by the script https://github.com/wikimedia/integration-config/blob/1c5ca2680/nodepool/scripts/setup_no... [19:46:28] 10Deployment-Systems, 3Scap3: Merge `git_deploy_user` and `ssh_user` - https://phabricator.wikimedia.org/T124460#1956755 (10dduvall) [19:52:12] (03PS3) 10Hashar: [OpenLayers] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265520 (owner: 10Paladox) [19:52:39] (03CR) 10Hashar: [C: 032] "That is being proactive :-} Thank you Paladox." [integration/config] - 10https://gerrit.wikimedia.org/r/265520 (owner: 10Paladox) [19:53:28] (03CR) 10Paladox: "Thanks." [integration/config] - 10https://gerrit.wikimedia.org/r/265520 (owner: 10Paladox) [19:54:37] (03Merged) 10jenkins-bot: [OpenLayers] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/265520 (owner: 10Paladox) [19:58:55] (03PS1) 10Hashar: nodepool: dist-upgrade when preparing snapshot [integration/config] - 10https://gerrit.wikimedia.org/r/265802 (https://phabricator.wikimedia.org/T124447) [20:00:51] !log Refreshing nodepool image to hopefully get Nodejs 4.2.4 https://phabricator.wikimedia.org/T124447 https://gerrit.wikimedia.org/r/#/c/265802/ [20:00:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:04:37] !log Image ci-jessie-wikimedia-1453492820 in wmflabs-eqiad is ready [20:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:05:53] (03CR) 10Hashar: [C: 032] "The following packages will be upgraded:" [integration/config] - 10https://gerrit.wikimedia.org/r/265802 (https://phabricator.wikimedia.org/T124447) (owner: 10Hashar) [20:06:52] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956830 (10hashar) So dist-upgrade did the trick: ``` The following packages will be upgraded: gyp (0.1~svn1729-3 => 0.1+20150913git1f374df9-1~bpo8+1) nodejs... [20:08:59] (03Merged) 10jenkins-bot: nodepool: dist-upgrade when preparing snapshot [integration/config] - 10https://gerrit.wikimedia.org/r/265802 (https://phabricator.wikimedia.org/T124447) (owner: 10Hashar) [20:13:04] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956845 (10hashar) ``` ssh jenkins@10.68.23.16 nodejs --version v4.2.4 ``` So Nodepool instances now have `v4.2.4`. Will craft a Jenkins job for that. [20:32:05] (03PS1) 10Hashar: [mathoid] experimental npm-node-4-2 [integration/config] - 10https://gerrit.wikimedia.org/r/265817 (https://phabricator.wikimedia.org/T124447) [20:35:51] (03PS2) 10Hashar: [mathoid] experimental npm-node-4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/265817 (https://phabricator.wikimedia.org/T124447) [20:41:03] (03CR) 10Hashar: [C: 032] [mathoid] experimental npm-node-4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/265817 (https://phabricator.wikimedia.org/T124447) (owner: 10Hashar) [20:41:38] (03PS3) 10Hashar: [mathoid] experimental npm-node-4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/265817 (https://phabricator.wikimedia.org/T119143) [20:41:53] (03CR) 10Hashar: [mathoid] experimental npm-node-4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/265817 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [20:42:07] (03CR) 10Hashar: [C: 032] "Forgot to link to T119143" [integration/config] - 10https://gerrit.wikimedia.org/r/265817 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [20:43:53] (03Merged) 10jenkins-bot: [mathoid] experimental npm-node-4.2 [integration/config] - 10https://gerrit.wikimedia.org/r/265817 (https://phabricator.wikimedia.org/T119143) (owner: 10Hashar) [20:54:19] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956992 (10hashar) Gave it a try it fails because npm has been installed with nodejs 0.10.x and before nodejs is updated to 4.2. The relevant part is in puppet mo... [21:02:35] 10Continuous-Integration-Infrastructure: QUnit on integration-slave-trusty-1011 fails with IOError: Lock at '/extensions/cldr/.git/refs/heads/wmf/1.25wmf8.lock' could not be obtained - https://phabricator.wikimedia.org/T124462#1957029 (10hashar) 5Open>3Resolved a:3hashar Yup that happens from time to time... [21:06:29] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Migrate javascript npm CI jobs to Nodepool - https://phabricator.wikimedia.org/T119143#1957046 (10hashar) [21:06:32] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1957045 (10hashar) [21:06:44] 10Continuous-Integration-Infrastructure, 10Mathoid, 5Patch-For-Review: Enable node4 for mathoid tests - https://phabricator.wikimedia.org/T124447#1956210 (10hashar) Follow up on T119143 [21:09:43] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Migrate javascript npm CI jobs to Nodepool - https://phabricator.wikimedia.org/T119143#1957058 (10hashar) From T124447 Gave it a try it fails because npm has been installed with nodejs 0.10.x an... [21:12:43] !log rebuilding nodepool reference image [21:12:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:14:39] !log updating nodepool snapshot based on new image [21:14:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:14:50] Creating image id: 443 with hostname ci-jessie-wikimedia-1453497269 for ci-jessie-wikimedia in wmflabs-eqiad [21:14:51] ... [21:15:05] * hashar grab a beer [21:18:35] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Migrate javascript npm CI jobs to Nodepool - https://phabricator.wikimedia.org/T119143#1957091 (10mobrovac) The problem here is that we have the newer `nodejs` package, but not the accompanying... [21:19:25] Can beta-scap-eqiad ever complete when I need it to [21:19:27] Like just once [21:19:35] I'd love it to complete in a normal time period [21:19:58] MarkTraceur: no :) [21:20:02] God damn it [21:20:09] Thanks thcipriani [21:20:28] that's what we're here for [21:22:07] !log Image ci-jessie-wikimedia-1453497269 in wmflabs-eqiad is ready (with node 4.2 for https://phabricator.wikimedia.org/T119143 ) [21:22:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:24:56] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 10Reading-Web, 5Patch-For-Review: MW-Selenium associates wrong SauceLabs job with Jenkins artifact - https://phabricator.wikimedia.org/T105589#1957152 (10dduvall) [21:26:36] 10Continuous-Integration-Config, 5Continuous-Integration-Scaling, 5Patch-For-Review, 7WorkType-NewFunctionality: Migrate javascript npm CI jobs to Nodepool - https://phabricator.wikimedia.org/T119143#1957177 (10mobrovac) >>! In T119143#1957091, @mobrovac wrote: > The problem here is that we have the newer... [21:43:07] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1957241 (10greg) Anything else needed here? Or is this complete now? [21:48:48] unrelatedly, it seems redis is borked in beta? https://phabricator.wikimedia.org/T124388 [21:49:02] ostriches: Why not set up a test repo on github and set up to mirror to that repo and once you get it working mirror to wikimedia/mediawiki [21:49:12] at least deployment-redis01.eqiad.wmflabs. is not the proper hostname anymore :-} [21:49:51] ostriches: That way we can go back an enable replication from git.wikimedia.org to github for mediawiki core until we figure out what the problem is. [21:54:24] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1957313 (10hashar) Thu Jan 21 19:24:12 2016] init: redis-instance-tcp_6380 main process (2135) terminated with status 1 [Thu Jan... [21:54:43] 10Continuous-Integration-Infrastructure: QUnit on integration-slave-trusty-1011 fails with IOError: Lock at '/extensions/cldr/.git/refs/heads/wmf/1.25wmf8.lock' could not be obtained - https://phabricator.wikimedia.org/T124462#1957316 (10Florian) Ah, ok, thanks for the explanation :) If this happens again (I hop... [21:55:07] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1957317 (10hashar) And it is dead: ``` root@deployment-redis01:/var/log# /etc/init.d/redis-server status redis-server is not run... [21:56:07] thcipriani: in short, someone broke redis on trusty :D [21:56:44] oh, I see. [21:56:58] *** FATAL CONFIG FILE ERROR *** [21:56:58] Reading the configuration file, at line 549 [21:56:58] >>> 'latency-monitor-threshold 100' [21:56:58] Bad directive or wrong number of arguments [21:57:05] so is there a task to upgrade all the beta instances? [21:57:16] (or do we need one, I guess) [21:57:30] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1957332 (10hashar) From /var/log/upstart/redis-instance* files which are all at `Jan 21 19:24`: ``` *** FATAL CONFIG FILE ERROR... [21:57:42] is ops switching to jessie? [21:58:19] ugh, yeah [21:58:27] upgrade all of beta to jessie [21:58:47] we should get help from ops ("if you upgrade this type of machine in prod to jessie, pleaes also do in beta") [21:59:01] +1 [21:59:09] * mobrovac hides as he'd need to do it [21:59:15] mobrovac: :) :) [21:59:22] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1957339 (10hashar) Well puppet patch from Dec 29th introduced the 'latency-monitor-threshold' config https://gerrit.wikimedia.or... [21:59:55] just upgrade to stretch so you're a step ahead of ops? ;) [22:01:03] "beta cluster - welcome to the next generation of testing and damage control" [22:01:04] legoktm: sid4life [22:01:28] legoktm: yeah that is what restbase01 ran at a point but it is all messed up now :D [22:02:12] heh [22:02:47] so [22:02:54] redis on Trusty is dead on the beta cluster [22:03:13] and I have no idea what the f*** is happening [22:03:21] beside that lame latency-monitor-threshold 100 error [22:06:18] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 10Reading-Web, 5Patch-For-Review: MW-Selenium associates wrong SauceLabs job with Jenkins artifact - https://phabricator.wikimedia.org/T105589#1957366 (10dduvall) a:3dduvall [22:08:34] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1957378 (10hashar) 5Open>3Resolved a:3hashar So the `redis-server` errors above are unrelated maybe. The services are run... [22:08:44] Project beta-scap-eqiad build #87252: 04FAILURE in 4 min 24 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87252/ [22:09:19] !log rebooting deployment-redis01 (kernel upgrade) [22:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:09:51] (03PS1) 10Paladox: Add npm-node-4.2 to experimental: in template npm [integration/config] - 10https://gerrit.wikimedia.org/r/265854 [22:10:17] hashar: Should we enable composer-test on all repos by adding it to the templates. [22:10:33] paladox: that is a good idea :-} [22:10:41] but a bit too early. It does not work yet [22:10:48] will polish it up next week [22:10:57] hashar: Ok. [22:11:13] but yeah [22:11:26] whenever I get it working, I will happily deploy your proposal [22:11:28] sounds good [22:13:38] It seems that https://integration.wikimedia.org/zuul/ has crashed again. [22:14:12] hashar: Ok. I will upload a patch that you can merge when ever you polish it up next week. [22:14:49] hashar: Could you review https://gerrit.wikimedia.org/r/#/c/265854/ please. It is todo with adding npm-node-4.2 to experimental in template npm. [22:18:45] (03PS1) 10Paladox: Add php-composer-test to a few templates [integration/config] - 10https://gerrit.wikimedia.org/r/265857 [22:20:33] Yippee, build fixed! [22:20:33] Project beta-scap-eqiad build #87253: 09FIXED in 5 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87253/ [22:20:35] (03CR) 10jenkins-bot: [V: 04-1] Add php-composer-test to a few templates [integration/config] - 10https://gerrit.wikimedia.org/r/265857 (owner: 10Paladox) [22:20:36] hashar: The test does work. Please see https://integration.wikimedia.org/ci/job/php-composer-test/29792/console [22:23:57] (03PS2) 10Paladox: Add php-composer-test to a few templates [integration/config] - 10https://gerrit.wikimedia.org/r/265857 [22:24:25] (03CR) 10Legoktm: [C: 04-1] "I don't think this is a good idea, mainly because of the number of extensions that are failing: T124342." [integration/config] - 10https://gerrit.wikimedia.org/r/265857 (owner: 10Paladox) [22:27:10] !log rebooted all CI slaves using OpenStackManager [22:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:28:42] (03CR) 10Paladox: "@Legoktm but it seems that it passes here https://integration.wikimedia.org/ci/job/php-composer-test/29792/console" [integration/config] - 10https://gerrit.wikimedia.org/r/265857 (owner: 10Paladox) [22:31:39] (03CR) 10Paladox: "@Legoktm how would I do that Instead lets add the job to all of the extensions where it isn't failing? bit since I thought the point of th" [integration/config] - 10https://gerrit.wikimedia.org/r/265857 (owner: 10Paladox) [22:35:39] ostriches: replication from phabricator to github is broken? did we enable 2 factor on github or something? [22:37:18] No we didn't. And only busted for 2 repos for 2 unrelated reasons [22:39:57] scap repo isn't replicating, can't figure out why. [22:40:04] no error message that I can find [22:41:21] (03PS1) 10Legoktm: Whitelist IoannisKydonis [integration/config] - 10https://gerrit.wikimedia.org/r/265864 [22:41:46] Hmm, now that *is* interesting. [22:41:52] (03CR) 10Legoktm: [C: 032] Whitelist IoannisKydonis [integration/config] - 10https://gerrit.wikimedia.org/r/265864 (owner: 10Legoktm) [22:42:18] it hasn't updated in over a week [22:43:36] hmm, zuul is lagging [22:43:48] twentyafterfour: I can't find any sort of logs from attempted pushing. [22:44:16] I ran the repository mirror script from the shell and it looked like it completed successfully [22:44:24] but commits don't seem to be showing up on github [22:44:35] (03Merged) 10jenkins-bot: Whitelist IoannisKydonis [integration/config] - 10https://gerrit.wikimedia.org/r/265864 (owner: 10Legoktm) [22:45:04] !log deploying https://gerrit.wikimedia.org/r/265864 [22:45:07] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:45:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:45:39] it only shows in the logs if it fails, right? [22:45:46] 10Beta-Cluster-Infrastructure: Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs." - https://phabricator.wikimedia.org/T124388#1957605 (10MGChecker) 5Resolved>3Open If I try to delete something right now (and earlier today too) I get a similar error w... [22:49:30] twentyafterfour: Yeah, but it's clearly not succeeding :p [22:49:56] 10Continuous-Integration-Infrastructure: QUnit on integration-slave-trusty-1011 fails with IOError: Lock at '/extensions/cldr/.git/refs/heads/wmf/1.25wmf8.lock' could not be obtained - https://phabricator.wikimedia.org/T124462#1957617 (10hashar) >>! In T124462#1957316, @Florian wrote: > Ah, ok, thanks for the ex... [22:52:06] twentyafterfour: imma try something. [22:52:23] #holdmybeer [22:53:01] 10Continuous-Integration-Config: Set up composer-test for all MW extensions where it isn't broken - https://phabricator.wikimedia.org/T124342#1957625 (10Paladox) Could we also do the same to test like disable it erroring and causing jenkins to fail I mean only if test isent present not if it fails because it fou... [22:53:09] https://github.com/wikimedia/mediawiki-tools-scap/commits/master [22:53:22] phd@iridium /srv/phab/repos/MSCA (BARE:master)$ git remote set-url origin https://github.com/wikimedia/mediawiki-tools-scap [22:53:22] phd@iridium /srv/phab/repos/MSCA (BARE:master)$ git push origin master [22:53:22] Username for 'https://github.com': wmfphab [22:53:22] Password for 'https://wmfphab@github.com': [22:53:22] Everything up-to-date [22:53:29] I tried doing it manually from the repo on disk [22:53:37] So it clearly can (and does) reach the outside. [22:54:56] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39540 bytes in 0.485 second response time [22:57:26] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1957632 (10hashar) git clone works for me over v6 :-) There is still one comment that I dont think is formally addressed: >>! In T100519#171061... [23:00:18] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1957636 (10BBlack) I don't really understand that quoted comment, but the ferm rules do have destination addresses that work at this time, and th... [23:03:47] twentyafterfour: I'll poke it over the weekend some more. [23:06:34] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #394: 04FAILURE in 9 min 33 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/394/ [23:11:53] (03CR) 10Hashar: [C: 04-1] "Almost! The json lint script only look for .json files so it is not going to work." (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/265534 (https://phabricator.wikimedia.org/T124319) (owner: 10Paladox) [23:17:14] Hi! anyone have any thoughts about this error? Fatal error: Uncaught exception 'InvalidArgumentException' with message 'The value for 'SkipSkins' should be an array' in /mnt/jenkins-workspace/workspace/mwext-qunit/src/includes/registration/ExtensionProcessor.php:367 [23:17:21] https://integration.wikimedia.org/ci/job/mwext-qunit/10375/console [23:17:30] Qunit tests [23:17:55] 5Gerrit-Migration, 10Gitblit-Deprecate, 3releng-201516-q1, 10Diffusion: [keyresult] Allow cloning of Phabricator hosted git repositories - https://phabricator.wikimedia.org/T128#1957816 (10chasemp) [23:17:57] 6Release-Engineering-Team, 6Phabricator, 10Traffic, 6operations, 5Patch-For-Review: Phabricator needs to expose ssh - https://phabricator.wikimedia.org/T100519#1957814 (10chasemp) 5Open>3Resolved that comment is out dated [23:20:07] ostriches: greg-g: hi!!! :) ^ ? [23:22:09] marxarelli: hi!! ^ ? [23:23:42] thcipriani: hi!! ^ ? [23:23:43] AndyRussG: there are no skins installed/required for qunit jobs/tests, so that be related [23:24:20] legoktm or Krinkle might know more [23:25:08] marxarelli: maybe some needed config for runing qunit tests has changed? user to work... [23:25:53] or something in ExtensionProcessor has changed, or something in CentralNotice? [23:26:08] Tried running grunt qunit from the command line, got a million errors like "22 01 2016 18:08:35.297:DEBUG [proxy]: failed to proxy /mw1/load.php?debug=false&lang=en&modules=jquery%2Cmediawiki&only=scripts&skin=vector&version=A0h0UtqB (browser hung up the socket)" [23:26:40] hmm dunno what could have changed in CentralNotice, not with the test failing anyway! [23:26:58] It's also impacting Echo. [23:27:12] fun times [23:27:19] * marxarelli goes digging [23:27:21] https://integration.wikimedia.org/ci/job/mwext-qunit/10388/console [23:29:26] umm what [23:30:15] # Enabled skins. [23:30:15] # The following skins were automatically enabled: [23:30:15] wfLoadSkin( 'BlueSky' ); [23:30:15] wfLoadSkin( 'Donate' ); [23:30:17] one of those probably [23:30:31] AndyRussG: James_F ^^ [23:31:44] legoktm: hi! thx!! uhhh where is that? [23:31:50] marxarelli: Skins being installed doesn't affect the Skin system itself or validation of extension.json contents (which is skin-agnostic). The error about SkipSkins seems genuine. [23:32:11] AndyRussG: https://integration.wikimedia.org/ci/job/mwext-qunit/10388/ build artifacts, LocalSettings.php [23:32:26] somehow a skin got cloned into the qunit job directory? [23:33:22] Hmmm [23:34:21] !log rm -rf /mnt/jenkins-workspace/workspace/mediawiki-phpunit-php53 on slave precise 1012 [23:34:23] also there in the CentralNotice job artifact: https://integration.wikimedia.org/ci/job/mwext-qunit/10375/artifact/log/LocalSettings.php/*view*/ [23:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:34:56] Where is that built? What builds that? [23:36:09] [15:32:26] somehow a skin got cloned into the qunit job directory? [23:36:12] the installer [23:36:18] someone just needs to login to the slave and delete it [23:36:35] Krinkle: got it. so it's most likely a bad extensions.json somewhere? [23:39:50] Ah hmmm [23:41:21] marxarelli: legoktm: I think CentralNotice doesn't have extensions.json at this point [23:41:32] Hmmm neither does Echo [23:43:41] [15:36:09] [15:32:26] somehow a skin got cloned into the qunit job directory? [23:43:42] really [23:43:44] that's what it is [23:43:54] I'll fix it after prod is fixed [23:44:38] legoktm: i'll handle it [23:44:50] thanks for looking into it [23:45:17] legoktm: marxarelli: thx! LMK if u need any +2'ing or anything else... [23:45:34] * AndyRussG looks for extensions.json documentation [23:46:32] Ah hmm https://www.mediawiki.org/wiki/Manual:Extension_registration [23:46:46] That was easier than Googling a candidate for trustee! [23:47:12] AndyRussG: looks like the Donate skin might be the culprit [23:47:24] skin.json says " "SkipSkins": "Donate"," [23:47:40] Huh I don't even know what that's from [23:47:46] It's also in Echo tho [23:47:52] which would explain the exception 'The value for 'SkipSkins' should be an array' [23:48:26] yes, because as legoktm explained the skins are still sitting in the jenkins workspace [23:50:23] marxarelli: where is this version of skin.json of which u speak? [23:50:58] integration-slave-trusty-1013:/mnt/jenkins-workspace/workspace/mwext-qunit/src/skins/Donate/skin.json [23:52:34] Huh, so nothing specifically to do with CentralNotice? [23:52:55] doesn't appear to be, no [23:53:03] * AndyRussG doesn't know enuf about how the CI infrastructure works to say anything intelligent [23:53:24] Hmmm [23:56:05] marxarelli: in any case I guess we should make an extension.json for CentralNotice, no? [23:56:44] * marxarelli shrugs [23:57:59] 8รพ [23:58:10] ok, looking [23:58:40] !log removed skins from mwext-qunit workspace on trusty-1013 slave [23:58:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:59:19] AndyRussG: try recheck now? [23:59:20] legoktm: eek. i was debugging that [23:59:24] :P [23:59:26] er, [23:59:30] debugging what specifically? [23:59:52] the installer automatically reads the skin.json files and adds them to the generated LocalSettings.php