[00:00:07] hashar: Eh. I'm not sure that we want to discourage extensive unit tests. [00:00:21] SMalyshev: aloha yeah hmm. New repos in Gerrit do not have job set in CI. The CI part is done independently :-( [00:00:37] hashar: It seems to be stuck on wikimedia/fundraising/crm which has been merged and has been fully tested. [00:00:39] hashar: ah! that may explain it [00:00:42] https://integration.wikimedia.org/zuul/ [00:00:51] hashar: so how it should be done? [00:01:06] SMalyshev: so it really all depends what you want to run for wikidata/query/gui . You could get maven :D [00:01:10] hashar: does it need to be in layout.xml or something else? [00:01:15] yup [00:01:25] hashar: no, don't need maven, need npm [00:01:26] SMalyshev: You could either file a task or you can checkout integration/config and edit the layout.yaml file. [00:01:33] (03CR) 10Subramanya Sastry: [C: 04-1] "Let us actually wait to confirm this is the source." [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/269335 (owner: 10Subramanya Sastry) [00:01:39] paladox: ah, ok, will do [00:01:40] SMalyshev: so we have a bunch of basic entry points described at https://www.mediawiki.org/wiki/Continuous_integration/Entry_points [00:02:08] SMalyshev: but really for you, there is a jenkins job named 'npm' already that would grab your patch, run npm install && npm test and report back [00:02:27] paladox can assist ;-} he has a ton of experience with the zuul/layout.yaml file [00:03:14] James_F: oh I dont want to discourage more tests. The problem is that we blindly run them all. [00:03:35] James_F: say I propose a patch for Math ... the Scribunto LUA tests were being run which dont really make any sense [00:03:45] (03PS1) 10Smalyshev: Add wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/269338 [00:04:15] hashar: I think we should run all of the cross-dependency ones. [00:05:08] (03PS2) 10Smalyshev: Add wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/269338 [00:05:20] (But none of the others.) [00:05:27] I thought about splitting our tests in various tiers. true unit tests (dont even need mw core) mw integration (bring extensions as dependencies but dont run their tests) mw ext integrations (the mess we run) [00:05:31] hashar: something like this: https://gerrit.wikimedia.org/r/#/c/269338/1 ? [00:08:15] SMalyshev: yeah exactly [00:11:33] (03CR) 10Hashar: Add wikidata/query/gui (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/269338 (owner: 10Smalyshev) [00:11:46] oh I reviewed ps1 [00:12:08] (03CR) 10Hashar: [C: 032] Add wikidata/query/gui (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/269338 (owner: 10Smalyshev) [00:15:24] hashar: could you merge https://gerrit.wikimedia.org/r/269191 and https://gerrit.wikimedia.org/r/#/c/269188/ please [00:16:00] Also could you review https://gerrit.wikimedia.org/r/#/c/269310/ please [00:17:31] hashar https://gerrit.wikimedia.org/r/#/c/267548/ please [00:24:56] (03CR) 10Subramanya Sastry: "Looks like something else is going on .. but, this would be useful for when this is no longer in beta features." [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/269335 (owner: 10Subramanya Sastry) [00:31:00] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 7Documentation, 7Jenkins: Jenkins-mwext-sync needs documentation - https://phabricator.wikimedia.org/T62793#2010349 (10greg) [00:34:42] * James_F sighs at https://gerrit.wikimedia.org/r/#/c/269330/ holding up the queue despite being cancelled. [00:35:30] James_F: yeah found a sleep call in Zuul ... [00:36:04] I am killing zuul [00:36:19] !log killing zuul [00:36:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [00:37:12] !log live hacking Zuul code to have it stop sleeping() on force merge [00:37:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [00:37:26] hashar: will restarting zuul fix the lock in Jenkins that is keeping beta cluster from updating too? [00:37:28] cause I wake up in 5 hours, no point in wasting an hour trying to build a package :D [00:37:40] * James_F grins at hashar. [00:38:00] "status.json: Service Temporarily Unavailable". [00:38:00] Quite. [00:38:08] Beta cluster updates have been backed up all day which is not awesome heading into a branch cut [00:38:09] "status.json: hashar just killed me!". [00:39:13] !log gallium edited /usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/trigger/gerrit.py and modified: replication_timeout = 300 -> replication_timeout = 10 [00:39:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [00:41:10] 10Deployment-Systems, 10MediaWiki-Logging: Include dbname in fatal logs - https://phabricator.wikimedia.org/T62324#2010371 (10greg) [00:41:52] (03CR) 10Hashar: Add wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/269338 (owner: 10Smalyshev) [00:41:58] (03CR) 10Hashar: [C: 032] Add wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/269338 (owner: 10Smalyshev) [00:43:58] (03Merged) 10jenkins-bot: Add wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/269338 (owner: 10Smalyshev) [00:47:08] SMalyshev: sorry took a while but wikidata/query/gui can finally get npm to run in Jenkins [00:49:20] hashar: Ah, I see [00:49:56] SMalyshev: it is running at https://integration.wikimedia.org/ci/job/npm/51022/console [00:50:08] SMalyshev: the change adding a README change can no more be merged [00:50:14] because it has no npm entry point [00:50:24] so you will have to rebase it on top of the change adding the test files [00:50:30] and it should be all fine :-} [00:50:49] ah, ok [00:51:22] reviewing https://gerrit.wikimedia.org/r/#/c/269212/ :D [00:55:18] SMalyshev: your favorite frontend devs should be familiar with the npm / grunt and all the tasks in there [00:56:09] hashar: yes I think so [00:56:26] hashar: I didn't write these files, I just moved them from other repo [00:56:39] yeah that is what most folks ends up doing :-} [00:57:08] mediawiki/extensions/BoilerPlate is usually quite up-to-date [00:57:15] it is supposed to provide a skeleton to start an extension [00:58:15] jshint seems to be getting stuck [00:58:30] but not I see that old one actually didn't run it on any useful files [00:58:44] so I wonder if I should drop it [00:59:54] SMalyshev: i think jshint is attempting to process all the dependencies in node_modules [01:00:20] hashar: ahh, I forgot to add node_modules [01:00:27] yeah see my comments on ps1 [01:00:42] I think you need both node_modules/** in .jshintignore [01:00:54] and the !node_modules/** rule in the grunt file [01:01:11] not sure whether both are strictly needed though [01:01:27] hashar: we're back to a good place re merge gate timing, right? [01:01:56] yeah more or less [01:02:27] then go to bed [01:02:28] :) [01:02:31] will have to shuffle stuff around [01:02:38] yeah [01:02:38] tomorrow is going to be a long day [01:02:47] will nap in the morning, lunch and cut branch [01:02:50] * greg-g nods [01:02:56] nap definitely needed tomorrow :) [01:06:36] ok, seems to be fine now, I'll fine-tune it after [01:06:42] hashar: thanks for your help! [01:07:21] greg-g: yeah settled :( [01:07:30] hashar: g'night sir [01:07:38] SMalyshev: you are welcome. Happy hacking! [01:07:38] thanks for staying up late fighting this [01:07:56] bd808: sorry for the mess! Still have a ton of stuff to polish up [01:17:42] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2010465 (10hashar) So in short: * 4 new 2 CPU Precise slaves have been added to help processing the php53 jobs * Scribu... [01:18:00] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2010467 (10hashar) [01:31:36] "snowball effect of doom". [01:31:45] Nice. [01:41:48] something weird is going on with https://gerrit.wikimedia.org/r/#/c/269212/. It passed the verification, but the merge actually never happens [01:47:13] Yippee, build fixed! [01:47:14] Project beta-scap-eqiad build #89120: 09FIXED in 7 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/89120/ [01:49:09] SMalyshev: jenkins probably doesn't have merge rights in there? [01:49:35] legoktm: hmm... no idea. It's a standard repo set up like any other... any way to check it? [01:50:08] one sec [01:51:34] SMalyshev: fixed :) [01:51:36] legoktm: thanks, now seems to have worked! [01:52:14] legoktm: I wonder has something changed? I've never had to do that before with other repos and they worked fine [01:52:29] SMalyshev: for future reference, I went to https://gerrit.wikimedia.org/r/#/admin/projects/wikidata/query/gui,access and added Submit: JenkinsBot. [01:52:45] SMalyshev: most repositories are typically under mediawiki/* which automatically grants jenkins submit rights [01:52:47] legoktm: I see, thanks [01:53:06] legoktm: ahhh... interesting. That may be it, different namespace [01:53:31] yeah, wikidata/query/rdf has the same setting [01:54:08] so we could set that permission at the wikidata/query level or wikidata/ even and never have to worry about that again [01:54:13] but I'm not sure what's under those namespaces [01:57:11] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: Make puppet provider for scap3 - https://phabricator.wikimedia.org/T113072#2010541 (10mmodell) Ok I updated the patch at https://gerrit.wikimedia.org/r/#/c/262742/18, works on labs with some minor hacks to get around environment-related peculiarities. [01:57:18] wikidata/ probably not a good idea [01:57:23] it's not all mine :) [01:57:37] wikidata/query maybe if I get more under that namespace [01:58:27] I just added you to the wikidata-query group, so you should have full ownership of the wikidata/query namespace and can adjust permissions as you wish :) [01:59:14] legoktm: thank you! [01:59:56] np [02:22:25] (03PS1) 10Legoktm: bin/php: Default to php5, because plain `php` would be itself [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269355 [02:22:45] (03CR) 10Legoktm: [C: 032] bin/php: Default to php5, because plain `php` would be itself [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269355 (owner: 10Legoktm) [02:23:32] (03Merged) 10jenkins-bot: bin/php: Default to php5, because plain `php` would be itself [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269355 (owner: 10Legoktm) [02:26:36] Project beta-scap-eqiad build #89124: 04FAILURE in 7 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/89124/ [02:34:45] Yippee, build fixed! [02:34:46] Project beta-scap-eqiad build #89125: 09FIXED in 6 min 34 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/89125/ [02:42:33] (03PS2) 10Legoktm: Get integration/config out of the 'mediawiki' queue [integration/config] - 10https://gerrit.wikimedia.org/r/238988 [02:44:07] 10Continuous-Integration-Config: Get integration/config out of the mediawiki gate queue - https://phabricator.wikimedia.org/T126298#2010566 (10Legoktm) 3NEW a:3Legoktm [02:44:24] (03PS3) 10Legoktm: Get integration/config out of the 'mediawiki' queue [integration/config] - 10https://gerrit.wikimedia.org/r/238988 (https://phabricator.wikimedia.org/T126298) [02:49:21] (03CR) 10Legoktm: [C: 032] Get integration/config out of the 'mediawiki' queue [integration/config] - 10https://gerrit.wikimedia.org/r/238988 (https://phabricator.wikimedia.org/T126298) (owner: 10Legoktm) [02:52:23] (03Merged) 10jenkins-bot: Get integration/config out of the 'mediawiki' queue [integration/config] - 10https://gerrit.wikimedia.org/r/238988 (https://phabricator.wikimedia.org/T126298) (owner: 10Legoktm) [02:53:22] !log deploying https://gerrit.wikimedia.org/r/238988 [02:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [03:03:15] 10Continuous-Integration-Config, 5Patch-For-Review: Get integration/config out of the mediawiki gate queue - https://phabricator.wikimedia.org/T126298#2010595 (10Legoktm) 5Open>3Resolved [03:13:23] PROBLEM - Host deployment-mediawiki01 is DOWN: PING CRITICAL - Packet loss = 80%, RTA = 2709.90 ms [03:13:34] Project beta-scap-eqiad build #89129: 04FAILURE in 8 min 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/89129/ [03:15:45] RECOVERY - Host deployment-mediawiki01 is UP: PING OK - Packet loss = 0%, RTA = 1.54 ms [03:17:57] (03PS1) 10Legoktm: Set $PHP_BIN for hhvm jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269359 [03:18:14] (03CR) 10Legoktm: [C: 032] Set $PHP_BIN for hhvm jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269359 (owner: 10Legoktm) [03:19:16] (03Merged) 10jenkins-bot: Set $PHP_BIN for hhvm jobs [integration/config] - 10https://gerrit.wikimedia.org/r/269359 (owner: 10Legoktm) [03:19:48] !log deploying https://gerrit.wikimedia.org/r/269359 [03:19:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [04:15:46] 3Scap3: Scap should touch symlinks when originals are touched - https://phabricator.wikimedia.org/T126306#2010701 (10Tgr) 3NEW [04:26:44] 3Scap3: Scap should touch symlinks when originals are touched - https://phabricator.wikimedia.org/T126306#2010715 (10bd808) Doing this for the general case would be computationally expensive. There is no "what links here" functionality in the POSIX filesystem standard. All files in /srv/mediawiki-staging would n... [04:36:47] Yippee, build fixed! [04:36:48] Project beta-scap-eqiad build #89133: 09FIXED in 7 min 14 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/89133/ [06:16:48] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2010794 (10greg) Updated graph: {F3326090} [06:29:46] I know CI has a self-hosted puppetmaster, but is there a way I can test a puppet patch on just one slave? [07:13:15] I will let you know when I see hashar around here [07:13:15] @notify hashar [07:13:21] ty :) [09:35:51] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Flow, 10Wikidata, and 2 others: Wikidata QUnit broken on branch REL1_25 causing other extensions to fail - https://phabricator.wikimedia.org/T126073#2010908 (10JanZerebecki) Yes I'm fine with 5, too. [10:25:59] !log pooling in integration-slave-trusty-1018 [10:26:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:26:08] * legoktm waves at hashar [10:26:17] legoktm: morning [10:26:24] I am still half-asleep :D [10:27:12] hashar: I uploaded https://gerrit.wikimedia.org/r/269370, is there a good way I can test it on only one server? [10:27:45] yup [10:27:45] disable puppet on all slaves [10:27:51] cherry pick patch on puppet master [10:27:58] enable puppet on a single slave, see what happens [10:27:58] :( [10:28:11] integration-saltmaster instance would help [10:28:39] you can then do: sudo su - ; salt '*' cmd.run 'puppet agent -tv "testing https://gerrit.wikimedia.org/r/269370"' [10:28:45] that will prevent puppet agent from running [10:29:00] PROBLEM - Puppet failure on integration-slave-trusty-1017 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [10:29:10] no way around disabling puppet on all the slaves? :/ [10:29:56] PROBLEM - Puppet failure on integration-slave-trusty-1013 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [10:30:15] not really [10:30:58] PROBLEM - Puppet failure on integration-slave-trusty-1018 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [10:36:01] RECOVERY - Puppet failure on integration-slave-trusty-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [10:36:05] RECOVERY - Puppet failure on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [10:41:48] (03PS1) 10Hashar: bin/php: replace bash with lighter sh [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269384 [10:41:50] (03PS1) 10Hashar: bin/php: use exec to replace shell with php command [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269385 [11:04:02] RECOVERY - Puppet failure on integration-slave-trusty-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [11:09:14] (03CR) 10Hashar: "Thank you very much for this work!" [integration/config] - 10https://gerrit.wikimedia.org/r/268031 (https://phabricator.wikimedia.org/T125498) (owner: 10Legoktm) [11:09:31] 10Continuous-Integration-Config, 7WorkType-NewFunctionality: Zuul can only apply one parameter function. Prevent us from injecting both php55 and extension dependencies - https://phabricator.wikimedia.org/T125498#2011052 (10hashar) antoine-approve [11:11:41] (03CR) 10Hashar: "> Yes a cyclical dependency of components smells of bad architecture." [integration/config] - 10https://gerrit.wikimedia.org/r/268451 (owner: 10Dduvall) [11:12:07] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2011054 (10JanZerebecki) >>! In T126274#2009859, @hashar wrote: > Later ideas: > * on... [11:13:37] !log disabling puppet on all *(trusty|precise)* slaves [11:13:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:15:13] hashar: is there documentation somewhere on how to cherry pick a patch onto the puppetmaster? [11:15:34] oh poor kunal :-D [11:15:41] legoktm: yeah let me find it [11:15:53] https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster [11:16:09] but really: [11:16:15] ssh integration-saltmaster.integration.eqiad.wmflabs [11:16:17] sudo su - [11:16:18] cd /var/lib/git/operations/puppet [11:16:30] git fetch && git cherry-pick FETCH_HEAD [11:17:16] the saltmaster is also the puppetmaster? [11:17:16] the puppet master is somehow looking for manifests in /var/lib/git/operations/puppet [11:17:21] aregh [11:17:23] stupid me [11:17:36] integration-puppetmaster.integration.eqiad.wmflabs [11:17:39] ok :) [11:17:52] I have split the roles on both integration and beta cluster [11:18:09] can't remember why but I guess a single instance could not handle both roles [11:20:42] !log enabling puppet just on integration-slave-trusty-1012 [11:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:20:53] !log cherry-picked https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster [11:20:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [11:25:22] :-) [11:26:15] $ update-alternatives --list php [11:26:23] /usr/bin/hhvm [11:26:23] /usr/bin/php5 [11:32:46] I'm reading that we have to use update-alternatives install [11:32:53] and now quickly writing puppet to do that... [11:33:27] oh [11:33:33] stupid update-alternatives [11:33:51] will try to get some european ops to review the patch as well [11:37:27] (03PS1) 10JanZerebecki: Bump Wikidata to wmf/1.27.0-wmf.13 [tools/release] - 10https://gerrit.wikimedia.org/r/269397 [11:39:53] hashar: uh, silly git question, how do I get rid of the cherry-pick now? [11:40:08] legoktm: look at the HEAD commit to confirm it is your ( git show HEAD ) [11:40:21] then rollback to commit before HEAD with : git reset --hard HEAD^ [11:40:40] (where HEAD^ means "commit before HEAD" ) [11:41:27] thanks [11:42:45] legoktm: you should get some sleep :-} [11:44:30] probably, but I'm so close :P [11:50:17] :-} [11:50:21] WOOHOO [11:50:24] (03CR) 10Hashar: [C: 032] Bump Wikidata to wmf/1.27.0-wmf.13 [tools/release] - 10https://gerrit.wikimedia.org/r/269397 (owner: 10JanZerebecki) [11:50:28] legoktm@integration-slave-trusty-1012:~$ php --version [11:50:28] PHP 5.5.9-1ubuntu4.14 (cli) (built: Oct 28 2015 01:34:46) [11:50:32] legoktm@integration-slave-trusty-1012:~$ PHP_BIN=hhvm php --version [11:50:32] HipHop VM 3.6.5 (rel) [11:50:40] legoktm@integration-slave-trusty-1012:~$ file /etc/alternatives/php [11:50:40] /etc/alternatives/php: symbolic link to `/srv/deployment/integration/slave-scripts/bin/php' [11:51:16] 6Release-Engineering-Team, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13): MW 1.27.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T125596#2011124 (10hashar) Wikidata bumped to `wmf/1.27.0-wmf.13` by https://gerrit.wikimedia.org/r/#/c/269397/ [11:51:43] the only catch is if you set PHP_BIN=php it'll totally blow up, because it'll infinitely recurse [11:56:25] eek [11:56:54] legoktm: we might be able to catch that one somehow though [11:56:58] I think we can make our bin/php throw an error if $PHP_BIN is set to 'php' or '/usr/bin/php' [11:59:40] https://integration.wikimedia.org/ci/job/composer-package-php55/90/console used php55 and https://integration.wikimedia.org/ci/job/composer-package-hhvm/90/console used hhvm \o/ [11:59:45] yeah that is probably sufficient [11:59:46] (03Merged) 10jenkins-bot: Bump Wikidata to wmf/1.27.0-wmf.13 [tools/release] - 10https://gerrit.wikimedia.org/r/269397 (owner: 10JanZerebecki) [11:59:47] great! !!!! [12:00:49] ok, I'm going to re-enable puppet on all the slaves now [12:02:27] !log re-enabling puppet on all trusty/precise slaves [12:02:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:02:31] legoktm: kudos [12:02:45] if something goes wrong I will handle it (hopefully) :D [12:03:12] thanks, and good night :) [12:04:35] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Make /usr/bin/php a wrapper that picks the right PHP version on CI slaves - https://phabricator.wikimedia.org/T126211#2011129 (10Legoktm) p:5High>3Normal >>! In T126211#2010795, @gerritbot wrote: > Change 269370 had a related patch set uploaded (b... [12:07:50] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2011140 (10scfc) 5Open>3Resolved a:3scfc Now the package installation works (https://integration.wikimedia.org/... [12:08:03] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2011143 (10scfc) a:5scfc>3None [12:14:22] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2011150 (10hashar) 5Resolved>3Open I have manually updated the image somehow. Will have to ask around but we mos... [12:22:18] PROBLEM - SSH on integration-slave-precise-1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:30:14] PROBLEM - SSH on integration-slave-precise-1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:32:28] bah slaves dieing [12:33:00] !log all slaves dieing due to PHP looping [12:33:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:35:12] PROBLEM - SSH on integration-slave-trusty-1014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:38:04] PROBLEM - SSH on integration-slave-precise-1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:38:18] PROBLEM - SSH on integration-slave-precise-1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:39:06] !log salt -v '*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'" [12:39:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:39:56] PROBLEM - SSH on integration-slave-precise-1014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:40:07] !log mass rebooting CI slaves from wikitech [12:40:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:40:18] PROBLEM - SSH on integration-slave-trusty-1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:42:45] PROBLEM - SSH on integration-slave-trusty-1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:43:21] PROBLEM - SSH on integration-slave-precise-1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:44:47] RECOVERY - SSH on integration-slave-precise-1014 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [12:45:03] RECOVERY - SSH on integration-slave-trusty-1014 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0) [12:45:05] RECOVERY - SSH on integration-slave-precise-1003 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [12:45:11] RECOVERY - SSH on integration-slave-trusty-1017 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0) [12:46:06] !log Mass testing php loop of death: salt -v '*slave*' cmd.run 'timeout 2s /srv/deployment/integration/slave-scripts/bin/php --version' [12:46:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:47:11] RECOVERY - SSH on integration-slave-precise-1002 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [12:47:33] RECOVERY - SSH on integration-slave-trusty-1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0) [12:47:57] RECOVERY - SSH on integration-slave-precise-1012 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [12:48:11] RECOVERY - SSH on integration-slave-precise-1001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [12:48:11] RECOVERY - SSH on integration-slave-precise-1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [12:50:55] PROBLEM - SSH on integration-slave-precise-1014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:51:11] PROBLEM - SSH on integration-slave-trusty-1016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:14:00] !log slave recurse infinitely doing /bin/bash -eu /srv/deployment/integration/slave-scripts/bin/mw-install-mysql.sh then loop over /bin/bash /usr/bin/php maintenance/install.php --confpath /mnt/jenkins-workspace/workspace/mediawiki-core-qunit/src --dbtype=mysql --dbserver=127.0.0.1:3306 --dbuser=jenkins_u2 --dbpass=pw_jenkins_u2 --dbname=jenkins_u2_mw --pass testpass TestWiki WikiAdmin https://phabricator.wikimedia.org/T126327 [13:14:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:14:10] 10Continuous-Integration-Infrastructure: php wrapper script cause death loop of doom - https://phabricator.wikimedia.org/T126327#2011268 (10hashar) [13:14:23] PROBLEM - Host integration-slave-trusty-1012 is DOWN: CRITICAL - Host Unreachable (10.68.18.2) [13:15:05] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Make /usr/bin/php a wrapper that picks the right PHP version on CI slaves - https://phabricator.wikimedia.org/T126211#2011274 (10hashar) Still cause a death loop of doom T126327 (should merge that ticket): ``` 8750 ? S 0:00 \_ /b... [13:15:30] !log removing https://gerrit.wikimedia.org/r/#/c/269370/ from CI puppet master [13:15:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:16:13] 10Continuous-Integration-Infrastructure: php wrapper script cause death loop of doom - https://phabricator.wikimedia.org/T126327#2011276 (10hashar) [13:16:15] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Make /usr/bin/php a wrapper that picks the right PHP version on CI slaves - https://phabricator.wikimedia.org/T126211#2011277 (10hashar) [13:17:59] !log salt -v --batch=3 '*slave*' cmd.run 'puppet agent -tv' [13:18:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:20:15] RECOVERY - Host integration-slave-trusty-1012 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [13:25:46] RECOVERY - SSH on integration-slave-precise-1014 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [13:26:02] RECOVERY - SSH on integration-slave-trusty-1016 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0) [13:27:50] PROBLEM - Puppet failure on integration-slave-trusty-1013 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0] [13:28:11] !log salt '*precise*' cmd.run 'update-alternatives --set php /usr/bin/php5' [13:28:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:28:45] !log salt '*trusty*' cmd.run 'update-alternatives --set php /usr/bin/hhvm' [13:28:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:37:49] RECOVERY - Puppet failure on integration-slave-trusty-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [13:58:10] hashar: are you restarting jenkins? [13:58:29] * aude sees "Jenkins is going to shut down" [13:58:34] yeah [13:58:45] aude: i noticed your patch it failed php53 lint for some reason [13:59:13] i can't see how [13:59:31] especially since it passed that already [13:59:56] oh for god sake git on precise sucks https://integration.wikimedia.org/ci/job/mediawiki-core-php53lint/1502/console [13:59:59] times out trying to clone [14:00:10] I am sure that is the reason [14:00:25] :/ [14:04:25] !log Manually git fetching mediawiki-core in /mnt/jenkins-workspace/workspace/mediawiki-core-php53lint of slaves precise 1001 to 1004 (git on Precise is remarkably too slow) [14:04:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:12:15] !log de pooling https://integration.wikimedia.org/ci/computer/integration-slave-precise-1012/ Mysql is gone somehow [14:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:13:02] !log pooling https://integration.wikimedia.org/ci/computer/integration-slave-precise-1012/ Mysql is back .. Blame puppet [14:13:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:28:11] Yippee, build fixed! [14:28:11] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #782: 09FIXED in 2 min 10 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/782/ [14:33:07] oh for god sake [14:33:18] that integration-make-wmf-branch doesn't work :D [14:33:43] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2011404 (10akosiaris) Taking a look at this I would say that it has nothing to do with the symlink mentioned above, b... [14:45:43] hashar: what do you mean? [14:45:55] 'git' 'clone' '-q' 'ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/AbuseFilter.git' 'AbuseFilter' [14:45:55] Permission denied (publickey). [14:45:56] :D [14:46:05] I created a local ssh key [14:46:14] ah, yeah, you have to keyforward to that machine [14:46:14] now trying to get sudo to recognize it somehow :-} [14:46:56] oh [14:49:18] ah [14:49:34] !log make-wmf-branch instance: created a local ssh key pair and set the config to use User: hashar [14:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:49:40] tis cloning [14:50:33] :D https://github.com/wikimedia/mediawiki-extensions-AbuseFilter/tree/wmf/1.27.0-wmf.13 [14:50:45] nice [14:50:51] !log pooling back integration-slave-precise1001 - 1004. Manually fetched git repos in workspace for mediawiki core php53 [14:50:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:51:09] !log ./make-wmf-branch -n 1.27.0-wmf.13 -o master [14:51:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:51:40] thcipriani|afk: any clue how long it takes usually ? :D [14:52:48] {"type":"ref-updated","submitter":{"name":"Hashar","email":"","username":"hashar"},"refUpdate":{"oldRev":"0000000000000000000000000000000000000000","newRev":"c0e49e1c21099a3e3d1bbe459d42b68f3a51e2cc","refName":"wmf/1.27.0-wmf.13","project":"mediawiki/extensions/MoodBar"}} [14:53:01] watching progress via: ssh -p 29418 hashar@gerrit.wikimedia.org 'gerrit stream-events' [14:53:13] hashar: I setup that machine after my rotation on deploy duty, so no idea :) guess: 15-20 minutes? [14:53:57] took me 1hr 18mins locally for branch cut before I setup that box. [14:54:15] so it's an improvement :P [14:57:04] remote: ERROR: In commit edd1d4ea6afa7ff9c91ab366273f0467503b0106 [14:57:04] remote: ERROR: committer email address root@integration-make-wmf-branch.integration.eqiad.wmflabs [14:57:04] remote: ERROR: does not match your user account. [14:57:08] that is going nowhere [14:57:37] that also mean I have probably sent all those commits with a crazy commiter name [14:57:56] https://github.com/wikimedia/mediawiki-extensions-UserMerge/commit/4e7d2240c80a275248dfd678ac0cdb596351df87 [14:58:16] ... [14:59:24] anyway, if it fails and you fix up what it's complaining about, you can use the --continue-from flag to pick up from the extension where you left off [15:00:14] will try --continue-from VisualEditor/VisualEditor [15:00:21] [ERROR] Could not find extension 'VisualEditor/VisualEditor' in any branched Extension list [15:00:22] eheh [15:00:38] just VisualEditor https://github.com/wikimedia/mediawiki-tools-release/blob/master/make-wmf-branch/config.json#L141 [15:01:33] * hashar tries [15:01:51] that works [15:01:57] --continue-from is nice [15:03:20] yeah :) the first time I tried to cut the branch it fell over somewhere in the middle, had to do a bunch of config.json local hacks. Would be nice to add --skip-extensions and --skip-skins at some point. [15:04:48] 10MediaWiki-Releasing, 6Release-Engineering-Team: make-wmf-branch doesn't ensure git has proper user.name and user.email - https://phabricator.wikimedia.org/T126334#2011449 (10hashar) 3NEW [15:05:05] I am not going to fix the wrong commit authorship [15:05:14] it is doing the git submodule add dance now [15:05:38] nice. [15:07:19] it is pushing! [15:07:29] To ssh://gerrit.wikimedia.org:29418/mediawiki/core.git [15:07:29] * [new branch] wmf/1.27.0-wmf.13 -> wmf/1.27.0-wmf.13 [15:07:34] not too bad [15:09:21] awesome. kudos on the branch cut! :) [15:10:29] 10MediaWiki-Releasing, 6Release-Engineering-Team: make-wmf-branch instance requires ssh auth forwarding - https://phabricator.wikimedia.org/T126335#2011463 (10hashar) 3NEW [15:11:21] !log mira: /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.13 php-1.27.0-wmf.13 [15:11:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:17:26] we should automatize the patching process as well :D [15:17:30] at least to check whether they apply [15:21:39] 6Release-Engineering-Team, 5Patch-For-Review, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13): MW 1.27.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T125596#2011486 (10hashar) Branches cut and pushed. Have to verify special extensions have been properly set. [15:22:48] 6Release-Engineering-Team, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13): checkoutMediaWiki fails with Fatal error: Class undefined: MWWikiversions in /srv/mediawiki-staging/multiversion/updateBranchPointers on line 26 - https://phabricator.wikimedia.org/T126336#2011492 (10hashar) 3NEW a:3hashar [15:23:05] poor branch pointers [15:23:16] Fatal error: Class undefined: MWWikiversions in /srv/mediawiki-staging/multiversion/updateBranchPointers on line 26 ||| https://phabricator.wikimedia.org/T126336 [15:30:18] 10Deployment-Systems, 10Salt, 5Patch-For-Review: Provide mechanism to add/remove minions from git-deploy - https://phabricator.wikimedia.org/T74319#2011517 (10ArielGlenn) a:3ArielGlenn [15:36:50] 6Release-Engineering-Team, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13): checkoutMediaWiki fails with Fatal error: Class undefined: MWWikiversions in /srv/mediawiki-staging/multiversion/updateBranchPointers on line 26 - https://phabricator.wikimedia.org/T126336#2011546 (10Anomie) Looks like it was caused by {d052abd... [16:04:37] 10MediaWiki-Releasing, 5MW-1.26-release: Consider maybe backporting https://gerrit.wikimedia.org/r/#/c/249054/ to last stable - https://phabricator.wikimedia.org/T126344#2011667 (10Bawolff) 3NEW [16:06:25] !log Deleted corrupt integration-slave-precise-1003:/mnt/jenkins-workspace/workspace/mediawiki-core-php53lint/.git [16:06:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:32:11] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review: MediaWiki gate takes 20 minutes for extensions tests and 1.5 hour for at least a patch - https://phabricator.wikimedia.org/T126274#2011736 (10Anomie) >>! In T126274#2010465, @hashar wrote: > * Scribunto has been remo... [16:51:17] 10Deployment-Systems, 6Release-Engineering-Team, 5WMF-deploy-2016-02-09_(1.27.0-wmf.13): checkoutMediaWiki fails with Fatal error: Class undefined: MWWikiversions in /srv/mediawiki-staging/multiversion/updateBranchPointers on line 26 - https://phabricator.wikimedia.org/T126336#2011793 (10greg) [16:56:40] 10MediaWiki-Releasing, 5MW-1.26-release: Consider maybe backporting https://gerrit.wikimedia.org/r/#/c/249054/ to last stable - https://phabricator.wikimedia.org/T126344#2011799 (10greg) @Krinkle: As author of the patch, do you agree it should be backported to 1.26? [16:58:10] 10Deployment-Systems: make-wmf-branch used master instead of the branch specified in special_extensions - https://phabricator.wikimedia.org/T125663#2011801 (10JanZerebecki) 5Open>3Resolved a:3JanZerebecki Worked fine when branching wmf/1.27.0-wmf.13. [17:03:54] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Flow, 10Wikidata, and 2 others: Wikidata QUnit broken on branch REL1_25 causing other extensions to fail - https://phabricator.wikimedia.org/T126073#2011818 (10JanZerebecki) https://gerrit.wikimedia.org/r/#/c/269310/ is an implemen... [17:31:42] marxarelli: did you forget to merge this? https://gerrit.wikimedia.org/r/#/c/268154/ [17:31:50] or was there a problem with it? [17:32:46] zeljkof: no problem, just forgot! [17:32:49] is Jenkins unwell? [17:33:28] marxarelli: no rush, but it is the last commit left to close T125532... ;) [17:33:33] https://gerrit.wikimedia.org/r/#/c/269440/ and https://gerrit.wikimedia.org/r/#/c/269439/ do not appear to be going through gate-and-submit [17:34:42] Krinkle, are you designated Jenkins-herder these days? [17:35:02] andrewbogott: i'll kick zuul [17:36:09] Someone mind having a look at https://gerrit.wikimedia.org/r/#/c/269444/? [17:36:15] thcipriani or marxarelli, specifically. [17:38:03] !log reloading zuul fails with "failed to kill 13660: Operation not permitted" [17:38:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:39:12] ostriches: that seems fine. Does it need to get removed from extension-list now, too? [17:39:23] !log restart of zuul fails as well. old process cannot be killed [17:39:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:39:30] thcipriani: Ah yes prolly. [17:39:32] Amending [17:40:17] !log killed old zull process manually and restarted service [17:40:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:40:37] andrewbogott, Krenair: try your +2 again [17:40:45] ostriches: cool. I can +2 if you want to deploy? [17:40:54] marxarelli, I already sent it through without waiting for jenkins [17:41:49] * andrewbogott rechecks, waits optimistically [17:42:08] actually I did the two backports, jenkins can still try the master one [17:42:45] Krenair: yeah, I just +2’d that one to give jenkins a nudge [17:42:49] I think it’s still stuck though [17:43:12] hrm, integration.wikimedia.org/zuul is saying "status.json: Service Temporarily Unavailable" [17:43:26] didn't this happen yesterday? [17:43:32] PROBLEM - zuul_gearman_service on gallium is CRITICAL: Connection refused [17:43:45] i'll restart gearman [17:43:57] andrewbogott: Krenair isn't official jenkins anything, though he sometimes jumps in to help (like last night). Default POC is antoine then marxarelli / thcipriani (and lego/jan z) [17:44:02] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server [17:44:30] greg-g: ok, noted [17:44:48] greg-g, he asked about Krinkle [17:44:49] not me [17:45:03] gah, bad tab-complete [17:45:12] s/Krenair/Kr.inkle/ up there :) [17:45:13] I knew what he meant [17:45:14] looks like gearman is dead [17:45:42] !log "Failed: Unable to Connect" in jenkins when testing gearman connection [17:45:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:46:55] I don't/can't really help with jenkins much [17:47:10] I just moan when it breaks, especially during my deployment windows when people are expecting me to get changes through on time [17:47:51] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 2 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server [17:49:02] RECOVERY - zuul_gearman_service on gallium is OK: TCP OK - 0.000 second response time on port 4730 [17:49:27] (Or when I'm stealing someone else's empty deployment and want to avoid crossing into someone else's :)) [17:49:34] window and* [17:49:40] !log performed stop/start of zuul on gallium to restore zuul and gearman [17:49:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:49:58] !log disabled/enabled gearman in jenkins, connection works this time [17:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:50:28] andrewbogott: ok, try again. gearman is back in action [17:51:07] marxarelli: ok, ‘recheck' [17:52:08] marxarelli: seems better, thanks [17:52:20] andrewbogott: np [17:56:04] thanks marxarelli [17:59:07] Yippee, build fixed! [17:59:08] Project beta-scap-eqiad build #89194: 09FIXED in 7 min 30 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/89194/ [18:01:51] Krenair: for sure! [18:03:51] 7Blocked-on-RelEng, 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2012014 (10akosiaris) Tested and the above patch definitely solves the production problem with outdated build environ... [18:37:11] marxarelli: another reminder :) https://gerrit.wikimedia.org/r/#/q/topic:T112651,n,z [18:37:35] no rush there too, but if you have a few minutes please take a look and +1 the commits that you think make sense [18:37:58] I will squash them into one commit then [18:48:26] 10Deployment-Systems, 10Salt, 6operations, 5Patch-For-Review: [Trebuchet] Salt times out on parsoid restarts - https://phabricator.wikimedia.org/T63882#2012106 (10ArielGlenn) https://gerrit.wikimedia.org/r/#/c/269450/ to fix the runner so it has a timeout option. the other two places that need fixes are in... [18:52:43] 10Deployment-Systems, 6Release-Engineering-Team, 6Services: `git deploy service restart` asked for sudo password - https://phabricator.wikimedia.org/T126359#2012118 (10Yurik) 3NEW [18:54:32] 10Deployment-Systems, 6Release-Engineering-Team, 6Services, 6operations: `git deploy service restart` asked for sudo password - https://phabricator.wikimedia.org/T126359#2012131 (10mobrovac) [19:05:07] jzerebecki: Hi are you aware of any patches to the unit tests in wikidata that fixex the unit tests. Since if there are i could backport. [19:07:59] hey guys [19:08:08] do you have gerrit super-powers? [19:08:08] Hi. [19:08:17] we are in a silly situation here [19:08:30] we have our own mediawiki-services admin group in gerrit [19:08:33] but can't admin it [19:08:41] https://gerrit.wikimedia.org/r/#/admin/groups/630,members [19:09:32] mobrovac: Yes you have to get either QChris or an admin that has access. [19:10:27] mobrovac: what do you need? [19:10:49] mobrovac: If you leave a message on his or her user talk page here https://www.mediawiki.org/wiki/User_talk:QChrisNonWMF he may be able to do it. [19:10:58] marxarelli: i need the possibility of adding and removing people from that group [19:11:55] mobrovac: Or i think twentyafterfour has access but not sure. [19:13:04] I can add people to the group but I don't know how to grant that ability to mobrovac [19:13:44] heh [19:13:45] same here :) [19:14:07] twentyafterfour: Would adding people to that grant them permission anyways. [19:15:06] I don't know anything about gerrit [19:15:20] twentyafterfour: Ok. [19:16:15] mobrovac: Could you try asking QChris since he may be able to do that. [19:17:08] paladox: sorted it out, thnx :) [19:17:19] mobrovac: Ok. [19:17:23] * mobrovac writing an access ticket for this [19:17:40] asking people around to manage a group i used to be able to manage is not pretty [19:18:09] mobrovac: I think that is because we are migrating to phabricator. [19:18:20] could be [19:22:40] doubt it, permissions shouldn't be changing right now because of that. A task and a response from os.ritches will clarify all, I'm assuming :) [19:23:40] 6Release-Engineering-Team, 10Gerrit: Allow the Services team to manage the mediawiki-services gerrit group - https://phabricator.wikimedia.org/T126362#2012212 (10mobrovac) 3NEW [19:23:56] 6Release-Engineering-Team, 10Gerrit, 6Services: Allow the Services team to manage the mediawiki-services gerrit group - https://phabricator.wikimedia.org/T126362#2012226 (10mobrovac) [19:24:48] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<12.50%) [19:25:33] 6Release-Engineering-Team, 10Gerrit, 6Services: Allow the Services team to manage the mediawiki-services gerrit group - https://phabricator.wikimedia.org/T126362#2012239 (10demon) 5Open>3Resolved a:3demon Done, group is self-managing. [19:31:06] mobrovac: see? ^ :) [19:36:27] yay! [19:36:44] thnx ostriches, greg-g, highly appreciated [19:36:57] just gotta ask the right person :) [19:37:01] which is usually Chad [19:37:05] for most things :P [19:41:54] paladox: you mean the qunit test failures? those patches are in the individual parts that make up the wikidata build, all of them should be linked from the original bug from when the bug was added that I linked to from the rel branch related bug. [19:42:49] jzerebecki: I mean unit tests. The qunit tests were broken because of a missing qunit code. But unit tests are broken please see https://integration.wikimedia.org/ci/job/mwext-testextension-php53/1958/consoleFull [19:46:17] 10Beta-Cluster-Infrastructure, 6Release-Engineering-Team, 6Discovery, 3Discovery-Portal-Sprint, 5Patch-For-Review: Beta: submodule update reverts new portals commits - https://phabricator.wikimedia.org/T126061#2012333 (10ksmith) @thcipriani: This is causing a lot of frustration for our team. Do you have... [19:52:38] jenkins just -1'd my change due to an internal error: https://gerrit.wikimedia.org/r/269329 [19:55:47] hudson.plugins.git.GitException: Failed to fetch from git://gallium.wikimedia.org/mediawiki/core [19:56:21] try a 'recheck'? [19:58:26] 10Beta-Cluster-Infrastructure, 6Release-Engineering-Team, 6Discovery, 7Blocked-on-Operations, and 2 others: Beta: submodule update reverts new portals commits - https://phabricator.wikimedia.org/T126061#2012357 (10greg) [20:02:12] greg-g: Am I ok to deploy ORES? (not particularly now, more in general) [20:02:17] ori: the recheck seems to work. not sure if that was multiple jenkins jobs on the same vm influencing each other or the vms on the same host... migrating everything to nodepool based jobs should fix the first in the future. [20:03:21] (influencing as in cpu or disk starve) [20:05:40] Reedy: https://www.mediawiki.org/wiki/Extension:ORES ? [20:05:48] Yup, https://phabricator.wikimedia.org/T120923 [20:06:54] Reedy: looks like t's are being dotted and i's crossed, so yeah [20:07:13] Thanks, I'll start making patches then look for a window then [20:08:50] (03PS1) 10Reedy: Branch ORES for WMF deployments [tools/release] - 10https://gerrit.wikimedia.org/r/269476 (https://phabricator.wikimedia.org/T120923) [20:09:14] paladox: mh I remember this was changed in https://phabricator.wikimedia.org/T95897#1549297 but the errors look unrelated and unfamilar. [20:10:41] jzerebecki: thanks! [20:10:53] and 'grats on +2, btw [20:11:33] paladox: maybe we should just disable wikidata in the REL1_25 branch? I don't think there is even interest in using that. [20:11:36] thx [20:13:11] (03CR) 10Reedy: [C: 032] Branch ORES for WMF deployments [tools/release] - 10https://gerrit.wikimedia.org/r/269476 (https://phabricator.wikimedia.org/T120923) (owner: 10Reedy) [20:13:52] jzerebecki: you made him ragequit [20:14:35] Reedy: for 1 second [20:19:41] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [20:21:42] (03Merged) 10jenkins-bot: Branch ORES for WMF deployments [tools/release] - 10https://gerrit.wikimedia.org/r/269476 (https://phabricator.wikimedia.org/T120923) (owner: 10Reedy) [20:22:07] * Reedy wonders if we're supposed to be deploying with this pointing at labs... [20:22:20] Or whether it's for testing stuff [20:22:29] (I know, production services shouldn't be running on labs etc) [20:23:10] yeah, that part confuses me, it looks like a labs service? [20:23:17] I'll comment on task [20:23:35] Brad asked if it should be blocked on https://phabricator.wikimedia.org/T120923#1912770 [20:23:41] Well, https://phabricator.wikimedia.org/T106867 [20:24:51] That's the reason, prod shouldn't be using labs AFAIK. [20:25:29] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:25:55] Yeah [20:26:00] in like 99% of cases it shouldn't [20:26:15] I just wasn't sure if this was an "exception" because they needed the wider testing [20:27:03] 10Continuous-Integration-Infrastructure, 7Puppet: Need a better way of testing puppet patches for contint/integration stuff - https://phabricator.wikimedia.org/T126370#2012566 (10Legoktm) 3NEW [20:28:11] Reedy: anomie thanks, commented [20:28:36] [20:27:27] It's different, ORES as a service is ores.wmflabs.org and the ORES extension uses ORES service data [20:29:23] jzerebecki: Could this patch https://gerrit.wikimedia.org/r/#/c/232272/1 be a potatial fixer meaning could that fix the issue. Looking at the first error in https://integration.wikimedia.org/ci/job/mwext-testextension-php53/1958/consoleFull it has something to do with sitelinks but i could be wrong. [20:39:58] 10Deployment-Systems, 3Scap3: Give tasks clearer names - https://phabricator.wikimedia.org/T126372#2012629 (10ori) 3NEW [20:40:48] touche [20:40:50] ^^ [20:41:01] * greg-g goes to get lunch [20:42:04] 10Continuous-Integration-Infrastructure, 6Labs, 10Tool-Labs, 5Patch-For-Review, 7WorkType-Maintenance: Change sid pbuilder image name to 'unstable' - https://phabricator.wikimedia.org/T111097#2012644 (10akosiaris) >>! In T111097#2007656, @hashar wrote: > Funny side effect found on {T125999}. The labs/too... [20:47:15] * Reedy files a "Do stuff and things" task for ori [20:55:12] 10Deployment-Systems, 3Scap3: Give tasks clearer names - https://phabricator.wikimedia.org/T126372#2012711 (10demon) I'd rather us do {T67827}, tbh... [20:56:07] group0 looks to be content, gonna step afk and eat something [20:59:56] (03PS1) 10Legoktm: Don't default to `php` for $PHP_BIN [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269524 (https://phabricator.wikimedia.org/T126211) [21:00:03] Krenair: rolling out beta cleanup now [21:00:05] via puppetmaster [21:00:09] btw, http://en.wikipedia.beta.wmflabs.org/w/wiki.phtml didn't work [21:00:15] shows source code instead xD [21:05:24] 10Beta-Cluster-Infrastructure: deployment-bastion.eqiad.wmflabs puppet choking on l10nupdate git clone - https://phabricator.wikimedia.org/T126377#2012737 (10thcipriani) 3NEW [21:06:54] (03PS1) 10Legoktm: bin/php: Prevent against infinite recursion [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269527 (https://phabricator.wikimedia.org/T126211) [21:07:53] (03CR) 10Legoktm: [C: 032] "Changing the default is fine because of 24d3de3d83f586fe9a66691c652a54795a4ae20c" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269524 (https://phabricator.wikimedia.org/T126211) (owner: 10Legoktm) [21:08:04] (03CR) 10Legoktm: [C: 032] bin/php: Prevent against infinite recursion [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269527 (https://phabricator.wikimedia.org/T126211) (owner: 10Legoktm) [21:10:06] 10Beta-Cluster-Infrastructure: deployment-bastion.eqiad.wmflabs puppet choking on l10nupdate git clone - https://phabricator.wikimedia.org/T126377#2012756 (10thcipriani) Also, noteworthy that `beta::autoupdater` is only used on deployment-bastion: ``` root@deployment-puppetmaster:/var/lib/git/operations/puppet#... [21:15:31] legoktm: So, php55… [21:15:42] I'm working on it :) [21:16:29] specifically https://phabricator.wikimedia.org/T126211#2011274 right now [21:19:17] Kk. [21:19:20] Need anything from me? [21:19:45] Krinkle, cool, doesn't surprise me much [21:20:34] (03Merged) 10jenkins-bot: Don't default to `php` for $PHP_BIN [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269524 (https://phabricator.wikimedia.org/T126211) (owner: 10Legoktm) [21:20:36] (03Merged) 10jenkins-bot: bin/php: Prevent against infinite recursion [integration/jenkins] - 10https://gerrit.wikimedia.org/r/269527 (https://phabricator.wikimedia.org/T126211) (owner: 10Legoktm) [21:21:03] James_F: I'll ping if/when I do :) [21:21:07] Kk. [21:24:21] (03PS1) 10MaxSem: Add HtmlFormatter [integration/config] - 10https://gerrit.wikimedia.org/r/269530 (https://phabricator.wikimedia.org/T125001) [21:30:23] (03PS1) 10Legoktm: Add flavored versions of composer-test, and rename it too [integration/config] - 10https://gerrit.wikimedia.org/r/269533 [21:31:51] (03CR) 10Paladox: [C: 031] Add flavored versions of composer-test, and rename it too [integration/config] - 10https://gerrit.wikimedia.org/r/269533 (owner: 10Legoktm) [21:32:25] (03CR) 10jenkins-bot: [V: 04-1] Add flavored versions of composer-test, and rename it too [integration/config] - 10https://gerrit.wikimedia.org/r/269533 (owner: 10Legoktm) [21:32:31] (03PS2) 10Legoktm: Add flavored versions of composer-test, and rename it too [integration/config] - 10https://gerrit.wikimedia.org/r/269533 [21:34:26] (03CR) 10Paladox: [C: 031] Add flavored versions of composer-test, and rename it too [integration/config] - 10https://gerrit.wikimedia.org/r/269533 (owner: 10Legoktm) [21:37:42] (03CR) 10Legoktm: [C: 032] Add flavored versions of composer-test, and rename it too [integration/config] - 10https://gerrit.wikimedia.org/r/269533 (owner: 10Legoktm) [21:39:58] (03Merged) 10jenkins-bot: Add flavored versions of composer-test, and rename it too [integration/config] - 10https://gerrit.wikimedia.org/r/269533 (owner: 10Legoktm) [21:40:18] !log deploying https://gerrit.wikimedia.org/r/269533 [21:40:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:50:51] !log disabling puppet on all trusty/precise CI slaves [21:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:51:31] This is gonna be fun. [21:52:41] !log cherry-picked https://gerrit.wikimedia.org/r/#/c/269370/ onto integration-puppetmaster [21:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:53:34] !log enabling puppet on just integration-slave-trusty-1012 [21:53:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:55:16] legoktm@integration-slave-trusty-1012:~$ PHP_BIN=php php --version [21:55:16] $PHP_BIN is set to 'php', causing infinite recursion! [21:57:45] Have the composer-php53 tests been removed? [21:57:50] https://gerrit.wikimedia.org/r/#/c/269401/ [21:59:03] uhh [21:59:11] wtf [21:59:14] Not intentionally. [21:59:20] INFO:jenkins_jobs.builder:Creating jenkins job composer-php53 [21:59:34] https://integration.wikimedia.org/ci/job/composer-php53/ [22:00:58] okay wtf [22:01:33] 2016-02-09 22:00:39,273 ERROR zuul.Gearman: Job is not registered with Gearman [22:01:33] 2016-02-09 22:00:39,273 INFO zuul.Gearman: Build complete, result NOT_REGISTERED [22:01:55] ugh :/ [22:02:56] !log reloading zuul to see if it'll pickup the new composer-php53 job [22:02:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:03:45] nope, wtf [22:08:02] bd808, marxarelli: have you ever seen gearman not pick up a new jenkins job? [22:08:34] 10Deployment-Systems, 6Performance-Team, 10Traffic, 6operations, 5Patch-For-Review: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime - https://phabricator.wikimedia.org/T99096#2012953 (10Krinkle) [22:08:42] no, but I've only deployed zuul changes like 3 times [22:09:05] hmm [22:10:36] legoktm@gallium:~$ /usr/local/bin/zuul-gearman.py status | grep composer [22:10:39] doesn't show them [22:14:28] hey, do mw/vendor changes just ride the train? [22:15:02] yes [22:15:36] (03PS1) 10Legoktm: Revert "Add flavored versions of composer-test, and rename it too" [integration/config] - 10https://gerrit.wikimedia.org/r/269540 [22:15:44] (03CR) 10Legoktm: [C: 032] Revert "Add flavored versions of composer-test, and rename it too" [integration/config] - 10https://gerrit.wikimedia.org/r/269540 (owner: 10Legoktm) [22:18:07] hmm [22:18:28] I should probably re-enable puppet everywhere... [22:18:33] (03Merged) 10jenkins-bot: Revert "Add flavored versions of composer-test, and rename it too" [integration/config] - 10https://gerrit.wikimedia.org/r/269540 (owner: 10Legoktm) [22:18:53] !log re-enabling puppet on all CI slaves [22:18:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:22:04] Error 400 on SERVER: Could not find class role::ci::slave::labs for integration-slave-precise-1003.integration.eqiad.wmflabs on node integration-slave-precise-1003.integration.eqiad.wmflabs? [22:26:21] 5Gerrit-Migration, 6Phabricator, 6Repository-Admins, 10pywikibot-core: Migrate Pywikibot to Differential code review - https://phabricator.wikimedia.org/T95526#2013065 (10Aklapper) p:5Triage>3Normal [22:28:55] legoktm: Has it been fixed? [22:29:34] sorry, doing that now [22:29:39] !log deploying https://gerrit.wikimedia.org/r/269540 [22:29:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:34:24] hoo: I reverted, so it should pass now, sorry [22:36:01] legoktm: sorry, was away. still having trouble with gearman? [22:36:10] yes [22:36:18] I reverted for now, but I'd like to see that change go out [22:36:19] there were some issues this morning and i had to restart zuul a couple of times [22:36:40] basically I deployed https://gerrit.wikimedia.org/r/269533 but zuul/gearman said "composer-php53" job didn't exist [22:36:52] legoktm: was the job running in jenkins, or did it never get created from the gerrit event? [22:37:01] it never got created [22:37:09] [14:01:32] 2016-02-09 22:00:39,273 ERROR zuul.Gearman: Job is not registered with Gearman [22:37:09] [14:01:32] 2016-02-09 22:00:39,273 INFO zuul.Gearman: Build complete, result NOT_REGISTERED [22:37:41] and when I run `/usr/local/bin/zuul-gearman.py status | grep composer` on gallium, I don't see them either [22:38:17] the new jobs are composer-(php53|php55|hhvm), but only php53 is in the zuul config so far [22:38:21] i'm seeing two zuul-server processes on gallium [22:38:23] is that normal? [22:38:28] uhmmm [22:38:29] saw that earlier as well [22:39:29] a few of the jenkins jobs for the integration/config repo will call zuul-server to validate the config, but I don't think there should be 2 long-lived processes... [22:40:28] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: Make puppet provider for scap3 - https://phabricator.wikimedia.org/T113072#2013126 (10ArielGlenn) @akosiaris said he'd give the patch one last review, after that I'll merge it and babysit patches for phab deployment as long as @mmodell is around. [22:40:41] hmm, looks like it does fork [22:40:52] the ppid of the second process is the pid of the first [22:41:48] 10Deployment-Systems, 3Scap3, 5Patch-For-Review: Make puppet provider for scap3 - https://phabricator.wikimedia.org/T113072#2013132 (10mmodell) @arielglenn: I'm almost done with {T125851} [22:42:07] okay... [22:42:30] I'm not really sure how gearman is expected to pick up new jobs, that should happen when zuul reloads right? [22:43:14] afaik, reloading zuul picks up changes to the layout, jobs that should be scheduled upon gerrit events [22:43:34] but if you're creating a new job, that needs to be created first [22:43:41] before adding it to the zuul layout [22:43:53] did you create the job? [22:44:03] yes [22:44:19] https://integration.wikimedia.org/ci/job/composer-php53/ [22:44:36] I also deleted it and recreated it once to be extra sure [22:45:25] yeah, layout looks ok too [22:49:49] legoktm: hmm, is the `node:` specification in jjb/php.yaml right? [22:50:01] yes [22:50:20] i think it's supposed to be `contintLabsSlave && phpflavor-{phpflavor}` [22:50:26] not php-{phpflavor} [22:50:51] * legoktm facepalms [22:50:52] thus, no nodes match and it doesn't get scheduled? [22:51:12] Yep, that makes sense [22:51:18] weee! [22:51:22] jjb is super fun. [22:51:23] (03PS1) 10Legoktm: Add flavored versions of composer-test, and rename it too (re-do) [integration/config] - 10https://gerrit.wikimedia.org/r/269547 [22:52:09] (03PS2) 10Legoktm: Add flavored versions of composer-test, and rename it too (re-do) [integration/config] - 10https://gerrit.wikimedia.org/r/269547 [22:52:46] marxarelli: ^ wanna review that? :) [22:52:55] sure thing [22:54:55] (03CR) 10Dduvall: [C: 032] Add flavored versions of composer-test, and rename it too (re-do) [integration/config] - 10https://gerrit.wikimedia.org/r/269547 (owner: 10Legoktm) [22:57:11] (03Merged) 10jenkins-bot: Add flavored versions of composer-test, and rename it too (re-do) [integration/config] - 10https://gerrit.wikimedia.org/r/269547 (owner: 10Legoktm) [22:57:29] marxarelli: thanks :) [22:57:34] !log deploying https://gerrit.wikimedia.org/r/269547 [22:57:35] legoktm: np! [22:57:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:57:48] legoktm: ah, you beat me to it :) [22:58:33] marxarelli: uh, still not working :/ [22:58:34] 2016-02-09 22:57:58,817 ERROR zuul.Gearman: Job is not registered with Gearman [22:58:34] 2016-02-09 22:57:58,817 INFO zuul.Gearman: Build complete, result NOT_REGISTERED [23:01:21] marxarelli: how terrible would it be to restart zuul? [23:01:34] blerg! [23:01:47] um, not more terrible than jobs not getting created :) [23:01:52] go for it [23:02:18] you're basically risking it not picking up some gerrit event while it's restarting [23:02:29] the window is pretty small though [23:02:39] !log gracefully restarting zuul [23:02:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:03:55] * Restarting Zuul Server zuul ... waiting for jobs to complete ...................................................................................... [23:04:00] heh it just keeps adding more dots [23:05:00] moar DOTs! [23:05:24] There'll be a bunch of dots waiting for MW to merge. [23:05:36] legoktm: still going? [23:05:51] (03CR) 10Paladox: "Could the reason why the test won't work because we removed UbuntuPrecise from the test." [integration/config] - 10https://gerrit.wikimedia.org/r/269547 (owner: 10Legoktm) [23:05:55] yeah, one job left [23:06:45] 10Continuous-Integration-Config, 7Regression: Make sure mediawiki-core-phpcs job is running under HHVM - https://phabricator.wikimedia.org/T126394#2013245 (10Legoktm) 3NEW a:3Legoktm [23:09:02] BTW, whatever it is that shows the status.json file isn't compatible with whatever writes to it. [23:09:29] (It contains HTML, but it's inserted as text.) [23:09:36] (03PS1) 10Legoktm: Force mediawiki-core-phpcs to use HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/269551 (https://phabricator.wikimedia.org/T126394) [23:10:29] OK, queues empty except for post-merge which we can ignore. [23:10:38] marxarelli: zuul restarted, but /usr/local/bin/zuul-gearman.py status doesn't show it still :( [23:10:54] :( [23:11:49] legoktm: Restart gearman? [23:12:07] (It'd be bad.) [23:12:12] I would've thought that restarting zuul also restarts gearman [23:13:14] I thought restarting gearman dropped all events and so was different. [23:13:25] But (a) I might be mis-remembering, and (b) it might well have changed. [23:15:31] (03CR) 10Legoktm: [C: 032] Force mediawiki-core-phpcs to use HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/269551 (https://phabricator.wikimedia.org/T126394) (owner: 10Legoktm) [23:15:46] legoktm: do you see anything relevant in /var/log/zuul/debug.log? [23:16:49] (03Merged) 10jenkins-bot: Force mediawiki-core-phpcs to use HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/269551 (https://phabricator.wikimedia.org/T126394) (owner: 10Legoktm) [23:17:58] !log deploying https://gerrit.wikimedia.org/r/269551 [23:18:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:18:32] marxarelli: uh, that file is really verbose :/ [23:18:34] legoktm: gah. /var/log/zuul/gearman-server.log shows "gear.Server: Exception in poll loop" [23:18:56] That's… uninformative. [23:18:58] oh nm [23:19:26] that's hella old [23:19:57] legoktm: could the error be the result of us switching from job to job-template [23:20:35] paladox__: Yes, but that was the point [23:20:37] i am seeing quite a few exceptions in /var/log/zuul/error.log however [23:21:18] marxarelli: the ones like Exception: Gerrit error executing gerrit review --project operations/mediawiki-config --message "Gate pipeline build succeeded. are because someone force-merged [23:21:30] ah, ok [23:21:44] other than that it's just the ones you mentioned already [23:22:37] is gearman a separate process from zuul-server? [23:23:45] separate process i believe, but forked from zuul-server [23:23:47] i _think_ [23:24:44] which we restarted... [23:26:01] hi hashar [23:27:17] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 3Scap3, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2013324 (10mobrovac) [23:31:26] Legoktm: should we make that new composer job experimental meaning put it in experimental: until we fix the problem. [23:33:26] 3Scap3, 10scap: Give tasks clearer names - https://phabricator.wikimedia.org/T126372#2013356 (10greg) [23:33:28] 3Scap3, 10scap: sync-masters slow on mira - https://phabricator.wikimedia.org/T125108#2013357 (10greg) [23:33:30] 3Scap3, 10scap: scap3 host restart batching should allow for delay between batches - https://phabricator.wikimedia.org/T122914#2013358 (10greg) [23:33:31] bug spam incoming! [23:33:32] 3Scap3, 10scap: Allow scap3 to read target host list from stdin - https://phabricator.wikimedia.org/T122913#2013359 (10greg) [23:33:34] 3Scap3, 10scap: Require sanity test to pass before syncing files to all web servers - https://phabricator.wikimedia.org/T121597#2013360 (10greg) [23:33:36] 3Scap3, 10scap: sync-wikiversions not syncing wikiversions.json with mira - https://phabricator.wikimedia.org/T121585#2013361 (10greg) [23:33:38] 3Scap3, 10ContentTranslation-cxserver, 10scap: Deploy CXServer with scap3 - https://phabricator.wikimedia.org/T120104#2013363 (10greg) [23:33:40] 3Scap3, 10scap: Bring co-master / fanout capabilities to `deploy` and friends - https://phabricator.wikimedia.org/T121276#2013362 (10greg) [23:33:43] 3Scap3, 10Parsoid, 10scap: Deploy Parsoid with scap3 - https://phabricator.wikimedia.org/T120103#2013364 (10greg) [23:33:45] 3Scap3, 10Graphoid, 10scap: Deploy Graphoid with scap3 - https://phabricator.wikimedia.org/T120102#2013365 (10greg) [23:33:48] 3Scap3, 10scap, 7WorkType-NewFunctionality: create a scap3 command to bootstrap a new deployment repo - https://phabricator.wikimedia.org/T118760#2013367 (10greg) [23:33:50] 3Scap3, 10scap, 7Documentation: Document Scap3's `--limit` flag - https://phabricator.wikimedia.org/T118745#2013368 (10greg) [23:33:53] 3Scap3, 10Analytics-EventLogging, 10scap: Move EventLogging service to scap3 - https://phabricator.wikimedia.org/T118772#2013366 (10greg) [23:33:55] 3Scap3, 10scap, 7WorkType-NewFunctionality: Need a way to see config diffs in Scap - https://phabricator.wikimedia.org/T118206#2013371 (10greg) [23:33:57] 6Release-Engineering-Team, 3Scap3, 10scap, 7Security-General: Scap should apply security patches - https://phabricator.wikimedia.org/T118478#2013370 (10greg) [23:33:59] 3Scap3, 10scap, 7Documentation: End user tutorial docs for Scap - https://phabricator.wikimedia.org/T118738#2013369 (10greg) [23:34:01] 3Scap3, 10scap, 7Documentation: Document Scap3 config-deploy - https://phabricator.wikimedia.org/T116634#2013372 (10greg) [23:34:03] 3Scap3, 10scap, 7WorkType-NewFunctionality: Remove apache dependency from scap3 deployment host - https://phabricator.wikimedia.org/T116630#2013374 (10greg) [23:34:05] 3Scap3, 10scap: File ownership differences between Scap3 and Trebuchet - https://phabricator.wikimedia.org/T116632#2013373 (10greg) [23:34:07] 3Scap3, 10Mathoid, 10scap: Deploy Mathoid with scap3 - https://phabricator.wikimedia.org/T116338#2013377 (10greg) [23:34:10] 3Scap3, 10RESTBase-Cassandra, 10scap: Deploy Cassandra with scap3 - https://phabricator.wikimedia.org/T116340#2013376 (10greg) [23:34:13] 3Scap3, 10RESTBase, 10scap: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#2013379 (10greg) [23:34:16] 3Scap3, 10Citoid, 10scap: Deploy Citoid with scap3 - https://phabricator.wikimedia.org/T116337#2013378 (10greg) [23:34:27] 3Scap3, 3releng-201516-q3, 10scap, 7WorkType-NewFunctionality: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#2013386 (10greg) [23:34:28] 3Scap3, 10scap: scap3 should repack / pack-refs git repos under /srv/deployment - https://phabricator.wikimedia.org/T112509#2013388 (10greg) [23:34:31] 3Scap3, 10scap, 5Patch-For-Review: Support smooth transitions from Trebuchet managed deploys - https://phabricator.wikimedia.org/T113107#2013387 (10greg) [23:34:39] 3Scap3, 10scap, 7Epic: EPIC: Scap3 should implement the services team requirements - https://phabricator.wikimedia.org/T109535#2013395 (10greg) [23:34:44] heh. today is a bad day to have a ping for "scap" [23:34:52] yeeep :) [23:34:59] * bd808 will survive [23:37:10] 3Scap3, 10Graphoid, 10scap: Deploy Graphoid with scap3 - https://phabricator.wikimedia.org/T120102#2013439 (10Yurik) @thcipriani, service restart is a tricky one - we usually don't want to restart everything at once, just in case something is badly broken, and we want to revert. When we have 2-4 servers han... [23:37:17] legoktm: hello, sorry was busy this evening. I had to revert the php slave-script due to some infinite recursion but havent looked at the root cause [23:37:36] legoktm: one thing I noticed is that when merging it we need to do a salt to git pull the repo on all instance [23:38:18] greg-g: my gmail is shouting from your mass phab task edits [23:38:20] hashar: I figured out that global-set-env.sh was setting the default $PHP_BIN to "php" which caused the recursion, I fixed that by having it default to php5, and adding a check in bin/php to fail if it is set to "php" [23:38:30] ETOOMANYPHABMAILS [23:38:56] hashar: I re-cherry-picked it onto the puppet master and everything related to it has been fine for the past 2 hours [23:39:06] other stuff is broken though :/ [23:39:18] legoktm: ah I havent though about global-set-env good job! [23:39:27] gearman claims a job that I just deployed to jenkins doesn't exist [23:39:56] and as we say in France: you can make an Omelette without break eggs [23:39:57] marxarelli also looked into it, but I don't think he was able to figure out what is wrong either [23:40:14] i haven't yet :( [23:40:16] have you tried nuclear option: kill jenkins :) [23:40:35] 6Release-Engineering-Team, 6Services, 10Trebuchet, 6operations: `git deploy service restart` asked for sudo password - https://phabricator.wikimedia.org/T126359#2013457 (10greg) [23:40:46] uh, [23:40:52] I restarted zuul [23:40:58] the Jenkins Gearman function register jobs as gearman function to the Zuul gearman server [23:41:04] but I assumed the issue would be on the zuul/gearman side, not jenkins? [23:41:06] ahhh [23:41:12] and you can get a list of registered functions with /usr/local/bin/zuul-gearman.py status [23:41:17] details at https://www.mediawiki.org/wiki/Continuous_integration/Zuul?redirect=no#Debugging [23:41:19] yeah, I've been using that :) [23:41:33] mobrovac: I just love you all soooo much [23:41:48] legoktm: marxarelli what is the job not being registerd? [23:41:53] haha [23:42:04] hashar: composer-php53 (and composer-php55 composer-hhvm) [23:42:08] hashar: "build:composer-php53 unique: a24f3dcb62134551bbda050d73fb5988> is not registered with Gearman" [23:42:54] build:composer-package-php53 0 0 24 [23:42:55] it is there [23:42:58] with 24 available workers [23:43:28] that's composer-package-php53 though, not composer-php53 [23:43:41] eek [23:45:54] one sure thing is that Zuul scheduler ask the Gearman server whether the build:job-name-here is available and if not throw the NOT_REGISTERED [23:45:59] so it must be the Jenkins gearman plugin being confused [23:46:08] so afaict, that status request is what zuul does internally to decide whether a job is registered or not [23:46:10] WAIT WAIT [23:46:13] I'm an idiot [23:46:27] looking at https://github.com/openstack-infra/zuul/blob/master/zuul/launcher/gearman.py#L187 [23:46:30] I never updated the jobs in jenkins after marxarelli found the issue [23:46:40] ooooooooooh [23:46:43] * hashar !google the issue [23:46:46] ':) [23:46:52] legoktm@gallium:~$ /usr/local/bin/zuul-gearman.py status | grep composer-php53 [23:46:52] build:composer-php53:contintLabsSlave 0 0 23 [23:46:52] build:composer-php53:phpflavor-php53 0 0 23 [23:46:52] build:composer-php53 0 0 23 [23:47:03] * legoktm slaps himself [23:47:04] ... [23:47:17] https://integration.wikimedia.org/ci/job/composer-php53/jobConfigHistory/showDiffFiles?timestamp1=2016-02-09_22-06-42×tamp2=2016-02-09_23-44-46 [23:47:27] >.< [23:47:30] ahhhoh [23:47:43] so the jobs node: was not matching any slave right ? [23:47:48] haha, it's cool. i learned way more about zuul/gearman today [23:47:49] right [23:47:55] in this case the Jenkins Gearman plugin would not registered the functions [23:48:10] fun trick that I am afraid to use [23:48:24] note how the job 'composer-php53' is registered as three functions [23:48:24] 1) composer-php53 [23:48:31] 2) build:composer-php53:phpflavor-php53 [23:48:36] 3) build:composer-php53:contintLabsSlave [23:49:16] when the Zuul scheduler wanna trigger composer-php53 , if one inject a parameter ZUUL_NODE=phpflavor-php53 , it will run 'build:composer-php53:phpflavor-php53' [23:49:38] i.e. from the Zuul parameter function we can more or less select the Jenkins label to run the job on [23:50:05] but I digress [23:51:43] okay, I'm going to sit and not do anything for an hour during SWAT, and then keep going down https://www.mediawiki.org/wiki/User:Legoktm/PHP_5.5 [23:51:44] 3Scap3, 10scap: refreshCdbJsonFiles should be rewritten in python - https://phabricator.wikimedia.org/T125685#2013514 (10greg) [23:51:47] 6Release-Engineering-Team, 10scap: sync-dir doesn't like 6Release-Engineering-Team, 3Scap3, 10scap: create an app to audibilize logstash events - https://phabricator.wikimedia.org/T123419#2013516 (10greg) [23:51:54] 6Release-Engineering-Team, 10scap: scap-purge-l10n-cache hanging - https://phabricator.wikimedia.org/T122008#2013517 (10greg) [23:51:56] 3Scap3, 10scap, 7WorkType-NewFunctionality: Build a dependency graph resolver for deployment stages and tasks - https://phabricator.wikimedia.org/T120684#2013519 (10greg) [23:51:58] 3Scap3, 10scap: Need a way to restart services without deploying via scap - https://phabricator.wikimedia.org/T119449#2013521 (10greg) [23:52:00] 3Scap3, 10scap: scap3 configuration selection is confusing - https://phabricator.wikimedia.org/T120410#2013520 (10greg) [23:52:01] legoktm: marxarelli: kudos ! [23:52:02] 3Scap3, 10scap: Scap3 needs a way to handle large binary file transport - https://phabricator.wikimedia.org/T119443#2013522 (10greg) [23:52:04] 3Scap3, 6Discovery, 10scap: Create scripts for automatic deployment for wikimedia/portals - https://phabricator.wikimedia.org/T114694#2013523 (10greg) [23:52:05] heading to bed [23:52:06] 3Scap3, 10scap: Investigate parallel-ssh library once paramiko supports hmac-256/hmac-512 - https://phabricator.wikimedia.org/T114110#2013524 (10greg) [23:52:09] 3Scap3, 10scap, 5Patch-For-Review: Make puppet provider for scap3 - https://phabricator.wikimedia.org/T113072#2013525 (10greg) [23:52:10] 6Release-Engineering-Team, 10scap: Scap should abort early when Keyholder is not armed - https://phabricator.wikimedia.org/T111062#2013526 (10greg) [23:52:12] 3Scap3, 10scap, 7WorkType-NewFunctionality: Scap3 check to monitor logstash and detect changes in error frequency - https://phabricator.wikimedia.org/T110068#2013527 (10greg) [23:52:14] 3Scap3, 3releng-201516-q2, 3releng-201516-q3, 10scap: [keyresult] Migrate all Service team owned services and MW deploys to scap3 - https://phabricator.wikimedia.org/T109926#2013528 (10greg) [23:52:18] good night hashar! [23:52:20] 3Scap3, 10scap, 5Patch-For-Review: Scap3 needs to be deployed on RESTBase boxes and needs a group on tin - https://phabricator.wikimedia.org/T109862#2013530 (10greg) [23:52:22] 6Release-Engineering-Team, 3Scap3, 10scap, 7WorkType-NewFunctionality: Instrument scap for "scap duration" KPI - https://phabricator.wikimedia.org/T108743#2013532 (10greg) [23:52:28] 3Scap3, 10scap, 7Epic: EPIC: Future Deployment Tooling - https://phabricator.wikimedia.org/T101023#2013536 (10greg) [23:52:32] 3Scap3, 6Performance-Team, 6operations, 10scap, 7HHVM: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#2013535 (10greg) [23:52:35] good night hashar :) [23:52:42] 3Scap3, 10scap: [scap] Consolidate scripts as sub-commands of `scap` - https://phabricator.wikimedia.org/T67827#2013543 (10greg) [23:52:43] hashar: goodnight! [23:52:51] noisy in here :) [23:52:55] sorry :( [23:52:58] greg-g: turn down your phab [23:53:05] WHAT?! [23:53:11] TURN DOWN YOUR PHAB [23:53:16] YOUR PHAB [23:53:21] MY CAB?! [23:53:25] ... [23:53:29] na [23:53:30] your fat [23:53:40] :( :( [23:53:46] just kidding [23:53:46] I was going to go for a run, you know! [23:54:25] Someone should add remembering state to the list of hardest problems in CS [23:54:54] wait, is it hard [23:55:06] int i; [23:55:27] 10Continuous-Integration-Config, 5Patch-For-Review, 7Regression: Make sure mediawiki-core-phpcs job is running under HHVM - https://phabricator.wikimedia.org/T126394#2013563 (10Legoktm) 5Open>3Resolved https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs/buildTimeTrend looks normal again. [23:55:53] legoktm: \o/ [23:56:07] Composer tests now works [23:56:21] https://integration.wikimedia.org/ci/job/composer-php53/12/console [23:56:44] awesome [23:56:49] neat! [23:57:08] have sweet dreams! [23:58:29] legoktm: https://www.mediawiki.org/w/index.php?title=User:Legoktm/PHP_5.5&diff=2043518&oldid=2043297 BTW. :-) [23:58:54] oh, I didn't see that [23:59:02] ima gonna celebrate with a beer, then morn akoval's departure by pouting [23:59:20] mourn