[00:00:05] btw I made good progress on the Zuul packaging :) [00:00:14] I have packages for our Precise and Trusty distros [00:00:15] Yeah, saw that. Cool stuff man. [00:00:25] and pairing with filippo tomorrow for a first review [00:00:40] today I have been packaging nodepool from scratch [00:00:45] doesn't quite work yet though :( [00:01:22] from there the puppet part will just be package { 'zuul': ensure => present } [00:01:33] hasharDinner: Is the maintenance window done / Zuul/Gearman/Jenkins back to normal state? [00:01:42] Just making sure I give it some time before I deploy stuff [00:01:49] but I will need to document how to update zuul / apply patches in the .deb packages etc [00:02:01] Krinkle: yeah it is back [00:02:15] Krinkle: it seems it reset to the default initial value when one does a zuul reload [00:02:20] (the default being 20 jobs) [00:02:33] maybe we can get it graphed at some point [00:03:28] Hm.. the queue is showing old jobs again [00:03:35] just went from "20 minutes ago" to one from "47 minutes ago" [00:03:42] One that was already aborted [00:03:50] Did it restore an old queue? [00:07:47] 10Continuous-Integration: reduce copies of mediawiki/core in workspaces - https://phabricator.wikimedia.org/T93703#1143683 (10hashar) 3NEW [00:07:57] hasharDinner: OK to restart Zuul? [00:08:05] I've snapshotted the changes, will resubmit [00:08:07] ohhhh [00:08:40] Krinkle: apparently the job at the top of the queue was broken/deadlocked/whatever [00:08:48] it has disappeared now [00:09:19] o/ [00:09:35] the jobs being processed are already merged though [00:09:42] whenever zuul/jenkins is stable, I'll re+2 all the l10n-bot jobs that got dropped due to running out of disk space [00:09:59] some of them. [00:10:04] I'll abort the ones that really are merged [00:10:24] Hm.. though that would dislodge the queue [00:10:31] I have aborted one [00:10:41] but that does not get the change out of the queue :( [00:11:22] hasharDinner: So regarding the auto-queue system. Is there a way to override that and do it based on project somehow? E.g. only in one direction (e.g. mwext depend on mwcore, and nothing else) [00:11:36] that is 80% of the case covered, and removes 90% of the bogus load [00:11:36] nop [00:11:45] E.g. app/ios doesn't even have jobs right now [00:11:48] that is why we had so many jobs [00:11:53] but has two wait for 4 mediawiki jobs to finish on Zend [00:12:22] I think the app/ios developers manually submit anyways so we could just remove it from zuul anyways [00:12:27] apps/ios/wikimedia has a "phplint" job [00:12:29] and services/* has no business waiting for extensions/* [00:12:34] hasharDinner: Yeah but conditionally run [00:12:37] and that job is shared with mediawiki repositories [00:12:44] so it is considered to have something in common [00:12:50] Yes, I know, but that's the automatic linking, which doesnt' scale for our purposes anymore. [00:12:54] also, mwcore + mysql + zend is signficantly slower than sqlite [00:13:05] Yeah, I filed a bug for that [00:13:07] well it worked fine until you broke it ? :) [00:13:09] on hhvm it's pretty good [00:13:14] but for Zend it's 5 minutes slower [00:13:27] can we just put zend back on sqlite? [00:13:40] and qunit + hhvm use mysql [00:14:01] well, that delay is not causing the issues. and making it inconsistent wont help stability I think. [00:14:12] * legoktm nods [00:14:22] But I've thought about that too. [00:14:31] or drop Zend? [00:14:33] The problem is, sqlite is itself unstable and will cause job failures. [00:14:45] can't drop zend yet [00:14:54] well ideally we would test on both mysql and sqlite [00:15:05] I'd be happy to drop Zend from Wikimedia Jenkins and defer that to low-prio testing after merge via Travis CI. Same for all the other php versions we do already (php 5.4, 5.5, 5.6) [00:15:16] supporting sqlite is a mwcore issue, not CI. [00:15:48] right now our contract is to test mw with database, that can be mysql or sqlite. sqlite is not a priority in itself. mysql is more important to have passing (if we could only pick one) [00:15:50] we can't drop zend from CI, because people will still merge stuff with arrays like [ this ], and break anyone trying to run a maint script [00:15:54] right [00:16:00] agree [00:16:31] to support sqlite, mwcore has a lot of refactoring to do to support the proper transactions and locking issues it caused. [00:16:50] from what I understand, it was working on Precise because that sqlite version had a bug in it. Making it ignore potential locks [00:16:59] lol [00:17:15] Tim went into low level debugging and discovered it. Quite impressive. [00:17:33] I think moving mwext-zend jobs over to labs would at least make the l10n-bot load less bad since right now they're just running on gallium + lanthanum [00:17:53] Yeah, I've got a patch for that waiting to deploy. [00:18:18] (03CR) 10Krinkle: [C: 032] "Let's try this." [integration/jenkins] - 10https://gerrit.wikimedia.org/r/198757 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [00:18:44] ^ is also causing slaves to break every 3 days (if I don't clear /tmp/npm-*) [00:18:54] :/ [00:19:05] https://phabricator.wikimedia.org/T90836 [00:20:39] isn't npm honoring TMPDIR or something? [00:20:45] could be made to point to $WORKSPACE/tmp [00:20:58] hasharDinner: That's what I did, exactly. [00:21:03] It does honour TMPDIR [00:21:06] but that's /tmp by default [00:21:12] I changed it for mw jobs only [00:21:28] well [00:21:30] Krinkle: are you still planning to restart zuul? or should I +2 all the l10n changes again? [00:21:44] legoktm: It seems the queue is recovering. I haven't stoped it [00:21:47] +1 on restart [00:21:55] there are a bunch of changes that got force merge [00:21:59] there's 10 left in the queue. [00:22:04] most are not merged already [00:22:05] and for some reason zuul locks / wait on them [00:22:23] it does remove them eventually [00:22:27] it's just silly for some reason [00:22:29] the four first items in gate and submit I have aborted the jobs [00:22:41] each will take apparently at least 5 minutes to proceed [00:22:42] OK. [00:22:47] Well, then we can just stop and start it [00:23:02] so imho it is going to take half an hour or so [00:23:08] there must be something in zuul scheduler waiting [00:23:16] !log Restarted Zuul [00:23:18] Logged the message, Master [00:23:41] I think some of them weren't merged yet but if they're aborted, then we can just restart :) [00:24:35] self.log.debug("Waiting for %s to appear in git repo" % (change)) [00:24:35] if self.waitForRefSha(change.project, ref, change._ref_sha): [00:24:49] def waitForRefSha [00:24:54] time.sleep(self.replication_retry_interval) [00:25:09] >.> [00:25:11] hasharDinner: Ha, this might be realted. [00:25:16] in short [00:25:24] it retries every 5 seconds for up to 300 seconds [00:25:31] or a 5 minutes deadlock [00:25:42] waiting for the change that has been merged to show up somewhere [00:25:47] hasharDinner: Earlier this weekend MobileFrontend got a bad unit test in itself exposed (a value in core changes from 16 to 32 string in javascript). The unit test was updated, but the production slaves were still running medaiwiki core from over a month ago. [00:25:57] that is in zuul/trigger/gerrit.py [00:26:00] It seemed okay at some point, but I'm not too happy about that. [00:26:05] could it be that replication is broken? [00:26:22] zuul merger is fine, but the gerrit replag that we use for hard linking in prod slaves [00:26:55] well [00:27:02] the Jenkins git plugin just use it as a reference [00:27:10] and get the other objects from zuul.eqiad.wmnet [00:27:11] then checkout [00:27:24] would depend on the job that had the issue I guess [00:27:25] it should, yes. but it was no doubt running a copy of mediawiki with the Thanks extension that was over a month old. [00:27:31] recheck fixed it, but it's concerning. [00:27:43] doh [00:28:20] as for slow zend tests with mysql, we're using the database too much. Our tests are shit. [00:28:28] +3 [00:28:33] Well, that's an over statement, but there's a few things here and there that are using it too much. [00:28:41] so back to sqlite for PHPUnit tests ? [00:28:44] On the surface it looks fine, but there's definitely room for improvement. [00:29:29] I'd rather not. It means tests will be broken again, and VisualEditor is already blocked on being able to tests itself *at all* in many ways for almost 6 months now. [00:29:38] That doesn't affect phpunit, but having the two not in sync could cause other issues. [00:29:44] I mean for PHPUnit [00:29:48] not the qunit/karma [00:29:55] Yeah. [00:30:04] But the warnings and stuff we found in karma, were also affecting phpunit runs. [00:30:10] The logs are full of database errors. [00:30:11] 100s of them. [00:30:14] ahhhhhhhhh [00:30:31] It somehow ignores them I guess (because most tests don't actually should be using the database!) [00:30:43] so only 1/100 it actually causes the job to fail [00:31:42] It's a terrible compromise, but I'd prefer to have it stable on our end and raise on the agenda in mediawiki-core team that tests need desperate improving and we don't have the capacity to do it for them. It's outgrown its limits. [00:31:59] the non-existent mw-core team :P [00:32:28] ;) [00:32:30] Well, I don't mean to point to individuals. I consider myself part of that team. [00:32:36] too [00:32:45] just separating concepts for CI maintenance. [00:32:48] CI team doesn't exist either [00:32:51] muhahaha [00:32:55] We are anonymous [00:33:01] We are legions [00:33:15] ah [00:33:21] * legoktm walks away slowly [00:33:21] https://phabricator.wikimedia.org/T87781 Split mediawiki tests into unit and integration tests [00:33:25] :D [00:33:52] anything with @group Database is integration, everything else is likely unit :P [00:34:02] * hasharDinner post a masked video to legoktm using legoktm twitter account [00:34:09] yeah [00:34:43] $ dsh-ci-slaves 'ls -l /tmp | grep npm | wc -l' [00:34:44] integration-slave1001.eqiad.wmflabs: 120 [00:34:44] integration-slave1003.eqiad.wmflabs: 130 [00:34:45] integration-slave1401.eqiad.wmflabs: 280 [00:34:47] integration-slave1404.eqiad.wmflabs: 235 [00:34:48] and surely lot of @group Database tests could use a mock instead [00:34:49] integration-slave1403.eqiad.wmflabs: 240 [00:34:51] integration-slave1402.eqiad.wmflabs: 364 [00:34:53] integration-slave1004.eqiad.wmflabs: 160 [00:34:55] integration-slave1405.eqiad.wmflabs: 315 [00:34:57] integration-slave1002.eqiad.wmflabs: 160 [00:34:59] * Krinkle rm -rfs [00:35:04] it's like 50MB+ each [00:35:05] doh [00:35:16] I'm deploying a fix now that should eliminate that [00:35:42] feel free to add the tmp reaper you talked about a while back [00:35:44] anyway [00:35:51] 1:36am, time for me to sleep [00:35:54] hasharDinner: See phabricator task. I've cancelled that for now [00:35:56] I am awake in 6 hours [00:36:01] Scoping our TMPDIR should addres 90% [00:36:08] not even 5:20 hours doh :// [00:36:16] tmpreaper is dangerous [00:36:21] ok ok :) [00:36:28] now I am sleeping [00:36:33] Debian has blacklisted it for security reasons. [00:36:36] alrighty :) [00:36:38] Seeya [00:36:51] * legoktm re-queues all the l10n-bot commits [00:38:46] thx [00:43:39] 00:42:33 Caused by: java.lang.NoClassDefFoundError: Could not initialize class jenkins.model.Jenkins [00:43:42] https://integration.wikimedia.org/ci/job/mwext-ApiFeatureUsage-testextension-zend/33/console [00:44:15] it doesn't appear to be out of space [00:45:31] Krinkle: ^ it looks like the post-build scripts are failing.... [00:45:47] did you just deploy those? [00:46:27] hm, lanthanum is passing, gallium isn't [00:46:28] legoktm: Which script? [00:46:32] IOException [00:46:34] disk space? [00:46:37] it's not disk space [00:46:44] /dev/sdb1 149G 146G 3.1G 98% /srv/ssd [00:47:13] 00:42:33 ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception [00:47:13] 00:42:33 java.io.IOException: remote file operation failed: /srv/ssd/jenkins-slave/workspace/mwext-ApiFeatureUsage-testextension-zend at hudson.remoting.Channel@340c2990:gallium: java.io.IOException: Remote call on gallium failed [00:47:16] This is an interesting zuul-cloner failure. Any ideas? https://integration.wikimedia.org/ci/job/wikimedia-fundraising-civicrm/48/console [00:47:23] my new scripts are not live on gallium yet, I didn't do a git deploy yet. [00:47:34] the new teardown/setup global e.g. I merged a few minutes ago [00:47:36] Only on labs [00:47:40] hm [00:47:45] The change was on a "contrib" branch, but the cloner complains that "upstream repo is missing branch None" [00:48:25] awight: the drupal repo has no master branch [00:48:27] that makes it fail [00:48:29] https://github.com/wikimedia/fundraising-crm-drupal [00:48:31] empty repo [00:48:38] Krinkle: ok, thanks! [00:49:01] Krinkle: err, but https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/fundraising/crm/drupal,branches [00:53:17] Oh? [00:54:06] Krinkle: I think the github mirror is misconfigured [00:54:16] repo name should be https://github.com/wikimedia/wikimedia-fundraising-crm-drupal [00:54:35] which repo do I patch to correct that? [00:54:51] awight: It's not handwritten, it's fully automated replace / with - [00:54:58] so.. [00:55:15] that empty repo might not be related [00:55:17] Well, the parent repo is https://github.com/wikimedia/wikimedia-fundraising-crm fwiw [00:55:34] I think you can assume it isn't replicated. Not replicated to the wrong name, that empy one is something else. I'll check that one out, btu for replag I can't help you :) [00:55:39] Yeah, that seems to be working fine [00:56:48] Zuul-cloner works through gerrit though, so this is an unrelated issue... [00:56:48] awight: meanwhile, that might help is merging a master commit in that drupal repo. [00:56:56] k [00:57:00] Then we'll know for sure if it's a current or a past bug. [00:57:44] our last merge was Jan 15, 2015. Merging another commit for fun... [00:57:56] btw, the zuul failure was on a change on the "contrib" branch. [01:03:37] awight: It will try to find the same branch and then falls back to master [01:03:45] when fetching the other repos [01:10:59] Krinkle: is there a way to re-trigger a job on a specific host? [01:11:15] https://integration.wikimedia.org/ci/job/mwext-AutoCreateCategoryPages-testextension-zend/4/rebuild/parameterized for example [01:11:46] legoktm: Update job configuration, && int ....; apply, rebuild, change config back :) [01:12:02] lol [01:12:10] Krinkle: I can just do "node: lanthanum"? [01:12:18] Yeah [01:12:26] or add it to the existing labels with && so that you don't have to remember thse [01:13:07] (03PS5) 10Krinkle: Implement global-set-env, global-setup, global-teardown [integration/jenkins] - 10https://gerrit.wikimedia.org/r/198757 (https://phabricator.wikimedia.org/T90836) [01:13:56] (03CR) 10jenkins-bot: [V: 04-1] Implement global-set-env, global-setup, global-teardown [integration/jenkins] - 10https://gerrit.wikimedia.org/r/198757 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [01:16:04] * legoktm waits for mw core to clone... [01:16:19] (03CR) 10Krinkle: "recheck" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/198757 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [01:17:22] it just seems to be gallium that is having those issues [01:17:28] I ran the same job on lanthanum and it worked fine [01:25:51] (03PS1) 10Krinkle: Run global-teardown after jobs using npm-set-env [integration/config] - 10https://gerrit.wikimedia.org/r/199180 (https://phabricator.wikimedia.org/T90836) [01:26:02] (03CR) 10Krinkle: [C: 032] Implement global-set-env, global-setup, global-teardown [integration/jenkins] - 10https://gerrit.wikimedia.org/r/198757 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [01:26:53] legoktm: Yeah.. [01:34:22] legoktm: so this is one of those moments [01:34:30] There's 6 random jobs queued, nothing special. [01:34:32] Nothing is happening [01:34:40] all slaves doing absolutelt nothing [01:35:43] gotta run an errand, bbl [01:46:59] !log deployed scap/scap-sync-20150324-014257 to beta cluster [01:47:02] Logged the message, Master [01:52:48] 7Blocked-on-RelEng, 6Release-Engineering, 6Multimedia, 6Scrum-of-Scrums, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1143911 (10Tgr) I almost forgot, but in production we should use pgsql, not mysql. The Sentry devs said mysql is not really supported - it wo... [02:00:28] 10Continuous-Integration, 6operations: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1143917 (10Dzahn) example why this check is missed: https://gerrit.wikimedia.org/r/#/c/199182/1 [02:14:40] 10Continuous-Integration: testextension-zend jobs failing on gallium (work fine on lanthanum) - https://phabricator.wikimedia.org/T93713#1143934 (10Legoktm) 3NEW [02:17:52] (03PS2) 10Legoktm: Move mwext-*-testextension-zend to UbuntuPrecise slaves in labs [integration/config] - 10https://gerrit.wikimedia.org/r/198770 (https://phabricator.wikimedia.org/T93143) (owner: 10Krinkle) [02:28:00] (pending—Waiting for next available executor on contintLabsSlave&&(UbuntuPrecise&&phpflavor-zend&&phpflavor-zend)||(UbuntuTrusty&&phpflavor-hhvm&&phpflavor-zend)) [02:31:12] eh [02:31:14] zuul is stuck [02:31:15] sigh [02:31:57] !log toggling gearman off/on in jenkins [02:32:03] Logged the message, Master [02:33:23] (03Merged) 10jenkins-bot: Implement global-set-env, global-setup, global-teardown [integration/jenkins] - 10https://gerrit.wikimedia.org/r/198757 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [02:33:29] (03CR) 10jenkins-bot: [V: 04-1] Run global-teardown after jobs using npm-set-env [integration/config] - 10https://gerrit.wikimedia.org/r/199180 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [02:33:52] (03CR) 10Legoktm: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/199180 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [02:36:25] 10Continuous-Integration: gallium (work fine on lanthanum) - https://phabricator.wikimedia.org/T93713#1143952 (10Legoktm) [02:36:38] 10Continuous-Integration: All? jobs failing gallium (work fine on lanthanum) - https://phabricator.wikimedia.org/T93713#1143934 (10Legoktm) [02:37:01] 10Continuous-Integration: All? jobs failing gallium (work fine on lanthanum) - https://phabricator.wikimedia.org/T93713#1143934 (10Legoktm) Doesn't appear to be disk space this time: `/dev/sdb1 149G 147G 3.0G 99% /srv/ssd` [02:37:18] (03CR) 10jenkins-bot: [V: 04-1] Move mwext-*-testextension-zend to UbuntuPrecise slaves in labs [integration/config] - 10https://gerrit.wikimedia.org/r/198770 (https://phabricator.wikimedia.org/T93143) (owner: 10Krinkle) [02:38:53] legoktm: That's 3G free, 99% in use [02:39:12] 3GB is not nothing, but it's not exactly healthy either [02:39:26] legoktm: What have you found so far? [02:39:47] there are two types of exceptions I've seen [02:40:14] well [02:40:17] they're all "Remote call on gallium failed" [02:40:31] the phpunit ones have a common stack trace of 01:10:19 Caused by: java.lang.NoClassDefFoundError: Could not initialize class jenkins.model.Jenkins [02:40:43] the other ones are 02:33:52 Caused by: java.lang.NoClassDefFoundError: Could not initialize class hudson.security.Permission [02:41:22] That sounds like it's coming from a plugin [02:41:27] Does the stack include one? [02:41:40] Upgrading Jenkins without upgrading plugins might cause that [02:41:56] Upgrading plugins we generally avoid because it too can cause trouble, so usually kept to a minimum [02:42:02] 02:33:52 at hudson.scm.SCM.(SCM.java:688) [02:42:06] is SCM a plugin? [02:42:21] !log Restarting Zuul, wikimedia-fundraising-civicrm is stuck as of 46min ago waiting for something already merged [02:42:24] Logged the message, Master [02:42:38] and the phpunit ones are pointing to the junit processing code [02:42:49] also the same exact job works fine on lanthanum [02:43:06] er, for the phpunit one. [02:43:14] I didn't test any non-phpunit ones on lanthanum [02:44:03] legoktm: The jenkins config tester also failed. [02:44:05] similar error [02:44:12] So it's not related to jUnit or Git. [02:44:13] if "scm" is broken, the mwext phpunit jobs use zuul-cloner so they'll bypass that [02:44:16] I think it's the file system. [02:44:19] hmm [02:44:23] permissions issue? [02:44:23] Or a plugin yeah [02:44:28] Hm.. [02:46:06] everything looks to be owned by jenkins-slave [02:46:20] (in /srv/ssd/jenkins-slave/workspace/mwext-AutoCreateCategoryPages-testextension-zend/log) [02:51:40] Couldbe related: https://issues.jenkins-ci.org/browse/JENKINS-19453 [02:57:32] Updating SCM plugin just in case. That one had a few bug fixes that sound like this. [03:02:05] (03CR) 10Krinkle: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/199180 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [03:02:51] legoktm: Hm.. I don't know whether the update or the restart helped but it seems to be better [03:02:55] I won't say fixed yet [03:03:17] Huh, the config job runs in labs? [03:03:51] oh right [03:04:37] the layout job runs on gallium [03:04:41] yeah. passing [03:09:18] 10Continuous-Integration: All? jobs failing gallium (work fine on lanthanum) - https://phabricator.wikimedia.org/T93713#1143960 (10Krinkle) Upgraded Jenkins "Multiple SCMs plugin" from v0.3 to v0.4 and restarted Jenkins. Not sure which of those actions resolved, but the failed jobs on gallium are now passing. [03:14:19] !log Deleting old job workspaces on gallium not touched since 2013 [03:14:22] Logged the message, Master [03:19:25] (03PS1) 10Krinkle: mw-teardown: Include mw-set-env [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199189 [03:19:34] (03CR) 10Krinkle: [C: 032] mw-teardown: Include mw-set-env [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199189 (owner: 10Krinkle) [03:20:27] (03Merged) 10jenkins-bot: mw-teardown: Include mw-set-env [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199189 (owner: 10Krinkle) [03:27:48] legoktm: The junit errors make sense now [03:28:18] legoktm: If a job fails before phpunit finishes and creates the xml junit file, then the collector in postbuild encounters the file from the prevous build because it's never removed [03:28:31] because we don't bloody clear our workspaces [03:28:38] Which comes back to zuul-cloner [03:30:43] (03CR) 10Krinkle: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/198770 (https://phabricator.wikimedia.org/T93143) (owner: 10Krinkle) [03:30:44] :| [03:30:50] (03PS2) 10Krinkle: Migrate prepare-mediawiki to MySQL (affects testextension + mwext-qunit) [integration/config] - 10https://gerrit.wikimedia.org/r/198773 (https://phabricator.wikimedia.org/T37912) [03:34:10] legoktm: The slave-scripts clones on precise slave are all broken [03:34:11] fatal: bad object HEAD [03:34:22] ugh what [03:34:39] https://gist.github.com/Krinkle/82b671571bc2b192ce75 [03:35:06] integration-slave1002.eqiad.wmflabs: fatal: index file open failed: Permission denied [03:35:12] that part seems suspicious [03:36:12] Krinkle: is everything supposed to be owned by root? [03:36:24] Yeah, but readable by world [03:36:28] 755/644 [03:37:43] -rw------- 1 root root 1416 Jan 28 23:32 index [03:37:52] that's in /srv/deployment/integration/slave-scripts/.git/modules/tools/mwcodeutils [03:38:02] same with FETCH_HEAD [03:38:04] Yeah, found that too just a sec ago [03:38:22] Im on 1002 [03:38:24] gonna try and fix [03:39:32] I was looking at 1002 as well [03:40:35] in 1003 index looks correct, FETCH_HEAD has wrong permissions [03:41:00] integration-slave1003:/srv/deployment/integration/slave-scripts/.git/objects <-- some random directories have wrong perms [03:42:46] I've rm -rfed the whole shebang and running puppet now [03:44:58] legoktm: don't +2 yet, they'll allf ail [03:45:21] Krinkle: I was removing my +2 :P but they're running on prod slaves so they should still pass right? [03:45:26] now that the gallium issue is resolved? [03:45:30] yeah, but just making sure [03:45:38] Ah, yeah [03:45:45] although all labs slaves are now puppeting [03:45:54] I could depool, but it's quiet so just doing it [03:46:29] Yay, they're all done and back in order [03:46:30] dsh-ci-slaves 'cd /srv/deployment/integration/slave-scripts/ && git log --oneline -n1' [03:46:35] shows them all happy on the same commit [03:47:00] yay [03:47:27] $ dsh-ci-slaves 'sudo rm -rf /srv/deployment/integration' [03:47:27] $ dsh-ci-slaves 'sudo /usr/local/sbin/puppet-run' [03:47:31] that was the "fix" basically [03:47:42] although joost knows what the hell caused that [03:53:10] joost? [03:53:46] "(archaic, no longer commonly used outside of the phrase ‘Joost mag het weten’) Satan, the Devil  [quotations ▼]" [03:54:00] "Joost mag het weten" indeed [03:54:10] was my AngloDutch reference [03:54:24] It's very common in Dutch today. [03:54:39] It's our version of "God knows" [03:54:54] which woulnd't make sense if literally aid in Dutch [03:55:44] I love how I can use the english wiktionary to learn dutch :P [03:55:47] However, Joost is actually quite a common first name in Dutch, too, so there's often joking towards it "Yeah, I'm Joost" or something along those lines. Nonetheless doesn't stop people from using it. It's not like we have many alternatives. [03:55:53] Yeah! [03:56:35] Looking up words in their original language is not what I use Wikt for usually. It's good at that, but others are too. Cross-language is awesome! [03:57:23] why would you name your kid joost if it means satan? [03:57:52] I didn't know it related to the devil. According to the Dutch language reference https://onzetaal.nl/taaladvies/advies/joost-mag-het-weten it actually originates from Joos from Dejos, which is Portugese for god. [03:58:01] It doesn't mean satan to anyone I know. [03:58:52] The dictionary can say what it wants, but it's always one step behind the reality, desperately to keep up with our pidgin behaviour of trying to make it easier. [03:59:00] oh interesting [04:00:18] https://en.wiktionary.org/wiki/Joost#Dutch [04:00:23] the page actually says that [04:00:28] Interesting [04:00:29] (03PS4) 10Legoktm: Only run jshint/jsonlint jobs when relevant files are touched [integration/config] - 10https://gerrit.wikimedia.org/r/198792 [04:00:36] (03CR) 10Legoktm: [C: 032] Only run jshint/jsonlint jobs when relevant files are touched [integration/config] - 10https://gerrit.wikimedia.org/r/198792 (owner: 10Legoktm) [04:00:41] the meaning of Satan is listed under the ethymology 2 which says it means Deus, god [04:01:58] (03Merged) 10jenkins-bot: Only run jshint/jsonlint jobs when relevant files are touched [integration/config] - 10https://gerrit.wikimedia.org/r/198792 (owner: 10Legoktm) [04:02:40] legoktm: Fixed :P Satan -> God. [04:02:45] Thats can't be controversial right? [04:02:51] haha :D [04:07:30] Yippee, build fixed! [04:07:31] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #446: FIXED in 9 min 12 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/446/ [04:11:08] !log deploying https://gerrit.wikimedia.org/r/198792 [04:11:11] Logged the message, Master [04:14:15] Krinkle: https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/8536/console eh? [04:14:48] (03PS1) 10Legoktm: Use generic jshint/jsonlint jobs for more repos [integration/config] - 10https://gerrit.wikimedia.org/r/199190 [04:15:33] legoktm: Yeah, I saw that earlier on another slave. [04:15:45] The exceptoin on the bottom is just the consequence of the build failing [04:15:56] which is because we're not resetting the workspace properly [04:16:12] in this case ve made a major change and it's unable to switch back and forth between the two versions. [04:16:30] rm -rf that workspace on that slave to resolve for now :) [04:17:29] I think we've made two major changes this year to CI that should've been worked on longer and been reverted on day 1: Dependant Pipeline for gate, 2: zuul-cloner. It's a miniscule advantage with a shitload of problems due to the flawed implementation. [04:18:20] it's never too late to revert ;) [04:18:28] <^d> +1 to getting rid of dependent gates [04:18:33] <^d> pisses me off daily [04:20:48] woot [04:20:48] [21:20:36] PROBLEM - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /srv/ssd 5765 MB (3% inode=87%): [04:22:42] <^d> Yeah I just saw that [04:22:44] <^d> ugh [04:23:14] it's probably because I tried to merge all the l10n-bot stuff again, and that's going to start doing a bunch of core clones again [04:23:30] basically dependent pipeline is for this use case, hold your horse: [04:24:25] If an upstream/downstream project (e.g. mediawiki-core and an extension) have a incompatible change (e.g. extension Foo adds code calling a function, and core is about to remove that function), and then within that, they press +2 within the same 5min window, and within that, the mediawiki core change was +2'ed first. [04:24:46] Any other situation (eg. they are merged an hour apart, or the extension change goes in the queue first) then they'll just merge and you'll find out the next commit. [04:24:50] Which is Not A Big Deal. [04:25:02] Krinkle: hi. [04:25:35] anything on, https://phabricator.wikimedia.org/T93510 ? [04:27:22] kart_: Can you reproduce it locally? [04:27:59] mediawiki core + ContentTranslation + UniversalLanguageSelector + EventLogging. Nothing else, no skins. Run Special:JavaScriptTest/qunit [04:28:02] kart_: ^ [04:28:12] Let me know if that passes locally in Chrome. [04:29:51] Krinkle: sure. Disabling everything else. [04:31:17] ugh, why is pywikibot/core in the mediawiki queue??? [04:31:50] legoktm: because we generalised the jobs [04:32:04] it only has one general job in common, and that's tox-flake8 [04:32:35] legoktm: Either named or unnamed, everything has a queue in Zuul. And zuul's only way of associating things in a queue is by seeing multiple projects have triggers for the same project in any pipeline. [04:32:46] sigh [04:33:16] however by default queues don't do much, other than allowing things to sort on the Zuul dashboard and to allow load balancing nodes (which we don't use queues for but openstack does) [04:33:22] and dependenant pipeline also uses it [04:37:57] we're down to 2G on lanthanum [04:39:10] Krinkle: any ideas on how to reclaim lanthanum disk space? my only idea is to move the testextension-zend jobs to labs... [04:39:37] legoktm: I did a sweep on gallium earlier by ls -l | grep 2013 [04:39:42] and finding patterns that can be removed [04:39:50] *-pplint*, *-erblint-* [04:39:52] where some of them [04:39:59] and a bunch of phpcs-strict and lenient [04:40:11] some more recent but still dead [04:40:29] Feel free to do the same on lanthanum :) [04:43:09] http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=gallium.wikimedia.org&r=hour&z=default&jr=&js=&st=1427172148&v=265.128&m=disk_free&vl=GB&ti=Disk%20Space%20Available&z=large [04:44:11] Hm.. not sure that includes ssd [04:45:15] ugh, gallium is full again [04:45:22] # /dev/sdb1 149G 149G 279M 100% /srv/ssd [04:45:39] this isn't working [04:45:51] Partial clones would be good right now [04:46:00] But zuul-cloner doesn't support that. [04:46:13] I guess we can't switch testextensions to use plain git plugin [04:46:45] legoktm: The slaves don't have a huge amount of space either [04:46:54] It's like 80GB total. [04:46:56] about 50% used. [04:47:05] :/ [04:47:15] for workspaces only that is, the /mnt [04:47:48] though it worked for hhvm jobs [04:47:59] and they're never on the same slave. [04:48:00] so... [04:48:46] 10Continuous-Integration, 10MediaWiki-extensions-ContentTranslation: QUnit tests for ContentTranslation fail - unable to merge commits - https://phabricator.wikimedia.org/T93510#1144201 (10santhosh) I am not able to reproduce this locally. Tests are passing. [04:49:13] kart_: OK. I'll try locally. [04:49:28] (based on santosh not reproducing it) [04:49:30] Or did it work for you? [05:04:24] 10Continuous-Integration, 10MediaWiki-extensions-ContentTranslation: QUnit tests for ContentTranslation fail - unable to merge commits - https://phabricator.wikimedia.org/T93510#1144223 (10Krinkle) Approaching what Jenkins does (latest master of: MediaWiki core, ContentTranslation, UniversalLanguageSelector, E... [05:04:32] Krinkle: go ahead. My EL is messed up atm. [05:04:38] !log deleting workspaces of jobs that no longer exist in jjb on lathanum [05:04:41] Logged the message, Master [05:05:38] Yippee, build fixed! [05:05:39] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #559: FIXED in 43 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/559/ [05:06:29] and there's 6GB free [05:08:37] 10Continuous-Integration, 10MediaWiki-extensions-ImageMetrics, 6Multimedia: Karma failing on an ImageMetrics test referencing a value set in a head script - https://phabricator.wikimedia.org/T93459#1144225 (10Krinkle) @Tgr: Yep, this is part of a larger refactoring to make tests more robust. The plain-making... [05:25:37] 10Continuous-Integration, 10Wikimedia-Fundraising-CiviCRM, 3Fundraising Sprint Grandmaster Flash: Make Civi CI job run on civicrm, drupal, and vendor (DonationInterface and SmashPig) repos - https://phabricator.wikimedia.org/T91905#1144246 (10Krinkle) [05:25:38] 10Continuous-Integration, 10Wikimedia-Fundraising-CiviCRM, 3Fundraising Sprint Grandmaster Flash: Mysterious failure to zuul-clone drupal repo - https://phabricator.wikimedia.org/T93707#1144244 (10Krinkle) 5Open>3Resolved It seems the new commit (Drupal update and branch merge) fixed whatever was up. ht... [05:27:53] lanthanum is out of space again [05:29:10] I guess we just can't handle the additional load l10n-bot commits create [05:32:12] legoktm: Ah, right, that's what prompted this [05:32:44] well, we're not gc'ing the workspaces. so I guess this would have happened eventually anyway [05:32:58] but there's presumably a decent number of extensions that have never been committed to in the gerrit/jenkins age [05:33:06] but do get i18n updates [05:33:21] legoktm: Is there a way we can generalise the job? [05:33:54] extension+core can be done with shallow-clone + mwcore archive unzip [05:33:59] but we need dependencies for many [05:34:07] That's what zuul-cloner provided, but was never tested to scale [05:34:13] we can generalize those that don't need dependencies [05:34:17] and most don't need dependencies [05:34:27] before zuul-cloner the mwcore part of that was an archive untar over git [05:34:31] not an actual clone [05:35:01] legoktm: Hm.. I guess we can do that still [05:35:22] from the testextension jobs, omit mediawiki/core from teh zuul-cloner arguments and instead use a hardcoded shallow clone first, and then let zuul-cloner run [05:35:46] in the bash script before executing zuul-cloner. [05:35:51] Should work I guess. [05:36:01] will that work for wmf submodules? [05:36:20] that never worked [05:36:26] we ignore submodules [05:36:48] but it takes a branch argument, so wmf branches in general will work [05:36:54] it seems /mw-core-get macro still exists [05:37:46] the ones on production slaves could scale quite well, because they can make use of the gerrit replication and make a hard link. [05:38:24] though that logic hasn't been tested in a while [05:38:29] mw-core-get should work. [05:38:42] (03PS2) 10Krinkle: Run global-teardown after jobs using npm-set-env [integration/config] - 10https://gerrit.wikimedia.org/r/199180 (https://phabricator.wikimedia.org/T90836) [05:38:56] * Krinkle is deploying ^ now [05:40:08] 10Continuous-Integration, 10MediaWiki-extensions-ContentTranslation, 5Patch-For-Review: QUnit tests for ContentTranslation fail - unable to merge commits - https://phabricator.wikimedia.org/T93510#1144250 (10santhosh) a:3santhosh [05:40:19] 10Continuous-Integration, 10MediaWiki-extensions-ContentTranslation, 5Patch-For-Review: QUnit tests for ContentTranslation fail - unable to merge commits - https://phabricator.wikimedia.org/T93510#1144251 (10santhosh) 5Open>3Resolved [05:44:19] 10Continuous-Integration: reduce copies of mediawiki/core in workspaces - https://phabricator.wikimedia.org/T93703#1144254 (10Krinkle) The outage is still ongoing essentially. gallium and lanthanum are taking turns running out of space. The immediate trigger was the localisation commits which triggered testexte... [05:50:52] 10Continuous-Integration: gallium and lanthanum disks full (tracking) - https://phabricator.wikimedia.org/T91211#1144267 (10Krinkle) [05:51:03] 10Continuous-Integration: gallium and lanthanum disks full (tracking) - https://phabricator.wikimedia.org/T91211#1076705 (10Krinkle) [05:51:04] 10Continuous-Integration: reduce copies of mediawiki/core in workspaces - https://phabricator.wikimedia.org/T93703#1144271 (10Krinkle) [05:51:49] 10Continuous-Integration: gallium and lanthanum disks full (tracking) - https://phabricator.wikimedia.org/T91211#1076705 (10Krinkle) 5Resolved>3Open a:5hashar>3None [05:52:39] 10Continuous-Integration, 5Patch-For-Review: Migrate all jobs to labs slaves - https://phabricator.wikimedia.org/T86659#1144276 (10Krinkle) [05:52:40] 10Continuous-Integration: gallium and lanthanum disks full (tracking) - https://phabricator.wikimedia.org/T91211#1076705 (10Krinkle) [05:53:53] (03CR) 10Krinkle: [C: 032] "Deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/199180 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [05:58:14] (03Merged) 10jenkins-bot: Run global-teardown after jobs using npm-set-env [integration/config] - 10https://gerrit.wikimedia.org/r/199180 (https://phabricator.wikimedia.org/T90836) (owner: 10Krinkle) [06:10:05] (03PS1) 10Krinkle: Create npm-setup.sh [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199195 [06:13:07] (03CR) 10Krinkle: [C: 032] Create npm-setup.sh [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199195 (owner: 10Krinkle) [06:13:40] (03Merged) 10jenkins-bot: Create npm-setup.sh [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199195 (owner: 10Krinkle) [06:15:18] (03PS2) 10Mattflaschen: Pass MEDIAWIKI_CAPTCHA_BYPASS_PASSWORD to GettingStarted browser test [integration/config] - 10https://gerrit.wikimedia.org/r/194749 (https://phabricator.wikimedia.org/T91220) [06:15:55] (03CR) 10Mattflaschen: "Please review. The browser tests are currently not passing due to this." [integration/config] - 10https://gerrit.wikimedia.org/r/194749 (https://phabricator.wikimedia.org/T91220) (owner: 10Mattflaschen) [06:24:19] 10Continuous-Integration, 5Patch-For-Review, 7Regression: /tmp/npm-* directories left behind on Jenkins slaves - https://phabricator.wikimedia.org/T90836#1144315 (10Krinkle) 5Open>3Resolved [06:24:20] 10Continuous-Integration: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#1144316 (10Krinkle) [06:24:29] 10Continuous-Integration, 7Regression: /tmp/npm-* directories left behind on Jenkins slaves - https://phabricator.wikimedia.org/T90836#1068897 (10Krinkle) [06:25:26] 10Continuous-Integration: All? jobs failing gallium (work fine on lanthanum) - https://phabricator.wikimedia.org/T93713#1144322 (10Krinkle) 5Open>3Resolved a:3Krinkle Also relevant {T91211} [06:26:05] legoktm: OK. I'm off to bed. [06:26:13] o/ good night [06:26:17] Maybe we should disable l10n-bot from testing again for the time being until we fix a few of those bugs we filed today [06:26:32] Don't feel like doing this every day. [06:26:58] 06:20:15 ERROR [launcher]: Cannot start Chrome [06:26:59] 06:20:15 Can not find the binary google-chrome [06:27:01] what's this? [06:27:04] https://integration.wikimedia.org/ci/job/mwext-ContentTranslation-qunit/1708/console [06:27:12] Nikerabbit: Link? [06:27:39] https://gerrit.wikimedia.org/r/#/c/199198/ patch [06:30:35] (03PS1) 10Krinkle: Move CHROME_BIN from npm-set-env to global-set-env [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199199 [06:30:48] (03CR) 10Krinkle: [C: 032] Move CHROME_BIN from npm-set-env to global-set-env [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199199 (owner: 10Krinkle) [06:30:54] Nikerabbit: Thx [06:31:32] (03Merged) 10jenkins-bot: Move CHROME_BIN from npm-set-env to global-set-env [integration/jenkins] - 10https://gerrit.wikimedia.org/r/199199 (owner: 10Krinkle) [06:32:13] Deploying.. [06:43:53] Do we really use Chrome rather than Chromium? [06:46:33] !log freed ~6G on lanthanum by deleting mediawiki-extensions-zend* worksapces [06:46:35] Logged the message, Master [06:52:13] Nemo_bis: what do you mean really? [06:54:50] nobody seems to know the use ratio of chrome/chromium [06:55:33] if we assume they work the same, chromium could be used for ideological reasons. or if we assume chrome is used more, chrome should be used for testing [06:55:47] Nikerabbit: I mean that it would be weird to deploy an unfree package [06:56:27] Forbidden by Labs rules, actually, so it's probably chromium [06:56:44] Nemo_bis: don't know about that, but we do browser testing in all browsers [06:56:52] (03PS1) 10Legoktm: Revert "Don't ignore l10n-bot in gate-and-submit pipeline" [integration/config] - 10https://gerrit.wikimedia.org/r/199201 [06:57:11] in fact, in the fix: [06:57:12] +export CHROME_BIN=`which chromium-browser` [06:57:24] :) [06:57:27] # Set CHROME_BIN for projects using karma-chrome-launcher as our slaves [06:57:31] # have Chromium instead of Chrome. [06:58:42] (03CR) 10jenkins-bot: [V: 04-1] Revert "Don't ignore l10n-bot in gate-and-submit pipeline" [integration/config] - 10https://gerrit.wikimedia.org/r/199201 (owner: 10Legoktm) [07:03:09] (03PS2) 10Legoktm: Revert "Don't ignore l10n-bot in gate-and-submit pipeline" [integration/config] - 10https://gerrit.wikimedia.org/r/199201 [07:05:51] (03PS3) 10Legoktm: Revert "Don't ignore l10n-bot in gate-and-submit pipeline" [integration/config] - 10https://gerrit.wikimedia.org/r/199201 [07:05:59] (03CR) 10Legoktm: [C: 032] Revert "Don't ignore l10n-bot in gate-and-submit pipeline" [integration/config] - 10https://gerrit.wikimedia.org/r/199201 (owner: 10Legoktm) [07:07:25] (03Merged) 10jenkins-bot: Revert "Don't ignore l10n-bot in gate-and-submit pipeline" [integration/config] - 10https://gerrit.wikimedia.org/r/199201 (owner: 10Legoktm) [07:08:03] !log deploying https://gerrit.wikimedia.org/r/199201 [07:08:06] Logged the message, Master [07:15:11] (03PS1) 10Legoktm: Temporarily ignore legoktm in gate-and-submit while he cleans up l10n-bot [integration/config] - 10https://gerrit.wikimedia.org/r/199204 [07:15:57] (03CR) 10Legoktm: [C: 032] Temporarily ignore legoktm in gate-and-submit while he cleans up l10n-bot [integration/config] - 10https://gerrit.wikimedia.org/r/199204 (owner: 10Legoktm) [07:17:29] (03Merged) 10jenkins-bot: Temporarily ignore legoktm in gate-and-submit while he cleans up l10n-bot [integration/config] - 10https://gerrit.wikimedia.org/r/199204 (owner: 10Legoktm) [07:17:51] !log deploying https://gerrit.wikimedia.org/r/199204 [07:17:54] Logged the message, Master [07:19:34] (03PS1) 10Legoktm: Revert "Temporarily ignore legoktm in gate-and-submit while he cleans up l10n-bot" [integration/config] - 10https://gerrit.wikimedia.org/r/199205 [07:19:46] (03CR) 10Legoktm: [C: 032] Revert "Temporarily ignore legoktm in gate-and-submit while he cleans up l10n-bot" [integration/config] - 10https://gerrit.wikimedia.org/r/199205 (owner: 10Legoktm) [07:21:10] (03Merged) 10jenkins-bot: Revert "Temporarily ignore legoktm in gate-and-submit while he cleans up l10n-bot" [integration/config] - 10https://gerrit.wikimedia.org/r/199205 (owner: 10Legoktm) [07:21:25] !log deploying https://gerrit.wikimedia.org/r/199205 [07:21:27] Logged the message, Master [07:57:54] (03PS2) 10Legoktm: Use generic jshint/jsonlint jobs for more repos [integration/config] - 10https://gerrit.wikimedia.org/r/199190 [08:01:47] (03CR) 10Legoktm: [C: 032] Use generic jshint/jsonlint jobs for more repos [integration/config] - 10https://gerrit.wikimedia.org/r/199190 (owner: 10Legoktm) [08:06:22] (03Merged) 10jenkins-bot: Use generic jshint/jsonlint jobs for more repos [integration/config] - 10https://gerrit.wikimedia.org/r/199190 (owner: 10Legoktm) [08:07:13] !log deployed https://gerrit.wikimedia.org/r/199190 [08:07:16] Logged the message, Master [08:34:08] (03PS1) 10Legoktm: Convert more repos to use generic jshint/jsonlint jobs [integration/config] - 10https://gerrit.wikimedia.org/r/199216 [08:35:39] !log restarting Jenkins for some plugins upgrades [08:35:41] Logged the message, Master [08:42:54] 7Blocked-on-RelEng, 6Release-Engineering, 6Multimedia, 6Scrum-of-Scrums, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1144467 (10Gilles) Excellent, P421 updated and sentry itself builds ok. Next I'm going to submit these debs for review. I expect that there m... [08:54:50] 10Continuous-Integration, 6operations, 3Continuous-Integration-Isolation, 7Upstream: Create a Debian package for NodePool - https://phabricator.wikimedia.org/T89142#1144478 (10hashar) [08:55:04] 10Continuous-Integration, 6operations, 7Blocked-on-Operations, 3Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1144479 (10hashar) [09:02:58] 10Continuous-Integration, 10MediaWiki-extensions-ContentTranslation, 5Patch-For-Review: QUnit tests for ContentTranslation fail - unable to merge commits - https://phabricator.wikimedia.org/T93510#1144497 (10hashar) Thank you @Krinkle for the investigation! [09:14:59] 10Continuous-Integration, 6operations: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1144521 (10hashar) Nobody bothered reviewing the patches I proposed back in October / November though :( I kind of loose interest in pushing this though. Quo... [09:41:44] zeljkof: yo [09:41:46] good morning [09:41:50] thanks for the +2s [09:42:05] aharoni: you and morning and you are welcome :) [09:42:30] see the "pale green" email [09:42:36] to me it's totally weird [09:42:47] although maybe I'm just misreading something in Jenkins [09:49:13] zeljkof: ^ [09:49:48] aharoni: saw it, it is strange, I have never seen something like that, I will investigate later today [09:49:58] zeljkof: thanks [09:50:43] 7Blocked-on-RelEng, 6Release-Engineering, 6Multimedia, 6Scrum-of-Scrums, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1144532 (10Gilles) Of course, I overlooked one very important issue: versions. Looking at Debian Jessie, I would have to downgrade 10 stock j... [09:50:52] zeljkof: https://integration.wikimedia.org/ci/view/BrowserTests/view/VisualEditor/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/build?delay=0sec [09:51:10] I added some languages yesterday, but I still don't see them in the default line in "Build with Parameters" [09:51:23] does it require an extra update step after merging the jenkins config patch? [09:51:25] aharoni: you have to deploy the job [09:51:45] aharoni: do you know how to do that? [09:51:51] no [09:51:56] and I'm not sure that I have a permission [09:57:11] aharoni: you do [09:57:25] if you can log in to jenkins [09:57:37] zeljkof: ok, and then? [09:57:56] aharoni: would you like to pair on it for 5 minutes now? [09:58:18] in five minutes [09:58:24] finishing something [09:58:37] aharoni: ok, ping me [10:01:45] zeljkof: in the meantime, https://gerrit.wikimedia.org/r/#/c/199215/ - the first actual use of padding [10:04:23] thanks [10:07:40] 10Continuous-Integration: reduce copies of mediawiki/core in workspaces - https://phabricator.wikimedia.org/T93703#1144542 (10hashar) From a discussion I had with @legoktm yesterday: When a job needs mediawiki/core and uses zuul-cloner, we could prepopulate mw from a local mirror on the same disk using somethin... [10:08:11] 7Blocked-on-RelEng, 6Release-Engineering, 6Multimedia, 6Scrum-of-Scrums, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1144543 (10Gilles) Posted the request for repo creation here https://www.mediawiki.org/wiki/Git/New_repositories/Requests [10:20:26] 6Release-Engineering, 6Engineering-Community: Lyon -> Annecy Transportation Info to RelEng Team - https://phabricator.wikimedia.org/T93686#1144574 (10Aklapper) If this task is just about researching and not booking: There are direct trains (though the earliest one is a bus): {F103499} [10:49:09] aharoni: add me to the reviewers of the latest VE commit [10:49:16] I do not see it in my dashboard [10:49:37] aharoni: found it in VE dashboard [10:49:38] https://gerrit.wikimedia.org/r/#/c/199238/ [11:13:09] 10Continuous-Integration, 10MediaWiki-extensions-MultimediaViewer, 6Multimedia, 5Patch-For-Review: Chromium 41.0.2272 (Ubuntu) mmv.ui.ProgressBar jumpTo()/hide() FAILED / animateTo() FAILED - https://phabricator.wikimedia.org/T93540#1144704 (10Gilles) a:3Gilles [11:13:16] 10Continuous-Integration, 10MediaWiki-extensions-MultimediaViewer, 6Multimedia, 5Patch-For-Review: Chromium 41.0.2272 (Ubuntu) mmv.ui.ProgressBar jumpTo()/hide() FAILED / animateTo() FAILED - https://phabricator.wikimedia.org/T93540#1139548 (10Gilles) p:5Triage>3Normal [11:13:24] 10Continuous-Integration, 10MediaWiki-extensions-MultimediaViewer, 6Multimedia, 3Multimedia-Sprint-2015-03-18, 5Patch-For-Review: Chromium 41.0.2272 (Ubuntu) mmv.ui.ProgressBar jumpTo()/hide() FAILED / animateTo() FAILED - https://phabricator.wikimedia.org/T93540#1139548 (10Gilles) [11:21:59] is jenkins asleep? maybe it was affected by the outage which happened earlier? [11:22:09] the patch I've just uploaded isn't getting picked up, it seems https://gerrit.wikimedia.org/r/#/c/199247/ [11:23:10] 10Beta-Cluster: beta-scap-eqiad runtime went from less than a minute to more than 10 minutes - https://phabricator.wikimedia.org/T93737#1144710 (10hashar) 3NEW [11:23:23] it just did :) not sure if you did something, at any rate, not a problem anymore [11:23:29] !log beta-scap-eqiad keeps regenerating l10n cache https://phabricator.wikimedia.org/T93737 [11:23:33] Logged the message, Master [11:24:00] 10Beta-Cluster: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1144717 (10hashar) [11:41:37] Yippee, build fixed! [11:41:37] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » ilo,contintLabsSlave && UbuntuTrusty build #25: FIXED in 24 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=ilo,label=contintLabsSlave%20&&%20UbuntuTrusty/25/ [12:01:41] aharoni: coming to the meeting? [12:01:48] coming [12:55:35] 10Continuous-Integration, 7Browser-test-bug: language screenshot job for Persian (fa) seems to run correctly, but marked as failure - https://phabricator.wikimedia.org/T93742#1144827 (10Amire80) 3NEW [12:55:52] 10Continuous-Integration, 6operations, 7Blocked-on-Operations, 3Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1144836 (10akosiaris) [13:08:06] zeljkof: more Jenkins nonsense: https://phabricator.wikimedia.org/T93742 [13:08:29] aharoni: argh [13:08:32] will take a look later [13:09:09] zeljkof: not urgent, because THE CORRECT FILES ARE ACTUALLY UPLOADED [13:09:21] but none of us like seeing red :) [13:09:22] aharoni: :) [13:38:04] (03PS10) 10Hashar: Package python deps with dh-virtualenv [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) [13:50:19] (03CR) 10Hashar: "Follow up to a pairing session I had with Filippo, I had a few todos:" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [13:53:36] 10Continuous-Integration, 6operations, 7Blocked-on-Operations, 3Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1144922 (10hashar) The Precise package has been polished following up the 1/1 with Filippo ( https://gerrit.wikimedia.o... [14:05:31] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » zh-hans,contintLabsSlave && UbuntuTrusty build #26: FAILURE in 1 hr 8 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=zh-hans,label=contintLabsSlave%20&&%20UbuntuTrusty/26/ [14:15:20] (03CR) 10Hashar: [C: 04-1] "And we need to keep the python binary in the venv :( When using a symlink as of patchset 10, python no more lookup modules in the venv si" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [14:25:09] 10Beta-Cluster, 10RESTBase, 5Patch-For-Review: Update / maintain Beta Cluster restbase cluster: Up & working with VE - https://phabricator.wikimedia.org/T91102#1144990 (10mobrovac) [14:26:45] 10Beta-Cluster, 10RESTBase, 5Patch-For-Review: Update / maintain Beta Cluster restbase cluster: Up & working with VE - https://phabricator.wikimedia.org/T91102#1074329 (10mobrovac) 5Open>3Resolved As a temporary solution, I have installed cron scripts on both VMs which check for changes every 3 minutes,... [14:29:40] 10Beta-Cluster, 10MediaWiki-extensions-Capiunto, 5Patch-For-Review: Deploy Capiunto on beta - https://phabricator.wikimedia.org/T93418#1145005 (10hoo) 5Open>3Resolved a:3hoo Deployed and (briefly) verified. Yay. [14:30:07] 10Beta-Cluster, 10MediaWiki-extensions-Capiunto: Deploy Capiunto on beta - https://phabricator.wikimedia.org/T93418#1145008 (10hoo) [14:41:34] 10Continuous-Integration, 7Jenkins, 7Regression: Manually starting builds in Jenkins throws "java.lang.IndexOutOfBoundsException: Index: 0, Size: 0" - https://phabricator.wikimedia.org/T93321#1145026 (10hashar) Before Zuul migrated to Gearman, the jobs had to notify Zuul on start and completion. That is done... [14:42:46] 10Continuous-Integration, 7Jenkins, 7Regression: Manually starting builds in Jenkins throws "java.lang.IndexOutOfBoundsException: Index: 0, Size: 0" - https://phabricator.wikimedia.org/T93321#1134346 (10hashar) Note: meanwhile the stracktrace is surely annoying but is not causing any harm. [14:45:03] 10Continuous-Integration: browsertests: triggers for MobileFrontend - https://phabricator.wikimedia.org/T59560#1145029 (10hashar) [14:50:25] 10Continuous-Integration: browsertests: triggers for MobileFrontend - https://phabricator.wikimedia.org/T59560#1145046 (10hashar) 5Open>3declined We do not have the capacity yet to had browser tests on patchset proposals. The VE (T55691) and ULS (T54120) jobs have been removed because they were too slow and... [14:50:26] 10Continuous-Integration, 7Browser-Tests, 7Tracking: [project] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#541910 (10hashar) [14:50:41] Yippee, build fixed! [14:50:41] Project browsertests-Wikidata-SmokeTests-linux-firefox-sauce build #196: FIXED in 33 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-SmokeTests-linux-firefox-sauce/196/ [14:50:55] 10Continuous-Integration, 7Browser-Tests, 7Tracking: [project] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#541910 (10hashar) [14:50:56] 10Continuous-Integration, 7Browser-Tests, 7Epic: Make browser tests voting for all repos of WMF deployed code - https://phabricator.wikimedia.org/T91669#1145063 (10hashar) [14:51:43] 10Continuous-Integration, 7Browser-Tests, 7Tracking: [project] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#541910 (10hashar) I have closed a few tasks that were requesting browser tests to be triggered on patch proposal. There is no point in keeping them open until... [14:53:46] 10Continuous-Integration, 10MediaWiki-Unit-tests: Support running MediaWiki PHPUnit tests via composer - https://phabricator.wikimedia.org/T89626#1145067 (10hashar) 5Open>3Resolved Per Timo, the CI infrastructure work is done so there is not much point in keeping this task open. Whenever mediawiki/core is... [15:08:01] huh, lanthanam went down to zero free space again: https://graphite.wikimedia.org/render/?width=586&height=308&_salt=1427150513.262&target=servers.lanthanum.diskspace._srv_ssd.byte_free.value (it's back up, looks like legoktm deleted more things according to SAL) [15:11:00] grrr [15:11:30] greg-g: it still have 5% left :) [15:11:48] that is all due to mediawiki-core being cloned everywhere :((( [15:11:55] we have a task for it though [15:14:14] <^d> mw/core has gotten untenably large. [15:14:43] https://phabricator.wikimedia.org/T93703 [15:15:38] ^d: maybe it still holds all the objects of the wmf branches we got rid of [15:15:44] might need to repack it from time to time as well [15:15:52] <^d> I do repack it [15:15:53] <^d> Weekly [15:16:00] <^d> And of course we keep those objects, yay tags. [15:16:11] <^d> (we should split wmf-deployment from the actual repo tbh) [15:16:17] <^d> But it's not really the wmf branches [15:16:20] <^d> It's the freaking history [15:16:21] <^d> And i18n [15:16:24] <^d> Which you never lose [15:16:26] if the old branches are removed, the commits should be unreferenced and pruned [15:16:31] but yeah i18n is cumbersome [15:16:43] we should probaqbly stop updating i18n on a daily basis [15:16:45] <^d> The old branches turn into tags, but really the branches aren't all that heavy [15:16:49] or even extract it out from core [15:16:54] <^d> We don't check a ton of stuff into the branches that doesn't go into master [15:17:44] <^d> Honestly most of it is crappy history from the rewrite. But meh [15:18:01] <^d> Which I'd love to rewrite and compact. [15:18:06] <^d> But people were like sha1ssssss [15:18:09] <^d> So I gave up [15:18:13] yup [15:18:16] <^d> Instead we get a slow repo [15:18:17] sha1sssssssssssss [15:18:28] sha1s are like urls, cool sha1s don't change [15:18:31] once you have cloned it, it is not much of an issue [15:18:38] :) [15:18:41] <^d> greg-g: Yeah but shitty sha1s shouldn't have existed :p [15:18:50] on CI side we "just" have to be clever and stop cloning it fully on each job workspace [15:19:07] ^d: :) [15:19:18] depth=1? [15:19:27] <^d> zuul-cloner doesn't support that iirc [15:19:41] 10Continuous-Integration, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, 3Fundraising Sprint Grandmaster Flash, 5Patch-For-Review: Write Jenkins job builder definition for CiviCRM CI job - https://phabricator.wikimedia.org/T91895#1145176 (10hashar) a:5hashar>3awight Reassigning to @awig... [15:19:48] nop [15:20:01] cause the aim of zuul cloner is for it to be used in a fresh instance that has nothing yet [15:20:13] and for openstack use case in mind which need a full clone [15:20:41] I posted a proposal at https://phabricator.wikimedia.org/T93703#1144542 [15:20:51] which is to maintain a local mirror of mediawiki/core on each instance [15:20:56] then just git clone --shared [15:21:04] that generates a 108KB .git dir on my machine [15:21:11] <^d> Not shared, --reference [15:21:31] ah yeah there is --reference as well [15:21:34] <^d> --shared is dangerous because writes to it will corrupt the shared .git dir [15:22:01] <^d> (all kinds of dangers with orphan commits and the like) [15:22:04] unless we either: dont write to the mirror (unlikely) [15:22:11] or: reclone everytime [15:22:30] <^d> --reference gives you the same basic benefit (don't clone crap you don't need) but doesn't actually write back to it [15:23:08] <^d> --reference ends up writing the location of the alternative gitdir into refs/info/alternates [15:23:20] <^d> --shared just changes the location of the gitdir to a shared location :) [15:23:29] ah yeah i remember now [15:23:39] <^d> gitdirs in refs/info/alternates are checked prior to fetch [15:23:40] I tried using --reference from a git mirror that belonged to another user [15:23:46] <^d> So you avoid the transfer & space consumption [15:23:54] and you can't create hardlinks for files that belong to someone else :/ [15:25:05] git clone --reference /home/hashar/projects/mediawiki/core /home/hashar/projects/mediawiki/core !!! [15:27:23] similar output https://phabricator.wikimedia.org/P426 [15:27:35] but --reference creates hardlinks [15:28:26] while --shared does not bother [15:28:37] (the .git dir is essentially empty) [15:29:16] anyway, we could remove the mediawiki/core .git directory, and reclone [15:29:18] on each job run [15:29:35] this way even if the source / mirror repo is written too (via git repack) we would be safe [15:30:32] ^d, hashar: sounds like there is need for LUv2 service which works without going via GIT :) [15:30:51] probably [15:32:15] Is such idea filed? [15:32:34] Nemo_bis: it was gsoc project last year [15:32:46] * Nemo_bis only knows of https://phabricator.wikimedia.org/T85458 [15:32:50] Nikerabbit: not my question :) [15:33:15] Nemo_bis: then I don't know what is your question [15:34:26] Nikerabbit: https://phabricator.wikimedia.org/search/query/Q9nU2qi.Nxim/#R [15:34:55] who spells it all out? l10n :) [15:35:13] Nemo_bis: https://phabricator.wikimedia.org/T48653 [15:36:03] it starts with some misleading stuff but essentially it talks about service [15:36:32] though, it doesn't mention no-git requirement [15:36:54] Nikerabbit: definitely :) [15:37:12] it has always killed me to see Raymond manually pushing all those changes to hundred of repos on a daily basis [15:37:24] maybe all the l10n can be hold in a shared repo like mediawiki/l10n [15:38:01] I am not volunteering though :( [15:38:10] we can just drop the middleman and do json over http (with all the possible security concerns included) [15:40:26] it sounds like something that could be prototyped in a hackathon [15:40:54] "though, it doesn't mention no-git requirement" --> so the answer is "no, it's not filed" [15:41:47] Nemo_bis: I guess we have different intepration of "it" [15:41:59] Nemo_bis: but do file it, I can then propose that as a topic [15:43:26] we can add it to the list of RelEng related hackathon ideas: https://phabricator.wikimedia.org/T92565 [15:44:31] not opposed, though I should have more time at mexico [15:44:47] after a week of offsite with the team, I would probably reach out to volunteers [15:46:51] Nikerabbit: yeah, I bet I'll pull from that task for things not done to add to mexico's list :) [15:49:14] (03CR) 10Hashar: [C: 032] Customize Jenkins top left icon [integration/docroot] - 10https://gerrit.wikimedia.org/r/197910 (owner: 10Hashar) [15:51:14] wth: "Jenkins 100K" -- https://integration.wikimedia.org/ci/jenkins100k/ [15:52:03] greg-g: they reached 100k installed Jenkins [15:52:16] so to celebrate we have that nice picture :D [15:52:21] I see that, just, modifying our UI to celebrate? :) [15:52:36] at least that image is loaded locally [15:54:21] 6Release-Engineering, 6Engineering-Community: Lyon -> Annecy Transportation Info to RelEng Team - https://phabricator.wikimedia.org/T93686#1145271 (10Rfarrand) Thanks Andre! I found those already. :) [15:54:26] stupid tests takes more than 14 minutes to complete https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/4076/console :( [15:54:45] (03PS1) 10Manybubbles: Add job for wikidata/query/rdf project [integration/config] - 10https://gerrit.wikimedia.org/r/199273 [15:57:20] 6Release-Engineering, 6Engineering-Community: Lyon -> Annecy Transportation Info to RelEng Team - https://phabricator.wikimedia.org/T93686#1145283 (10hashar) You can use their mobile interface which is much simpler. They of course have en localization available http://m.en.voyages-sncf.com/ As Gilles pointed... [15:57:25] (03CR) 10Manybubbles: [C: 04-1] Add job for wikidata/query/rdf project [integration/config] - 10https://gerrit.wikimedia.org/r/199273 (owner: 10Manybubbles) [15:58:12] (03PS2) 10Manybubbles: Add job for wikidata/query/rdf project [integration/config] - 10https://gerrit.wikimedia.org/r/199273 [15:58:56] Project beta-code-update-eqiad build #49014: FAILURE in 56 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/49014/ [15:59:13] (03CR) 10Manybubbles: [C: 04-1] Add job for wikidata/query/rdf project [integration/config] - 10https://gerrit.wikimedia.org/r/199273 (owner: 10Manybubbles) [15:59:43] (03PS3) 10Manybubbles: Add job for wikidata/query/rdf project [integration/config] - 10https://gerrit.wikimedia.org/r/199273 [16:01:00] (03Merged) 10jenkins-bot: Customize Jenkins top left icon [integration/docroot] - 10https://gerrit.wikimedia.org/r/197910 (owner: 10Hashar) [16:01:48] twentyafterfour: hashar ^d meeting ohai [16:01:54] <^d> Yeah, finishing up swat [16:01:57] * greg-g nods [16:02:03] bad timing with that I suppose [16:02:18] yeah yeah filling tasks :D [16:02:19] https://phabricator.wikimedia.org/T93761?workflow=create [16:03:22] 10Continuous-Integration, 10MediaWiki-Codesniffer, 10Possible-Tech-Projects, 3Google-Summer-of-Code-2015, 3Outreachy-Round-10: Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T89682#1145301 (10NiharikaKohli) Hello! The IRC meeting tomorrow has been shifted to #wikimedia-... [16:04:35] (03CR) 10Manybubbles: [C: 031] "I don't hate this change now." [integration/config] - 10https://gerrit.wikimedia.org/r/199273 (owner: 10Manybubbles) [16:05:25] I see that deploying jenkins changes is three steps- 1. propose ^^^, 2. sync it to jenkins using, 3. sync it to zuul. [16:05:30] I'm pretty sure I can't do step 3 [16:24:28] 10Continuous-Integration, 7Browser-test-bug: language screenshot job for Persian (fa) seems to run correctly, but marked as failure - https://phabricator.wikimedia.org/T93742#1145398 (10Amire80) [16:36:29] 10Beta-Cluster, 10Deployment-Systems: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1145419 (10greg) [16:51:51] 10Beta-Cluster, 10Deployment-Systems: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1145513 (10mmodell) Possibly related: T92823 [16:54:33] (03CR) 10JanZerebecki: [C: 04-1] "I tried it by deploying and reverting afterwards." [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (owner: 10Adrian Lang) [16:54:39] hello [16:56:44] <^d> o/ [16:57:00] manybubbles: looking [16:59:03] (03PS4) 10Legoktm: Add job for wikidata/query/rdf project [integration/config] - 10https://gerrit.wikimedia.org/r/199273 (owner: 10Manybubbles) [16:59:21] (03CR) 10Legoktm: "PS4: Added zuul config" [integration/config] - 10https://gerrit.wikimedia.org/r/199273 (owner: 10Manybubbles) [16:59:40] (03CR) 10Manybubbles: "Thanks. I didn't realize that was there." [integration/config] - 10https://gerrit.wikimedia.org/r/199273 (owner: 10Manybubbles) [17:00:03] legoktm: thanks [17:00:22] (03CR) 10Legoktm: [C: 032] Add job for wikidata/query/rdf project [integration/config] - 10https://gerrit.wikimedia.org/r/199273 (owner: 10Manybubbles) [17:00:43] 10Beta-Cluster, 10Deployment-Systems: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1145526 (10mmodell) But I don't see anything in that change that _should_ cause this problem. [17:01:33] 10Beta-Cluster, 10Deployment-Systems: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1145529 (10mmodell) Adding Bryan because he seems to be the only one who really understands all the moving pieces of this thing... [17:04:41] (03Merged) 10jenkins-bot: Add job for wikidata/query/rdf project [integration/config] - 10https://gerrit.wikimedia.org/r/199273 (owner: 10Manybubbles) [17:06:10] !log deploying https://gerrit.wikimedia.org/r/199273 [17:06:15] Logged the message, Master [17:07:56] manybubbles: https://integration.wikimedia.org/ci/job/wikidata-query-rdf/1/console does that failure look legit? [17:14:16] twentyafterfour: hmm.... I think I noticed in my local scap that l10n started be updated every run too... [17:14:39] Oh! I bet I know what did it actually [17:14:56] forgot to mention that i need to run my dog to the vet this morning. i'll be out for a bit [17:24:30] legoktm: its something [17:31:24] 10Beta-Cluster, 10Deployment-Systems: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1145620 (10bd808) I think this was an unintended side effect of . Scap is now build... [17:32:40] Is there some problem with gerrit test runner? getting failures all over like this: https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests/12990/console [17:37:20] o.O [17:38:01] SMalyshev: it's because git.wikimedia.org went down [17:38:16] it should auto-restart itself soon... [17:38:16] (03PS2) 10Adrian Lang: Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 [17:38:26] and those jobs should be using zuul-cloner instead of git.wm.o [17:38:35] (03PS3) 10Adrian Lang: Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 [17:41:10] manybubbles: I don't think jenkins-bot has submit permissions on the repo [17:41:30] * manybubbles shakes fist an jenkins-bot [17:41:50] ^d: can you fix ^^^^^. I've tried doing this like 3 times with gerrit but I can never see the jenkinsbot. [17:41:59] he's like a fucking butler ninja [17:49:28] wtf jenkins bot, you totally have permission to submit https://gerrit.wikimedia.org/r/#/admin/projects/wikidata/query/rdf,access [17:51:23] <^d> I just did it a few mins ago [17:58:37] 6Release-Engineering, 10Ops-Access-Requests, 6Phabricator, 6operations, 5Patch-For-Review: Mukunda needs sudo on iridium (phab host) - https://phabricator.wikimedia.org/T93151#1145918 (10RobH) This access request has been granted via ops meeting review. I'm owning this task and implementing it later today. [18:02:01] 10Continuous-Integration: make Jenkins voting for the wikidata/query/rdf project - https://phabricator.wikimedia.org/T93601#1145929 (10Legoktm) a:5Legoktm>3Manybubbles [18:16:44] (03PS2) 10Legoktm: Convert more repos to use generic jshint/jsonlint jobs [integration/config] - 10https://gerrit.wikimedia.org/r/199216 [18:19:01] (03CR) 10Legoktm: [C: 032] Convert more repos to use generic jshint/jsonlint jobs [integration/config] - 10https://gerrit.wikimedia.org/r/199216 (owner: 10Legoktm) [18:23:26] (03Merged) 10jenkins-bot: Convert more repos to use generic jshint/jsonlint jobs [integration/config] - 10https://gerrit.wikimedia.org/r/199216 (owner: 10Legoktm) [18:23:31] Project beta-mediawiki-config-update-eqiad build #2162: FAILURE in 1.6 sec: https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad/2162/ [18:25:44] !log deploying https://gerrit.wikimedia.org/r/199216 [18:25:48] Logged the message, Master [18:30:41] 6Release-Engineering, 10Ops-Access-Requests, 6Phabricator, 6operations, 5Patch-For-Review: Mukunda needs sudo on iridium (phab host) - https://phabricator.wikimedia.org/T93151#1146054 (10RobH) 5Open>3Resolved https://gerrit.wikimedia.org/r/#/c/197798/ is now live, and @mmodell has the same sudu right... [18:30:55] 10Continuous-Integration, 10Wikidata, 7Technical-Debt: Remove dependency on git.wikimedia.org - https://phabricator.wikimedia.org/T74001#1146060 (10JanZerebecki) This regularly ensures that a ton of jobs fail. Could this get higher priority? [18:39:48] 10Deployment-Systems, 6Release-Engineering, 6Services, 6operations: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1146106 (10GWicke) [18:42:26] (03CR) 10JanZerebecki: [C: 04-1] "Fails: https://integration.wikimedia.org/ci/job/mwext-Wikibase-qunit/8221/console" [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (owner: 10Adrian Lang) [18:44:53] (03PS1) 10Legoktm: Use generic jobs for VisualEditor/VisualEditor repo [integration/config] - 10https://gerrit.wikimedia.org/r/199305 [18:50:22] (03PS2) 10Legoktm: Use generic jobs for VisualEditor/VisualEditor repo [integration/config] - 10https://gerrit.wikimedia.org/r/199305 [18:50:46] (03CR) 10Legoktm: [C: 032] Use generic jobs for VisualEditor/VisualEditor repo [integration/config] - 10https://gerrit.wikimedia.org/r/199305 (owner: 10Legoktm) [18:56:48] (03Merged) 10jenkins-bot: Use generic jobs for VisualEditor/VisualEditor repo [integration/config] - 10https://gerrit.wikimedia.org/r/199305 (owner: 10Legoktm) [18:57:53] !log deploying https://gerrit.wikimedia.org/r/199305 [18:57:57] Logged the message, Master [19:00:06] 10Beta-Cluster, 10Deployment-Systems: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1146164 (10mmodell) a:3mmodell [19:07:12] i'm not seeing a /data/project/logs/exception.log on deployment-bastion, is there somewhere else i should be looking? [19:13:01] Hi, can anybody help me with QUnit.start() called twice via karma (happens on jenkins and locally) [19:13:05] ? [19:16:13] I described the issue in https://phabricator.wikimedia.org/T74063#1140354 and it's now happening on jenkins for our experimental karma job, too: https://integration.wikimedia.org/ci/job/mwext-Wikibase-qunit/8221/console [19:16:58] (03CR) 10Adrian Lang: "That's what I described in https://phabricator.wikimedia.org/T74063#1140354. Need to investigate." [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (owner: 10Adrian Lang) [19:25:57] Yippee, build fixed! [19:25:57] Project browsertests-VisualEditor-production-linux-firefox-sauce build #54: FIXED in 1 hr 58 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-production-linux-firefox-sauce/54/ [19:30:03] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #447: FAILURE in 8 min 54 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/447/ [19:51:07] 10Continuous-Integration, 6operations: fix failures of jenkins job operations-puppet-puppetlint-strict - https://phabricator.wikimedia.org/T93642#1146346 (10Dzahn) This is not really done though. [19:55:21] twentyafterfour: I have a patch for the l10n problem in scap [19:55:31] It's working for me locally [19:56:27] bd808: stop you [19:58:00] (03PS1) 10BryanDavis: Copy l10n CDB files to rebuildLocalisationCache.php tmp dir [tools/scap] - 10https://gerrit.wikimedia.org/r/199318 (https://phabricator.wikimedia.org/T93737) [19:59:29] greg-g: but I got to write code! [19:59:50] bd808: is April 15th your "last day" of being a RPM? [19:59:56] bd808: I'll review it [20:00:32] twentyafterfour: you should be able to cherry-pick it to beta and see if it makes the huge lag go away [20:00:47] greg-g: nope. Not unless there is a hiring miracle between now and then [20:01:00] :/ [20:01:16] meh. somebody has to help with this stuff [20:01:57] I honestly don't know how rob.la survived so long without some help like this [20:02:35] you haven't seen his blood pressure [20:02:38] * greg-g doesn't know [20:06:05] 10Continuous-Integration, 6Scrum-of-Scrums, 6operations, 7Blocked-on-Operations: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1146416 (10greg) [20:08:01] 10Continuous-Integration, 7Upstream: [upstream] Zuul cloner fails on extension jobs against a wmf branch - https://phabricator.wikimedia.org/T73133#1146424 (10greg) [20:08:21] 10Continuous-Integration, 6Scrum-of-Scrums, 6operations, 7Blocked-on-Operations: Jenkins is using php-luasandbox 1.9-1 for zend unit tests; precise should be upgraded to 2.0-7+wmf2.1 or equivalent - https://phabricator.wikimedia.org/T88798#1146425 (10greg) [20:15:13] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #560: FAILURE in 42 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/560/ [20:20:02] bd808: I cherry picked it on beta but testing is another story...not enough disk space to run it [20:20:15] wtf [20:23:06] twentyafterfour: I cleaned up some junk in /tmp [20:23:17] It looks like there is 2G free [20:24:19] !log Deleted junk in deployment-bastion:/tmp [20:24:26] Logged the message, Master [20:25:34] those process accounting logs are such a waste of disk space [20:25:41] 10Staging: Create staging-jobrunner (Job runners!) - https://phabricator.wikimedia.org/T91550#1146456 (10greg) See also: {T76999} [20:26:56] bd808: it's var that's full [20:27:01] it's been a problem on that host for a while [20:27:07] stupid /var/ mount that's way too small [20:27:59] I'm tempted to symlink /var/lib/l10nupdate to a different location [20:28:00] !log deployment-bastion -- rm -r pacct.1.gz pacct.2.gz pacct.3.gz pacct.4.gz pacct.5.gz pacct.6.gz [20:28:05] Logged the message, Master [20:28:35] oh gawd, you should [20:28:56] was that just added to deployment-bastion recently? [20:29:56] twentyafterfour: I'd move that to /srv/l10nupdate and leave a symlink [20:30:14] ok [20:30:21] that full clone of MW plus extensions have no space on that tiny /var [20:31:07] !log sudo mv /var/lib/l10nupdate/ /srv/ [20:31:11] Logged the message, Master [20:31:38] sudo ln -s /srv/l10nupdate/ /var/lib/ [20:31:40] er [20:31:42] !log sudo ln -s /srv/l10nupdate/ /var/lib/ [20:31:45] Logged the message, Master [20:32:54] 10Staging: Create staging-pc* (parsercache) - https://phabricator.wikimedia.org/T93806#1146470 (10greg) 3NEW [20:49:16] bd: I got a whole bunch of these when running l10nupdate ... [20:49:18] Warning: LU_Updater::readMessages: Unable to parse messages from file:///mnt/srv/mediawiki-staging/php-master/extensions/DonationInterface/gateway_common/interface.i18n.php in /mnt/srv/mediawiki-staging/php-master/extensions/LocalisationUpdate/Updater.php on line 63 [20:49:25] bd808: ^ [20:49:40] normal [20:50:00] ok cool [20:50:19] There are stub php i18n files left over from the conversion to json for back compat [20:50:25] they cause those warnings [20:51:12] Are you sure you want to continue connecting (yes/no)? The authenticity of host 'deployment-mediawiki02.eqiad.wmflabs (10.68.16.127)' can't be established. [20:51:14] ECDSA key fingerprint is be:3a:10:b8:c3:64:23:e4:99:0d:1c:98:f0:be:45:88. [20:51:45] from the dsh step? [20:52:13] if it's from sync-dir that's normal in beta too [20:52:45] the local hack on top of the scap code is to ignore certs [20:52:47] I just ran l10nupdate [20:53:16] Inside l10nupdate it uses sync-dir and dsh [20:53:35] If it didn't halt that's the sync-dir [20:54:03] yeah [20:54:21] the patch I made won't change l10nupdate. It will only speed up a full scap [20:54:31] l10nupdate is it's own weird beast [20:54:37] oh [20:54:40] and not invoked by scap [20:54:52] although we talk about the l10nupdate step in scap [20:55:02] but confusingly it's a different step [20:55:11] in a different script [20:55:14] hah [20:56:03] the /usr/local/bin/l10nupdate script merges the latest HEAD l10n messages into the deployed branches on tin and then syncs the resulting CDB files to the MW servers [20:56:13] It runs from cron at ~03:00 each day [20:56:45] The bit I touched is the l10n cache rebuild that is called during a full scap (not by sync-*) [20:57:07] It updates the same CDB cache files based on the contents of the release branches [20:58:05] so if a new branch is going out it makes the initial CDBs and otherwise it checks to see that the CDBs have messages that are not older than the ones in the local i10n json files [20:58:38] Tim's change moved this CDB build from happening inline in the php-1.XwmfY cache dirs to a tmp dir [20:59:03] and I just added copying the current cdb files to that tmp dir before trying to update them [20:59:41] otherwise the full files are rebuilt on each scap which causes a lot of disk io that is slow as crap on deployment-bastion [21:01:15] The disk io is caused by the CDB update process which is transactional by forcing a disk sync on each key update/insert [21:01:44] In a much more awesome world we would create the CDBs in a ramdisk/tmpfs [21:01:53] 10Deployment-Systems, 6Services, 6operations: Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#1146530 (10GWicke) [21:02:00] and make the cost of that io very very low [21:03:39] hello :) [21:03:55] o/ hashar [21:07:04] bd808: thank you for the explanations [21:07:19] there might be a problem with extension updates on deployment-bastion, in /srv/mediawiki-staging/php-master/extensions/Flow a `git log HEAD...origin/master` lists 3 patches that are not in HEAD. these are causing browser tests to fail . [21:07:26] the patches in question were merged ~4hrs ago [21:08:22] twentyafterfour: bd808 you two should pair and write down that doc somewhere :D [21:08:28] maybe as sphinx doc in scap ! [21:09:16] yeah agreed, I'll make an attempt at documenting it [21:09:22] also [21:09:31] I noticed that the scap on beta cluster no more sync to rsync proxies [21:09:40] is there just a cron job that pulls the latest HEAD for extensions? [21:09:40] I assume there is no use for a rsync proxy [21:09:56] we have a jenkins job that brute force git submodule update --init [21:10:12] https://integration.wikimedia.org/ci/view/Beta/ [21:10:38] https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/ <--- here is the lame beast [21:11:16] and it fails bah [21:13:05] hashar: Can you help me with karma for wikibase? [21:13:34] bd808: hashar: file ownership issue on deployment-bastion: https://phabricator.wikimedia.org/P428 [21:13:53] oh nice [21:14:08] twentyafterfour: so yeah the group used for deployment has been changed on March 17 [21:14:17] I think I commented about it on the task [21:14:21] Yippee, build fixed! [21:14:22] Project beta-code-update-eqiad build #49045: FIXED in 1 min 20 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/49045/ [21:14:59] !log beta: deleted untracked file /srv/mediawiki-staging/php-master/extensions/.gitignore . That fixed the Jenkins job https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ [21:15:04] Logged the message, Master [21:15:51] hieradata/labs/deployment-prep/common.yaml:"role::deployment::server::deployment_group": 'project-deployment-prep' [21:17:01] Did Jenkins get murdered again? [21:18:29] 10Beta-Cluster, 10Deployment-Systems, 5Patch-For-Review: beta-scap-eqiad always rebuild l10n cache since March 17th causing build to take more than 10 minutes. - https://phabricator.wikimedia.org/T93737#1146586 (10hashar) Earlier this morning I found a potential culprit with https://gerrit.wikimedia.org/r/#/... [21:18:43] twentyafterfour: commented on the task: https://phabricator.wikimedia.org/T93737#1146586 [21:20:51] marktraceur: what is happening ? [21:21:16] twentyafterfour: you can wipe out all info-extensions-* files and the next scap will rebuild them [21:21:44] Not sure what the right owner/perms are after all the ^d + YuviPanda changes in the last month [21:21:48] marktraceur: https://gerrit.wikimedia.org/r/#/c/198851/ has been force merged. And zuul deadlock in such a case [21:21:57] marktraceur: will resume in up to 5 minutes [21:22:55] -.- [21:22:56] !log Zuul gate is deadlocked for up to half an hour due to change being force merged :( [21:23:00] Logged the message, Master [21:23:17] Excuse me while I go find some cardboard upon which to take out my frustration [21:23:33] marktraceur: we noticed a bug in Zuul yesterday :D [21:24:02] marktraceur: whenever a change is force merged, Zuul has trouble merging the change :) [21:24:45] hashar: you can override the ownership setting by adding a line here that deployment group line here: https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [21:24:58] hashar: the scap l10n slowness was definitely caused by https://gerrit.wikimedia.org/r/#/c/197262/. twentyafterfour is testing my patch for it now I think [21:25:01] I think it was mainly a side-effect of wanting to combine beta and prod roles [21:25:02] thcipriani: yeah though I guess the change has been for some reason :) [21:25:12] I am not sure I want to change the group back to wikidev hehe [21:25:33] bd808: awesome! [21:25:40] wikidev is the prod group which makes it all a bit confusing [21:25:50] (that the group changed) [21:26:17] right, unless the prod role was being used, I guess... [21:26:30] those weren't configurable before iirc [21:28:30] hashar: thanks for fixing that, i noticed now that /srv/mediawiki/... on deployment-bastion01 has the right code, but deployment-mediawiki01 (and maybe others) still have the wrong code. I was thinking once it moved from mediawiki-staging to mediawiki i can expect it to be on all the servers? [21:28:50] bd808: hashar Here you go: https://gerrit.wikimedia.org/r/#/c/195340/27/manifests/role/deployment.pp [21:35:12] 10Continuous-Integration: Change force merged cause a deadlock in Zuul gate-and-submit pipeline - https://phabricator.wikimedia.org/T93812#1146625 (10hashar) 3NEW [21:35:30] 10Continuous-Integration: Change force merged cause a deadlock in Zuul gate-and-submit pipeline - https://phabricator.wikimedia.org/T93812#1146632 (10hashar) [21:35:46] marktraceur: here is the issue you noticed https://phabricator.wikimedia.org/T93812 [21:37:10] ebernhardson: the code update is a two stages process. At first we brute git pull https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/ [21:37:53] ebernhardson: that then trigger a build of the job 'beta-scap-eqiad' which sync the code to the other instance. The job is working but slow because it currently regenerate the whole l10n cache on each run [21:38:41] ebernhardson: we have a wiki page which should be reasonably up to date https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated :) [21:42:16] !log Reconfigured [https://integration.wikimedia.org/ci/view/Beta/job/mediawiki-core-code-coverage/ mediawiki-core-code-coverage] [21:42:21] Logged the message, Master [21:42:24] !sal [21:42:24] https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:45:25] ^d: can you make jenkins merge https://gerrit.wikimedia.org/r/#/c/198723/ ? he's stuck [21:45:44] he is allowed to merge, I think, but gerrit's permission model makes it impossible for me to tell [21:46:36] legoktm: he keeps trying, I think [21:47:14] ok then [21:47:23] :D [21:47:29] I suppose he just needed more +2s [21:47:48] he did apparently [21:47:55] manybubbles: when you commented "recheck" it only re-runs the tests, it doesn't try to merge again. So you just had to remove your +2 and then re-apply it [21:48:06] ah [21:48:18] yeah what kunal says :) [21:48:29] though we can tweak it [21:55:16] !log Ran trebuchet for scap to keep cherry-pick of I01b24765ce26cf48d9b9381a476c3bcf39db7ab8 on top of active branch; puppet was forcing back to prior trebuchet sync tag [21:55:21] Logged the message, Master [21:57:06] I'm not sure what changed in the puppet automation of trebuchet that keeps updating the git clone on deployment-bastion but it it annoying [21:57:38] I think it's probably that the mediawiki host role is applied there somehow but I haven't gone hunting to prove that [21:59:04] twentyafterfour: The beta-code-update-eqiad job that is running right now just caught a batch of translations from translatewiki so I expect the next scap to continue to be slow. Hopefully the one after that will go back to being fast. [22:12:38] (03PS3) 10Krinkle: Move mwext-*-testextension-zend to UbuntuPrecise slaves in labs [integration/config] - 10https://gerrit.wikimedia.org/r/198770 (https://phabricator.wikimedia.org/T93143) [22:12:44] (03PS3) 10Krinkle: Migrate prepare-mediawiki to MySQL (affects testextension + mwext-qunit) [integration/config] - 10https://gerrit.wikimedia.org/r/198773 (https://phabricator.wikimedia.org/T37912) [22:15:51] 10Continuous-Integration: Change force merged cause a deadlock in Zuul gate-and-submit pipeline - https://phabricator.wikimedia.org/T93812#1146782 (10hashar) I have looked at the debug log for change 198885,1 2015-03-24 21:30:19,442 DEBUG zuul.Gerrit: Checking if change is merge... [22:16:28] Krinkle: yesterday we had some changes held in gate-and-submit. It happened again fairly recently and I managed to get a trace :) [22:16:40] 10Quality-Assurance, 10MediaWiki-extensions-ZeroPortal, 6Zero-Team: ZeroPortal browser tests duplicate mediawiki_selenium's LoginPage object - https://phabricator.wikimedia.org/T85649#1146785 (10DFoy) [22:16:46] Krinkle: looked a bit at the code and something seems racy in zuul code ( https://phabricator.wikimedia.org/T93812 ) [22:17:31] Krinkle: also regenerated the mediawiki core code coverage code, the label was wrong. Presumably it got edited manually :) That is all for tongiht, I am in bed [22:17:44] hashar: What about current zuul status? It looks like an hour old job is on top and stuff is blocked. [22:18:18] bah [22:18:28] mwext-Gather-jslint is queued but won't run. [22:19:16] * hashar blames jenkins [22:22:42] (03CR) 10Krinkle: "I'd like to propose reverting this. The logo is too cram, and too dark. It doesn't belong in such tiny space and doesn't add much value." [integration/docroot] - 10https://gerrit.wikimedia.org/r/197910 (owner: 10Hashar) [22:23:15] (03CR) 10Hashar: "That is merely a proof in concept to show the Jenkins ui can be customized. Not much more!" [integration/docroot] - 10https://gerrit.wikimedia.org/r/197910 (owner: 10Hashar) [22:24:08] (03PS1) 10Krinkle: Revert "Customize Jenkins top left icon" [integration/docroot] - 10https://gerrit.wikimedia.org/r/199513 [22:24:36] (03CR) 10Krinkle: "Okay. Let's re-use it in the future if we have a good branding for it." [integration/docroot] - 10https://gerrit.wikimedia.org/r/197910 (owner: 10Hashar) [22:24:43] (03CR) 10Krinkle: [C: 032] Revert "Customize Jenkins top left icon" [integration/docroot] - 10https://gerrit.wikimedia.org/r/199513 (owner: 10Krinkle) [22:25:13] !log marked gallium and lanthanum slaves as temp offline, then back. Seems to have cleared some Jenkins internal state and resumed the build [22:25:17] Logged the message, Master [22:26:09] twentyafterfour: w00t -- https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-eqiad/46274/console -- <2m scap again [22:26:12] 10Beta-Cluster, 6operations, 7Blocked-on-Operations, 7Puppet: Setup a mediawiki03 (or what not) on Beta Cluster that we can direct the security scanning work to - https://phabricator.wikimedia.org/T72181#1146865 (10greg) [22:26:20] (03Merged) 10jenkins-bot: Revert "Customize Jenkins top left icon" [integration/docroot] - 10https://gerrit.wikimedia.org/r/199513 (owner: 10Krinkle) [22:26:22] twentyafterfour: bd808 congratulations! [22:27:25] awesome [22:32:09] Krinkle: you should really stop being so nitpicky [22:34:10] twentyafterfour: I suppose you should merge and update in prod before you do the train deploy tomorrow. It should help some there too but likely not quite as noticeably [22:34:38] I would expect it to save 5 minutes or so on tin [22:48:44] 6Release-Engineering, 6Phabricator, 10Wikimedia-Git-or-Gerrit: Gerritbot shouldn't post "Change merged by jenkins-bot:" messages any more - https://phabricator.wikimedia.org/T91766#1146925 (10QChris) Phabricator only tells you about merged commits that mention a task //somewhere// in the commit message.... [22:56:25] ^d: / marxarelli: are you wfh on Thursdays or in office? [22:56:31] * greg-g always forgets [23:07:37] <^d> Thursdays? Office [23:08:20] grumble, there's no rooms available [23:09:57] oh, look at that, you have your wfh info in your calendar, handy [23:15:40] Is Jenkins stuck again? [23:22:50] Krinkle, when a job completes successfully, is everything behind it in the queue supposed to reset? [23:24:25] that appears to be what is happening [23:24:32] it just did it again [23:25:54] Krenair: reset in what way? [23:26:00] requeue [23:26:15] when it completes successfully, why would it need to reset? [23:26:35] I'm not sure I follow [23:26:51] Exactly [23:26:54] It's doing it anyway [23:26:59] I don't think it should be [23:27:22] The one on top of the queue, when finished, will go away and the second one will move to the top. This is purely a visualation though, nothing actually moves anywhere. [23:27:30] right [23:27:39] but everything else resets to the beginning and starts testing again [23:28:04] Hm.. you mean a change in the gate queue finished and then the one below restarted? [23:28:15] right [23:28:16] all of the ones below it restarted [23:28:25] That should only happen in case of a failure. [23:28:40] Let's see what this wikigrok and core change do [23:28:43] * greg-g nods [23:29:06] it did it again [23:29:09] https://gerrit.wikimedia.org/r/#/c/199522/ completed [23:29:23] everything below reset [23:29:53] Hm.. [23:29:57] let's exponentially increase the work we do! :) [23:30:03] might be due to the fact that it was already merged? [23:30:04] There was a phpcs non voting job was failing [23:30:07] s/we do/jenkins does/ [23:30:10] and therefore failed to actually perform the merge? [23:30:10] shoudl not matter but could be the bug [23:30:15] lets see what the next one does [23:30:26] this next one is also already merged [23:30:36] Well, no. changes already merged cuases Zuul to get confused. [23:30:41] This si a bug Antoine filed today. [23:30:45] Seems to be a regression somewhere. [23:30:56] So much is happening at once, I'm losing track. [23:31:02] (also, people force merging, then making the issue (slowness) even worse....) [23:31:31] Anything involving force merging is a separate story together. If it resets there that'd be one way for Zuul to handle that exception. [23:31:51] I'll force restart so we can try anew with those not merged [23:32:24] !log Force restart Zuul [23:32:28] Logged the message, Master [23:32:58] wait... [23:33:23] oh, ok [23:40:58] One more minute.. [23:53:42] Krenair: How did those jobs go, no reset? [23:53:50] I got distracted there for a minute. [23:54:14] I didn't see what happened to the one next in the queue [23:54:45] ok [23:57:16] (03PS4) 10Krinkle: Move mwext-*-testextension-zend to UbuntuPrecise slaves in labs [integration/config] - 10https://gerrit.wikimedia.org/r/198770 (https://phabricator.wikimedia.org/T93143)