[00:01:53] 10Gerrit, 10MediaWiki-Vagrant, 13Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#2984083 (10bd808) >>! In T152801#2984055, @Paladox wrote: > Oh ok, i guess then the only thing here is to build from source as suggested here https://bu... [00:04:03] 10Gerrit, 10MediaWiki-Vagrant, 13Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#2984088 (10Paladox) @bd808 hi, we upgraded to gerrit 2.13 just near that date. See https://gerrit.wikimedia.org/r/#/c/323545/ and https://gerrit.wikimed... [00:15:44] 10Gerrit, 10MediaWiki-Vagrant, 13Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#2984115 (10Paladox) >>! In T152801#2984083, @bd808 wrote: >>>! In T152801#2984055, @Paladox wrote: >> Oh ok, i guess then the only thing here is to buil... [00:16:21] 10Gerrit, 10MediaWiki-Vagrant, 13Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#2984116 (10Paladox) Also do you curl the commit-msg too? [00:44:25] 10Gerrit, 10MediaWiki-Vagrant, 13Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#2984138 (10Tgr) I think the realistic alternatives at this point are shallow cloning and using GitHub. Shallow cloning breaks git log and blame (and ma... [01:16:03] Phab admins, we've got a spammer, see #-dev [01:16:47] MaxSem: Looks like about 50 tickets need to get purged [01:16:58] at least these: https://phabricator.wikimedia.org/p/GuerellaNuke23/ [01:17:15] and these: https://phabricator.wikimedia.org/p/SimonWalkerAlt/ [01:17:48] greg-g: ryasmeen ^^ [01:20:14] MaxSem: I'm already on it [01:20:32] ostriches: Thanks for blocking account creation (didn't realize you were user:demon on phab) [01:20:43] That *should* handle it for now [01:20:46] I hope [01:21:17] ostriches: kinda vague question to ask, but any other wikis that you think might have this kind of loop hole? [01:21:23] ostriches: seems like wikitech doesn't have torblock on [01:22:14] Wikitechwiki isn't used for Phab login [01:22:20] It's mw.org (so SUL'd wikis) [01:24:03] ostriches: PMed you instead of chan chat [01:53:51] !log https://integration.wikimedia.org/zuul/ showing huge backlogs but https://integration.wikimedia.org/ci/ looks mostly idle [01:53:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:54:49] legoktm: about? I could use some help figuring out what has zuul all gummed up [01:55:00] Ish [01:55:07] jobs suck for 3+ hours :/ [01:55:10] Hmm [01:55:27] There were issues with nodepool not deleting slaves I think earluer [01:55:30] Earlier* [01:55:42] yeah... looks like nodepool stupidity [01:55:52] *-jessie queued [01:56:14] and no jessie exec nodes pooled [01:56:19] https://integration.wikimedia.org/ci/computer/ [01:56:30] No nodepool instances [01:56:49] I fucking hate nodepool [01:59:22] !log nodepool is full of instance stuck in "delete" [01:59:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [02:00:03] https://phabricator.wikimedia.org/T156636 [02:00:40] blerg [02:01:57] chasemp: about? We've got a pile of fscked noodppol instances stuck in delete. Possibly related to T156636 [02:01:57] T156636: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636 [02:03:41] and of course we have made this unstable pile of nodepool critical to pretty much every zuul job queue at this point instead of ripping it out entirely [02:05:08] It's ok, we're going to just add more nodes ;-) [02:05:34] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2981991 (10bd808) The whole pool is full of instances stuck in delete now. Here's a bit more info: ``` nodepool@labnodepool1001:~$ no... [02:05:48] this doesn't look good "novaclient.exceptions.Unauthorized: Unauthorized (HTTP 401) (Request-ID: req-12dc7525-9b7a-4045-be36-4f4ac2dd5587)" [02:06:38] bd808: I get teh same error actually using novaadmin [02:06:42] so yeah, we should call andrew [02:07:20] So we think it's an openstack issue? [02:07:45] I'm not sure, but that slants towards yes [02:09:53] possibly related to the problems with scaling the uwsgi container that he's been fighting? [02:10:09] chasemp: you wanna dial or should I? [02:10:34] talking now [02:11:20] should I start moving CI jobs off of nodepool or wait for a bit? [02:11:48] we should move all jobs off of nodepool forever (my $0.02USD) [02:11:52] andrew will be online in 15 (he's out to dinner) I'm gogin to try a few things first [02:18:35] +10000 to bd808 [02:45:10] does anyone know ( ostriches?) the last time nodepool was doing things successfully? [02:46:26] My best guess is ~4h ago when those jobs in zuul got stuck [02:46:36] But I haven't been doing any gerrit/jenkins/ci work today [02:46:39] So that's just a guess [02:46:49] kk [02:47:30] is there any chance it was much longer, like several days? [02:49:56] ostriches: ^ [02:50:56] I think it was fine as of thursday. [02:51:02] I was out friday and out of town all weekend [02:51:33] that's… a big window :( [02:52:27] I mean I'm pretty sure it was fine earlier today or people would've complained. [02:52:29] andrewbogott: fwiw event I think you mean was Thu Jan 26 19:56:33 2017 +0000 [02:52:37] Granted, it took (at least) 3.5h for us to notice [02:52:51] yeah seems like it can't be parked for any real length, but that doesn't mean some cache didn't expire or something [03:58:57] ostriches (or anyone else following along), I'd expect the nova/keystone apis to work normally now, probably nodepool will sort itself out shortly [03:59:13] andrewbogott: I've been watching nodepool churn nodes for awhile now [03:59:16] seems to be building etc [03:59:22] Q. What was wrong? A: I'm not sure but restarting the nova-api endpoint seems to have fixed it [04:04:11] CI is moving again :) [04:04:16] thanks andrewbogott and chasemp [04:04:36] Yeah, +1 on the thank you [04:08:54] This was either a cache-invalidation bug or an 'after this service works for X number of days it doesn't work no more' bug [04:09:00] we reverted the change that may've introduced the former [04:09:05] as for the latter… time will tell :( [05:07:10] So guys, this is not the first time nodepool has just been the canary in a coal mine with OpenStack infra issues yet it still gets the stink eye when it acts as a better monitor of Labs health than the monitoring system (like most users do!) [05:07:21] of course, at this hour I'm talk to myself [06:39:54] Project selenium-Wikibase » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #255: 04FAILURE in 1 hr 58 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/255/ [06:50:05] 10MediaWiki-Releasing, 10MediaWiki-Vendor: 1.27 tarball: Unnecessary library "ruflin/elastica 2.3.1" requirement - https://phabricator.wikimedia.org/T156637#2984818 (10Osnard) [06:50:38] 10MediaWiki-Releasing, 10MediaWiki-Vendor: 1.27 tarball: Unnecessary library "ruflin/elastica 2.3.1" requirement - https://phabricator.wikimedia.org/T156637#2982017 (10Osnard) @Aklapper 1.27.1; I've added the download link in the task description [07:12:46] (follow-ed up in -labs-admin) [07:16:58] 10MediaWiki-Releasing, 10MediaWiki-Vendor: 1.27 tarball: Unnecessary library "ruflin/elastica 2.3.1" requirement - https://phabricator.wikimedia.org/T156637#2984861 (10Osnard) @Legoktm Thanks for the explanation. I understand now why there is the mediawiki/vendor repo [1]. The problem with this approach is tha... [07:43:26] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [08:29:46] 10MediaWiki-Releasing, 10MediaWiki-Vendor: 1.27 tarball: Unnecessary library "ruflin/elastica 2.3.1" requirement - https://phabricator.wikimedia.org/T156637#2982017 (10hashar) I cant find a reference right now, but I thought mediawiki/vendor had REL branches populated via the composer merge plugin. Maybe we ca... [08:33:33] 10Continuous-Integration-Config, 10BlueSpice, 13Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#2984981 (10Osnard) Okay, it's merged. Go ahead :smile: [08:40:17] 10Continuous-Integration-Config, 10BlueSpice, 13Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#2985012 (10Paladox) Ok thanks :) [08:54:39] 10Continuous-Integration-Config, 10BlueSpice, 13Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#2985022 (10Paladox) This is the file https://github.com/wikimedia/mediawiki-extensions-BlueSpiceExtensions/blob/28fb12d1d04557d78f1ce26c5d706a27cfec2ad6/Checklis... [08:58:23] 10Continuous-Integration-Config, 10BlueSpice, 13Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#2985025 (10Paladox) Actually it's this file https://github.com/wikimedia/mediawiki-extensions-BlueSpiceExtensions/blob/master/Checklist/tests/phpunit/BSApiCheckl... [09:50:40] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2985545 (10hashar) [09:51:50] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2981991 (10hashar) [10:04:07] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2985716 (10hashar) Not sure what happened with 508783 but eventually it has been deleted: Logs show that some other instances/proj... [10:16:22] 06Release-Engineering-Team, 10MediaWiki-Vagrant, 06Operations, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2985781 (10Gilles) [10:17:57] 10Continuous-Integration-Config, 10BlueSpice, 13Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#2985792 (10Osnard) Strange. Looks like some magic word does not work. It is initiated by https://github.com/wikimedia/mediawiki-extensions-BlueSpiceExtensions/bl... [10:38:16] 10Gerrit, 10MediaWiki-Vagrant, 13Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#2985831 (10hashar) >>! In T152801#2983841, @bd808 wrote: > ... > It's still pretty suspicious that this all showed up around the same time as {T151676}.... [12:27:42] 06Release-Engineering-Team, 06Developer-Relations (Oct-Dec-2016), 07Documentation: Merge Wikimedia's "Deployment checklist for new extensions" doc pages - https://phabricator.wikimedia.org/T142081#2522000 (10Nemo_bis) For reference: https://www.mediawiki.org/w/index.php?title=Review_queue&type=revision&diff=... [12:43:31] hashar: car to look at something git and deploy related quickly for me? :D [12:45:06] addshore: having lunch sorry :(( paste question here [12:45:14] will follow up once I am done [12:45:48] okay, well hashar it relates to my notes in the reverting section @ https://wikitech.wikimedia.org/wiki/User:Addshore/Deployments , im sure there should be a git rebase in there somewhere, but I can't tell where! [12:47:39] (03PS1) 10Aleksey Bekh-Ivanov (WMDE): Fix absence of dev dependencies for Wikibase in jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/335215 [13:10:38] PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:12:50] 10Gerrit, 06Operations, 07Beta-Cluster-reproducible, 13Patch-For-Review, 07Upstream: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2986079 (10Paladox) [13:15:29] 10Gerrit, 06Release-Engineering-Team: Update gerrit to 2.14 - https://phabricator.wikimedia.org/T156120#2986085 (10Paladox) [13:15:35] 10Gerrit, 06Operations, 07Beta-Cluster-reproducible, 13Patch-For-Review, 07Upstream: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2986084 (10Paladox) [13:17:05] 10Gerrit, 10BlueSpice, 13Patch-For-Review, 07Upstream: Merge/Submit error on Gerrit: "org.eclipse.jgit.errors.MissingObjectException: Missing unknown" for BlueSpiceExtensions' REL1_27 branch - https://phabricator.wikimedia.org/T153079#2986087 (10Paladox) [13:17:15] 10Gerrit, 06Operations, 07Beta-Cluster-reproducible, 13Patch-For-Review, 07Upstream: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2824332 (10Paladox) [13:18:46] 10Gerrit, 06Operations, 07Beta-Cluster-reproducible, 13Patch-For-Review, 07Upstream: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2986089 (10Paladox) This should hopefully be fixed in gerrit 2.14. Though I doint know if there will be permenant damage. We... [13:25:38] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [13:30:10] addshore: k back around [13:30:29] addshore: the submodules have a setting to autorebase [13:31:25] $ git config --list|grep Wikidata.update [13:31:26] submodule.extensions/Wikidata.update=rebase [13:31:59] your git log commands can be made: git log HEAD..HEAD@{u} [13:32:16] {u} or {upstream} refers to the tracked branch [13:32:48] portals has an extra shell script for deployment [13:33:12] gotcha: if updating wikiversion.json, on mwdebug1001 one need to compile the wikivesion.PHP [13:33:25] should be: mwdebug1001$ scap compile-wikiversions [13:33:56] but yeah looks more or less fine :] [13:42:13] (03PS1) 10Aude: Bump Wikidata to wmf/1.29.0-wmf.10 [tools/release] - 10https://gerrit.wikimedia.org/r/335225 [13:43:06] aude: that wikidata bump should happen today right? [13:47:30] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #291: 04FAILURE in 2 min 30 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/291/ [13:48:43] hashar: yes [13:49:09] PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:49:20] (03CR) 10Hashar: [C: 032] Bump Wikidata to wmf/1.29.0-wmf.10 [tools/release] - 10https://gerrit.wikimedia.org/r/335225 (owner: 10Aude) [13:49:42] thanks [13:50:17] (03Merged) 10jenkins-bot: Bump Wikidata to wmf/1.29.0-wmf.10 [tools/release] - 10https://gerrit.wikimedia.org/r/335225 (owner: 10Aude) [13:50:33] aude: addshore: and eventually I think we should update wikidatawiki during our wednesday afternoon [13:50:51] instead of bumping it with group1 at 20:00/21:00 [13:51:59] 10Gerrit, 06Operations, 07Beta-Cluster-reproducible, 13Patch-For-Review, 07Upstream: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2986228 (10hashar) [13:52:03] 10Gerrit, 10BlueSpice, 13Patch-For-Review, 07Upstream: Merge/Submit error on Gerrit: "org.eclipse.jgit.errors.MissingObjectException: Missing unknown" for BlueSpiceExtensions' REL1_27 branch - https://phabricator.wikimedia.org/T153079#2986227 (10hashar) [13:53:50] hashar: maybe [13:54:07] aude: it is really all up to you :] [13:54:16] though if there is ever a problem we find on testwikidata on tuesday [13:54:21] then we need time to fix it [13:54:23] but if there is an interest in bumping wikidatawiki during european day, I am all for it [13:54:31] * aude is not in europe, btw [13:54:38] iohhhh [13:54:50] would have to be before european swat [13:55:23] probably not good for hoo though [13:55:41] well it just a random idea really :] [13:55:48] yeah [13:57:14] messed up my irc... [15:09:59] 10Gerrit, 06Operations, 07Beta-Cluster-reproducible, 13Patch-For-Review, 07Upstream: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2986472 (10Paladox) @hashar hi this task T153079 has nothing to do with the task here as the problem there was that the branc... [15:30:25] 10Deployment-Systems, 03Scap3, 13Patch-For-Review, 07Wikimedia-Incident: Include fatal log rate check in scap canary test - https://phabricator.wikimedia.org/T154646#2986573 (10thcipriani) 05Open>03Resolved a:03thcipriani [15:31:57] 10Deployment-Systems, 10Wikimedia-Logstash, 13Patch-For-Review, 07Wikimedia-Incident: Check same set of errors/warnings/fatals in scap logstash_checker.py as there is in `fatalmonitor` on fluorine - https://phabricator.wikimedia.org/T142784#2986592 (10thcipriani) 05Open>03Resolved New scap release is n... [16:15:02] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Labs instance ci-jessie-wikimedia-498353 can not be deleted - https://phabricator.wikimedia.org/T156636#2986837 (10bd808) The nodepool issues on 2017-01-30 and 31 were very likely caused by a nova-api failure which itself may or may not... [16:48:42] 05Gerrit-Migration, 10releng-201617-q2, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2986995 (10Aklapper) #releng-201617-q2 is over. Should this be #releng-201617-q3 or not? [17:10:06] 03Scap3, 06Services (later), 15User-mobrovac: Delay repooling trending service after a restart - https://phabricator.wikimedia.org/T156687#2987059 (10thcipriani) >>! In T156687#2984052, @mobrovac wrote: > We need to establish if that would be possible with Scap3. I figure we could do a `sleep 30` //check// s... [18:44:02] 10Browser-Tests-Infrastructure, 07Ruby, 15User-zeljkofilipin: Release mediawiki_api 0.7.1 - https://phabricator.wikimedia.org/T156837#2987336 (10zeljkofilipin) [18:45:04] 10Browser-Tests-Infrastructure, 07Ruby, 15User-zeljkofilipin: Release mediawiki_api 0.7.1 - https://phabricator.wikimedia.org/T156837#2987336 (10zeljkofilipin) p:05Triage>03Normal [18:49:38] 03Scap3, 10Parsoid: Saying yes (y) continues to all groups - https://phabricator.wikimedia.org/T156839#2987377 (10Arlolra) [18:53:40] (03PS1) 10Zfilipin: Release patch version 0.7.1 [ruby/api] - 10https://gerrit.wikimedia.org/r/335264 [18:54:57] (03CR) 10Zfilipin: [C: 032] Release patch version 0.7.1 [ruby/api] - 10https://gerrit.wikimedia.org/r/335264 (owner: 10Zfilipin) [18:55:20] (03Merged) 10jenkins-bot: Release patch version 0.7.1 [ruby/api] - 10https://gerrit.wikimedia.org/r/335264 (owner: 10Zfilipin) [18:55:43] (03CR) 10jenkins-bot: Release patch version 0.7.1 [ruby/api] - 10https://gerrit.wikimedia.org/r/335264 (owner: 10Zfilipin) [18:55:49] 10Browser-Tests-Infrastructure, 07Ruby, 15User-zeljkofilipin: Release mediawiki_api 0.7.1 - https://phabricator.wikimedia.org/T156837#2987412 (10zeljkofilipin) Forgot to tag the task in the commit message: https://gerrit.wikimedia.org/r/#/c/335264/ [19:00:30] 10Browser-Tests-Infrastructure, 07Ruby, 15User-zeljkofilipin: Release mediawiki_api 0.7.1 - https://phabricator.wikimedia.org/T156837#2987433 (10zeljkofilipin) 05Open>03Resolved Done! https://rubygems.org/gems/mediawiki_api [19:07:39] 05Gerrit-Migration, 10releng-201617-q2, 07Documentation: Document workflow and creation of CI jobs in Differential - https://phabricator.wikimedia.org/T130952#2987481 (10greg) It's not a quarterly goal level thing, so no ;) [19:08:30] (03CR) 10Abartov: "Thank you! :)" [ruby/api] - 10https://gerrit.wikimedia.org/r/335264 (owner: 10Zfilipin) [19:49:58] (03PS1) 10Ejegg: Change SmashPig tests from PHP 5.3 to 5.5 [integration/config] - 10https://gerrit.wikimedia.org/r/335271 [20:56:27] thcipriani: about? [20:56:43] chasemp: yep, what's up? [20:57:28] looking through the mess from last night w/ the nova-api freeze up and I was considering an alert that was some rough sanity marker for nodepool, it stores internals but nothing detailed other than age of instances and state [20:57:39] I was thinking alert if newest nodepool instance is x old [20:57:55] and that led me to wondering if there is any dummy jobs in the ci pipeline we could rely on [20:58:11] to flush out failures in a predictive way and if not could there be? [20:58:17] not sure how hard, maybe you have an idea [20:58:25] it's also possible I've gone full ramble [20:58:34] :) [20:58:51] you're looking for a job that runs on nodepool frequently? Or? [20:59:18] well, yes is there a job that runs predictably as in every n minutes to watch for [20:59:27] and if not and it's not a good idea [20:59:39] thoughts on alerting if nodepool doesn't have new instances in y time [21:00:18] hrm, I'm not sure if we have nodepool jobs that run in intervals rather than being triggered by patch sets [21:00:40] it seems like a predictable and seeded dummy job every 10m would be a good idea [21:00:43] I don't think it's not a good idea, I just don't think we have any right now [21:00:50] right [21:01:17] * thcipriani digs in integration/config [21:01:49] I do have the beginnings of a full stack test (create/test/delete) a vm but that's sort of in the neighborhood of watching a to know about b since there are a lot of variables between any test cae and nodepool [21:01:55] I think both paths are appropriate [21:03:23] thcipriani: one issue we have every time and I think this will translate to any medium is since all jobs are adhoc it's difficult to know when a problem as begun [21:03:51] and right behind that is there is no deterministic test case to lean on when you are wondering if things are ok now [21:03:57] yup, that makes sense [21:04:46] the only problem I could think is that if we have some dummy job we rely on that gets burried under patch sets it may not be super reliable [21:05:08] sure that's ok though to find out I think [21:05:22] and would possibly be the first real holistic view we have :) [21:05:27] yeah, may be a non-issue provided we queue it correctly [21:05:50] is there a task for this? [21:06:08] we have tasks for nova but probably not this angle as I was just working through the idea [21:06:15] also, I'm not sure [21:06:30] there are nodepool tasks scattered about [21:06:52] yeah nodepool has a mean phab presence [21:07:11] I can make one :) doyou mind if I toss it your way even for just reasoning and some ci details on feasibility? [21:07:58] no problem. I'll add some thoughts to the task. Seems like it'd be trivial to make the actual job. [21:08:28] I have to believe so [21:08:57] * thcipriani says before really considering the weight of jjb on his soul [21:09:19] thcipriani: I have way too many stalk words ;p [21:09:22] jb was one of them [21:09:24] * JustBerry removes jb [21:09:32] well some insane person has nodepool reporting age of instances in decimal? [21:09:33] 0.07 [21:09:48] JustBerry: heh, sorry :) [21:09:54] nah not your fault [21:10:07] 0.07 time [21:10:13] time units [21:10:40] Age (hours) [21:10:46] it's hour in decimal? [21:10:53] that hurts me deeply [21:12:05] good thing this convo only took .09 of an hour [21:12:16] could have been worse. Age (12 minutes) [21:40:35] 10Deployment-Systems, 03Scap3, 10scap: scap wikiversions compile happening too late in scap sync - https://phabricator.wikimedia.org/T156851#2987851 (10thcipriani) [21:41:16] 10Deployment-Systems, 03Scap3, 10scap: scap wikiversions compile happening too late in scap sync - https://phabricator.wikimedia.org/T156851#2987865 (10thcipriani) p:05Triage>03High [21:48:37] 21:43:55 Building remotely on integration-slave-precise-1012 (phpflavor-php53 contintLabsSlave phpflavor-zend UbuntuPrecise) in workspace /srv/jenkins-workspace/workspace/mwext-testextension-php53 [21:48:43] 21:44:21 ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) [21:59:25] hrm, mysql was stopped there [21:59:33] I poked it, it's going now [22:01:24] thanks [22:06:36] thcipriani: dead on 1011 too [22:06:52] hrm, looks like a job for salt [22:06:57] :) [22:07:17] even better thcipriani age is not age of the instance but age in that state only [22:07:24] in decimal hours [22:07:51] wat [22:08:24] lol [22:08:52] it resets between build and ready at least and ready adn delete [22:09:01] which is honestly not terrible as a counter but who would hve guessed? [22:12:18] !log started mysql on all integration precise instances via salt -- was stopped for some reason [22:12:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:15:39] progress [22:15:41] 22:13:57 Building remotely on ci-jessie-wikimedia-510456 (ci-jessie-wikimedia) in workspace /home/jenkins/workspace/npm-node-6-jessie [22:15:49] 22:14:02 npm ERR! install Couldn't read dependencies [22:15:50] etc [22:15:53] https://integration.wikimedia.org/ci/job/npm-node-6-jessie/2859/console [22:16:57] Potentially REL1_23 thing [22:17:37] Roll on May 2017 [22:18:40] Oh, I see [22:20:16] 10Continuous-Integration-Config, 10MediaWiki-extensions-LiquidThreads: npm-node-6-jessie fails on LiquidThreads on REL1_23 - https://phabricator.wikimedia.org/T156859#2988046 (10Reedy) [22:20:24] 10Continuous-Integration-Config, 10MediaWiki-extensions-LiquidThreads: npm-node-6-jessie fails on LiquidThreads on REL1_23 - https://phabricator.wikimedia.org/T156859#2988058 (10Reedy) p:05Triage>03Lowest [22:35:59] 10Continuous-Integration-Config, 10MediaWiki-extensions-LiquidThreads: npm-node-6-jessie fails on LiquidThreads on REL1_23 - https://phabricator.wikimedia.org/T156859#2988144 (10hashar) 05Open>03declined Yup the package.json with a test script has been introduced in a later branch. Zuul has support to ski... [22:42:21] lol [22:44:49] 10Deployment-Systems, 03Scap3, 10scap: scap wikiversions compile happening too late in scap sync - https://phabricator.wikimedia.org/T156851#2988199 (10bd808) Moving `tasks.sync_common` after `wikiversions-compile` certainly would have caused this. Moving the sync **before** calling wikiversions-compile was... [23:09:04] 10Gerrit, 10MediaWiki-Vagrant, 13Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#2988333 (10Paladox) Bug report already filled upstream on gerrit https://bugs.chromium.org/p/gerrit/issues/detail?id=2295 [23:19:59] thcipriani: there's some bug related to mysql randomly dying on precise slaves [23:20:31] I had some half-memory of that. [23:20:42] I don't remember if there is a resolution? [23:22:53] [14:17:37] Roll on May 2017 [23:34:52] thcipriani: https://gerrit.wikimedia.org/r/335373 [23:40:31] chasemp: neat. Logic looks reasonable. [23:40:48] the actual thresholds are from my limited watching just this afternoon [23:41:07] so guaranteed to be not ideal but in theory if we tweak we will find a place that it's only outliers [23:42:32] yup. The delete one will definitely come in handy -- I think that's where issues make themselves known. I'm not sure about the used value, but, as you say, will rough-hewn closer to correct over time. [23:42:44] well my thinking on used is [23:42:52] stuck tests and/or tests that are invalid in duration [23:42:59] that's probably me being bullheaded tho [23:43:20] I think there should be a max test run time tbh