[00:20:07] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:22:55] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [01:08:50] 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Update gerrit to 2.14 - https://phabricator.wikimedia.org/T156120#3159439 (10Paladox) Also they are updating jgit to 4.7.0 which includes sha1 collision detection. Looks like that is a security update. [02:24:07] (03CR) 10Krinkle: [C: 04-1] Cache node_modules (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/346152 (https://phabricator.wikimedia.org/T159591) (owner: 10Hashar) [06:23:48] Yippee, build fixed! [06:23:48] Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #322: 09FIXED in 1 hr 43 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/322/ [06:46:05] Project selenium-Wikibase » chrome,beta,Linux,BrowserTests build #322: 04FAILURE in 2 hr 6 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/322/ [07:49:27] 10Continuous-Integration-Config, 06Release-Engineering-Team: Priortize gerrit changesets that contain the comment of "SWAT" - https://phabricator.wikimedia.org/T162316#3158988 (10hashar) The underlying problem is to have MediaWiki patches for Wikimedia to be prioritized or at least in a different queue than th... [07:49:42] 10Continuous-Integration-Infrastructure (Little Steps Sprint): Create "High Priority" gate-and-submit pipeline - https://phabricator.wikimedia.org/T160668#3107085 (10hashar) [07:49:44] 10Continuous-Integration-Config, 06Release-Engineering-Team: Priortize gerrit changesets that contain the comment of "SWAT" - https://phabricator.wikimedia.org/T162316#3159675 (10hashar) [07:55:35] 10Continuous-Integration-Infrastructure (Little Steps Sprint): Create "High Priority" gate-and-submit pipeline - https://phabricator.wikimedia.org/T160668#3159683 (10hashar) [07:58:45] 10Continuous-Integration-Infrastructure (Little Steps Sprint): Create "High Priority" gate-and-submit pipeline - https://phabricator.wikimedia.org/T160668#3159689 (10hashar) [08:24:55] hashar: o/ - if you have time, I created hack_v2 for Redis.php - https://gerrit.wikimedia.org/r/#/c/346695/ [08:25:21] if I want to test it, should I just go under /srv/deployment/mediawiki/etc.. and copy the file in there ? [08:25:29] (restarting hhvm etc.. after that) [08:27:19] elukey: on the jobrunner02 yes [08:27:28] though I am not sure that is /srv/deployment [08:27:34] would expect it to be under /srv/mediawiki [08:27:52] note there is an auto deploy job as well that runs scap every 10 minutes or so [08:28:24] so most probably the patch will be overwritten at some point [08:28:55] thus we would to drop jobrunner02 from the scap targets [08:29:12] on deployment-tin that is /etc/dsh/group/mediawiki-installation:deployment-jobrunner02.deployment-prep.eqiad.wmflabs [08:29:13] which is populated by puppet/hiera [08:31:46] hashar: yep sorry /srv/mediawiki, got confused :) [08:31:50] elukey: https://gerrit.wikimedia.org/r/346705 [08:31:57] you can cherry pick that on the puppet master [08:32:02] run puppet on deployment-tin [08:32:15] and jobrunner02 should then no more be reached out by scap [08:32:19] and thus your live hack will stay [08:32:42] thanks :) [08:39:40] hashar: wouldn't it be cleaner to just update Hiera:deployment-prep? [08:39:51] temporary [08:40:58] or maybe disable the cron [08:41:12] ah no auto deploy might not be a cron [08:53:40] modified /etc/dsh/group/mediawiki-installation on deploymnet-tin and disable puppet for a bit [09:01:42] the auto deploy is handled by jenkins [09:01:50] there is a job that pull the repo [09:01:54] then trigger scap [09:09:58] hashar: disabling jobrunner02 from /etc/dsh/group/mediawiki-installation seems not working :( [09:10:26] wait, the file should be handled by puppet [09:10:26] mmmm [09:10:55] ok maybe it was my pebkac, retryting [09:24:35] hashar: don't see any more rsts :) [09:29:04] but now I am wondering if it would be the case of simply consuming the OK field [09:31:06] (03PS7) 10Hashar: mediawiki-core-selenium-jessie Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/324719 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [09:32:48] (03CR) 10jerkins-bot: [V: 04-1] mediawiki-core-selenium-jessie Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/324719 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [09:35:00] (03PS8) 10Hashar: mediawiki-core-selenium-jessie Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/324719 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [09:35:58] (03CR) 10jerkins-bot: [V: 04-1] mediawiki-core-selenium-jessie Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/324719 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [09:38:45] 10Browser-Tests-Infrastructure, 10RelatedArticles, 07JavaScript, 13Patch-For-Review, 15User-zeljkofilipin: Set up Selenium tests in Node.js for RelatedArticles extension - https://phabricator.wikimedia.org/T158052#3159918 (10zeljkofilipin) [09:38:48] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 4 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3159917 (10zeljkofilipin) [09:44:08] (03PS9) 10Hashar: mediawiki-core-selenium-jessie Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/324719 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [09:53:51] (03PS10) 10Hashar: Run webdriverio tests for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/324719 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [10:03:04] 10Continuous-Integration-Config: namespace webdriverio tests - https://phabricator.wikimedia.org/T162350#3159953 (10hashar) [10:08:10] 10Continuous-Integration-Config: namespace webdriverio tests - https://phabricator.wikimedia.org/T162350#3159965 (10hashar) [10:10:07] 10Continuous-Integration-Config: namespace webdriverio tests - https://phabricator.wikimedia.org/T162350#3159953 (10hashar) [10:22:27] 10Continuous-Integration-Config: namespace webdriverio tests - https://phabricator.wikimedia.org/T162350#3159980 (10hashar) [10:22:54] 10Continuous-Integration-Config: split mediawiki tests in unit/integration/smoke tests to speed up CI - https://phabricator.wikimedia.org/T162350#3159953 (10hashar) [10:36:24] (03CR) 10Zfilipin: [C: 031] "Looks good to me in general. A minor formatting notice: one line in commit message is really long." [integration/config] - 10https://gerrit.wikimedia.org/r/324719 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [10:39:14] definitely working, I haven't seen one RST during the past hour [10:39:30] I updated the hhvm issue, undoing my hack on deploymnet-tin [10:41:13] elukey: awesome!!! [10:41:32] elukey: dont forget to clearout the patch from the puppetmaster so jobrunner02 get synced again :} [10:41:50] then I guess we can polish up that RedisMonkeyPatch and send it upstream to HHVM [10:41:56] maybe with a test if they have any [10:42:27] didn't apply the patch to puppetmaster, just removed the host from mw dsh on deployment-tin :) [10:42:31] ahah [10:43:09] I listed the two possible fixes in https://github.com/facebook/hhvm/issues/7757#issuecomment-292124015 [10:44:03] elukey: potentially one has to sign a CLA with Facebook which moritz did on https://github.com/facebook/hhvm/pull/7766 [10:44:35] and I have no idea whether Wikimedia signed the cla [10:47:57] * hashar heads out for lunch [10:48:09] :O [11:31:12] stupid mailling lists [11:31:15] I still havent lunchd [12:11:29] (03CR) 10Hashar: [C: 032] dib: allow Linux memory overcommit on Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/346634 (https://phabricator.wikimedia.org/T125050) (owner: 10Hashar) [12:12:25] (03Merged) 10jenkins-bot: dib: allow Linux memory overcommit on Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/346634 (https://phabricator.wikimedia.org/T125050) (owner: 10Hashar) [12:13:22] !log Updating Nodepool Trusty image to let Linux overcommit memory ( https://gerrit.wikimedia.org/r/#/c/346634/ ) [12:13:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:18:26] hashar: sorry to bother you again, but what is the usual procedure to test a change like https://gerrit.wikimedia.org/r/#/c/346508/ in deployment-prep? [12:18:35] (this is for the logstash spam) [12:23:42] !log Image snapshot-ci-trusty-1491480759 in wmflabs-eqiad is ready [12:23:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:25:04] elukey: we get it merged via a -labs.php config change [12:25:55] elukey: or use some if ( $wmfRealm == 'labs' ) { /* blabla */} [12:26:28] mmmm [12:26:44] elukey: ah wmf-config/jobqueue-labs.php [12:27:22] jobqueue.php is not used on beta [12:27:40] require $wmfRealm === 'labs' [12:27:40] ? "$wmfConfigDir/jobqueue-labs.php" [12:27:42] : "$wmfConfigDir/jobqueue.php"; [12:33:48] hashar: for prioritzing wmf branches for mediawiki and such couldnt you just have it to where if a patchset in wmf branch triggers a job make the job a job that would only run in test-prio? [12:35:48] (03PS3) 10Zppix: Add more jobs to test-prio [integration/config] - 10https://gerrit.wikimedia.org/r/346656 [12:36:27] Zppix: the thing to prioritize is a patch that received CR+2 and is in gate-and-submit [12:36:42] where it could well be behind changes made to the master branch [12:37:43] so we dont want wmf branches in test-prio? i thought thats what we wanted to stop having to wait forever on tests on stuff we need sooner rather then later [12:40:02] not really [12:40:15] if one is in a hurry, the patch is typically send then immediately voted CR+2 [12:40:22] we could even skip tests entirely [12:40:30] and just care about CR+2 / gate-and-submit [12:40:54] but in most cases the patches are created ahead of time so we run tests but there is no urgency to provided tests results [12:40:59] since the deployment would happen later [12:42:11] hashar: and why can we just do what we did with test-prio and just do it with gate? whats the difference [12:42:17] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata, 13Patch-For-Review: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3160248 (10hashar) The Trusty instances that run the php5 jobs now allow Linux to over commit the memory. So a `fork()`... [12:43:36] Zppix: this morning I slightly rewritten the task description for https://phabricator.wikimedia.org/T160668 [12:43:53] it has some explanation and you would want to grab all the context from https://docs.openstack.org/infra/zuul/gating.html [12:44:02] which explains how the gate-and-submit thing works [12:44:12] in short it is done per job/repo [12:44:14] and the branch is not taken in account [12:44:29] AND the patches are serialized/ depend on each other [12:44:49] unlike test pipelines in which changes are more or less independent [12:45:24] maybe we could dedicate a new instance to jobs running on an wmf branch? [12:46:58] it is irrevelant [12:47:27] if you +2 three patches A,B,C for mediawiki/core master branch [12:47:29] hashar: not really, if we do that we can speed up the jobs by making it where theres an instance that only wmf branches can use [12:47:35] you get a queue A <- B <- C [12:47:52] when one then +2 a patch D for a wmf branch, it is added at the end of the queue eg: [12:47:57] A <- B <- C <- D [12:48:16] hence D (for a wmf branch) depends on the three other patches to complete/pass [12:48:27] when we would really want a queue for the wmf branches patches [12:48:29] and eg have: [12:48:36] 1st queue: A <- B <- C [12:48:44] 2nd queue (wmf branch only): D [12:48:55] and then D get merged independently of the patches A B C for the master branch [12:49:19] is there not a way to say have zuul wait a few mins to search for an queued wmf branch change if theres one move it to the front of queue [12:49:57] that can be done manually [12:50:12] known as promoting a change (via a command line that has to be run on the server: zuul promote) [12:50:22] hashar: if it can be done manually couldnt we create a bot that could do what could be done manually automatically [12:50:29] so if one has: A <- B <- C <- D (that one for wmf) [12:50:42] we could manually promote D at the head of the queue and end up with: [12:50:48] D (wmf) <- A <- B <- C [12:50:58] which also has the side effect of cancelling all jobs [12:51:12] so no really, we need a second pipeline [12:51:23] hashar: have a bot automatically zuul promote wmf branches then requeue the other jobs? [12:51:44] that cancel the jobs [12:51:52] and Zuul is already a bot! [12:52:04] "just" have to make Zuul do what we want [12:52:29] zeljkof: so went with some other duties and I havent reviewed nor deployed the qunit/selenium job [12:53:12] hashar: if we made it to where phab is where wmf branch changes where handled instead of gerrit could that give us more room to do things [12:53:30] that would atleast take zuul out of the picture [12:53:58] hashar: no rush, we can do it later today, or tomorrow, Friday is a good time to deploy ;) [13:25:09] umm, can someone explain me this weird behavior: [13:26:08] $ php ../../../../tests/phpunit/phpunit.php insertables/TranslatablePageInsertablesSuggesterTest.php [13:26:11] Fatal error: unknown class MediaWikiInsertablesSuggesterTest in /www/dev.translatewiki.net/docroot/w/extensions/Translate/tests/phpunit/insertables/TranslatablePageInsertablesSuggesterTest.php on line 10 [13:26:15] $ php ../../../../tests/phpunit/phpunit.php insertables [13:26:17] OK (17 tests, 17 assertions) [13:26:40] why doesn't it work when run via Jenkins: https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm-jessie/10001/console ? [13:26:58] 10Continuous-Integration-Config, 06Labs, 10MediaWiki-extensions-Scribunto, 10Wikidata: For contintcloud either add RAM or Swap to the instances - https://phabricator.wikimedia.org/T162166#3160294 (10chasemp) >>! In T162166#3158827, @hashar wrote: > Oops I forgot about overcommit_memory which I mentioned on... [13:28:46] Nikerabbit: your include_path has . or something like that and MediaWikiInsertablesSuggesterTest hasn't been added to wgTestAutoloader or whatever class map we use for tests? [13:32:00] hashar: I don't populate $wgTestAutoloader at all [13:33:42] Nikerabbit: well apparently that is ./tests/common/TestsAutoLoader.php [13:33:46] for core [13:34:05] and for an extension I can't remember really [13:35:01] hashar: okay, thanks. I'll just avoid depending on other test cases [13:35:47] Project beta-scap-eqiad build #149689: 04FAILURE in 47 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/149689/ [13:36:21] bah [13:39:37] Project beta-scap-eqiad build #149690: 04STILL FAILING in 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/149690/ [13:45:48] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #359: 04FAILURE in 1 min 48 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/359/ [13:45:49] Project beta-scap-eqiad build #149691: 04STILL FAILING in 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/149691/ [14:03:34] Yippee, build fixed! [14:03:35] Project beta-scap-eqiad build #149692: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/149692/ [14:10:35] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata, 13Patch-For-Review: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3160404 (10hashar) Seems the Wikibase related jobs that use Scribunto are passing just fine. That does not close that... [14:18:56] 10Continuous-Integration-Config, 06Labs, 10MediaWiki-extensions-Scribunto, 10Wikidata: For contintcloud either add RAM or Swap to the instances - https://phabricator.wikimedia.org/T162166#3160433 (10hashar) `vm.overcommit_memory=1` fixed the fork issue (T125050). Hence there is no need to add swap or 2GB... [14:20:41] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata, 13Patch-For-Review: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3160440 (10chasemp) [14:20:41] 10Continuous-Integration-Config, 06Labs, 10MediaWiki-extensions-Scribunto, 10Wikidata: For contintcloud either add RAM or Swap to the instances - https://phabricator.wikimedia.org/T162166#3160436 (10chasemp) 05Open>03Resolved a:03chasemp Let's visit the disk usage issue later if needed, I don't want... [14:21:07] Project beta-update-databases-eqiad build #16183: 04FAILURE in 1 min 6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/16183/ [14:23:38] 10Scap, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint: Portals deployment failed - https://phabricator.wikimedia.org/T161832#3160446 (10debt) Hopefully we'll be able to get out the stats/translations update and the other bug fixes soon, yay! Thanks again, @demon ! [14:35:57] Yippee, build fixed! [14:35:58] Project beta-update-databases-eqiad build #16184: 09FIXED in 1 min 32 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/16184/ [14:42:37] the beta failures were due to the deployment of the 3D extension [14:42:42] (which also depends on mmv [15:47:00] 10Browser-Tests-Infrastructure, 10RelatedArticles, 07JavaScript, 13Patch-For-Review, 15User-zeljkofilipin: Set up Selenium tests in Node.js for RelatedArticles extension - https://phabricator.wikimedia.org/T158052#3160657 (10Jdlrobson) [16:22:06] Project language-screenshots-VisualEditor » chrome,Windows 10,ci-jessie-wikimedia build #54: 09SUCCESS in 9 min 8 sec: https://integration.wikimedia.org/ci/job/language-screenshots-VisualEditor/BROWSER=chrome,PLATFORM=Windows%2010,label=ci-jessie-wikimedia/54/ [16:36:36] !log staging ores:554ea12 [16:36:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:05:12] PROBLEM - Free space - all mounts on integration-slave-jessie-1002 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1002.diskspace.root.byte_percentfree (<11.11%) [19:08:28] 10Scap (Scap3-Adoption-Phase1), 10RESTBase, 13Patch-For-Review, 06Services (doing), 15User-mobrovac: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#3161478 (10mobrovac) @akosiaris @thcipriani could we schedule a slot next week where all of us are present so that we can switch REST... [19:10:39] 10Scap (Scap3-Adoption-Phase1), 06Operations, 10RESTBase, 13Patch-For-Review, and 2 others: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#3161485 (10mobrovac) [19:14:07] 10Scap (Scap3-Adoption-Phase1), 06Operations, 10RESTBase, 13Patch-For-Review, and 2 others: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#3161487 (10akosiaris) Yes, fine by me. Monday after/before the Ops meeting ? [19:17:15] 10Scap (Scap3-Adoption-Phase1), 06Operations, 10RESTBase, 13Patch-For-Review, and 2 others: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#3161490 (10thcipriani) >>! In T116335#3161487, @akosiaris wrote: > Yes, fine by me. Monday after/before the Ops meeting ? I could be around 3pm... [19:18:36] 10Scap (Scap3-Adoption-Phase1), 06Operations, 10RESTBase, 13Patch-For-Review, and 2 others: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#3161503 (10mobrovac) Great! Let's settle for 15h UTC on Monday then. [19:33:07] (03PS1) 10D3r1ck01: Add Wikimedia-Emoji-Bot repo for integration testing [integration/config] - 10https://gerrit.wikimedia.org/r/346819 [19:41:13] (03CR) 10Dereckson: [C: 04-1] "The pipeline looks good to me, excepted there is a typo." (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/346819 (owner: 10D3r1ck01) [19:43:55] (03PS2) 10D3r1ck01: Add Wikimedia-Emoji-Bot repo for integration testing [integration/config] - 10https://gerrit.wikimedia.org/r/346819 [19:45:15] (03CR) 10Dereckson: [C: 031] "We can trust the author as a former GSoC (2016 edition) student, and the pipelines templates looks good to me for a Node application with " [integration/config] - 10https://gerrit.wikimedia.org/r/346819 (owner: 10D3r1ck01) [19:47:11] (03CR) 10Paladox: Add Wikimedia-Emoji-Bot repo for integration testing (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/346819 (owner: 10D3r1ck01) [19:48:38] (03CR) 10Dereckson: [C: 031] Add Wikimedia-Emoji-Bot repo for integration testing (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/346819 (owner: 10D3r1ck01) [19:49:08] (03PS3) 10D3r1ck01: Add Wikimedia-Emoji-Bot repo for integration testing [integration/config] - 10https://gerrit.wikimedia.org/r/346819 [19:50:00] (03CR) 10Paladox: [C: 031] Add Wikimedia-Emoji-Bot repo for integration testing [integration/config] - 10https://gerrit.wikimedia.org/r/346819 (owner: 10D3r1ck01) [19:59:33] (03PS1) 10D3r1ck01: Add email of author/contributor [integration/config] - 10https://gerrit.wikimedia.org/r/346825 [20:15:18] (03CR) 10Dereckson: [C: 031] "D3r1ck01 is a former GSoC student (2016), so I think we can trust it to run unit tests." [integration/config] - 10https://gerrit.wikimedia.org/r/346825 (owner: 10D3r1ck01) [20:19:23] (03CR) 10Paladox: [C: 031] "Also this users has more then 5 merges https://gerrit.wikimedia.org/r/#/q/owner:%22D3r1ck01+%253Calangiderick%2540gmail.com%253E%22" [integration/config] - 10https://gerrit.wikimedia.org/r/346825 (owner: 10D3r1ck01) [20:20:49] (03CR) 10Zppix: [C: 031] "I see no harm in allowing this user access to recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/346825 (owner: 10D3r1ck01) [20:48:29] PROBLEM - Puppet run on deployment-urldownloader is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:06:45] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 78.57% of data above the critical threshold [140.0] [21:07:19] https://integration.wikimedia.org/zuul/ [21:09:18] paladox: yes, the security release today [21:09:30] yep [21:10:38] with that zuul backlog lets hope it dont deadlock [21:11:04] We've had worse backlogs [21:11:08] It's only ~30 changes per queue [21:11:09] zuul is made for this i think. As openstack always has alot of changes. [21:12:02] to be fair zuul doesnt have a great reputation for being stable with lots of changes in queue :P [21:12:35] Zppix zuul uses jenkins [21:12:56] It uses a jenkins plugin call gaemon [21:13:10] *gearman [21:13:13] which then finds any instances available and starts the tests. [21:13:16] Zppix: That's nodepool's fault, not zuul [21:13:32] RainbowSprinkles: thats what i meant i was just about to correct myself [21:13:46] * paladox wishes nodepool could be like a static instance and run mutiple changes. [21:14:13] Chad committed the changes in a way to reduce the overall load to CI [21:14:17] paladox: i believe the trusty instances are static? [21:14:30] Well some are but most are nodepool [21:14:47] paladox: jessies are meant to be non-static [21:15:05] why isnt castor instance never used? [21:15:08] We have static jessie instances too [21:15:19] I forgot what castor does. [21:15:29] it is only used for castor-save [21:18:25] Is the tests meant to show cancelled? [21:18:43] paladox: depends what were they? [21:18:56] gate and submit [21:19:07] paladox: if they were v+2'd i think [21:19:09] yes [21:19:09] https://integration.wikimedia.org/zuul/ [21:19:10] mediawiki/core [21:19:16] but it shows one change as passed but in zuul it shows as cancelled. [21:19:27] v+2 should not have cancelled it. [21:20:02] paladox: i dont know maybe the operations patch that in test-prio took over it and zuul plans to requeue it after? [21:20:56] the ones on the right show queued [21:21:09] woops i meant left [21:22:19] paladox: if it was something really bad im sure something would of alerting someone [21:22:47] why is ci-jessie-wikimedia-603296 offline [21:23:22] woops wrong one [21:23:33] i meant ci-trusty-wikimedia-603274 [21:24:11] it's being deleted [21:24:26] greg-g: why? [21:24:34] Zppix because it is not static [21:24:36] https://wikitech.wikimedia.org/wiki/Nodepool [21:24:43] might want to read up on that first [21:25:01] oh i thought you meant deleted as in no longer a instance we use [21:25:20] Nodepool is when instances are created. It increases security if it is done like that. [21:25:42] paladox: i know i didnt realise greg was meaning deleted as in nodepool [21:25:50] all of those are nodepool instances [21:25:53] so it's presumed [21:26:17] greg-g: well i've never noticed nodepool taking that long to delete before [21:38:20] Shit happens [21:38:22] Things break [21:38:24] It's also busy [21:40:19] doesn't zuul web ui auto refresh usually? [21:41:23] paladox: do you still see an puppet patch on zuul in test-prio pipeline? [21:42:03] Hmm strange it has been tested [21:42:14] but it should have shown the results on the change [21:42:15] paladox: ive refreshed like 20 times and its still there [21:42:40] Have you not got anything important to actually worry about? [21:43:08] it's gone now. [21:43:08] Reedy: with test-prio being a new pipeline its easier to fix stuff now then later on when theres more things running on the pipeline [21:43:09] Reedy nope [21:43:28] paladox: i think its just zuul being busy perhaps [21:43:31] There's a lot running now [21:43:35] Yep [21:43:43] paladox: ill keep my eye on it [21:43:45] Now isn't the time to be trying to fix things like that [21:44:11] Reedy: we're not fixing anything we're just making sure everything is working like it should thats all [21:44:18] It is working just fine [21:44:24] Go grab a coffee, take a walk, read a book [21:44:30] Refreshing won't make it go faster [21:49:50] lol [21:49:58] I was busy doing https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/trilead-ssh2/build217-jenkins-9-SNAPSHOT/ :) [23:09:46] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] [23:18:57] 10Continuous-Integration-Config, 10MediaWiki-Unit-tests, 05MW-1.27-release: REL1_27 qunit karma:main tests pass intermittently - https://phabricator.wikimedia.org/T162419#3162175 (10Reedy) [23:19:04] 10Continuous-Integration-Config, 10MediaWiki-Unit-tests, 05MW-1.27-release: REL1_27 qunit karma:main tests pass intermittently - https://phabricator.wikimedia.org/T162419#3162189 (10Reedy) p:05Triage>03Normal [23:24:51] 10Continuous-Integration-Config, 10MediaWiki-Unit-tests, 05MW-1.27-release: REL1_27 qunit karma:main tests pass intermittently - https://phabricator.wikimedia.org/T162419#3162175 (10Paladox) I believe this is because of wikibase. https://gerrit.wikimedia.org/r/#/c/346916/ <-- i uploaded that for REL1_27. [23:25:40] 10Continuous-Integration-Config, 10MediaWiki-Unit-tests, 05MW-1.27-release: REL1_27 qunit karma:main tests pass intermittently - https://phabricator.wikimedia.org/T162419#3162204 (10Paladox) See https://gerrit.wikimedia.org/r/#/c/329775/ [23:29:59] * RainbowSprinkles glares at Krinkle [23:30:32] * Krinkle tries again [23:31:22] RainbowSprinkles: It certainly appears as if the patch correctly provides qqq [23:31:29] but the banana disagrees. [23:32:39] I'm looking at that wikibase change you merged when I'm trying to get security patches through. Wikibase is gonna take up like 12 nodepool instances ;p [23:32:49] * RainbowSprinkles throws a pillow at Krinkle [23:32:58] RainbowSprinkles: Oh, it will fix the build though. [23:33:14] * Krinkle throws another - but you're bypassing Jenkins, right? [23:33:16] It's only intermittently failing [23:33:17] lol [23:33:19] (no I'm not) [23:34:19] Krinkle: I did do something that got it all moving faster. I submitted them all with CR+2 already attached so they went straight to gate-and-submit and I didn't have to go click +2 on ~30 changes [23:34:20] :) [23:34:21] RainbowSprinkles: I can't imagine it being intermittent, the failure is deterministic always has been (when we originally encountered it) [23:34:31] Maybe, I dunno [23:35:19] Wikibase mwext-mw-selenium-composer-jessie is failing on REL1_27 [23:35:21] what a surprise [23:35:22] .. [23:35:39] These old integration tests, maybe we should disable them on the older branches [23:37:29] Wouldn't be a terrible idea [23:37:46] Maybe we should just stop putting things into older branches full stop [23:38:51] I'm ok with that idea too :p [23:38:53] Old branches suck [23:38:54] lol [23:39:45] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] [23:58:50] RainbowSprinkles: I'm cancelling a bunch of doxygen post-merge builds