[04:18:21] Yippee, build fixed! [04:18:22] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #220: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/220/ [05:27:42] 03Scap3, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, 05Language-Engineering October-December 2016, and 4 others: Enable Scap3 config deploys for CXServer - https://phabricator.wikimedia.org/T147634#2837285 (10KartikMistry) @akosiaris @mobrovac Should we schedule this now? [06:56:40] 06Release-Engineering-Team, 10ChangeProp, 06Operations, 06Parsing-Team, and 4 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2837406 (10Joe) [06:59:28] Project selenium-Wikibase » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #194: 04FAILURE in 2 hr 19 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/194/ [08:15:35] (03PS4) 10Hashar: Switch mediawiki-extensions-* jobs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324477 (https://phabricator.wikimedia.org/T135001) [08:17:47] (03CR) 10Hashar: [C: 032] Switch mediawiki-extensions-* jobs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324477 (https://phabricator.wikimedia.org/T135001) (owner: 10Hashar) [08:18:42] (03Merged) 10jenkins-bot: Switch mediawiki-extensions-* jobs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324477 (https://phabricator.wikimedia.org/T135001) (owner: 10Hashar) [09:26:18] (03PS1) 10Hashar: Mw extensions jobs for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324682 (https://phabricator.wikimedia.org/T137199) [10:01:47] (03PS1) 10Zfilipin: WIP RSpec tests for Echo Mention notificationon [integration/config] - 10https://gerrit.wikimedia.org/r/324687 (https://phabricator.wikimedia.org/T146916) [10:12:08] (03CR) 10Hashar: WIP Run experimental Node.js Selenium job for mediawiki/core in experimental pipeline (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/324416 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [10:22:54] 03Scap3, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, 05Language-Engineering October-December 2016, and 4 others: Enable Scap3 config deploys for CXServer - https://phabricator.wikimedia.org/T147634#2837804 (10akosiaris) @KartikMistry Sounds fine to me [10:23:54] (03PS2) 10Zfilipin: WIP Run experimental Node.js Selenium job for mediawiki/core in experimental pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/324416 (https://phabricator.wikimedia.org/T139740) [10:24:38] (03CR) 10Zfilipin: WIP Run experimental Node.js Selenium job for mediawiki/core in experimental pipeline (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/324416 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [10:35:05] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2837830 (10hashar) So the job does: ``` - prepare-mediawiki-zuul-project-no-vendor - zuul-cloner:... [10:48:09] (03PS3) 10Zfilipin: Run Node.js Selenium job for mediawiki/core in experimental pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/324416 (https://phabricator.wikimedia.org/T139740) [10:56:58] (03PS4) 10Hashar: Run Node.js Selenium job for mediawiki/core in experimental pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/324416 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [10:59:10] (03CR) 10Hashar: [C: 032] Run Node.js Selenium job for mediawiki/core in experimental pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/324416 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [11:00:59] (03Merged) 10jenkins-bot: Run Node.js Selenium job for mediawiki/core in experimental pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/324416 (https://phabricator.wikimedia.org/T139740) (owner: 10Zfilipin) [11:26:24] zeljkof: and for the Echo / rspec thing, we will do the same thing we have paired on this morning [11:27:10] hashar: agreed [11:31:56] (03PS1) 10Tobias Gritschacher: Change reciepients for Wikibase browsertests [integration/config] - 10https://gerrit.wikimedia.org/r/324698 (https://phabricator.wikimedia.org/T150856) [11:34:42] hashar: would it be hard to get Oozie checks for analytics? I'm investigating but having some hints on where to look at would help :) [11:46:19] (03PS2) 10Tobias Gritschacher: Change reciepients for Wikibase browsertests [integration/config] - 10https://gerrit.wikimedia.org/r/324698 (https://phabricator.wikimedia.org/T150856) [12:36:31] (03PS2) 10Hashar: Mw extensions jobs for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324682 (https://phabricator.wikimedia.org/T137199) [12:50:52] (03Abandoned) 10Hashar: Mw extensions jobs for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/324682 (https://phabricator.wikimedia.org/T137199) (owner: 10Hashar) [12:51:16] (03CR) 10Hashar: "I have missed this change. Going to squash my change https://gerrit.wikimedia.org/r/#/c/324682/ into your :]" [integration/config] - 10https://gerrit.wikimedia.org/r/292509 (https://phabricator.wikimedia.org/T137199) (owner: 10Paladox) [12:54:37] (03PS16) 10Hashar: Mw extensions jobs for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/292509 (https://phabricator.wikimedia.org/T137199) (owner: 10Paladox) [13:04:00] (03PS17) 10Hashar: Mw extensions jobs for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/292509 (https://phabricator.wikimedia.org/T137199) (owner: 10Paladox) [13:06:50] (03CR) 10Hashar: [C: 032] "Lets do it now that we have more quota on Nodepool!" [integration/config] - 10https://gerrit.wikimedia.org/r/292509 (https://phabricator.wikimedia.org/T137199) (owner: 10Paladox) [13:08:55] (03Merged) 10jenkins-bot: Mw extensions jobs for Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/292509 (https://phabricator.wikimedia.org/T137199) (owner: 10Paladox) [13:20:26] 05Continuous-Integration-Scaling: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342#2838100 (10hashar) ``` 13:10:02,299 DEBUG nodepool.ProviderManager: Status of server in wmflabs-eqiad ce709ec4-a85c-4359-be5f-7ee2fb09a6d1: ACTIVE 13:10:02,299 DEBUG... [13:24:23] (03CR) 10Jonas Kress (WMDE): [C: 031] Change reciepients for Wikibase browsertests [integration/config] - 10https://gerrit.wikimedia.org/r/324698 (https://phabricator.wikimedia.org/T150856) (owner: 10Tobias Gritschacher) [13:45:50] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #230: 04FAILURE in 1 min 49 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/230/ [13:48:03] Openstack is down [13:50:41] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: OpenStack API refuses to launch new instances || Nodepool is out of instance / CI stalled - https://phabricator.wikimedia.org/T152096#2838157 (10hashar) [13:55:43] hashar: so what's the deal,CI is dead atm? [13:56:12] that's a nonsensical statement, what is down exactly? [13:56:28] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: OpenStack API refuses to launch new instances || Nodepool is out of instance / CI stalled - https://phabricator.wikimedia.org/T152096#2838175 (10hashar) [13:56:44] chasemp: good morning! [13:57:19] it is all in the task but roughly nova.api.openstack.extensions got an exception: ImageNotAuthorized: Not authorized for image 84f2fcfb-7ac5-4c3b-9505-ada37cbcaebf. [13:57:45] so nodepool does some image creation overnight iirc and now it cannot access the image created [13:58:00] hashar: is CI down entirely then? what's the impact? [13:58:17] the new snapshots are created at 14:14UTC, so in 15 minutes from now [13:58:30] seems the first occurence was at 13:30 or half an hour ago [14:01:10] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: OpenStack API refuses to launch new instances || Nodepool is out of instance / CI stalled - https://phabricator.wikimedia.org/T152096#2838181 (10hashar) The ImageNotAuthorized reference the images: 84f2fcfb-7ac5-4c3b-9505-ada37cbcaebf... [14:01:15] hashar: are these contintcloud instances usually size 'small' or 'medium'? [14:01:24] m1.medium iirc [14:01:31] 2 cpu 4GB RAM 40G disk [14:01:51] has anything changed roughly half an hour ago ? [14:02:18] no [14:02:25] did you get the patch in to deal w/ leaking instances yet? [14:02:37] nop [14:03:06] hashar: we're talking about the 'contintcloud' project, right? [14:03:12] I just created a VM there using that image [14:03:12] yes yes [14:03:20] oh [14:03:32] (03PS1) 10Zfilipin: WIP mediawiki-core-selenium-jessie is running by default but not voting [integration/config] - 10https://gerrit.wikimedia.org/r/324719 (https://phabricator.wikimedia.org/T139740) [14:04:23] well, hm, maybe I spoke too soon [14:04:56] there is some contint-imagetest spawned [14:05:40] which seems to work just fine [14:06:03] yeah, that's the one [14:06:09] maybe because you did it as "novaadmin" [14:06:40] maybe, althouhg it should still check project permissions [14:06:49] going to try through horizon with nodepool credentials [14:06:54] ok [14:07:06] oh no [14:07:09] I cant :D [14:07:13] due to two-factor [14:08:52] hashar: do you have the id of an image that worked properly, recently? [14:09:12] trying with " openstack server create --image ci-jessie-wikimedia-1480515240 --flavor m1.medium hashar-spawn " [14:09:23] the image are from yesterday [14:09:41] oh, so this same image worked fine until a few minutes ago? [14:09:54] andrewbogott: I believe hashar has indicated this is the same image that was used for the previous 20+ hours and the new image isn't meant to be created for another 10m [14:09:57] seems using the CLI tool I can get an image to spawn just fine bah [14:11:31] I am force refreshing the jessie snapshot [14:11:34] hashar: what happens if enough leaked instances cause no slots for the snapshot creation? [14:11:35] maybe that will clear something out [14:11:49] chasemp: the snapshot will fail [14:11:59] I only see two VMs in that project [14:12:02] and nodepool keeps using the last image that got generated in the last run (eg 24 hours ago) [14:12:09] so presumably the others have been cleaned up properly [14:12:12] hashar-spawn is the one I created manually [14:12:22] ci-jessie-wikimedia-1480601060 is a snapshot being created [14:12:22] hm, why don't I see that one? [14:12:29] me too andrewbogott, two only [14:12:39] ah guess I have deleted hashar-spawn already bah [14:12:41] sorry :( [14:12:46] nah, it's ok [14:12:48] anyway [14:12:52] looks like openstack works fine [14:12:53] just confirming that it's not the leak [14:12:56] but nodepool is somehow confused [14:13:08] also one VM just came up, ci-jessie-wikimedia-1480601060 [14:13:12] yeah [14:13:29] that is me forcing the creation of a new snapshot bsed on the ci-jessie-wikimedia image [14:13:37] ok [14:13:40] it spawned the instance just find and its provisionning right now [14:13:43] so openstack looks fine [14:13:49] but nodepool get confused somehow :( [14:14:05] seems like, although that error message is so specific [14:14:17] right [14:14:28] that error message is no accident and def not sourced from nodepools end [14:15:06] nodepool could be doing something insane to generate it tho [14:15:32] (03CR) 10Zfilipin: [C: 032] "Nobody complained in more than a week. One +1. Self merging." [selenium] - 10https://gerrit.wikimedia.org/r/312047 (https://phabricator.wikimedia.org/T146292) (owner: 10Zfilipin) [14:15:40] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: OpenStack API refuses to launch new instances || Nodepool is out of instance / CI stalled - https://phabricator.wikimedia.org/T152096#2838195 (10hashar) Spawning an instance from Horizon as novaadmin works fine (tested by Andrew). Tried... [14:16:24] !log Image ci-jessie-wikimedia-1480601060 in wmflabs-eqiad is ready | T152096 [14:16:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:17:58] (03CR) 10Daniel Kinzler: [C: 031] "yes, this is what we want." [integration/config] - 10https://gerrit.wikimedia.org/r/324698 (https://phabricator.wikimedia.org/T150856) (owner: 10Tobias Gritschacher) [14:18:07] created a new image ( 0d29c97d-390b-439b-9778-6c2171a7020b ) and that fails still bah [14:19:59] I can see the api throwing the error, so it's not nodepool's imagination [14:20:23] maybe Nodepool has lost track of its own credentials [14:21:21] hashar: are you doing the swat? [14:21:39] there are a few questions about it in -operations [14:22:21] hashar: I restarted the api with more logging turned on, can you prompt another instance creation? Or is that happening constantly regardless? [14:25:48] andrewbogott: constant spam [14:25:54] I did force the refresh of the jessie snapshot [14:25:56] but still the same deal [14:26:13] it looks like nodepool is using an expired token, do you know how to force it to refresh? [14:27:16] ohh [14:27:27] can it be an expired keystone token? [14:27:39] and thus ends up lacking the right to spawn an image via glance? [14:27:50] hashar: just to make sure I understood, 322667 is blocked, but I should deploy 324401? [14:28:05] hashar: that is not a stretch considering the error is not authorized [14:28:22] the glance logs are full of WARNING keystonemiddleware.auth_token [-] Authorization failed for token [14:29:12] I don't know why this would be happening today, unless we've gone a longer-than-ever time between nodepool restarts [14:29:41] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: OpenStack API refuses to launch new instances || Nodepool is out of instance / CI stalled - https://phabricator.wikimedia.org/T152096#2838232 (10hashar) The new Jessie snapshot has ID 0d29c97d-390b-439b-9778-6c2171a7020b and fails as w... [14:29:48] some expriy got reached ? [14:30:01] it has been running since Mon 2016-11-07 13:30:51 UTC [14:30:10] or 24 days ago [14:30:24] so if our expiry is that ... it has expired exactly at 13:30 [14:30:35] andrewbogott: [14:30:37] glance/glance-api.log:2015-12-10 04:14:21.429 30995 WARNING keystonemiddleware.auth_token [req-3d8b443d-785d-4833-aa50-2d9d4b2a17df nodepoolmanager contintcloud - - -] Identity response: {"error": {"message": "An unexpected error prevented the server from fulfilling your request: {'desc': \"Can't contact LDAP server\"} (Disable debug mode to suppress these details.)", "code": 500, "title": "Internal Server Error"}} [14:30:47] that is a round number of days, looks like Nodepool doesn't grab new tokens [14:30:55] is it possible nodepool tries to renew it's token and keystone gets caught up in an ldap restart? [14:31:21] chasemp: isn't that log message from a year ago? [14:31:41] dear god it is :) [14:32:36] In theory tokens live 7.1 days [14:32:41] 613440 seconds [14:33:23] also it's weird that nodepool's token worked long enough to get in the instance create but then glance hates it [14:33:27] disregarding nostalgia errors [14:33:29] glance/glance-api.log:2016-12-01 13:30:04.851 2223 WARNING keystonemiddleware.auth_token [-] Identity response: {"error": {"message": "An unexpected error prevented the server from fulfilling your request.", "code": 500, "title": "Internal Server Error"}} [14:33:35] seems like right about the time this began? [14:33:44] yes [14:33:49] hm, yeah [14:33:54] that started at 13:30 [14:34:28] hashar: did you attempt to force the nodepool service to re-up it's token? [14:34:34] nop [14:34:37] I havent touched anything [14:34:40] beside tailing logs [14:34:52] maybe I should restart nodepool ? [14:35:07] I think andrewbogott is looking at things let's coordindate if we are going to do it [14:35:14] yeah [14:35:37] maybe it is highlightning something deeper in openstack. But the token expiry sounds the easiest explanation so far [14:35:51] (left wondering why Nodepool manage to refresh the token just fine until now though) [14:36:53] Please update the topic [14:36:59] I guess nodepool is down [14:38:07] hashar: go ahead and restart nodepool. I'm not really learning anything from keystone at the moment. [14:38:10] andrewbogott: seems clear the token is bad and nodepool is still retrying every 5s or so [14:38:13] ok I'll do it [14:39:42] it's thinking about it.... [14:39:55] apparently [14:40:07] nodepool cancels builds [14:40:11] and flag them for deletion [14:40:49] restarted [14:41:30] seems busy now [14:41:44] glance is complaining less at least [14:41:57] or not [14:42:12] nodepool list has a lot of output [14:42:18] OS_TENANT_NAME=contintcloud openstack server list [14:42:19] does not [14:42:56] nodepool seems to think 19 instances are in delete state [14:42:57] "nodepool list" that its internal tracking table [14:43:04] which is disjoint from the status of openstack [14:43:08] it is going to delete all [14:45:34] no auth denied since 2016-12-01 14:40:02.334 2223 WARNING keystonemiddleware.auth_token [-] Authorization failed for token [14:46:00] nodepool seems to have accepted the deletens are done or invalid [14:46:33] but now it wants 19 instances /now/ [14:46:44] and it's going to take a minute [14:46:45] yeah since there is lot of demand [14:47:13] after the restart, nodepool has been stuck [14:47:28] well. I dont know what it was doing [14:48:01] it seems ok, its new build requests are queued up [14:48:09] yep [14:48:17] nothing fulfilled yet but hopefully it will [14:48:20] I'm still confused about why that token would have failed for glance but worked for nova [14:49:17] It was using something that looked like an actual token, not just "" or "-error" or something obviously broken [14:49:46] andrewbogott: do we know for sure it was the same token in both nova and glance cases? [14:50:20] hm... [14:50:28] hard to tell, I think only keystone logged the token [14:50:50] but the glance request is made /by/ nova [14:50:56] and I have no idea how Nodepool manage the tokens [14:51:44] andrewbogott: right, that's what makes it so confusing? [14:52:24] yeah. It's possible that the image lookup is the first auth thing that nova does, but that seems unlikely [14:52:53] it seems like we are rolling now [14:53:57] the thing to do is produce a curl request to nova with an invalid token and see what kind of error it throws [14:54:11] That should be possible albeit annoying… I'll try [14:54:14] cool [14:54:43] my wife is back home. I will be back in roughly 10 minutes [14:54:52] ok [14:55:46] (03Merged) 10jenkins-bot: Helper that allows you to query whether JavaScript module has loaded [selenium] - 10https://gerrit.wikimedia.org/r/312047 (https://phabricator.wikimedia.org/T146292) (owner: 10Zfilipin) [14:55:49] andrewbogott: I'm going to make a note about the invalid token and close that task [14:56:10] unless you object or would like to hold it open [14:57:35] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: OpenStack API refuses to launch new instances || Nodepool is out of instance / CI stalled - https://phabricator.wikimedia.org/T152096#2838313 (10chasemp) 05Open>03Resolved a:03chasemp Somehow nodepool was using an invalid token. We... [14:59:56] andrewbogott: chasemp: so could it be nodepool just having some wierd bug [15:00:06] or is that some issue on the keystone/glance side ? [15:00:27] maybe it is a one off error. Nodepool has been up for 3 weeks, 3 days [15:01:23] hashar: I'm not sure. I'm pretty sure that the main cause is nodepool running for longer than usual between restarts [15:01:30] but I'm still curious about why the error was in glance [15:01:31] and there Nodepool doesn't log the low level requests it is doing. But maybe that can be enabled somehow [15:02:06] maybe nodepool ask to launch an instance to nova, then nova ask glance access to the image [15:02:14] and since the token is somehow expired, glance refuses it [15:02:37] maybe glance expect fresher tokens [15:09:50] hashar: I've confirmed that if I ask nova to create a VM with a bad token it doesn't make it as far as glance [15:10:05] but maybe I'm overthinking this, it could be that nodepool just checks in with glance before it even makes the nova call... [15:10:26] or, rather, asks nova to check in with glance [15:10:38] hashar: if you can turn on better logging for the future, that'd be good [15:13:13] from the stacktrace, seems nodepool uses the nova client [15:13:25] and does a POST to /v2/contintcloud/servers [15:13:28] can you tell what api call it was making when it hit the failure? [15:13:42] yeah, ok, that's weird then :( [15:14:02] according to the client side of the stacktrace on https://phabricator.wikimedia.org/T152096 [15:14:56] yeah, ok [15:15:07] on server side [15:15:14] so all I can think is that nova uses some kind of cache to avoid validating a token every time [15:15:19] and that cache was out of sync with glance [15:15:56] the server side images shows compute.api methods create_instance > _get_image [15:16:02] andrewbogott: that seems likely [15:16:07] then get_show_deleted which relies on glance and that one raise an exception [15:16:46] so one off error of some sort ? :( [15:17:51] that is terrible for a morning breakfast [15:17:56] thanks for the diagnostic/debugging at least! [15:18:43] yeah, looks like there's a client cache in the nova api [15:19:09] I don't think I'm curious enough to follow this to the end :) Maybe we should just have a cron restart nodepool once/week [15:19:55] ;D [15:20:07] we will see next week what happens [15:34:45] hashar thanks for merging my patch today :) [15:36:51] 05Continuous-Integration-Scaling: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342#2838431 (10hashar) I manually spawned an instance at `15:21:49`. Did a ping on it and network went up roughly a minute later at `15:22:52`. ssh client was apparently... [15:39:48] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:48:25] paladox: I wanted to move some jobs today, but got other issues :/ [15:48:33] Oh [15:48:39] What other issues do you have? [15:50:17] nodepool exploded [15:51:04] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:51:56] Oh [15:52:03] hashar how did it manage to explode? [15:52:31] i guess these are all bugs, some could be fixed in a future update, we could try to backport the patches. [16:03:14] paladox: some wierd condition where nodepool had somehow a bad authentication token [16:03:17] restarting nodepool solved it [16:03:24] Oh [16:03:35] on another, eventually we will get the mw-testextension ( https://phabricator.wikimedia.org/T137199 ) migrated [16:03:40] I saw that they are saying to do that weekly (restart on a chron) [16:03:43] I have reused your patch from june, and tweaked it [16:03:48] Thanks [16:03:50] :) [16:03:54] from the few tests I did, it is probably going to work [16:04:05] :) :) [16:04:16] then can look at skins as dependencies :( [16:04:29] Yep [16:04:41] hashar i think i correctly fixed that [16:04:48] yeah gotta review your patch [16:04:55] i cleaned it up [16:05:02] in parameter_functions.py [16:05:09] too, makes it easy to add deps [16:05:45] just one thing though i am wondering will it correctly do the main ext then add the deps or will it add extensions/ on it or skins/ [16:05:46] ? [16:05:48] hashar ^^ [16:09:11] paladox: no clue :] [16:09:16] Oh ok [16:09:17] paladox: will review it tomorrow [16:09:22] and probably add some tests [16:09:24] ok thanks :) [16:09:29] hashar ive added tests [16:09:29] too [16:09:31] oh [16:09:33] awesome [16:09:34] In a seperate patch [16:09:50] Should show in the side of the patch related patches [16:10:22] We should get the tests to also test deps to make sure they exist too [16:12:00] I will look at both tomorrow :] [16:13:06] Ok thanks [16:13:07] :) [16:18:17] hashar it should now show clearly what the deps are and should be easy to just add deps [16:18:39] 05Continuous-Integration-Scaling: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342#2838534 (10hashar) I noticed we have two interfaces configured: /etc/network/interfaces.d/eth0 /etc/network/interfaces.d/eth1 That is done by diskimage-buil... [16:18:48] we need to add extensions/ or skins/ to the deps for example instead of it being echo it will be extensions/echo [16:18:52] and same for skins [16:18:53] :) [16:20:15] paladox: yeah using prefixes extensions/ skins/ is probably the easiest path [16:20:27] Yep [16:20:28] I thought of having skin and extension handled separately [16:20:30] Ive added tests [16:20:33] might be overkill [16:20:37] so the tests will fail if you doint. [16:20:39] so will review your patch tomorrow [16:20:43] that will unblock a bunch of things [16:20:44] :):) [16:20:46] Yep [16:26:10] (03PS1) 10Hashar: dib: remove eth1 configuration [integration/config] - 10https://gerrit.wikimedia.org/r/324744 (https://phabricator.wikimedia.org/T113342) [16:26:32] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342#2838560 (10hashar) a:03hashar [16:27:02] speculative change [16:27:07] I am off, gotta head to the hacker local group [16:27:10] *wave* [16:52:40] twentyafterfour phabricator puppet role i managed to create a web domain for the phabricator instance [16:52:44] running just that role [16:52:52] accessible from phabricator-01.wmflabs.org [17:07:13] twentyafterfour we can now move on with T137928 , with the fixes we did for labs it has now allowed us to move on with T137928 [17:07:39] Im helping mutante with where to copy the repo's from (the path the repos are on) so they can be copied to phab2001 [17:07:43] i can setup an rsync for the repos [17:07:56] :) [17:55:35] 10Staging, 13Patch-For-Review: Create staging-db* (databases) - https://phabricator.wikimedia.org/T91545#2838849 (10fgiunchedi) [17:58:53] 10Staging, 13Patch-For-Review: Create staging-db* (databases) - https://phabricator.wikimedia.org/T91545#2838855 (10jcrespo) [18:11:25] !log adding https://gerrit.wikimedia.org/r/#/c/305536/3 to the puppet master [18:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:19:48] turned out that it is a no-op for deployment prep :( [18:20:09] !log removing https://gerrit.wikimedia.org/r/#/c/305536 from the puppet master via rebase -i (no-op for beta) [18:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:21:04] done :) [19:05:52] 10Browser-Tests-Infrastructure, 06Reading-Web-Backlog, 07Browser-Tests, 13Patch-For-Review, and 2 others: Add helper to Selenium that allows you to query whether JavaScript module has loaded - https://phabricator.wikimedia.org/T146292#2839213 (10Jdlrobson) Moving columns as the last remaining patch is for... [19:27:00] (03PS1) 10Awight: Add config for a new repository [integration/config] - 10https://gerrit.wikimedia.org/r/324779 [19:27:49] (03CR) 10Paladox: [C: 031] Add config for a new repository [integration/config] - 10https://gerrit.wikimedia.org/r/324779 (owner: 10Awight) [19:47:27] 10Deployment-Systems, 03Scap3, 06Operations: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#2839375 (10fgiunchedi) p:05High>03Normal ATM there's 5 mediawiki versions on `/var/lib/l10nupdate/caches` so I suspect something/someone is cleaning up, not sure what though [20:01:03] 06Release-Engineering-Team, 06Operations, 10Phabricator: reinstall iridium (phabricator) as phab1001 with jessie - https://phabricator.wikimedia.org/T152129#2839436 (10Dzahn) [20:01:44] 06Release-Engineering-Team, 06Operations, 10Phabricator: reinstall iridium (phabricator) as phab1001 with jessie - https://phabricator.wikimedia.org/T152129#2839452 (10Dzahn) p:05Triage>03Normal [20:12:44] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-GettingStarted, 06Operations: GettingStarted on Beta Cluster periodically loses its Redis index - https://phabricator.wikimedia.org/T100515#2839504 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi I'm not seeing the related test failing frequently, h... [20:21:41] 06Release-Engineering-Team, 10Labs-project-Phabricator, 06Operations: Setup test domain for phab2001 - https://phabricator.wikimedia.org/T152132#2839539 (10Paladox) [20:24:03] 06Release-Engineering-Team, 10Labs-project-Phabricator, 06Operations: Setup test domain for phab2001 - https://phabricator.wikimedia.org/T152132#2839556 (10Dzahn) agreed. we did it in a similar way for gerrit with "gerrit-new". but gerrit wasn't behind varnish, unlike phab. [20:24:18] 06Release-Engineering-Team, 10Labs-project-Phabricator, 06Operations: Setup test domain for phab2001 - https://phabricator.wikimedia.org/T152132#2839572 (10Paladox) [20:24:19] 06Release-Engineering-Team, 10Labs-project-Phabricator, 06Operations: Setup test domain for phab2001 - https://phabricator.wikimedia.org/T152132#2839570 (10Dzahn) please link this to the other phab2001 ticket(s) in some way [21:04:40] twentyafterfour: how about if we make phabricator-new.wm.org work and point to phab2001 as a backend for a while [21:04:53] like we did with gerrit-new during migration [21:05:06] it would be nice for staging [21:05:38] also we closed the ticket about the main::role working on labs/jessie today [21:08:04] :), +1 [21:20:11] 10Deployment-Systems, 03Scap3, 13Patch-For-Review: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#2839832 (10demon) 05Resolved>03Open Can we please get 3.4.1-1 built and uploaded? Thanks! Changelog: ``` scap (3.4.1-1) unstable; urgency=low * "scap deploy" no longer reports... [21:21:56] PROBLEM - Puppet run on deployment-pdfrendertest02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:24:26] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [21:25:35] mutante: sounds good to me [21:25:56] twentyafterfour would you know how we can setup a heira for phab2001 [21:26:08] since we would need to set the base uri, and the other things [21:26:11] to get it working [21:26:18] paladox: just need to put it in the codfw hiera tree [21:26:22] oh [21:26:27] i have https://gerrit.wikimedia.org/r/#/c/324796/ [21:26:35] for rsync to start syncing the repo's [21:26:46] twentyafterfour ^^ [21:26:58] we don't need rsync [21:27:03] Oh [21:27:09] doint we need to sync the repos [21:27:12] paladox: phabricator will take care of syncing the repos [21:27:13] from iridium to phab2001 [21:27:20] Oh [21:27:22] it git to clone them [21:27:28] Oh [21:27:39] How would we do that [21:27:42] then git push --mirror to keep them in sync [21:27:46] Oh [21:27:57] phabricator does it automatically once we configure repo clustering [21:28:06] yep we could try ^^ [21:28:26] would you be able to create a patch to do that on phab2001 please [21:28:34] domain phabricator-new.wikimedia.org is live [21:28:54] ok [21:29:03] Thanks [21:29:18] https://secure.phabricator.com/book/phabricator/article/cluster_repositories/ [21:29:39] lol, i was just looking at that [21:29:40] yep [21:29:46] that looks really nice [21:30:10] we could do it to phab2001 reimage iridium and rename it then do it back and switch. [21:32:23] twentyafterfour ./bin/repository clusterize --service [21:32:27] paladox: https://phabricator-new.wikimedia.org/ isn't online though? [21:32:37] nope [21:32:40] but the domain resolves [21:32:59] just need to get phab2001 up and running, ie we need to do the heira config for phab2001 [21:33:03] so we need to configure lvs [21:33:07] Yep [21:34:50] twentyafterfour how can we enable all the repo's since it seems that command takes one repo [21:34:57] so you would need to input once at a time [21:35:40] paladox: I'll test it with one repo and then make a script that loops through all of them enabling them one at a time [21:35:53] I'm setting it up now [21:36:06] Ok thanks [21:36:15] :) :) [21:48:32] twentyafterfour should we remove metamta.maniphest.public-create-email ? [21:48:43] it was removed in phabricator according to the warnning im getting [21:56:51] yeah [21:59:01] I wonder what to replace it with [22:01:11] twentyafterfour ^^ [22:01:53] RECOVERY - Puppet run on deployment-pdfrendertest02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:11:04] 10Deployment-Systems, 03Scap3, 13Patch-For-Review: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#2840068 (10fgiunchedi) 05Open>03Resolved @demon yep that's done (built+uploaded) [22:13:30] paladox: no need to replace it with anything I think [22:13:38] Oh ok [22:13:39] :) [22:29:01] 06Release-Engineering-Team, 06Operations, 10Phabricator, 13Patch-For-Review: Setup test domain for phab2001 - https://phabricator.wikimedia.org/T152132#2840082 (10Paladox) [22:43:07] twentyafterfour is the Almanac service setup? As that is required :) [22:59:15] ah, good to know about the repo sync [22:59:30] will not worry about the rsyncd change then [23:06:52] paladox: working on it now [23:07:03] Ok, thanks :)