[00:28:50] oki [00:29:03] is it a chat roo, pr somthing [01:11:04] YuviPanda: irccloud didn’t like me today so I’ve no backscroll… can you (again) link me to the incident timeline from earlier? [01:11:26] I promised myself I would think about this at least once more before giving up [01:12:26] andrewbogott: moment [01:34:07] 10PAWS: PAWS with bot accounts - https://phabricator.wikimedia.org/T120558#1861157 (10Legoktm) Given that MW doesn't have any way to link master account with bot account, I don't think PAWS should try either. Users should just use OAuth to login with the bot account IMO. I did that a few days ago and it worked o... [01:49:04] andrewbogott: um [01:49:07] > PING tools-worker-05.tools.eqiad.wmflabs (10.68.16.174) 56(84) bytes of data. [01:49:12] > 64 bytes from ci-jessie-wikimedia-7821.contintcloud.eqiad.wmflabs.contintcloud.eqiad.wmflabs (10.68.16.174): icmp_seq=2 ttl=64 time=0.549 ms [01:50:23] andrewbogott: same for another node [01:50:37] andrewbogott: these weren't created during the instance creation outage... [01:51:00] the ci boxes might have been [01:51:36] andrewbogott: will that take up ips already allocated? [01:52:01] I wouldn’t think so [01:52:08] yeah [01:52:20] andrewbogott: so I know that -05 and -07 were working yesterday before the outage... [01:54:32] that instance (7821) doesn’t exist anymore [01:54:37] and plus, what’s with that run-on name? [01:54:44] :( [01:54:44] contintcloud.eqiad.wmflabs.contintcloud.eqiad.wmflabs [01:54:48] yeah [01:55:17] guess I’ll dive in to the designate db [02:07:49] 6Labs, 7Nodepool: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1861191 (10Andrew) 3NEW [02:08:30] 6Labs, 7Nodepool: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1861204 (10Andrew) It's pretty clear that at some point in the process nodepool is using the fqdn (ci-jessie-wikimedia-11345.contintcloud.eqiad.wmflabs) when it should be using just the name. [02:08:47] 6Labs, 7Nodepool: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1861205 (10Andrew) [02:10:18] 10PAWS: PAWS with bot accounts - https://phabricator.wikimedia.org/T120558#1861217 (10jayvdb) The same problem exists for sysop accounts - i.e. the user-config.py `sysopnames` can not be used in PAWS. The current approach allows people to create/modify `user-config.py` in their $HOME It would be good to prev... [02:11:23] 6Labs, 10Tool-Labs: Dead link for "Directory NG" on ad bar - https://phabricator.wikimedia.org/T120793#1861219 (10Catrope) [02:12:14] 10PAWS: PAWS with bot accounts - https://phabricator.wikimedia.org/T120558#1861221 (10yuvipanda) If you look at the current user-config file (https://github.com/yuvipanda/paws/blob/master/singleuser/user-config.py) it already does this by setting usernames and other things *after* the $HOME/user-config.py file i... [03:01:21] YuviPanda: better? [03:01:51] checking [03:02:14] ping gives me right rdns but I still can't ssh [03:03:19] andrewbogott: tools-worker-07 has wrong rdns, but I can ssh [03:03:53] YuviPanda: I have a migraine and can’t really work on this anymore. [03:04:25] Your mission, should you choose to accept it, is to take those two ‘latest’ files and separate out all the leaks [03:04:39] (basically just be selecting only the contintcloud entries for instances that don’t exist) [03:05:00] andrewbogott: :( take care! I'll see if I can dig into it.. [03:05:06] and then produce a file that consists of just “ " [03:05:12] for all leaked records [03:05:20] then tomorrow I’ll try to figure out some automated way to delete them [03:05:21] there are hundreds [03:05:32] oh wow. [03:05:34] ok [03:05:40] Probably I’ll have to hack and/or fix the designate source to do this [03:05:46] we aren’t leaking now though [03:05:57] I have a vague recollection of knowing about/understanding the leak [03:05:57] that's good! [03:06:02] but right now too foggy to remember [03:06:30] andrewbogott: go away! [03:07:03] 6Labs, 10Labs-Infrastructure: Clean up leaked designate entries - https://phabricator.wikimedia.org/T120797#1861262 (10Andrew) 3NEW a:3yuvipanda [03:07:15] For your records :) [03:07:23] Thanks — catch you later [07:34:39] PROBLEM - Host tools-worker-04 is DOWN: CRITICAL - Host Unreachable (10.68.16.122) [07:34:53] yes that one is actually down [07:55:47] 6Labs, 6Discovery, 10Maps: Enable OSM Postgres machine access in labs - https://phabricator.wikimedia.org/T98382#1861466 (10akosiaris) 5Open>3Resolved a:3akosiaris I think this is done. Documentation is in https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Connecting_to_OSM_via_the_official_CL... [11:39:58] 6Labs, 10Labs-Infrastructure, 6operations, 7Puppet: ldap-yaml-enc.py fails with host_info['puppetClass'] --> KeyError: 'puppetClass' - https://phabricator.wikimedia.org/T120817#1861849 (10hashar) 3NEW [12:59:55] 10Tool-Labs-tools-Other, 7Epic: Convert all Labs tools to use cdnjs for static libraries and fonts - https://phabricator.wikimedia.org/T103934#1861960 (10Ricordisamoa) [13:26:06] 6Labs, 10MediaWiki-extensions-Newsletter: Internal error when creating new user in newsletter-test.wmflabs.org - https://phabricator.wikimedia.org/T119945#1862024 (1001tonythomas) root@newsletter-test:/# du -hs /srv/ 1.3G /srv/ [13:29:49] Can someone help me with mediawiki vargant? I cannot access the mysql database [13:37:44] 6Labs, 10Tool-Labs, 5Patch-For-Review: Redirect //stable.toolserver.org/geohack/geohack.php requests - https://phabricator.wikimedia.org/T120526#1862036 (10coren) The server already catches *.toolserver.org; so the change sufficed. Note that only HTTP is redirected (the certificate does not cover stable.too... [13:37:53] 6Labs, 10Tool-Labs, 5Patch-For-Review: Redirect //stable.toolserver.org/geohack/geohack.php requests - https://phabricator.wikimedia.org/T120526#1862037 (10coren) 5Open>3Resolved [13:46:51] bd808: Are you here? [14:01:36] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: Investigate decommissioning labcontrol2001 - https://phabricator.wikimedia.org/T118591#1862064 (10mark) a:5mark>3None labcontrol2001 is out of warranty, so we're not going to repurpose it for production use. If you want to continue using it for dev/testing... [14:06:40] 6Labs, 10Tool-Labs, 5Patch-For-Review: Redirect //stable.toolserver.org/geohack/geohack.php requests - https://phabricator.wikimedia.org/T120526#1862067 (10Nemo_bis) Thanks! [14:29:47] 6Labs, 7Nodepool: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1862090 (10hashar) Indeed the DNS has twice the domain name: ``` $ dig -x 10.68.21.236 ci-jessie-wikimedia-11400.contintcloud.eqiad.wmflabs.contintcloud.eqiad.wmflabs. $ ``` The page generated o... [14:37:51] 6Labs, 10Labs-Infrastructure: Clean up leaked designate entries - https://phabricator.wikimedia.org/T120797#1862134 (10chasemp) p:5Triage>3High [14:38:08] YuviPanda: what's the story on https://phabricator.wikimedia.org/T120797? [14:39:19] 6Labs, 10Tool-Labs: Dead link for "Directory NG" on ad bar - https://phabricator.wikimedia.org/T120793#1862139 (10chasemp) p:5Triage>3Normal [14:40:59] 6Labs, 7Nodepool, 5Patch-For-Review: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1862146 (10chasemp) p:5Triage>3Normal a:3hashar [14:42:03] 6Labs, 7Nodepool, 5Patch-For-Review: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1862147 (10hashar) From a random instance: ci-jessie-wikimedia-11406 | /etc/hostname | ci-jessie-wikimedia-11406 | hostname --fqdn | ci-jessie-wikimedia-11406.contintcloud.eq... [14:43:25] 6Labs, 10Labs-Infrastructure: Clean up leaked designate entries - https://phabricator.wikimedia.org/T120797#1862150 (10Andrew) It required a hack in the designate-api source, but I cleaned up all those leaked contintcloud records, and we don't seem to be leaking any new ones. There are probably a few other le... [14:43:41] 6Labs, 10Labs-Infrastructure: Clean up leaked designate entries - https://phabricator.wikimedia.org/T120797#1862151 (10Andrew) a:5yuvipanda>3None [14:46:36] 6Labs, 7Nodepool, 5Patch-For-Review: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1862156 (10hashar) Change got deployed `openstack server list`: | ci-jessie-wikimedia-11418 | ACTIVE | public=10.68.21.250 | | ci-jessie-wikimedia... [14:52:54] 6Labs, 10Labs-Infrastructure: Update /etc/hosts during labs instance first boot - https://phabricator.wikimedia.org/T120830#1862183 (10Andrew) 3NEW [14:58:09] 6Labs, 7Nodepool, 5Patch-For-Review: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1862192 (10hashar) The Nodepool instances rely on cloud-init and the Ec2 metadata service. And the hostname is exposed without the tenant name: ``` $ curl http://169.254.169.... [15:00:21] I can't access any instance via bastion-01. Can someone help? [15:00:41] oh, sorry my fault [15:02:58] 6Labs, 10Labs-Infrastructure, 6operations, 7Puppet: ldap-yaml-enc.py fails with host_info['puppetClass'] --> KeyError: 'puppetClass' - https://phabricator.wikimedia.org/T120817#1862208 (10Andrew) We moved the default classes out of the ldap node def and into hiera; this is probably a side-effect of that.... [15:03:21] 6Labs, 7Nodepool, 5Patch-For-Review: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1862209 (10hashar) 5Open>3Resolved Wikitech shows https://wikitech.wikimedia.org/wiki/Nova_Resource:Ci-jessie-wikimedia-11420.contintcloud.eqiad.wmflabs | Instance Name |... [15:03:33] anyone have any experience optimizing a mediawiki instance on tools to not be dog slow ? [15:04:05] 6Labs, 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 7Nodepool, and 2 others: weird double-domained DNS entries for nodepool nodes - https://phabricator.wikimedia.org/T120792#1862211 (10hashar) [15:07:59] thedj: cache everything ? :-} [15:08:18] https://www.mediawiki.org/wiki/Manual:Performance_tuning && https://www.mediawiki.org/wiki/Manual:Cache [15:11:58] http://tools.wmflabs.org/hartman/mediawiki-dev/index.php?title=Main_Page [15:12:21] currently uses: $wgMainCacheType = CACHE_ACCEL; [15:12:22] $wgParserCacheType = CACHE_DB; [15:14:05] Is the Manage Web Proxies thing broken right now? I can't see any proxies for shiny-r project. [15:20:28] bearloga: I can see the proxys for my project [15:23:39] Looks like I can't manage instances either :\ [15:25:53] bearlogo: Maybe someone removed you from that project? [15:27:29] Luke081515: maybe? I'm still listed as an admin on https://wikitech.wikimedia.org/wiki/Nova_Resource:Shiny-r but idk if that list is dynamic or manual [15:28:40] this list is automatic. Maybe we can ask other members of your project, I guess this is a project specific problem [15:28:47] YuviPanda: Can you take a look? [15:34:47] usually if I have the same issue a logout/login works [15:34:56] the authorize handling on wikitech has issues [15:38:00] chasemp: will try that, thanks [15:42:25] 6Labs, 10DBA, 5Patch-For-Review: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#1862328 (10coren) a:3jcrespo Code for the view is added, all that is missing now is the underlying data replication. @jcrespo: at your convenience, please add the `watchlist` table to replicat... [15:45:41] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 7Puppet: ldap-yaml-enc.py fails with host_info['puppetClass'] --> KeyError: 'puppetClass' - https://phabricator.wikimedia.org/T120817#1862336 (10chasemp) p:5Triage>3High [15:48:54] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 7Puppet: ldap-yaml-enc.py fails with host_info['puppetClass'] --> KeyError: 'puppetClass' - https://phabricator.wikimedia.org/T120817#1862351 (10hashar) a:3chasemp [15:50:49] chasemp: you were right! logout/login worked! thanks! [15:51:54] andrewbogott: Coren: for the LDAP stuff, neither integration or deployment-prep had the pam fix run (afaik) [15:51:58] not sure if it is an issue [15:52:21] hashar: in theory we’re going to alias the old ldap servers to the new ones once we trust them [15:52:28] so it shouldn’t matter [15:53:01] hashar: The PAM fix should not affect anything, though self-hosted puppetmaster probably want to have it run sooner than later as config will diverge increasingly until then. [15:53:26] hashar: But I thought deplotment-prep used the normal puppetmaster anyways? [15:53:30] will have to remember to poke #releng folks about it [15:53:42] role::self::puppetmaster or something yeah [15:53:52] * Coren checks. [15:53:54] with puppetmaster being deployment-puppetmaster [15:54:04] integration is similar, pm being integration-puppetmaster [15:54:23] Aha! But their puppetmaster wisely pulls at regular interval so they got the changes. :-) [15:55:02] well only sort of as instances lost labs:instance class definition by default at some point I guess.... [15:55:04] yeah they are both set to auto rebase [15:55:12] though the script has not been run on instances [15:56:06] hashar: Hm. If it rebases often enough, it will have. I did a salt run that checked for the update having been done and that ran the script. Lemme see... [15:57:05] hashar: Ah, no they haven't [run the script]. [15:57:30] That said, I can simply do another salt run and it'll catch them. [15:57:48] Since they got the required things via puppet. [15:59:32] Coren: both projects also have their own salt masters :D [15:59:44] Heh. lulz. [15:59:47] anyway I rebased deployment-prep this mornin [15:59:53] checking integration right now [16:03:58] both up-to-date \O/ [16:04:12] though puppet is broken/disabled on some instances [16:06:54] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 7Puppet: ldap-yaml-enc.py fails with host_info['puppetClass'] --> KeyError: 'puppetClass' - https://phabricator.wikimedia.org/T120817#1862387 (10Andrew) I think the ENC is broken. It should be merging custom hiera settings with the default r... [16:14:30] hashar: Allright; it's not a disaster since the changes have no semantic effect now - but it's important that it be done eventually. [16:15:01] hashar: The biggest flaw of the old way of doing things is that it misses on any security changes coming from upstream. [16:15:43] (It also, less importantly but very visibly, breaks auxilarry things - that is why motd broke on Trusty) [16:23:53] Coren: so not too much to worry about and that doesn't seem super urgent. Will look at running it this week [16:24:08] hashar: Definitely not urgent. [16:30:26] YuviPanda: Once you are awake and lucid, ping me for a brief conversation re puppet cleanup of lab roles. [16:31:49] Coren: will you join us for our ldap window in 30 minutes? The ‘LDAP user tools’ section could use a volunteer. https://etherpad.wikimedia.org/p/opendj-migration [16:32:34] andrewbogott: Sounds like a plan to me. [16:33:56] thanks. I just now duplicated that section since I think we’re going to have separate prod/labs phases [16:37:04] andrewbogott: Got a question in the labs section though; what do you mean by " Change /etc/ldap/ldap.conf on all labs instances"? [16:39:21] Coren: that’s redundant with the puppet patch, you can remove that line [17:16:53] is here someone who can help me with mediawiki vargant? [17:21:05] I'm trying to create a new web proxy for a labs instances, and I keep getting a 'failed' message. [17:21:59] ragesoss: there's on ongoing migration of the LDAP servers, they are currently set to read-only [17:22:16] okay, cool. [17:22:29] so I should just wait until that is over and try again. [17:22:52] ragesoss: yep — it’ll be a bit, maybe try in 90 mins or so [17:23:07] thanks much! [17:23:26] see https://lists.wikimedia.org/pipermail/wikitech-l/2015-December/084207.html for further details [17:25:37] !log rcm wiki-rcm: synchronized config at 16:15 UTC [17:25:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL, Master [17:34:39] !log ores Deployed 1ad37c5 with ores:b745570, revscoring:, editquality:b41b7c1, wikiclass:bbfa9ce, and wb-vandalism:1075596 [17:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [18:03:46] !log help [18:03:46] I am a logbot running on tools-exec-1219. [18:03:46] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [18:03:46] To log a message, type !log . [18:03:47] just looking [18:16:51] It doesn't listen to everyone, does it? [18:16:54] !log help [18:16:55] I am a logbot running on tools-exec-1219. [18:16:55] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [18:16:55] To log a message, type !log . [18:17:02] Oh, it does. :o Okay. [18:31:26] 6Labs, 6operations, 10ops-eqiad: setup promethium in eqiad in support of T95185 - https://phabricator.wikimedia.org/T120262#1862850 (10Cmjohnson) moved to b3 connected asw-b-eqiad ge-3/0/19 updated vlan to labs-b updated dns https://gerrit.wikimedia.org/r/#/c/257654/1 currently it's set to install lvm.cfg... [18:51:56] randomly just noticed, instances can be assigned to labvirt1010 but it's not listed in ganglia [18:59:00] 10Tool-Labs-tools-Other, 6Community-Tech, 6Community-Tech-fixes, 7Tracking: Improving Magnus' tools (tracking) - https://phabricator.wikimedia.org/T115537#1863013 (10DannyH) [19:01:38] YuviPanda: can you have a look at https://phabricator.wikimedia.org/T120817 ? [19:02:22] oo yes [19:04:02] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 7Puppet: ldap-yaml-enc.py fails with host_info['puppetClass'] --> KeyError: 'puppetClass' - https://phabricator.wikimedia.org/T120817#1863031 (10chasemp) >>! In T120817#1862387, @Andrew wrote: > I think the ENC is broken. It should be mergin... [19:06:02] chasemp: hey! were they getting no role::labs::instance applied at all? [19:06:05] chasemp: that's defined in site.pp [19:07:22] deployment-tin.deployment-prep.eqiad.wmflabs was not afaik but then again puppet was broken there with bad ENC anyways so it could be a consequence of that BUUUT...we had a discussion about puppet master self in general and the fact that you could easily screw up that site.pp fallthrough, possibly even legit [19:07:36] and it may not be the best candidate for the wild west that is self hosted puppet master [19:07:37] ikd [19:07:39] idk even [19:07:50] I... don't understand what you just said. [19:07:55] :D [19:08:10] :) [19:08:16] chasemp: from that, the problem was only if you added *more* things to site.pp [19:08:28] actually [19:08:30] wait [19:08:33] ok slowing down [19:08:35] I know how to fix this! [19:08:41] deployment-prep needs not this ENC [19:08:44] so the ENC lookup for the labs host done on the labs project puppet master was failing [19:08:45] since it isn't using it at all [19:08:50] so it blocked puppet entirely [19:09:08] but it was failing with no roles applied maybe from the migration from ldap to maniphest idk [19:09:23] no, I killed that a few weeks ago [19:09:23] and it was getting nothing entirely I assume because of ENC breakage [19:09:35] so role::labs::instance is applied from site.pp [19:09:38] rather than from LDAP [19:09:40] right [19:09:49] and I tested to make sure that puppet runs fine [19:09:51] but that file itself is subject to change right on a self hosted pupppet master? [19:09:54] we can't rely on it can we? [19:10:01] what do you mean by 'that file' [19:10:06] site.pp [19:10:20] 'change' as in 'people can take out role::labs::instance?' [19:10:27] or do anything really [19:10:28] if they do their instance is fucked and not much we can do :D [19:10:30] yeah [19:10:39] but if you use puppetmaster self you get a lot more power with your responsibility [19:10:44] there isn't much we can do about that [19:10:46] well yes but my point is the idea is to have a master you can manipulate and changing that [19:10:48] would be a stretch [19:10:56] manipulate what? [19:11:01] so it may be more persistent to not have it be the source of truth in this case [19:11:05] manipulate the puppet master configuration [19:11:26] well, with role::labs::instance in wikitech/ldap you can accidentally uncheck that box too from the configure page... [19:11:32] whicch applied to all hosts [19:11:51] and I also am not sure how that relates to this bug at all? this is a crappy ENC I wrote that only 2 projects are using (integration and staging) [19:11:54] oh — was the issue that people removed that setting from site.pp? [19:12:19] no let's backup as I'm off in left field you guys I think [19:12:33] the ENC was broken not sure how you want to fix it is first course I imagine [19:12:55] am pretty sure andrewbogott's patch will fix the ENC [19:13:01] also why can't I ssh to deployment-puppetmaster?! [19:13:19] YuviPanda: ldap is overloaded, probably breaking auth [19:13:19] could be ldap [19:13:24] it's having issues [19:13:24] moritz is fixing I think [19:13:27] ah [19:13:29] ok [19:13:32] I'll just wait it out then [19:13:47] alright we can totally ignore the no applied role via ldap case [19:13:51] I'm not opposed to that [19:14:02] but we can't guarantee even slightly the role applied via site.pp on self hosted master [19:14:09] by nature of it beign a self hosted master [19:14:20] it may make sense to just hard code that in the ENC as such [19:14:28] and then there is never a no role applied case at all anyways [19:14:30] but the ENC isn't used by all self hosted puppetmsaters... [19:14:57] I hate self hosted puppet master use case [19:15:10] I hate teh code :D I'm killing it bit by bit! [19:15:24] > but we can't guarantee even slightly the role applied via site.pp on self hosted master [19:15:25] the whole use case is just a box of worms with so little value [19:15:29] chasemp: ^ I don't understand that [19:15:49] chasemp: I disagree about 'little value' [19:15:59] you can test everything you need via puppet apply [19:16:03] except exported resources [19:16:05] chasemp: maybe for you but you have +2 on operations/puppet [19:16:16] so all this silly dance is just for that which isn't being used in labs [19:16:28] chasemp: no [19:16:39] we regularly have cherry picked patches [19:16:57] it's tangential here to the extreme, but sure we could do that in another way but we aren't so it doesn't matter [19:17:05] if I change something and test it with puppet apply then the puppet agent comes along and puts it back the way it was [19:17:32] assuming that's setup to happen then well yes [19:17:54] so without self hosted puppetmaster I can't override anything that's in operations/puppet [19:18:26] you are not understanding what I'm lamenting at all [19:18:40] the disconnect with using puppet master ot manage instances and puppet masters to disconnect instances for testing [19:18:44] simultaneously [19:19:05] I don't have an opinion at all on the cases you are communicating [19:19:17] https://wikitech.wikimedia.org/w/index.php?title=Hiera%3ADeployment-prep&type=revision&diff=221373&oldid=211419 should fix deployment-prep [19:19:21] at least for this discussion [19:19:25] once puppet runs on the puppetmaster [19:19:37] chasemp: ok I guess I don't understand [19:20:08] > the disconnect with using puppet master ot manage instances and puppet masters to disconnect instances for testing [19:20:15] I don't understand what that means too, chasemp [19:20:17] YuviPanda: was that to remove the enc entirely and be done w/ it? [19:20:18] I you mean managing the puppetmaster with another puppetmaster? [19:20:32] well yes and no [19:20:35] so we have an instance in labs [19:20:41] and it's on the primary puppet master [19:20:44] and all is well [19:20:59] and then someone wants to test some ops/puppet crap and teh best way to do that is to setup a limited scope puppet master [19:21:03] and to move the client over to it [19:21:03] chasemp: this particular ENC is toread files from nodes/labs/ in ops/puppet and 'merge' that with LDAP. since there's nothing for deployment/prep in nodes/labs, it was just using LDAP. might as well directly point it at LDAP. [19:21:26] YuviPanda: understood, is that used anywhere else out of curiousity? [19:21:41] chasemp: in the staging project and in the integration project, but not sure if integration actually is using it [19:21:44] chasemp: staging is [19:22:09] may be it has the same breakage then and requires the allowance for no ldap roles? [19:22:15] even though it hasn't hit I guess [19:22:36] chasemp: yes, and i think andrewbogott's patch is the right thing... [19:22:51] I thik hashar had a patch [19:22:53] I think hashar wrote that, actually? [19:22:55] is that the one you mean? [19:23:08] what ? [19:23:10] bah [19:23:12] I clearly can not read [19:23:17] yes hasha's patch [19:23:39] ah it was abandonned actually, can you un-abandon things? [19:23:59] chasemp: since for all labs instnaces now, role::labs::instance comes from site.pp and anything additional comes from the LDAP 'terminus' thingy (aka if there are no additional roles LDAP terminus returns []) [19:24:03] we should mimic that here [19:24:08] well I just went around the KeyError with a default [19:24:14] yes [19:24:19] ie dict.get( 'somekey', [] ) [19:24:21] YuviPanda: but not on all instances is it the same site.pp correct? [19:24:37] on some it's a site.pp on 'bobs puppet master' and on others it's from the main labcontrol site.pp [19:24:38] but chase mentionned we should also always add role::labs::instance and went with a patch doing that [19:24:46] but handling the key error with try / except [19:24:53] > YuviPanda: but not on all instances is it the same site.pp correct? [19:24:57] I don't understand that either [19:25:15] are you trying to cover the use case where someone hacks up site.pp to remove role::labs::instance? [19:25:35] 10PAWS, 10pywikibot-core: Install developer requirements into PAWS - https://phabricator.wikimedia.org/T120860#1863099 (10jayvdb) 3NEW [19:25:38] YuviPanda: or maybe out of date puppetmasters? [19:25:43] * andrewbogott thinks that case should be permitted [19:25:51] twentyafterfour: right, but all those people are just shooting themselves in the foot [19:26:02] right but it's still our problem? [19:26:08] twentyafterfour: role::puppet::self auto-updates itself every 10 minutes so you need to go out of your way to turn that off [19:26:13] there should be some level of global always include [19:26:16] so if you go out of your way to turn that off and then fuck stuff up... [19:26:17] for dns changes and ldap things [19:26:24] and right now we do not have that covered well [19:26:28] chasemp: yes, but this is just a small ENC used in two projects only... [19:26:41] I totally get that isn't the right place [19:26:48] I'm not advocating that patch at all [19:26:57] yes, so I think there's like 4-5 different conversations mixed up here... [19:27:01] the convo shifted to self hosted puppet master theory with twentyafterfour's questions [19:27:03] yes [19:27:05] ah [19:27:08] ok [19:27:17] https://phabricator.wikimedia.org/T120159 is my only feelings on that [19:27:55] afa that ENC or that patch or whatever if it was used everywhere I might want to force labs::instance on ppl that way [19:27:56] sorry I didn't mean to derail this conversation [19:28:44] chasemp: people have root on things, so you can't really force anything. [19:29:19] it's true but you can make it so ingrained as to be ridiculous to remove and not have the removal be in the path of any normal (maybe legit) use case [19:29:22] if someone explicitly removes an include from site.pp, IMO they got what's coming for them. if you go out of your way to disable auto updates in your self hosted puppetmaster and do not update it yourself, you have what is coming for you [19:29:40] chasemp: I don't think as a team of 4ish we can support that use case :) [19:29:53] so we should allow people to shoot themselves in the foot if they choose to, as long the default isn't foot shooty [19:30:04] wait [19:30:11] I misunderstood what you're saying [19:30:21] I'm just saying it's all pointless. if people want to shoot themselves in the foot they will. [19:30:28] we can't really put safeguards areound it [19:30:48] we can't prevent bad faith use for sure [19:30:58] and even if we do, this patch is completely unrelated :D the enc requires you to get stuff merged into ops/puppet ot actually use it... [19:31:01] but atm I would say site.pp is a valid file to modify on a self hosted master [19:31:12] and doing so can have bizarre more globally labsy consequences [19:31:13] and that's not ideal [19:31:19] 'global consequences'? [19:31:37] you no longer get things all instances are meant to have regardless of what project they are in if you remove the wrong line [19:32:03] yeah, but that's if you point gun at foot and pull trigger.... [19:32:04] chasemp: so you're suggesting that we separate the base labs roles from stuff people might want to customize? [19:32:24] twentyafterfour: yu can already customize it. just not remove the include role::labs::instance [19:32:32] YuviPanda: yes agreed people can screw themselves over but we make it very easy to do is all I'm suggesting [19:32:35] which you could do earlier by unchecking a tickbox in 'configure' [19:32:38] probably too easy [19:32:43] chasemp: no, I think site.pp is *far* harder than the previous setup [19:32:51] earlier you just had to accidentally uncheck one box in the configure page [19:32:53] now [19:32:58] well relativeness of foot shooting isn't a good metric [19:33:02] as it could go from terrible to bad [19:33:04] and still be bad [19:33:05] lol [19:33:25] I don't think this is a very useful use case, but anyway, it's unrelated to the patch and I'm not sure what actionables we can get out of it [19:33:43] totally unrelated and none atm [19:33:48] right [19:33:50] ok [19:33:55] chasemp: I do get what you're saying now, at least ;) [19:34:02] and I agree [19:34:08] can you remove your -1 from https://gerrit.wikimedia.org/r/#/c/257606/ [19:34:20] I should kill that ENC too, staging is dead :( [19:36:14] Minion did not return. [No response] [19:36:14] ` [19:36:31] so how do you guys ops keep your sanity with salt being barely reliable ? [19:36:53] barely reliable? you mean totally unreliable? [19:36:57] by never using salt :D [19:37:03] by hating salt and that^ [19:37:08] yeah [19:37:23] time to have scap support running arbitrary commands [19:37:30] oh dear god no [19:37:46] :) [19:37:48] scap run 'whatever' [19:37:53] good news: python's ssh client lib has been updated to support hmac256 so something like fabric should be usable again? [19:38:03] cause 99% my commands are just salt --timeout 10 --show-timeout '*' cmd.run [19:38:08] do you mean paramiko? twentyafterfour [19:38:09] hashar: yes scap run 'whatever' is something I already built actually [19:38:14] chasemp: yeah [19:38:19] ah interesting [19:38:22] I thought it was mostly dead [19:38:23] 6Labs, 6operations, 10ops-eqiad: setup promethium in eqiad in support of T95185 - https://phabricator.wikimedia.org/T120262#1863190 (10Cmjohnson) a:5Cmjohnson>3Andrew OS is installed...no puppet certs. Assigning to @andrew [19:38:25] * YuviPanda uses pssh with a small generator script [19:38:26] chasemp: not dead [19:38:39] YuviPanda: did you ever check out http://mig.mozilla.org/ [19:38:48] not suggesting it but the model is super like mcollective [19:38:54] and it's pretty neat [19:39:19] nah, I just rage at salt and make snarky comments [19:39:50] it's become this messy problem that nobody (esp. me!) wants to touch [19:40:06] I was waiting for apergo's to declare salt mental bankruptsy [19:40:15] and then I was going ot mcollectie up on labs and ignore the rest [19:40:19] I guess [19:40:29] it's all terrible [19:40:35] all software sucks [19:40:54] I am going to move to fabric and ansible [19:40:58] well yes :) "All models are bad, some are useful" [19:40:59] everything is DOOOOMED [19:41:33] 26710 ? Sl 0:17 /usr/bin/python /usr/bin/salt-minion -d [19:41:34] 28473 ? Ssl 0:17 /usr/bin/python /usr/bin/salt-minion [19:41:37] ^^^top cause of issues [19:41:44] we end up having some dupe process [19:41:46] chasemp: mig looks neat [19:42:25] hashar: twentyafterfour https://github.com/yuvipanda/personal-wiki/blob/master/project-dsh-generator.py [19:42:28] I use that with pssh [19:42:41] and https://github.com/yuvipanda/personal-wiki/blob/master/tools-dsh-generator.py when I want to do something tools specific [19:42:46] super low overhead and just works [19:42:56] YuviPanda: interesting [19:42:59] I may steal it [19:43:17] chasemp: :D it has an option to generate a list of *all* labs hosts too [19:43:22] has saved me in the past [19:44:31] well [19:46:24] YuviPanda: that's awesome, that's a feature I should add to iscap [19:46:33] I should just fork iscap into it's own tool by now [19:46:46] that is..make it a separate tool instead of part of scap [20:06:24] just checking...a self hosted puppetmaster in labs should have the ldap-yaml-enc.py script? [20:06:57]