[06:27:25] before I get into trying to figure out how to fix https://phabricator.wikimedia.org/T190111 - do people actually use Apache's mod_status? it adds a /server-status endpoint and `apache2ctl status` command [06:46:19] I have not used it even once since I've been here [07:26:50] erp, looks like prometheus uses it: https://gerrit.wikimedia.org/g/operations/puppet/+/09b32217a64378a586217494ad8018808636fa53/modules/prometheus/manifests/apache_exporter.pp#10 [10:12:20] disturbing you again, is production restbase on stretch or buster? [10:23:00] mostly stretch, but there are already initial buster nodes [10:24:04] so if you're replacing a node in deployment-prep, best to directly move to buster [10:26:54] Got a new guide for setting up a k8s cluster in our infra up at https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/New . It's pretty comprehensive although there will definitely be errors in it. I 'll also document all the clusters we got under Clusters/ since it seems there's a lot of confusion over that across the org. [10:43:59] I 'll just leave this here: https://bugs.mysql.com/bug.php?id=102935 [10:54:57] https://jira.mariadb.org/browse/MDEV-25112 is more relevant to us [11:37:15] akosiaris: how did you end up there? want to you our team??? [11:37:35] marostegui: twitter [15:51:11] jbond42: fwiw I think it would be fine to just drop any notion of appletalk :) [15:57:59] but how will I get my https://en.wikipedia.org/wiki/LaserWriter working via appletalk-over-ip-over-gre-over-ssh from production? [16:02:03] 🤔 [16:08:42] changing the stock /etc/services feels pretty scary :) [16:08:55] god knows how many package postinsts etc. make assumptions based on the contents of that file [16:10:27] there was at least libnss-extrausers for alternate (supplemental) passwd/group, but I don't think it supported services [16:10:45] I don't think many people care about services, as it's relatively easy to use the port [16:11:40] there's probably some other nss routes to do this without touching /etc/services too, but, yeah I kinda question the cost/benefit tradeoff in customizing /etc/services for anything really [16:12:07] having the metadata for ports we actually use and configure in puppet, I totally get that end of it [16:14:26] although /etc/services is something that has annoyed me in the past, when I've realized that some common API usage patterns in high-level languages cause you to see programs constantly re-parsing the text of /etc/services in runtime on every socket creation or whatever, and it's having to scan thousands of mostly-useless-in-practice entries. [16:15:01] bblack: it would be really nice, when you're debugging something in production, to see our service names in netstat/ss output [16:16:16] I guess, that's possible if either (a) the services file (or equiv) is updated differently on each machine or (b) we have some rule against re-use globally within our infra? [16:16:31] I'm not talking about 80/443 [16:16:33] I haven't looked, but I would guess we have cases today with the same port number serving different protocols/purposes in different clusters [16:16:39] I'm talking about port numbers on k8s hosts, for instance [16:16:42] for which they are unique [16:16:55] and it'd be nice to have a repository of these that isn't a wikitech page https://wikitech.wikimedia.org/wiki/Service_ports [16:17:17] (they are also mostly already in the service catalog hiera, but that doesn't go far outside of puppet rn) [16:18:07] yeah but if we're talking about a globally-synced-and-useful /etc/services, it would have to encompass more cases, and I'm betting there are existing conflicts to discover either way. [16:18:49] hmmm [16:21:29] in stock debian, is services something postinsts actually *edit*, or is just managed centrally and conservatively for the whole distro? [16:22:01] right, "netbase" [16:22:06] so the latter [16:23:35] if we weren't editing existing entries, it would seem more-comfortable [16:24:15] there's kind of a basic conflict of scopes here, between IANA and the OS-level and what services care about here, and how (like everyone) we haven't really care in the past about port conflicts that don't matter in practice [16:26:20] I could see a path where you minimize the risks p.void is talking about as well, and keep the stock baseline file and just template our own wmf-specific ports onto the end of the file, where the stock one ends with "# Local Services" [16:26:36] we might need to update our puppet copy of the upstream base file once in a blue moon if netbase changes, or on distro upgrades [16:27:12] and then try, as a policy going forward, to avoid conflicts between the baseline/IANA stuff and our local services [16:31:15] I donno, it's all pretty tricky to have a valid opinion about [16:31:51] there's also the conflict between how ports are "officially" assigned to protocols, but in practice they're not used that way [16:31:58] well I think j.bond will be able to tell us if there are any conflicts presently existing [16:32:23] and if there aren't, I think we should worry about what to do to them when they come up :) I don't have the impression that many people are asking IANA for port numbers nowadays anyway [16:32:52] surprisingly many people are still shoving junk protocols at IANA, but I don't think that's relevant to us for the most part [16:33:08] https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt [16:33:26] so its worht giving some scopr of why this came about. i think there are two main use cases i have heard 1) as cdanis mentioned having tools like ss netstat resolve out intrnal port names to something recognizable. the second thing it to be able to reuse services names instead of specifying port number/protocol tuples. specificly thies was requested for the caprica installation which has a [16:33:32] syntax very simmilar to what etc/services ... [16:33:34] ... provides but not quite. as such there was an ask for a puppet way to represent the /etc/services data [16:33:54] less than a year ago, for example, Siemens decided that the whole world needed port 29000 reserved for their software licensing server: [16:33:57] saltd-licensing 29000 tcp Siemens Licensing Server [Siemens_Digital_Industries_Software] [Tony_Greatorex] 2020-06-08 [16:34:11] in relation to the concern in relation to overrideing the iana ports the intention is that the puppet module will protect against that [16:34:30] or rather by default protect against that but allo users to prefer there own ports if they want [16:36:06] so when it comes to conflicts the current design is set up to prefer the default set of list that come from netbase, with an eperimental (all of this is experimental but this especuialy so) to add the name of the conflicting services as an aliase to the iana deinition [16:36:41] see lines 14-16 https://gerrit.wikimedia.org/r/c/operations/puppet/+/670917/14/modules/netbase/manifests/init.pp#1 [16:37:04] and the acompanying merge and mucng functions thatr to the hevy lifting [16:37:34] as to apple talk i think its fine to drop however it is only 4 ports so they way i have structure the module is to still add it to /.etc/servioces but ignore them for everything elses [16:38:12] so tl;dr my intention is that iana ports will be prefered and conflicts will be rejected (currently silightly ) [16:38:13] right, we could totally ignore/drop appletalk safely I imagine, as chris mentioned [16:38:35] of course netbase from debian doesn't include anywhere near all the IANA-assigned ports I think [16:38:36] yes my only hesitation there i why is it still in netbase? [16:38:42] I'm not sure what their own filtering process is [16:39:05] also im using the netbase services file from salsa to get my list of default ports [16:39:14] and then I guess we have to track netbase either way and keep up with changes, just to avoid creating some weird problem for debian packages, yeah [16:39:26] yes i think iana keeps all ports for ever looks like debian removes deprectated protocols [16:39:33] bblack: yeah, the patch includes a script to update from the packaged list [16:39:39] the services list in Debian is hand-curated I think, based on bug reports and the maintainer's discretion [16:40:10] if anything is obsolete there, let's submit a merge request or make a bug [16:43:21] "git grep ip_local_port_range" is interesting in light of this, too [16:43:35] since the few customizations we have of that, also overlap where we intend to put named service ports :) [16:44:09] arguably thats an issue with or without this module, its just with the module we may have a better way to detect it :) [16:44:13] I know the origin of the 4001 base number in the cacheproxy case, is that we knew we were using custom ports in the 3xxx range in our own edge stack stuff [16:44:27] yeah for sure, it should be fixed on the local port range side :) [16:44:40] I'm assuming the phab one just copied from cacheproxy [16:45:08] memcached still uses mysterious_sysctl heh [16:48:08] this all branches off into another side-topic too, which is why people bother tuning the local port range and worrying about port exhaustions [16:48:13] as an aside and as you mentioned the ip_local_port_range i did start a patch to try and move all of those preformcnae bits into one profile https://gerrit.wikimedia.org/r/c/operations/puppet/+/662932/3/modules/profile/manifests/performance.pp [16:48:47] I know in the traffic-edge stuff, we've worried about it in the past for the special case of connections between our layers, which can stack up lots of local TIME_WAIT sockets and run out. [16:49:25] it makes me wonder if we're in danger, in some exceptional circumstances, of running out of port numbers for e.g. a specific ats-be -> appservers.svc's singular IP, too, and things like that. [16:51:06] jbond42: I tend to be wary of those things. performance tuning via sysctls is often misguided, and often doesn't age well as kernels change, etc. Making it easier to apply such settings broadly or copy them around might be a net loss in the long run. [16:51:43] I tried to do "better" last time around by adding those huge explanatory comment blocks on a bunch of our cacheproxy ones [16:51:57] but the bottom line is even at the time they were unverified guess-work in many cases [16:53:09] bblack: the reason i strated that is because there are many instances in the current repo where people have implmented one some or all of those paramters. moving everyhitn to one place at least allows us to easily check if things are still valid, quickly tune and remove [16:53:19] I think, it would be nice if going forward we really raise the bar of experimental proof or extremely solid reasoning for introducing those [16:54:39] just in a quick scan of the cacheproxy performance.pp, I'd estimate ~1/3 of those are outdated and possibly suspect already. [16:54:48] my planning (and this was i think a friday afternoon task) was to create this profile, move anything currently doing something simlar to this classes configured in the same manner as currently. then we would have a way to better validate and introduce changes [16:54:49] even in cases where we know they helped in the past, everything has changed since they were last fixed [16:55:40] bblack: it'd be nice if we had a reasonably-comprehensive loadtest of an edge cache node and also of a whole edge cluster :) [16:55:49] e.g. our vm_dirty_ratio tuning and related bits: that actually did fix some real issues measurably. But that was probably on Varnish 3.x or 4.x (vs ats-be), on a completely different kind of SSD, on a completely different era of linux kernel [16:57:05] bblack: re: "even in cases where we know they helped in the past" swifts use of net.ipv4.tcp_tw_recycle => 1 would i think be an option to revisit for instance [16:57:17] cdanis: I hear you, but I also fear that being the "easy" answer and then metric-becomes-the-measure. It's not generally about optimizing the load on the machines, most of these. [16:57:33] sure :) [16:58:19] I'd like us to be in a world where every time someone introduces a setting like this, they have to back it up with "real" measurements and reasons that matter, and we re-examine those when relevant things change. That's just a lot more overhead than we've been equipped for in the past, maybe. [16:58:30] +1 to that! [16:58:36] documenting the "why" is the most important part, here [16:58:57] (this is true of a lot of engineering, but is especially true of twiddling arcane knobs) [17:26:18] fyi i had to use this for something today and though i would just drop the link here in case others are not familure [17:26:21] https://github.com/nwops/puppet-debugger [17:27:05] kormat: ^^^ i think this may help with some of the debugging (however it is still missing puppetdb context) [17:27:42] ok im loggin of for the day have a nice weekend all [19:57:23] https://usercontent.irccloud-cdn.com/file/9I6NwUCQ/Americas-top-gutter-services-SPAM.png [19:58:10] thought you might like that effie, et all :) [19:58:35] LOL [19:59:12] https://emoji.slack-edge.com/T1TKDB1T5/excuseme/1e60bc0f0aa9f3b7.gif [19:59:20] they infect my TV too. Not a day goes by that I don't hear about one of the 15 companies that claims to end gutter cleaning forever. they're all scams. [20:02:18] https://www.youtube.com/watch?v=Rt3zADlw63g <- best review of one of these systems ever [20:02:23] I didn't know there is such scam going around [20:08:47] it's really one of those fundamentally-hard engineering problems that seems so simple [20:09:41] you need gutters so you don't have water sheeting and/or dripping off in bad ways, because that impacts the soil around the house foundation, etc. But gutters eventually clog with junk, so you have to clean them, which is annoying and/or dangerous and/or expensive. [20:10:17] and apparently many, many people have decided they think they could invent a better system that can manage the water problem without creating a new cleaning problem. [20:11:23] and a number of them have invested a lot of time on their crackpot idea and developed it into a product, then probably realized it's not actually-better, but forged ahead and decided to make money selling it people anyways, because if it was this hard for them to figure out the problem is basically-intractable, then surely it will be hard for lots of other people to figure out that the system [20:11:29] they're selling is junk, too. [20:12:13] which is really kind of a great analog of or metaphor for a lot of what happens with various software in our industry, too! :) [20:12:49] 1) Find really hard problem everyone is annoyed by because it can't really be solved 2) Invent crackpot commercial solution that just replaces one problem with another and sell it 3) $$$$ [20:13:56] bblack, wait, are we talking gutters or containers? [20:14:32] :-) [20:14:35] hey, that's not quite fair [20:14:37] gutters *are* containers, designed to hold tree debris, water, and squirrel nests :) [20:14:57] various kinds of snake oil have a long history in many fields, not just in tech, bblack ;) [20:17:36] How come we also have gutters in Europe but I don't recall ever having to clean it.. hmm [20:18:30] https://en.wikipedia.org/wiki/Snake_oil#/media/File:Clark_Stanley's_Snake_Oil_Liniment.png [20:18:35] Ours dumped a stone on the car the other week [20:18:36] https://en.wikipedia.org/wiki/Rain_gutter#Types_of_gutter_guards [20:18:37] ^ is pretty awesome [20:19:27] yeah that wiki article is kinda questionable, since it says "Screen gutter guards are among the most common and most effective. They can be snapped on or mounted, made of metal or plastic. Micromesh gutter guards provide the most protection from small and large debris." [20:19:47] those screens/micromeshes are all the scam ones we see here in the US. they're expensive and always cause more problems than they solve. [20:19:50] paid editing?:) [20:20:45] yeah it's even already pointed out a bit in https://en.wikipedia.org/wiki/Talk:Rain_gutter [20:21:00] maybe the competitor of the foam type: https://en.wikipedia.org/w/index.php?title=Rain_gutter&type=revision&diff=992027525&oldid=981041921 ) [20:22:38] anyways, having stared at this problem as a homeowner many times (but things vary by climate, region, and construction styles!), my net take is that gutters are good, and all gutter "guards" are bad, and you just have to live with cleaning the gutters. [20:27:03] irobot makes a solution too, but I've never been willing to try it :) [20:27:04] bblack: It seems you might be interested in purchasing a gutter vaccum cleaning system - gutterprovac.com [20:27:05] https://store.irobot.com/default/looj-gutter-cleaning/irobot-looj-330/L330020.html [20:28:14] hah, the robot is nicer [20:32:46] "Just put it in containers." https://fosstodon.org/@tvass/105872702236278982 [20:39:53] http://jimbly.github.io/regex-crossword/ (just noticed it on HN) [23:09:32] Anyone know how `/etc/envoy/envoy.yaml` gets provisioned for hosts that `include ::profile::tlsproxy::envoy`? I rolled out https://gerrit.wikimedia.org/r/c/operations/puppet/+/671229 to setup envoy for wdqs test hosts `wdqs100[9,10]`, but something in the chain isn't setting up `/etc/envoy/envoy.yaml` properly [23:10:33] So for example on `wdqs1005` - a public wdqs host whose envoy, etc work as expected - its `/etc/envoy/envoy.yaml` has stuff to set up port 9601, to listen on port 443, etc [23:11:06] Whereas on `wdqs1009` it looks like it's just got the default [23:11:11] https://www.irccloud.com/pastebin/uSUZbVhm/ [23:14:33] `s/port 9601/port 9631` in the message 2-3 lines above [23:41:06] ryankemper: I think profile::tlsproxy::envoy reads from hiera? [23:44:10] I haven't yet found the definition of `profile::tlsproxy::envoy::services` in hiera for mainline wdqs but it should be somewhere [23:46:47] ohhh, it probably uses the default definition? https://gerrit.wikimedia.org/g/operations/puppet/+/production/hieradata/common/profile/tlsproxy/envoy.yaml [23:48:58] that's actually very interesting, because that file defines the default port 443 listen, for instance [23:49:01] hm [23:49:17] right, which (to your point) is absent currently [23:51:01] hm [23:53:55] As far as the roles go, the only difference between https://gerrit.wikimedia.org/g/operations/puppet/+/production/modules/role/manifests/wdqs/test.pp and https://gerrit.wikimedia.org/g/operations/puppet/+/production/modules/role/manifests/wdqs/public.pp is the inclusion of `include ::profile::lvs::realserver` in `public`, which I don't think interacts with envoy at all