[08:08:55] I'll reimage bast1002 starting in ~ half an hour, please switch to a different bastion in the mean time [08:10:12] i assume thats prod? [08:14:13] yeah, that's one of the bastions for prod access, nothing changes for accessing Cloud VPS of Toolforge [08:14:39] one alternative is to switch to bast2002.wikimedia.org [08:16:58] thanks just wanted to confirm :), good luck [08:35:23] starting now [09:29:03] moritzm: I just opened T275599 [09:29:04] T275599: debmonitor: returns proxy error when user is in too many groups - https://phabricator.wikimedia.org/T275599 [09:29:31] ack, thanks. didn't get to it on Friday, but will merge a fix in the next days [09:30:09] great, thanks! [09:31:44] bast1002 is up again [09:37:05] \o/ [09:37:08] thanks! [09:42:19] took the opportunity to update the ssh config and use ProxyJump instead of ProxyCommand, looks cleaner now :) [10:23:01] woo hoo [12:02:11] dcaro: now that you mentioned it I did the same, thanks [12:40:53] not finished yet, but that's a good beginners introduction podcast to BGP, peering/transit, etc https://blog.ipspace.net/2020/06/bgp-navel-gazing.html [15:08:53] puppet style question: is it allowed to use lookup() in puppet functions? [15:09:55] kormat: no, see https://wikitech.wikimedia.org/wiki/Puppet_coding#Hiera [15:09:55] AIUI the only place lookup() is technically allowed is in the arguments given to profile classes [15:09:59] CI should also vote -1 [15:10:03] there are many exceptions in the codebase ofc [15:10:09] IIRC [15:10:41] (and said exceptions mostly have the wmfstyle linter disabled in a line comment) [15:10:46] mmph. so instead i'll need to put the lookup in every profile that calls the function, and pass in the hash as a param [15:11:04] or make an exception ;) [15:11:08] that partially defeats the purpose of making a function to not repeat this code multiple times [15:11:33] IMO don't let the style guide get in the way of doing the right thing when it makes sense [15:11:43] kormat: what's the use case? [15:11:46] cdanis: it's puppet, the only right thing involves napalm [15:12:17] kormat: we tried that with Arzhel, didn't work well with netbox how we wanted :-P (https://github.com/napalm-automation/napalm ) [15:12:34] volans: https://phabricator.wikimedia.org/T275497#6856476. i'm defining a hash in hiera that contains an entry per section, with 2 parameters in each entry [15:13:00] i'd like to have a few small functions to do lookups in the hiera hash and provide a simple answer to the caller [15:14:59] e.g. an `is_in_writeable_dc` function. [15:14:59] there's some precedent of that with the services hiera block [15:15:28] we do have some cases of lookupvar called in wmflib fwiw, I'd 302 to jb.ond (but he's out today) [15:16:24] volans: i don't know lookupvar. can it do "look up this variable, if the value is mw_primary, then return the value of mediawiki::state('primary_dc')"? [15:19:05] not by itself ofc, and to have them in scope you still need to pass them I think so maybe not useful [15:21:49] you're right, getting my hopes up like that was just foolish [15:22:51] Now, i know nothing about puppet, but just an idea, what if you looked up the mw_primary and mediawiki::state('primary_dc'), then combined that into some sort of string or something in a function, and then used the function for whatever you needed? [15:22:56] kormat: you said puppet and simple in the same sentence, that's just way too hopeful in itself [15:24:32] feel free to ignore me, if what I said is impossible or entirely not what you needed [15:24:59] RhinosF1: πŸ’― [15:25:51] kormat: puppet is an amazing tool when it works but getting there normally leaves me wondering if I'm speaking the right language by the end [15:29:00] that is true of every configuration management tool I've ever used [15:30:03] <_joe_> yeah no one invented one that doesn't suck [15:30:13] <_joe_> and I frankly think it's basically impossible to do [15:30:16] the problem space doesn't allow a good solution [15:31:12] <_joe_> also puppet became much easier to write (and less to debug) over the last few years [15:31:29] do you know if puppet is broken on alert1001? [15:31:45] I got an "Error 500 on SERVER" [15:32:12] modules/monitoring/functions/build_notes_url.pp, line: 22, column: 13 [15:32:26] <_joe_> kormat: I'll take a look at your problem in a few [15:32:30] https://puppetboard.wikimedia.org/node/alert1001.wikimedia.org [15:32:40] since 14:05:10 [15:32:42] <_joe_> I think I had that problem already, and somehow solved it basically [15:33:01] Error while evaluating a Function Call, The $dashboard_links and $notes_links URLs must not be URL-encoded (file: /etc/puppet/modules/monitoring/functions/build_notes_url.pp, line: 22, column: 13) (file: /etc/puppet/modules/profile/manifests/mediawiki/alerts.pp, line: 46 [15:34:01] <_joe_> effie: ^^ [15:34:33] not sure how easy it is to add a spec test for that, but, it should have been caught by pcc [15:34:35] sigh, when it wasnt url encoded, it didn' like it [15:34:41] when it is, it still doesnt like it [15:35:02] which is the patch, I may be blind, but don't see it [15:35:11] volans: I will push a patch to fix it [15:35:17] sorry I didn't see it [15:35:21] mybad [15:35:32] np [15:35:37] is it this? https://gerrit.wikimedia.org/r/c/operations/puppet/+/666614/3/modules/profile/manifests/mediawiki/alerts.pp [15:37:19] O, I see, I was looking at the function called, not the caller [15:37:40] s/function/resource/ [15:38:43] jynus: yes that is the patch tha makes this complaint [15:42:50] <_joe_> kormat: so you want to define the data structure in hiera, correct? [15:43:08] <_joe_> or well, in a specific place in puppet [15:43:16] <_joe_> and retrieve it from a function [15:43:24] <_joe_> we have precedent, IIRC, let me find it [15:43:48] _joe_: the service catalog functions :) [15:44:25] <_joe_> cdanis: it's a bit different in terms of usage, but yes [15:44:45] <_joe_> I was thinking of https://github1s.com/wikimedia/puppet/blob/HEAD/modules/role/lib/puppet/parser/functions/kafka_cluster_name.rb [15:45:29] <_joe_> (that will need to be moved to call lookup btw) [15:47:08] <_joe_> cdanis: the catalog functions use loadyaml() actually [15:47:16] hah [15:47:32] <_joe_> see https://github1s.com/wikimedia/puppet/blob/HEAD/modules/wmflib/functions/service/fetch.pp [15:47:33] Um, what is the equivalent of `racadm config -g cfgServerInfo -o cfgServerFirstBootDevice PXE` with modern idracs? That command doesn't work anymore. [15:47:52] <_joe_> klausman: I think we have updated instructions on wikitech [15:48:07] <_joe_> you just have to find the right revision of the platform specific docs [15:48:21] https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/Dell_Documentation#Reboot_and_boot_from_network_then_console shows the old stuff still [15:48:43] <_joe_> oh that's been completely reorganized [15:48:56] Great! Where? [15:49:05] <_joe_> I don't know! [15:49:09] haha [15:49:24] <_joe_> I meant the platform-specific docs have been reorganized, we used to have multiple pages for dell hardware [15:51:09] godog: [15:51:13] And dell's docs that I can find are either a) vague or b) paywalled. [15:51:18] <_joe_> yeah [15:51:19] <_joe_> :/ [15:51:37] I wonder if my .edu can get past some of the paywalls [15:51:42] <_joe_> klausman: I'd go ask in #-dcops [15:53:21] done. [15:56:14] effie: "you called?" [15:57:03] godog: yes I am waiting on a pcc [15:57:07] give me 1s [15:58:51] I am wondering if this is a CI issue [15:58:56] I am testing running pcc for https://gerrit.wikimedia.org/r/c/operations/puppet/+/666663/ [15:59:33] and I am still getting https://puppet-compiler.wmflabs.org/compiler1002/28199/alert1001.wikimedia.org/prod.alert1001.wikimedia.org.err [15:59:43] which is the error I am trying to fix [16:02:54] interesting, I don't understand it either [16:06:24] I think it is a CI problem than an actual one [16:06:50] are we sure that will actually happen, and it is not marking that is the "old" error? [16:07:38] I'd say to merge as is to check, and then refine [16:08:01] or with the previous value, up to you [16:12:44] if godog says yes [16:12:46] I will do so [16:12:50] ofc [16:12:52] effie: that's the production error. your patch (666663) looks like it resolves the error. [16:13:14] shdubsh: since you are here, then I will merge [16:13:21] +1 [16:13:29] awesome, thanks ! [16:13:41] effie: as shdubsh says you were looking at the wrong file -- https://puppet-compiler.wmflabs.org/compiler1002/28199/alert1001.wikimedia.org/change.alert1001.wikimedia.org.err verifies it is fixed [16:14:19] sorry, I didn't notice from the link earlier [16:14:39] I used the link that pcc from my cmd gave [16:14:41] mmmm [16:14:45] yes [16:14:50] and then you clicked 'production errors/warnings' [16:14:55] not 'change errors/warnings' [16:14:57] :) [16:14:58] ah ! [16:15:15] maybe "before" and "after" are better names [16:15:15] I didn't noticed I clicked production [16:15:26] yeah yeah, too much ado for nothing [16:15:28] thank you [16:16:17] I am running puppet on alert1001 now [16:16:42] jynus: I am running too [16:16:48] oh [16:16:51] probably mine will wait for yours [16:16:56] FWIW, Icinga does its own url mangling which is why that gate exists. A `%20` in the url would be itself url-escaped. [16:17:30] lots of stagged changes applying now [16:17:49] Profile::Mediawiki::Alerts including [16:17:49] check_prometheus rules are also, like, three layers of indirection of quoting, and that is also somewhat unavoidable :/ [16:18:02] yay! [16:18:12] thank you, effie, it worked [16:18:20] and others that helped too [16:18:53] shdubsh: thanks [16:19:16] check icinga, there may be changes that may not be applied for some time [16:23:06] _joe_: hey, so I have a question about the partman recipe for kubernetes nodes (partman/custom/kubernetes-node.cfg). Is it *meant* to be semi-manual? [16:24:46] (that's my theory based on the fact that it doesn't create any filesystems/mountpoints) [16:36:28] <_joe_> klausman: 301 to jayme (in a meeting) [16:36:58] * jayme looks up [16:37:36] 408 [16:38:12] Do you *want* me to spam requests until one goes through? :) [16:38:34] eheh, nono. I'll have a look [16:39:29] thanks :) [16:42:09] is there anything specific you are missing? [16:42:52] It should create / ofc and all the docker volumes will be created by docker in lvm directly [16:44:49] please bear with me as I don't really speak partman [16:47:12] hmm..but compared to other files is indeed does not look as if it would create a root-fs. I wonder why it did the last time we set up nodes... [16:47:42] did you actually try klausman? And ended up without root-fs? [16:48:34] No, I just hit return and it seems to have made a good install [16:50:39] so you needed to hit return in the installer interface? [16:51:01] <_joe_> yeah I think we had that issue originally, "to be fixed" later [16:51:11] <_joe_> and apparently akosiaris never did [16:51:19] Alex has mentioned that this may be due to the machines getting Buster, not Stretch. [16:51:28] <_joe_> also that, yes [16:51:41] That in turn is a bit of a bigger topic, since for AMD GPUs, using an ancient kernel is not so greatβ„’ [16:52:07] <_joe_> oh I think we should move to buster too ftr [16:52:18] <_joe_> although buster has already an ancient kernel overall :) [16:52:21] ah, okay. Yeah...we're not using buster currently - unfortunately :/ [16:52:37] _joe_ oh it was fixed, if you reimage a kubernetes node now it's hands off [16:52:48] but apparently it doesn't work on buster [16:53:00] <_joe_> heh [16:53:07] <_joe_> good grief, partman [16:53:31] <_joe_> klausman: thankfully we have one of the greatest partman experts worldwide in our ranks [16:53:42] * _joe_ stares at kormat [16:53:44] you're being really cruel to her today _joe_ [16:53:50] Oh I've been talking to her behind the scenes already [16:53:53] <_joe_> cdanis: this is *true* [16:54:02] some truths are better left unsaid [16:54:05] <_joe_> cdanis: there are 3 people who understand partman in the world [16:54:10] <_joe_> 2 are the authors [16:54:51] <_joe_> cdanis: the mtail thing was a cheap joke, but this is actual admiration [16:55:48] * jayme returns to grafana clicking [16:57:40] jynus: FYI as I'm not sure if you're aware, database-backups-snapshots.service is in failed state on cumin1001 [16:59:23] kubernetes -node.cfg btw does create filesystems and mountpoints. See https://github.com/wikimedia/puppet/blob/production/modules/install_server/files/autoinstall/partman/custom/kubernetes-node.cfg#L29 [16:59:27] ERROR - Backup process completed, but some backups finished with error codes, it triggers the systemd check, not sure if it's the desired behaviour [17:00:31] I 'd happily never have to deal again with partman for the rest of my life though [17:00:42] * akosiaris sorry kormat :-( [17:02:24] πŸ₯€ [17:06:58] πŸ§‘β€πŸŒΎ [17:07:17] the amount of emojis that exist continues to amaze me [17:07:41] I still like my ascii emotes [17:07:49] also, no way all these are used only the way they were justified as [17:08:34] no they arent ;) [17:41:31] <_joe_> def not :D [17:42:54] what te heck does the "farmer emoji" have to do with anything? [17:51:38] farmers-only :P [17:58:25] <_joe_> 🀌 [18:16:14] apergos: farmer emoji == all tech is terrible; time to change career and be a farmer instead [18:16:23] hahahahahaha [18:16:32] because the farmers are doing so well career-wise... [18:19:13] when i had the "all tech is terrible" moment my alternative job was always "zookeeper", you know, feed the penguins sounded better than farming [18:23:56] nah it should always be a goat [18:24:40] you wouldn't be limited to one or the other [18:28:13] I gotta use sprintf() or something in puppet to add leading zeros to Integers, if I want to use them in systemd calendar events/timers: [18:28:20] '/usr/bin/systemd-analyze calendar *-*-1 0:0:00' returned 1: Failed to parse calendar specification ' *-*-1 0:0:00': [18:28:52] turns into a puppet error [18:28:53] Original form: *-*-1 0:0:00 [18:28:53] Normalized form: *-*-01 00:00:00 [18:30:01] mutante: are you sure it isn't the leading space before the first asterisk? [18:30:07] when running the systemd-analyze command locally, it just normalizes it and still understands "from now: 4 days left" [18:30:19] in puppet code.. it fails [18:30:21] '*-*-1 0:0:00' works for me but ' *-*-1 0:0:00' doesn't [18:31:01] rzl: ooh.. yes, looks like it [18:31:14] thanks, let me try fixing that [18:33:46] still requires something ugly to allow for either leaving $weekday completely undefined or defining but and get the spaces right [18:34:19] but was already helpful to say it out loud [18:34:43] join() with a space delimiter, no? [18:34:51] but, sweet, glad it worked πŸ‘ [18:39:42] hello hello, anybody knows what's happening to restbase? [18:40:34] it seems throttling busts of wikitext to html [18:42:52] I am very ignorant, does restbase call parsoid? I don't see pressure from the RED dashboard [18:45:04] if the page doesn't have a current rendered revision then I expect parsoid would be called one way or another [18:45:12] *revision stored in restbase that is [18:45:55] and I assume there would be an attempt somewhere to check the parser cache to see if something good is in there first before trying a re-render but I don't know what the flow of that would be [18:47:46] I am checking logstash for the throttling :) [18:48:25] as we rememmber from a few days ago restbase will retry a few times on failure [18:48:31] so there's that as well [18:50:01] seems all traffic for bcl.wikipedia.org, with VisualEditor as UA [18:52:36] an example is https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-restbase-2021.02.24?id=H0tT1XcBsCn0xdb8Djvf [18:55:27] now, how to get to external ips is not clear to me [18:55:40] (without waiting for analytics webrequest data) [18:55:51] I checked sampled-1000 on centrallog but cant find much [18:56:05] why is the root req url from another wiki [18:56:45] translation tools? [18:58:28] huh [18:58:49] yeah no idea the difference [18:59:09] also the UA is Visual Editor [19:00:13] does VE have some translation aid to load an article from one wiki for translation to another? I know that seems odd, but [19:00:32] no idea [19:00:56] need to go now sorry, will check later (the impact was brief, nothing ongoing) [19:04:14] yeah 20 minutes of quite still, so maybe we're ok [19:04:26] *quiet [19:04:44] I'm going to drift off too as it's definitely my late evening here [19:04:46] this is the second time in recent memory that I've incidentally noticed these restbase-involved ::ffff:10.64.0.100 client IPs [19:05:01] https://www.mediawiki.org/wiki/Content_translation ? [19:05:21] I still think they seem 'wrong', and I'm not sure why they're in that form. it's likely a mysterious misconfiguration, and it could have some real impact [19:05:43] I wonder how the rate gets limited anyways, on what basis [19:05:52] dang it I was going to drift off :-P [19:06:23] (because we probably have tooling that parses XFF and client IPs (maybe even for ratelimiting?) and doesn't recognize these IPs as legitimately-internal the way they would the corresponding true IPv4 or the appropriate standard wmf-mapped ipv6) [19:07:12] yeah tat would be not great [19:07:34] these look the result of something using ipv4 internal source address to reach a server's port, which is answering an ipv4 request using an ipv6 listening socket without IPV6_ONLY, and thus getting this auto-mapped fake IPv6 IP that we don't normally see in our infra and don't have revdns or hieradata about these IP ranges, etc [19:08:07] I think "::ffff:10.64.0.100" is what the client IP would appear to be in such a case [19:09:40] same curiosity as last time: [19:10:33] restbase1019 has 10.64.0.100 + 2620:0:861:101:10:64:0:100 on eno1 [19:10:43] dns only knows the ipv4 forward+rev, not the mapped-ipv6 [19:11:01] actually, rb1019's eno1 has 3 other ipv4s too [19:11:13] almost looks like an LVS service IP setup, but they're not on the loopback [19:11:56] there's 10.64.0.10[123] IPs for restbase1019-a, restbase1019-b, restbase1019-c [19:12:00] all with /32 masks... [19:12:22] I don't know how much that is really part of the novel mystery vs just some standard well-understood part of its setup [19:12:29] is that some artifact of however Cassandra is sharded for restbase? [19:13:09] it sounds like it, but why eno1 IPs with /32 masks? [19:14:25] really out of my area of knowledge... but. maybe hnowlan (probably also off though) would have a clue about the restbase piece of things [19:14:29] answering myself: maybe to make sure it doesn't make outbound connections with those IPs [19:14:50] but in that case, it might've been better to define them on loopback like the LVS case [19:15:00] it sounds like maybe we need a refresher SRE session on our restbase setup :) [19:15:21] there are't any presentations on these details are there? I think [19:15:39] I'm not sure, but I do know that I barely know the basics [19:15:43] same [19:16:11] well hugh, if you're around later and read this scrollback, wanna present at a meeting? :-) [19:16:24] well so far I can't even quickly grep up what mechanism in puppet is even creating those IPs [19:17:20] bblack: https://netbox.wikimedia.org/ipam/ip-addresses/?q=restbase [19:19:17] restbase1019.eqiad.wmnet huh [19:20:09] yeah I found it in puppet now [19:20:35] modules/cassandra/manifests/instance.pp -> $instance_rpc_address [19:20:41] which uses interface::alias [19:21:22] why it's setup like that is mostly historical reasons that don't hold anymore [19:21:25] which adds secondary IPs to the primary interface using a fixed /32 or /128 mask as appropriate [19:21:33] as inbound-only that aren't selected for outbound traffic [19:21:40] so that part all makes some kind of sense [19:21:45] and probably isn't part of the problem here [19:21:50] I forgot if it got documented somewhere when we did the Netbox import and tried to remove those odd ducks [19:22:32] instead of have one service per port, they do one per IP [19:24:34] ah ok, I think I found the ::ffff: part, and as I suspected last time around, it's envoy [19:24:42] oh? [19:24:51] in modules/envoyproxy/manifests/tls_terminator.pp : [19:25:04] # @param listen_ipv6 [19:25:05] # Listen on IPv6 adding ipv4_compat allow both IPv4 and IPv6 connections, [19:25:07] # with peer IPv4 addresses mapped into IPv6 space as ::FFFF: [19:25:26] bblack@haliax:~/repos/puppet$ git grep 'listen_ipv6: true' [19:25:26] hieradata/role/common/idp.yaml:profile::tlsproxy::envoy::listen_ipv6: true [19:25:29] hieradata/role/common/idp_test.yaml:profile::tlsproxy::envoy::listen_ipv6: true [19:25:32] hieradata/role/common/parsoid/testreduce.yaml:profile::tlsproxy::envoy::listen_ipv6: true [19:25:35] hieradata/role/common/restbase/dev_cluster.yaml:profile::services_proxy::envoy::listen_ipv6: true [19:26:15] the problem with this mode of listening for v4+v6 on one socket, as our envoy can apparently be configured to do [19:26:40] is it's going to produce these fake ipv6 client/source IPs that the rest of our infra doesn't expect in various network ACLs or XFF-parsing or whatever-else [19:26:43] I'm not yet caught up with backlog, bblack if you need some context on the cassandra IP setup I can help [19:26:57] volans: no I think I got that part, it's not the issue [19:27:05] as I had to make netbox work for it (as it's an exception) [19:27:18] and proposed also patches to fix the netmask on the host, but not deployed them yet [19:27:44] what's the issue? [19:27:48] volans: I think the /32 netmasks on the hosts are probably correct per the intent (whether the design intent is right is out of scope I guess) [19:28:08] because that prevents the host from using those alias IPs as source addreses for random outbound connections [19:28:56] is anything related to T253173 ? [19:28:57] T253173: Some clusters do not have DNS for IPv6 addresses (TRACKING TASK) - https://phabricator.wikimedia.org/T253173 [19:29:07] restbase are not v6 ready AFAIK [19:29:11] possibly, tangentially [19:29:30] restbase are in the class of hosts which have mapped-ipv6 defined on their $interface_primary, but do not have it delcared in DNS [19:29:33] *declared [19:29:47] but I don't think that particular quirk is causing this [19:30:00] it's intended [19:30:04] they are not v6 ready [19:30:08] the quirk we're looking at here is this: [19:30:44] restbase is connecting to some other service which intentionally only has an IPv4 service address, like our lvs'd ones that are all in 10.2.2.x or whatever [19:31:13] but that service is configured with an envoy listener, and that envoy listener only has an IPv6 listen socket, which is configured to accept traffic from both ipv4+ipv6 [19:31:30] ok [19:31:33] so when any client connects to this service, they'll use Ipv4 because there's only an IPv4 service address [19:32:04] but on the envoy side, it gets received on a universal-style ipv6 listen address, and thus the client IP gets recorded as ::ffff:a.b.c.d [19:32:25] which is not the client host's actual ipv4 nor its ipv6, nor will it match various ACLs embedded all over our infra, etc... [19:33:07] got it [19:33:29] brb, quick meeting, but I'm not even sure if this is causing any other problem at present [19:33:34] I've just run into it twice, and it smells fishy [19:36:34] :) [20:10:04] question about decommissioning a host: cescout1001 -- a physical host and not a VM because of the disk requirements for the database replica that we were syncing -- is no longer required because the database updates have been deprecated; [20:10:41] so I am thinking of moving this to a VM. question: since I have never done this before, the process includes opening a ticket and then running the decom cookbook. correct? [20:11:34] sukhe: yes, that is correct [20:11:39] there is a template for that kind of ticket [20:12:21] it starts with transition on https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Server_transitions [20:12:36] mutante: thanks, yes I saw! on the puppet side, I noticed the dry-run output in the decom cookbook says, "DRY-RUN: Removed from Puppet master and PuppetDB [20:12:39] " [20:12:53] from there you get to https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Remove_from_production [20:13:05] on the puppet repo side, what else needs to be done? should I remove the role from site.pp? [20:13:17] and there you get the actual phab link https://phabricator.wikimedia.org/project/profile/3364/ [20:13:23] no wait :) [20:13:36] https://phabricator.wikimedia.org/maniphest/task/edit/form/52/ [20:13:43] this is the link you should use [20:14:05] thanks! [20:14:42] sukhe: yes, remove it from site.pp but _after_ running the cookbook [20:14:49] got it [20:14:52] while the cookbook will also warn you that it is .. still in site.pp [20:14:59] then you say "yea, i know, right" [20:15:16] and it will be in DHCP [20:15:22] if only we could have automated gerrit edits :) [20:15:23] you can do that before [20:15:35] you don't have to worry about DNS anymore though [20:16:21] possibly an edit in netboot.cfg / partman recipe [20:16:46] possibly remove from cumin aliases [20:17:03] thanks, I will make read the wikitech link once again to make sure I have everything covered [20:17:06] cdanis: what do something similar to like how dbctl works? [20:17:29] or something that can upload a patchset that a human could then +2 and merge [20:17:34] sukhe: just "grep -r hostname *" in puppet/repo to check [20:17:35] that would be neat [20:18:13] mutante: yeah! [20:18:19] is the db script in the software repo cdanis ? [20:18:54] dbctl? yeah https://wikitech.wikimedia.org/wiki/dbctl links to its repo [20:18:56] i could look into hacking something up [20:19:47] ill get back to you on what i can figure out cdanis [20:20:34] I'm not sure that dbctl is the best example here, but sure :) [20:20:58] Well, i was going to use it to get a understanding on how it "commits" and kinda fork it from there so to speak [20:21:29] plus its written in python, a lang i understand [20:21:39] anyway imma stop rambling [20:41:06] cdanis: So im looking at how dbctl sends the paste to phab, i think if gerrit had a similar backend script to phab (the phaste) we could basically fork up a version of conftool/dbctl and make a way to convert whatever into a .patch or .diff file and use a backend script to upload to gerrit [20:41:26] I dont know if gerrit has such script [20:46:35] (i hope that made sense) [20:49:05] FWIW, I've recorded the ::ffff: issue in https://phabricator.wikimedia.org/T255568#6858439 for now [20:58:14] uhm.. I need to take that into account as well [20:59:13] and basically duplicate a lot of envoy config :_) [21:00:19] I will revert the "let envoy listen on Ipv6" for the testreduce machine. Because it did not work anyways. [21:00:20] <_joe_> vgutierrez: you know you have... templates right? [21:00:54] _joe_: I'm slightly aware, yes [21:01:04] <_joe_> :D [21:04:48] at least v4mapped is one of the less-crazy of these auto-ipv6 schemes to have to deal with :) [21:06:01] the 6 I know of (from having to do best-effort support for dns geoip lookups) are (copypasta from docs): [21:06:04] ::0000:NNNN:NNNN/96 # RFC 4291 - v4compat (deprecated) [21:06:06] ::ffff:NNNN:NNNN/96 # RFC 4291 - v4mapped [21:06:09] ::ffff:0000:NNNN:NNNN/96 # RFC 2765 - SIIT (obsoleted) [21:06:11] 64:ff9b::NNNN:NNNN/96 # RFC 6052 - Well-Known Prefix [21:06:14] 2001:0000:X:NNNN:NNNN/32 # RFC 4380 - Teredo (IPv4 bits are flipped) [21:06:17] 2002:NNNN:NNNN::/16 # RFC 3056 - 6to4 [21:06:28] they're all a mess! :) [22:18:28] IPv6 a mess? that can't be :P [22:18:45] * akosiaris couldn't resist [22:19:45] root_req.headers.x-client-ip [22:19:45] ::ffff:10.64.0.100 [22:19:47] wait what? [22:21:58] nodejs 127896 restbase 13u IPv6 109896146 0t0 TCP *:7233 (LISTEN) [22:21:58] nodejs 127896 restbase 14u IPv6 109896147 0t0 TCP *:7231 (LISTEN) [22:22:41] yeah, it looks like restbase is opening up just the IPv6 socket and relying on the ipv4 compat behavior, but this should be happening for years now [22:34:07] Yeah I can see entries in logstash going back to at least Dec 2020, chances are it's been around for ever (tm) [22:58:39] I guess I don't understand restbase internal loopiness [22:58:55] I still read that as restbase1019's IP as the client side of some connection [22:59:25] maybe it's an rb->rb request? [23:03:04] I do agree, it does look like nodejs is only listening on the v6-any, which implies nodejs is doing what I'm complaining about heh [23:03:53] still, you'd think we'd have noticed this a long time ago. maybe something else subtle has changed (on the rb nodes) more-recently [23:04:55] or maybe something has changed in how envoy works [23:05:43] well for the nodejs listener to be the reason, it has to be something connecting to RB's nodejs [23:05:48] and yet it's also an RB client IP [23:06:00] so it's probably rb->rb [23:06:15] unless there's rb->envoy->rb? [23:07:30] why would rb route to rb? [23:22:50] /ac/ac