[04:03:48] In 1h I will be switching enwiki db master [04:33:57] In 30 minutes I will be switching enwiki db master [07:03:43] hello folks, some kafka-main200x hosts are missing the AAAA records in the DNS, going to add them [07:03:59] one at the time, all the interfaces firewall etc.. are already in place [07:05:41] ack, thx! [07:06:49] nice [07:26:22] anybody available for a quick code review? https://gerrit.wikimedia.org/r/c/operations/puppet/+/683232/1 [07:26:31] (Ips to check :) [07:27:55] I can [07:28:07] <_joe_> elukey: shouldn't our newly appointed zookeeper/kafka expert handle those? [07:28:21] _joe_ this is a great point [07:28:23] <_joe_> i mean jayme [07:28:32] yes yes :D [07:29:07] I feel bullied :D [07:29:19] <_joe_> rightfully so [07:29:35] <_joe_> let me tell you the story of how i was named the zuul expert [07:29:58] oh, you are the zuul expert! I did not know about that ;) [07:30:04] <_joe_> I WAS [07:30:08] <_joe_> not anymore [07:30:48] you have to prove that by pointing finger to someone else I suppose [07:31:04] <_joe_> exactly [07:31:33] <_joe_> so now you need to wait for a new, innocent prey, someone willing to learn new stuff [07:31:41] <_joe_> and lure them in with "we'll do it together" [07:31:48] <_joe_> on that note, how's etcd? :P [07:32:17] already said this has been a pretty bad week for me - ownership wise :D [07:32:28] elukey: there's a typo , checking th eothers [07:32:40] jayme: I fell into Joe's trap in past, I know the feeling [07:32:54] _joe_: why does that feel suspiciously similar on how you helped me with deployment-prep etcd [07:33:44] volans: <3 [07:33:59] <_joe_> Majavah: you're mistaken, s/ etcd// :P [07:35:08] elukey: while we [07:35:28] *'re at it. I would purge zookeeper from the old nodes and merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/682669 [07:35:45] jayme: +2! [07:37:31] volans: fixed the cr! [07:40:33] elukey: {done} thanks a lot for taking care of it! [07:41:37] thanks for the review! [07:52:18] jayme: ok to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/683232 ? [07:54:44] <_joe_> jayme: shouldn't we just move the etcd clients too and then decom the nodes? [07:55:05] elukey: +1ed ... to the full extend of my knowledge :P [07:56:26] _joe_: Now that the zookeeper stuff is done, yes. I would like to complete writing the two scorecards from yesterday first, though [07:56:50] <_joe_> do you want me to do the client-side migrations? [07:57:36] actually I would like "we'll do it together" :P [07:58:45] scorecards won't take ages. I'll get back to you in a bit [07:59:14] akosiaris: I'm waiting for a puppet merge, let me know when you are done please :) [07:59:45] dcaro: I was asking in -operations. Do I merge yours as well ? [08:00:50] I assume yes (the thirdparty/ceph-octopus stuff). [08:02:01] yep [08:02:14] done [08:02:22] thanks! [08:56:29] contint2001, the server running Jenkins will be rebooted in two minutes, CI will be interrupted until it's back up [09:29:45] herron: https://lists.wikimedia.org/postorius/lists/ \o/ FINALLY [11:22:32] volans: you may be interested in this backtrace from the reimage script [11:22:35] https://www.irccloud.com/pastebin/PPQVbBhc/ [11:24:27] arturo: ack, thats from the netbox script though, not the reimage script [11:24:51] looking [11:25:03] the server ended in good shape anyway [11:25:15] I can open a phab task if you prefer [11:26:51] nah, no need, I'm looking at it right now [11:28:12] at least for now [11:30:25] arturo: is 185.15.56.237/30 [11:30:29] a floating IP? [11:30:35] or assigned to a specific host? [11:30:56] let me double check. I think it is a VIP [11:31:15] virtual_ipaddress { [11:31:15] 185.15.56.237/30 dev ens2f1np1.1107 [11:31:15] } [11:31:15] it's marked as vip in netbox, but wanted to check in reality [11:31:27] volans: netbox is right [11:31:48] ok so the script is trying to assign it to the iface although it shouldn't [11:36:07] arturo: so, the problem is that the address was created in netbox (and the host FWIW) as a /30, while we consider VIPs only /32 or /128 when getting data from puppetdb. XioNoX thoughts? [11:36:36] now, the code has clearly a bug, but that's unrelated to the fact it didn't detect this as VIP [11:39:12] ac [11:39:14] ack* [11:40:11] (so has 2 bugs :) ) [11:42:51] arturo: for what concerns the reimage it goes ahead anyway so this is not a problem. I'll re-run the script on netbox once those things are fixed. As for the /30 vs /32 I'll wait Arz.hel reply to decide what's the best course of action. [11:43:05] but first, lunch [11:43:23] 👍 thanks! [11:46:16] arturo: shouldn't the VIP be a /32 ? [11:46:30] unless it's a specificity of how linux works [11:46:40] I don't think the netmask alone can tell you much, because there are probably cases where a "VIP" in the metadata sense needs config with a netmask like that at the host level [11:46:48] I could check how it affects routing [11:46:53] and we can't just call all /30's VIPs either [11:48:00] maybe could break it up in config though? Call is a /32 in netbox and specify the /30 netmask separately in hieradata somewhere? [11:48:55] bblack: yeah that could work. We have the /30 in hiera anyway. I don't mind if netbox has the /32 if that makes live easier [11:48:59] life* [11:49:03] (or create some kind of artificial subnet object in netbox for such cases, too) [11:49:19] but puppetdb would report what's on the host anyway [11:49:23] another such "artificial" subnet is the LVS ranges [11:49:30] but we don't configure them the same way [11:49:58] now, if the VIP has been already created on netbox I guess we could add a check on its role variable, but if it doens't exists the script will happily create it "wrongly" [11:51:32] wait, why is the reimage script (or netbox) assigning the VIP in the first place? I would expect the VIP role to tell netbox that the VIP is managed using some software and we don't need any kind of management by it [11:52:15] what's the difference between VIP and VRRP roles in netbox? perhaps we're just using the wrong role [11:52:50] there is no functional difference [11:53:31] Netbox is pulling the IP info so it has the whole picture, and we don't double assign IPs for example [11:55:05] ok [11:56:00] my point was: is there a way to tell netbox that a given IP address is not to be configured on a server? Like, the IP address is managed by an additional software component (like keepalived) [11:57:33] arturo: netbox doesn't configure the IP [11:57:56] it documents it [11:58:55] well I was referring to this line in the log above: [11:59:01] 11:13:09 | cloudgw1001.eqiad.wmnet | [info] Assigning 185.15.56.237/30 to cloudgw1001:ens2f1np1.1107 [11:59:29] that is clearly wrong. It should be keepalived managing that VIP, not the reimage script or anything else [12:00:15] it's in the context of netbox [12:00:16] that's on netbox [12:00:33] it assign the netbox object IP, to the netbox object interface [12:00:38] ok! I see [12:00:44] nevermind then :-) [12:01:34] it does have a limitation where netbox will get out of sync with the real world if the IP is moved to the backup VRRP [12:02:01] what I did on the network devices is create the VRRP/VIP IP on both members [12:02:27] to indicate that they are both potentially the user of that IP [12:15:12] makes sense [12:19:36] pcc [12:19:42] * jbond42 wrong window [12:23:32] it's this time of the week - looking for an SRE session this coming Monday [12:24:02] arturo: ^ I see you on the list, is that something you would want to do this Monday? [12:24:36] paravoid: no time, could it be somewhere next month? [12:24:49] I would like to prepare some slides etc [12:25:26] sure thing [12:25:44] thanks! I'll let you know as soon as I have something prepared [12:26:28] no worries :) [12:27:11] anyone else? jynus, akosiaris/jayme, I see you on the list as well [12:27:54] (we also have shdubsh for ECS, but we had godog from o11y in our last meeting, so if possible it'd be better to not have folks from the same team twice in a row I think) [12:29:02] +1, also Cole is out this week [12:33:17] paravoid: don't want to volunteer a.kosiaris but I think we can put something together for monday [12:34:08] nothing fancy, though. Just a bit of YAML and some Q&A maybe [12:35:37] I can go in 3 weeks, I don't intend to do a presentation, just a demo + Q&A [12:36:17] paravoid: jayme: no can do, it's Easter Monday and a holiday over here. [12:36:31] paravoid, fill me in next time you don't have anyone queued-- mine is not time sensitive, so I can be slot in as a wildcard [12:37:04] jynus: "we don't have anyone queued" [12:37:05] :P [12:37:27] yeah, but jayme volunteered [12:37:43] ask him, otherwise I can go [13:04:34] paravoid: I can take the slot without akosiaris around [13:31:27] Amir1: nice! [13:35:33] ^^ [13:38:40] so when you create a new shell account by adding to modules/admin/data/data.yaml there's a uid value to come up with [13:38:53] I vaguely remember something to take into account [13:39:07] I don't remember what that something was [13:39:40] ema: you need to get the uidNumber value from ldap on mwmaint1002 [13:39:46] I think you should lookup the WMCS uid from LDAP [13:39:55] ah my heroes [13:39:58] thank you [13:59:57] jayme: sold, thank you! [14:00:01] We've got reports that some instant commons file aren't loading [14:00:04] Not all though [14:02:32] https://meta.miraheze.org/wiki/File:Test.svg and https://meta.miraheze.org/wiki/File:01-01-2014_-_Messeturm_-_trade_fair_tower_-_Frankfurt-_Germany_-_01.jpg work [14:02:41] https://meta.miraheze.org/wiki/File:Replacement_filing_cabinet.svg doesn't [14:04:38] File:Replacement filing cabinet.svg works on my local wiki, you sure that it's not some caching combined with the incident yesterday on your side? [14:07:07] Majavah: it looks like it, now to work out how to purge cache [14:07:13] ?action=purge. not work [14:49:31] it fixed [15:11:06] jynus: I've an unexpected homer diff while running decom regarding backup2006, is that okay to merge? [15:13:51] jayme: what's the unexpected part? [15:14:22] XioNoX: it adds an interface, I guess [15:14:37] jayme: can you share the diff? [15:14:46] https://phabricator.wikimedia.org/P15621 sure [15:16:37] XioNoX: related task is probably https://phabricator.wikimedia.org/T277323 [15:16:48] jayme: you should ping papaul but it looks safe [15:16:55] https://netbox.wikimedia.org/dcim/devices/3415/changelog/ [15:18:19] XioNoX: okay, thanks [15:52:28] Just to close the loop, for the earlier netbox script issue on reimage I've sent a patch that should fix it, added related people to it. [17:09:59] <_joe_> anyone else is having trouble using pcc? [17:12:47] I'm seeing zuul quite full and CI slow [17:12:50] so might be related [17:16:36] <_joe_> yep, worked now [18:49:34] Do we consider "puppetdb_query" inside puppet code an antipattern? We have all these lists of hosts in Hiera but..while thinking about just another case where I would need to have "one random host out of the list of hosts using this class" or "all hosts using this class", I was thinking maybe I should just query the puppetdb right from the manifest to get such lists. like $debian_nodes_query = [18:49:40] ["from", "nodes", ["=",.. something ? And then avoid any future Hiera changes when host names change? [18:50:31] $debian_nodes = puppetdb_query($debian_nodes_query).each |$value| { ... [18:51:30] oh, that is from puppet6 API docs though.. ok [21:43:17] mutante: just a thing to be aware of with puppetdb hard dependencies, most of Cloud VPS does not have puppetdb because it is not multi-tenant safe. Deployment-prep does have a puppetdb though, so its not 100% a deal breaker depending on the role/profile/module it affects [22:02:49] bd808: *nod* ok, ack [22:50:22] herron: shdubsh: fyi, logstash-beta down for a couple weeks. I was hoping some of the people I told this had since reached out, but given they haven't I'm pinging y'all the usual way as before this quarter. https://phabricator.wikimedia.org/T233134 [22:51:02] Any help to get it back up would be much appreciated. Various things blocked right now as a result for myself and others, or degraded in favour of finding out more problems in prod instead of beta. [22:53:20] hey Krinkle ok I’ll try to have a look at it tomorrow [22:54:53] thx <3