[10:12:52] <jelto>	 volans: one question regarding host renaming and decommissioning. In https://gerrit.wikimedia.org/r/c/operations/puppet/+/1102710 I want to rename kubernetes nodes to wikikube-worker.
[10:12:52] <jelto>	 The wikikube-worker numbers were previously used by decommissioned hosts and my rename fails with "Device name must be unique per site.". The old hosts are still in netbox as "offline" (https://netbox.wikimedia.org/dcim/devices/2525/).
[10:12:52] <jelto>	 I assume I have to use higher numbers instead of re-using the old ones? Or would it be possible to re-use the names for the wikikube-workers?
[10:13:24] <volans>	 checking
[10:14:31] <volans>	 the standard policy is to never re-use numbers when new hosts are replacing old host
[10:15:11] <volans>	 https://wikitech.wikimedia.org/wiki/SRE/Infrastructure_naming_conventions#Name_reuse
[10:16:13] <jelto>	 okay that makes sense, I'll use the next free wikikube-worker numbers. Thank you!
[10:16:57] <volans>	 if there are special circumstances for this specific case to reuse numbers we can discuss them, but if hte old hosts are offline it means they are stillin the DC
[10:18:49] <jelto>	 no I think the only reason is our automation script does not support this yet (skipping numbers). But that can be fixed, no need to break this policy
[10:19:16] <volans>	 ack, thx
[10:28:26] <Emperor>	 integers are cheap :)
[12:26:45] <claime>	 ugh that's my fault volans, jelto, I specifically made a patch to reuse numbers for tidiness and didn't remember that policy
[12:26:47] <claime>	 I'm sorry
[12:26:57] <claime>	 I'll patch the script
[12:31:35] <jelto>	 no problem! I already talked with jayme about a patch. It probably makes sense to just block/ignore a list of certain numbers (the decommed ones). Because for eqiad we can still use the numbers between 1083 and 1240 afaik. Alternatively we can leave a gap and revert back to the +1 the highest number (like in the beginning). But that's a bit ugly
[12:32:31] <claime>	 yeah, I
[12:33:32] <claime>	 I'm not sure of a "general solution" to that problem, apart from hardcoding that range
[12:41:51] <volans>	 or getting it from netbox
[12:42:14] <claime>	 that's slow :p
[12:42:22] <claime>	 (but yeah, that's a possibility as well)
[12:42:52] <topranks>	 graph ql enters the chat :) 
[12:44:10] <claime>	 topranks: do you have a code example somewhere I can look at using netbox graphql?
[12:44:20] <claime>	 Is it through pynetbox too?
[12:45:03] <topranks>	 I've only been playing with it recently but it's way faster, esp if you have more than one lookup with the REST API / Pynetbox 
[12:45:26] <topranks>	 there is a simple example here where I grab all IP addresses and there dns name if it's not empty
[12:45:27] <topranks>	 https://github.com/topranks/random_wmf/blob/main/netbox_dns/gen_zonefile_includes.py
[12:46:08] <topranks>	 there is an example here of grabbing devices:
[12:46:09] <topranks>	 https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/homer/+/refs/heads/master/homer/netbox.py#229
[12:46:29] <claime>	 <3
[12:46:31] <topranks>	 and a pretty neat interactive web UI where you can play with queries here:
[12:46:36] <topranks>	 https://netbox-next.wikimedia.org/graphql/
[12:46:41] <claime>	 Awesome
[12:46:58] <volans>	 I don't think you need graphql for this, it's a single api call
[12:47:19] <topranks>	 yes quite possibly you don't I am not really sure exactly what you'd be looking to do 
[12:47:21] <volans>	 give me the devices with name starting with 'foo1' order by name desc limit 1
[12:47:40] <volans>	 if you need to check also VMs that's 2 API calls
[12:47:54] <volans>	 where this script lives and runs?
[12:48:15] <claime>	 This script lives in https://gitlab.wikimedia.org/repos/sre/serviceops-kitchensink and runs locally
[12:49:03] <claime>	 but yeah I basically just need to grab a list of all devices matching wikikube-worker
[12:49:08] <topranks>	 https://w.wiki/CTHD
[12:49:24] <topranks>	 ^^ unfortunately graphql doesn't allow to query for a custom field 
[12:49:32] <claime>	 That's unfortunate yes
[12:49:32] <topranks>	 of which the bgp attribute is one 
[12:49:41] <claime>	 Because that would make this part of the script a lot faster
[12:49:52] <claime>	 But we're not doing 50 hosts at a time either so it's ok
[12:50:21] <topranks>	 yeah, I'd hoped to use it when trying to speed up building the BGP config for the CRs but found it wasn't a possibility
[12:53:29] <volans>	 claime: something like https://netbox.wikimedia.org/api/dcim/devices/?name__isw=wikikube-worker1&ordering=-name&limit=1 for eqiad
[12:53:50] <volans>	 you can do it via pynetbox ofc
[12:54:41] <claime>	 with filter right?
[12:57:29] <volans>	 gotta love pynetbox :/
[12:57:45] <volans>	 you get it with: next(n.api.dcim.devices.filter(name__isw="wikikube-worker1", ordering="-name", limit=1))["name"]
[12:57:53] <volans>	 BUT it much slower than the HTTP api
[12:57:58] <volans>	 for no reason at all
[12:58:06] <volans>	 is wikikube-worker1327 the right answer?
[12:59:01] <claime>	 to the question "the biggest id of eqiad" yest
[12:59:05] <claime>	 yes*
[12:59:13] <claime>	 although nb.dcim.devices.filter('wikikube-worker1') works
[12:59:18] <jelto>	 wikikube-worker1327.eqiad.wmnet is the highest numbered wikikube worker in eqiad yes, but there is a gap between 1083 and 1240
[12:59:38] <claime>	 but what I'll do is get all ids from netbox, and remove that from the set of possible ids
[12:59:42] <claime>	 and we'll be fine
[13:01:24] <volans>	 why not last one +1?
[13:02:04] <volans>	 or you mean there is a gap never used
[13:02:14] <claime>	 yep
[13:02:22] <claime>	 there's a gap we left for renames
[13:09:30] <volans>	 if you have to grab all the names much better with graphql as you care only about the name and nothing else an will be much quicker
[14:23:15] <btullis>	 Hello. I'm trying to re-provision a server that is currently in a decommissioning state in netbox for T382410
[14:23:15] <stashbot>	 T382410: Re-use an-presto100[1-5] hosts as temporary hadoop workers an-worker106[5-9] - https://phabricator.wikimedia.org/T382410
[14:24:11] <btullis>	 I think that I need to run https://netbox.wikimedia.org/extras/scripts/9/ to assign it a primary IP, before I can run ``sre.hosts.rename` against it.
[14:25:27] <btullis>	 Do I need to update the status in netbox to active first? https://netbox.wikimedia.org/dcim/devices/2016/
[14:28:16] <elukey>	 btullis: o/ I'd suggest to jump on #wikimedia-dcops to discuss it, so people can chime in
[14:28:26] <elukey>	 not 100% sure what is the best procedure
[14:28:30] <btullis>	 elukey: Ack, thanks.
[14:29:21] <elukey>	 I can follow up in there too
[14:29:38] <kamila_>	 btullis: changing to active is what I did in such cases and it seemed to work 
[14:30:10] <btullis>	 kamila_: Ack, thanks.
[14:30:38] <elukey>	 I also found https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Decommissioned_-%3E_Active
[14:30:52] <elukey>	 that seems to point to rename directly
[16:41:11] <Krinkle>	 TIL udp2irc used to live in mediawiki-core with file paths hardcoded :D
[16:41:12] <Krinkle>	 https://github.com/wikimedia/mediawiki/blob/1.4.7/irc/mxircecho.py
[16:41:18] <Krinkle>	 sys.path.append('/home/kate/pylib/lib/python2.2/site-packages')
[16:41:36] <Krinkle>	 from ircbot import SingleServerIRCBot
[16:42:18] <_joe_>	 monorepo FTW
[16:42:48] <Krinkle>	 https://github.com/wikimedia/mediawiki/blob/9e639c3af8d5143dd892f80c8d60e8db34c1346b/irc/rcdumper.php
[16:42:54] <Krinkle>	 Yeah :D
[16:43:57] <hashar>	 ahhh
[16:44:04] <paravoid>	 did someone say udp2irc
[16:44:17] <cdanis>	 👀
[16:44:24] <hashar>	 I am still wondering why IRC went to be used as a protocol to dispatch the updates. It is not a bad choice, but  I do wonder :)
[16:44:37] <hashar>	 then we got RCStream to replace it, and now EventStream
[16:45:07] <hashar>	 but of course irc.wikimedia.org remains! [mandatory https://xkcd.com/1782/ ]
[16:46:51] <Lucas_WMDE>	 it’s a strange application of https://bash.toolforge.org/quip/AVDp9SU-1oXzWjit6VkB ;)
[16:47:18] <bd808>	 I want to link https://wikitech.wikimedia.org/wiki/Videoscaling into the https://wikitech.wikimedia.org/wiki/Template:Navigation_Wikimedia_infrastructure sidebar, but I am not sure if it should go in the "MediaWiki SRE" section, the "Multimedia" section, or maybe both?
[16:49:00] <bd808>	 hnowlan: ^ maybe you have an opinion?
[16:53:32] <urandom>	 If there are no objections, I will upgrade Cassandra on sessionstore in ~30 minutes. It's a minor upgrade, well tested, and already rolled out elsewhere, but just in case... :)
[16:57:03] <paravoid>	 Krinkle: https://github.com/paravoid/ircstream/blob/feature/sse/ircstream/rcfmt.py
[16:58:59] <Krinkle>	 paravoid: heh, this is not used currently right? We currently have MW format the channel and message still, right? 
[16:59:03] <Krinkle>	 Ah I see, different branch.
[16:59:12] <paravoid>	 correct
[16:59:37] <paravoid>	 too many bugs/limitations (on the server-side) for this to be used
[17:00:56] <hnowlan>	 bd808: I would say both, especially if thumbor is already in multimedia
[17:01:12] <paravoid>	 (but it mostly works for e.g. #en.wikipedia)
[17:01:32] <bd808>	 hnowlan: ack. I will poke at it later today. :)
[17:01:46] <hnowlan>	 thank you!