[10:12:52] volans: one question regarding host renaming and decommissioning. In https://gerrit.wikimedia.org/r/c/operations/puppet/+/1102710 I want to rename kubernetes nodes to wikikube-worker. [10:12:52] The wikikube-worker numbers were previously used by decommissioned hosts and my rename fails with "Device name must be unique per site.". The old hosts are still in netbox as "offline" (https://netbox.wikimedia.org/dcim/devices/2525/). [10:12:52] I assume I have to use higher numbers instead of re-using the old ones? Or would it be possible to re-use the names for the wikikube-workers? [10:13:24] checking [10:14:31] the standard policy is to never re-use numbers when new hosts are replacing old host [10:15:11] https://wikitech.wikimedia.org/wiki/SRE/Infrastructure_naming_conventions#Name_reuse [10:16:13] okay that makes sense, I'll use the next free wikikube-worker numbers. Thank you! [10:16:57] if there are special circumstances for this specific case to reuse numbers we can discuss them, but if hte old hosts are offline it means they are stillin the DC [10:18:49] no I think the only reason is our automation script does not support this yet (skipping numbers). But that can be fixed, no need to break this policy [10:19:16] ack, thx [10:28:26] integers are cheap :) [12:26:45] ugh that's my fault volans, jelto, I specifically made a patch to reuse numbers for tidiness and didn't remember that policy [12:26:47] I'm sorry [12:26:57] I'll patch the script [12:31:35] no problem! I already talked with jayme about a patch. It probably makes sense to just block/ignore a list of certain numbers (the decommed ones). Because for eqiad we can still use the numbers between 1083 and 1240 afaik. Alternatively we can leave a gap and revert back to the +1 the highest number (like in the beginning). But that's a bit ugly [12:32:31] yeah, I [12:33:32] I'm not sure of a "general solution" to that problem, apart from hardcoding that range [12:41:51] or getting it from netbox [12:42:14] that's slow :p [12:42:22] (but yeah, that's a possibility as well) [12:42:52] graph ql enters the chat :) [12:44:10] topranks: do you have a code example somewhere I can look at using netbox graphql? [12:44:20] Is it through pynetbox too? [12:45:03] I've only been playing with it recently but it's way faster, esp if you have more than one lookup with the REST API / Pynetbox [12:45:26] there is a simple example here where I grab all IP addresses and there dns name if it's not empty [12:45:27] https://github.com/topranks/random_wmf/blob/main/netbox_dns/gen_zonefile_includes.py [12:46:08] there is an example here of grabbing devices: [12:46:09] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/homer/+/refs/heads/master/homer/netbox.py#229 [12:46:29] <3 [12:46:31] and a pretty neat interactive web UI where you can play with queries here: [12:46:36] https://netbox-next.wikimedia.org/graphql/ [12:46:41] Awesome [12:46:58] I don't think you need graphql for this, it's a single api call [12:47:19] yes quite possibly you don't I am not really sure exactly what you'd be looking to do [12:47:21] give me the devices with name starting with 'foo1' order by name desc limit 1 [12:47:40] if you need to check also VMs that's 2 API calls [12:47:54] where this script lives and runs? [12:48:15] This script lives in https://gitlab.wikimedia.org/repos/sre/serviceops-kitchensink and runs locally [12:49:03] but yeah I basically just need to grab a list of all devices matching wikikube-worker [12:49:08] https://w.wiki/CTHD [12:49:24] ^^ unfortunately graphql doesn't allow to query for a custom field [12:49:32] That's unfortunate yes [12:49:32] of which the bgp attribute is one [12:49:41] Because that would make this part of the script a lot faster [12:49:52] But we're not doing 50 hosts at a time either so it's ok [12:50:21] yeah, I'd hoped to use it when trying to speed up building the BGP config for the CRs but found it wasn't a possibility [12:53:29] claime: something like https://netbox.wikimedia.org/api/dcim/devices/?name__isw=wikikube-worker1&ordering=-name&limit=1 for eqiad [12:53:50] you can do it via pynetbox ofc [12:54:41] with filter right? [12:57:29] gotta love pynetbox :/ [12:57:45] you get it with: next(n.api.dcim.devices.filter(name__isw="wikikube-worker1", ordering="-name", limit=1))["name"] [12:57:53] BUT it much slower than the HTTP api [12:57:58] for no reason at all [12:58:06] is wikikube-worker1327 the right answer? [12:59:01] to the question "the biggest id of eqiad" yest [12:59:05] yes* [12:59:13] although nb.dcim.devices.filter('wikikube-worker1') works [12:59:18] wikikube-worker1327.eqiad.wmnet is the highest numbered wikikube worker in eqiad yes, but there is a gap between 1083 and 1240 [12:59:38] but what I'll do is get all ids from netbox, and remove that from the set of possible ids [12:59:42] and we'll be fine [13:01:24] why not last one +1? [13:02:04] or you mean there is a gap never used [13:02:14] yep [13:02:22] there's a gap we left for renames [13:09:30] if you have to grab all the names much better with graphql as you care only about the name and nothing else an will be much quicker [14:23:15] Hello. I'm trying to re-provision a server that is currently in a decommissioning state in netbox for T382410 [14:23:15] T382410: Re-use an-presto100[1-5] hosts as temporary hadoop workers an-worker106[5-9] - https://phabricator.wikimedia.org/T382410 [14:24:11] I think that I need to run https://netbox.wikimedia.org/extras/scripts/9/ to assign it a primary IP, before I can run ``sre.hosts.rename` against it. [14:25:27] Do I need to update the status in netbox to active first? https://netbox.wikimedia.org/dcim/devices/2016/ [14:28:16] btullis: o/ I'd suggest to jump on #wikimedia-dcops to discuss it, so people can chime in [14:28:26] not 100% sure what is the best procedure [14:28:30] elukey: Ack, thanks. [14:29:21] I can follow up in there too [14:29:38] btullis: changing to active is what I did in such cases and it seemed to work [14:30:10] kamila_: Ack, thanks. [14:30:38] I also found https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Decommissioned_-%3E_Active [14:30:52] that seems to point to rename directly [16:41:11] TIL udp2irc used to live in mediawiki-core with file paths hardcoded :D [16:41:12] https://github.com/wikimedia/mediawiki/blob/1.4.7/irc/mxircecho.py [16:41:18] sys.path.append('/home/kate/pylib/lib/python2.2/site-packages') [16:41:36] from ircbot import SingleServerIRCBot [16:42:18] <_joe_> monorepo FTW [16:42:48] https://github.com/wikimedia/mediawiki/blob/9e639c3af8d5143dd892f80c8d60e8db34c1346b/irc/rcdumper.php [16:42:54] Yeah :D [16:43:57] ahhh [16:44:04] did someone say udp2irc [16:44:17] ๐Ÿ‘€ [16:44:24] I am still wondering why IRC went to be used as a protocol to dispatch the updates. It is not a bad choice, but I do wonder :) [16:44:37] then we got RCStream to replace it, and now EventStream [16:45:07] but of course irc.wikimedia.org remains! [mandatory https://xkcd.com/1782/ ] [16:46:51] itโ€™s a strange application of https://bash.toolforge.org/quip/AVDp9SU-1oXzWjit6VkB ;) [16:47:18] I want to link https://wikitech.wikimedia.org/wiki/Videoscaling into the https://wikitech.wikimedia.org/wiki/Template:Navigation_Wikimedia_infrastructure sidebar, but I am not sure if it should go in the "MediaWiki SRE" section, the "Multimedia" section, or maybe both? [16:49:00] hnowlan: ^ maybe you have an opinion? [16:53:32] If there are no objections, I will upgrade Cassandra on sessionstore in ~30 minutes. It's a minor upgrade, well tested, and already rolled out elsewhere, but just in case... :) [16:57:03] Krinkle: https://github.com/paravoid/ircstream/blob/feature/sse/ircstream/rcfmt.py [16:58:59] paravoid: heh, this is not used currently right? We currently have MW format the channel and message still, right? [16:59:03] Ah I see, different branch. [16:59:12] correct [16:59:37] too many bugs/limitations (on the server-side) for this to be used [17:00:56] bd808: I would say both, especially if thumbor is already in multimedia [17:01:12] (but it mostly works for e.g. #en.wikipedia) [17:01:32] hnowlan: ack. I will poke at it later today. :) [17:01:46] thank you!