[13:57:00] jbond42: thanks for all the reviews :) [13:58:10] no probs. [14:11:06] i deployed a few db-related puppet changes recently [14:11:26] is there work going on? some rack maybe? [14:11:27] but that wouldn't explain the non-db hosts [14:11:33] dumpsdata1003 for example [14:12:01] `● nic-saturation-exporter.service loaded failed failed Prometheus network interface saturation exporter` [14:12:06] yes, it is known [14:12:09] j.bond is fixing [14:12:11] right , there are some codfw in there too [14:12:13] ok good [14:12:26] phew [14:14:09] is context for this page storm in some other channel somewhere? [14:14:21] both above and in #-observability [14:14:45] -operations, yeah [14:14:58] apergos: that _too_, yes :) [14:15:11] 🙃 [14:15:32] pages or irc noise? [14:15:42] the latter [14:16:06] puppet running now which should clear the alerts [14:16:15] cool, thank you [14:16:21] awesome [14:16:33] sorry for the noise :) [14:16:39] no worries! [15:04:15] paravoid, volans, elukey, better here I guess so I don't proxy :) [15:04:26] yeah 4-layer proxy doesn't seem optimal [15:05:05] lol [15:05:21] elukey: this is paravoid, paravoid this is elukey... feel free to interact directly, XioNoX and I will just watch :D [15:05:29] what's up? [15:05:54] so apparently when elukey upgraded the kernel on some stretch hosts the iface names changed [15:06:11] (I am in meeting sorry) [15:06:14] oh lovely [15:06:30] yes, expected :) [15:06:41] I think 4.9.x to 4.19.x [15:06:44] yup [15:07:04] so much for "predictable" :D [15:07:16] did you read https://github.com/systemd/systemd/issues/12261 ? [15:07:20] yeah [15:07:26] https://github.com/torvalds/linux/commit/c124a62ff2dde9eaa9e8083de8206a142535c04e would be one of the causes [15:09:13] so what's the question? [15:10:22] I don't have any, I thought you had some :) [15:10:30] XioNoX proxy didn't work very well :-P [15:11:21] hahaha... [15:11:26] but if we plan to upgrade hosts with those kernels without a reimage we need to plan for syncing netbox too [15:12:04] and how to keep a mapping of old -> new names [15:13:09] also the hosts would not boot I think given you have to update /e/n/interfaces [15:13:10] do we need to map them, or just reimage in these cases? [15:13:16] IMO systemd cannot be the single source of truth [15:13:20] no need to "map" [15:13:43] linux+systemd really, because it depends on both [15:13:53] it's an unfortunate compatibility barrier, but surely we can just reimage as we hit it. We have to at least reboot for a kernel change anyways, and a reimage is like a very slow reboot :) [15:14:23] "predictable" for systemd developers AIUI means "can be independently named in a race-free way", not predictable across releases etc. [15:14:54] so I think relying on other sources of truth like iDRAC/iLO would be the more prudent way here [15:15:11] and then ideally, we could generate a udev rule out of that, and have our own linux naming scheme to avoid any sort of mappings [15:16:23] [I still kinda fail to see the light at the end of the tunnel on this issue for the general case and automatic naming. Overall, we might've been better off just living with eth0-style naming as the default, and we still could've custom-renamed them in udev for our particular environment] [15:17:34] https://github.com/dell/biosdevname was Dell's approach to the problem [15:17:43] relying on SMBIOS to extract the vendor's naming scheme [15:17:50] I think they even had HP support in there as wlel [15:18:12] Ubuntu was using that for a few years, I don't think they do anyore [15:20:16] another thought, is that most things could key off of interface_primary and/or lldp facts to avoid hardcoding interface naming at the puppet level [15:20:29] isn't that the case right now? [15:20:36] the complication that's left in that worldview is our legacy "write /e/n/i at install-time" bit [15:20:40] I did a sprint of fixes to that end some years ago [15:20:47] not sure if we regressed :) [15:21:14] /e/n/i doesn't get fixed by puppet, and even if it did, there's a chicken and egg problem maybe [15:21:44] there's also the possibility that the installer kernel has different ideas about the device names than the install_ed_ kernel [15:22:03] lvs hieradata still has hardcoded interface names [15:22:14] maybe they could be derived from lldp facts, but nobody's hooked that up yet [15:22:57] $ git log --author=faidon --grep=eth0 --since="Feb 10 2017" [15:23:02] are the fixes of the time, fwiw [15:23:49] make that Jan 1 2017 to cover the net.ifnames=0 change of the time [15:23:53] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common/lvs/interfaces.yaml [15:24:07] ^ this clearly could be reworked, it already has the vlan ids too [15:25:02] just needs some manifest magic to pull the iface names from the lldp facts based on the id [15:25:13] we have all that info in netbox fwiw :) [15:26:00] is netbox a data source for puppet config too? [15:26:19] not _yet_, aiui. ;) [15:26:26] cf. https://gerrit.wikimedia.org/r/c/operations/puppet/+/563186 [15:26:39] and T229397 [15:26:40] T229397: Puppet: get row/rack info from Netbox - https://phabricator.wikimedia.org/T229397 [15:27:07] ^^^ looking for reviews (cough volans ;)) [15:27:19] jbond42: indeed! [15:27:20] fascinating! [15:27:33] it's on my short TODO [15:27:37] but the current interface naming in netbox comes from... ? [15:27:47] puppetdb :D [15:27:50] that said there was worry about what we use from netbox in puppet as netbox is public and has no stop gaps currently [15:27:52] which comes from ... ? [15:27:54] yeah [15:27:55] OS [15:27:58] facter [15:28:09] we want to prvent someone who finds an exploit in netbox escalating to puppet [15:28:13] facter networking and net_driver [15:28:41] it seems like there's a full circle in here somewhere [15:28:47] somewhere :) [15:29:04] that's why I said we need to think in a "source of truth" way [15:29:08] what's our source of truth :) [15:29:09] bblack: im not sure i get what you mean by "pull the iface names from the lldp facts based on the id" [15:29:17] then we can find ways to propagate the information [15:29:25] but, meeting now, ttyl :) [15:29:51] jbond42: right now, "facter -p" output includes stuff like: [15:29:53] lldp => { [15:29:53] ens2f0np0 => { [15:29:53] neighbor => "asw2-b-eqiad", [15:29:53] port => "xe-2/0/36", [15:29:56] descr => "authdns1001", [15:29:58] mtu => 9192, [15:30:01] vlans => { [15:30:03] untagged_vlan => 1002, [15:30:06] mode => "access" [15:30:28] so you could in theory refactor the hieradata I linked earlier and just remove the interface names completely, and have the consuming manifests generate them from the facts by linking up to the vlan ids [15:31:19] also fwiw/iirc netbox added a "label" field to interfaces in 2.9 [15:31:40] interesting [15:31:57] ganeti hosts have sub-interfaces named directly private/public [15:32:11] (although it might be simpler if we had an alternate representation that keyed on vlan instead of interface name. that would be easy enough to do at the facter level. just turn that data structure around as available_vlans => public1-a-eqiad => { id => 1019, iface => ens2f0np0 } [15:32:16] or something like that [15:32:26] bblack: if i fllow that would work for the vlan interfaces e.g. ens2f0np0.1002 but dosn;t help with the physical (ens2f0np0) one right [15:32:42] it should work for the physical as well [15:32:57] the example I pasted above is a plain physical port with lldp vlan info [15:33:14] also how robust would it be, like would LLDP breaking cause an LVS outage? [15:33:32] yeah it would [15:33:34] im gussing that lldp is just reading the interface name as configuered on the system [15:33:54] jbond42: right, the point of my suggestion was just that it was a way to avoid hardcoding the iface name in LVS puppetiation [15:34:09] (which is one of the few cases where iface names are still hardcoded in puppet) [15:34:41] but yes, the problem is lldp is unreliable (and thus makes me worry about lldp breakage killing anything in puppet that uses it, really) [15:34:46] oh sorry issue with comming in mid way throught the converstaion thought it was more genral discussion of who is the sorce of truth for iface names [15:34:59] it is, this is just a side branch :) [15:35:04] ack :) [15:35:38] I think it does make sense to think about netbox as the source of truth on all such things [15:36:05] and have the puppetmasters pulling from the netbox api to derive stable facts that are present for host agent runs [15:36:17] yes i agree i think inux has made things worse in an affort to improve things and it would be better t just name the interface in netbox and configure udev rules [15:36:30] bblack: note that I'm working on importing and managing cables and switch ports in Netbox, so the same data as LLDP should be in there [15:36:36] but... that worldview gets tricky when the os is controlling interface naming [15:36:38] (including vlans) [15:37:26] if we naively treat netbox as the only truth on all such matters, then we have to make a manual netbox change when a kernel update changes interface names, too. [15:37:43] or the interface naming from the os has to be predictable in the ways that really matter [15:38:37] what if we skipped over all the physical port stuff (effectively giving up on the ideal of ever having interface names that clearly map to physical reality on the back of the box), and instead constructed our interface names out of vlan info? [15:38:42] bblack: if we just have a mac address to name mapping in udev then kernal updates or systemd updates shouldn't matter [15:38:56] bblack: yes thats is more what im thiking [15:39:17] public.1001/private.2001/frack.3001/dmz.4001 [15:39:49] it gets tricky if we allow the edge case of multiple interfaces from a single host to a single vlan, e.g. for bonding [15:39:56] I think we've had one or two such cases in the past [15:40:14] yes this is when we rember naming is hard lol [15:40:24] but if you assume we can be rid of those cases or handle them as some exceptional case [15:40:48] you could have host interface names like "eth-public1-a-eqiad" [15:42:02] and then we have lvs with its vlan tagging [15:42:37] mac-address.vlanid [15:43:59] yeah, it makes little discintion between the base and the vlan-tagged interfaces, but I guess that's just a human comfort that doesn't matter [15:44:00] one thing that would be nice as we are considering this is to have the interface with the default gw named something predictable i.e. primary. often times all you want is this interface so having a way to predict it is usefull hence base/lib/facter/interface_primary.rb [15:45:02] not sure I understand [15:45:20] fwiw we don't track mac addresses and I would really prefer to keep it that way, the next step in the plan for provisioning is to get rid of the mac in the dhcp config [15:46:07] there are probably 88 ways we could do this that would "work" for the common case: single interface to a single vlan, nothing tricky or virtual, and the only complications are defining the source of truth, defining a name we like, and bootstrapping it all reliably in the face of kernel/systemd changes [15:46:30] mapping out the edge cases we wish to support and how we'd handle those is what really narrows it down [15:46:38] if we take lvs1013 as an exmaple the interface with hte gw is enp4s0f0. i think it would be nice if it was named something like `primary` so i can do `ip addr show primary` on any server to get the primary ip [15:48:40] we apparently don't have any live interfaces left named "bond0", but maybe they're just more-creatively named these days? [15:48:49] so I guess the short term fix is to properly rename interfaces in netbox if they change on the servers [15:49:16] bblack: they're used in frack, and there are wmcs plans to use some [15:49:22] XioNoX: that and, reimage too [15:49:37] (so that the reimage fixes /e/n/i) [15:49:48] fixes? [15:50:02] puppet won't change the interface name hardcoded in /e/n/i from install-time [15:50:22] and thus /e/n/i won't even work right when you first reboot with a kernel+systemd that changes the interface name [15:50:32] I see [15:50:55] for the short term, I think a kernel/systemd upgrade that changes the naming has to require a reimage [15:51:09] and then, yes, unless the reimage script already takes care of it (?), change it in netbox too [15:51:53] I'm not sure how that works for reimages now. I assume netbox gets initial interface name data from the installation process somehow, or one of the imaging automation steps [15:52:48] yeah, not sure how the "update Netbox" script can know which old interface has been renamed to the new existing names [15:53:10] what happens now for a fresh new install with no renaming? [15:53:12] bblack: we trigger the import from puppetdb after the reimage [15:53:18] ah ok [15:53:19] so that bit is covered* [15:53:30] but it causes interfaces to be duplicated [15:53:31] *there is a tricky thing in via of fixation [15:53:43] Netbox has both the old and new interfaces [15:53:47] the script for mass importing was very conservative and didn't delete anything [15:54:00] right [15:54:06] I'm planning now that we have all the data to change it and be more aggressive, but there is an issue with the cables [15:54:17] that XioNoX plans to mass-add [15:54:35] and you could say "well then the reimage should first wipe existing interfaces before pulling the new stuff from puppetdb", which would sorta sanitize parts of this right now, but is very much in opposition with the long-term goal of netbox being the truth source [15:54:43] tl;dr but we should be able to fix it in a way that a reimage fixes netbox [15:55:51] the core conflict my brain keeps running into here is that netbox is what we want to be the source of truth for all things, but the kernel/systemd is the actual source that creates the truth about interface names [15:56:02] and isn't necessarily predictable in its choices [15:56:34] which I guess is why pvoid was talking in the direction of udev rules for idrac/ilo, etc [15:57:21] maybe such udev rules could give us consistency across kernel/systemd (including installtime vs runtime, etc) [15:57:52] and then we could pull that info once at install-time up to netbox, and all puppet data flows from netbox api -> puppetmaster -> facts -> agent for everything else derived [15:58:13] (so we don't end up with a puppetdb->netbox->puppetdb loop for some parts of this) [15:59:11] this would name only the ifaces we configure? [15:59:17] the remaining will keep the systemd names? [15:59:44] if we need to have all intefaces in netbox before install time, who's adding them? [15:59:52] I think the idea is that idrac/ilo/biosdevname/? (whatever it is we plug up to udev rules at install+run time) [16:00:24] would name them all in some consistent fashion, maybe it calls them Ethernet0 and so on, I have no idea, but it matches some vendor interface naming concept [16:00:41] so we allocate like we do now with a placeholder, then at reimage time we get those names from HW and push them to netbox? [16:00:55] and from there it's turtles all the way down :D [16:01:03] yeah the placeholder part is tricky [16:01:31] we already have that :) [16:01:49] right now the way the installer works, IIRC, is that it tries dhcp and then checks which interface ended up with the default-gw and decides that's primary and writes /e/n/i [16:02:07] and that determines the truth of $interface_primary for everything that follows afterwards [16:02:08] yes, but on a new host we allocate IPs to an inteface named ##PRIMARY## in Netbox, in order to generate the DNS [16:02:26] that are required pre-reimage [16:02:27] and this drives dhcp config as well? [16:02:35] no because that's based on MAC [16:02:46] the plan is to get rid of MAC and get some form of autoconf there [16:02:52] but no immedate ETA [16:03:30] yeah I don't know what that would look like [16:03:40] back [16:04:06] (the "some form of autoconf" that would get the netbox-allocated ##PRIMARY## IP set for some interface at first-install time, without a macaddr?) [16:04:14] wrt bootstrapping [16:04:27] I think we will eventually need a "discovery" stage [16:04:34] or inventorizing stage [16:04:44] yeah, that kinda makes sense [16:04:59] boot some minimal image on the hardware, which inventories this stuff and reports to netbox [16:04:59] where a server is plugged in to mgmt, then something runs that polls iDRAC and populates all interfaces [16:05:08] iDRAC should have everything these days [16:05:09] possibly with udev rules already doing their custom things, and lldp vlan info [16:05:25] idrac won't know what's plugged to what [16:05:37] but a minimal invetory-stage linux image could [16:06:03] we need that information before, to configure switches as well [16:06:22] so source of truth there is dc ops putting the cable (with its ID) into netbox [16:06:51] and the idrac info can match with dcops physical reality? [16:07:08] I... think so? :D [16:07:18] that's the part that always seems to be at the bottom of the rabbithole [16:07:40] even if we go down idrac/ilo -specific solutions that plug into udev and/or polling them over mgmt interface (probably both in combination?) [16:08:10] can we get to a world where, when dcops plugs cable 5123A into on the back of a new machine, they know what the interface name will end up being at the linux level? [16:08:40] that's the vision :P [16:09:23] 18:31 < paravoid> also fwiw/iirc netbox added a "label" field to interfaces in 2.9 [16:09:28] so we can also use that [16:09:36] ok [16:09:45] the interface name is named whatever dc ops thinks the interface name is [16:10:07] not sure :) [16:10:45] yeah the real ground truth is probably that even dell can't tell us programmatically how to now the idrac name of a physical port we're staring at [16:10:51] s/now/know/ [16:11:09] let's check, what's the most complicated box you can think of? [16:11:17] lvs [16:11:24] say lvs1013 [16:11:26] let's see what a) idrac b) smbios c) systemd think [16:12:29] so smbios is dmidecode? [16:13:37] yeah, and/or sysfs [16:14:04] there's also smbios-utils but I don't recall anything nic-related there [16:16:07] lvs1013 is an older-gen Dell, so iDRAC 8 [16:16:31] but, still says [16:16:47] Embedded NIC 1 [16:16:49] NIC Slot 1 [16:16:51] NIC Slot 2 [16:17:00] right [16:17:03] and for NIC Slot 1, sees Ports 1 and 2 [16:17:21] including their mac addresses fwiw [16:17:23] in this case, Embedded is disabled (doesn't show up at the kernel/systemd level) [16:17:34] but yeah we're using ports 1+2 of nic slot 1+2 [16:17:42] embedded says 4 ports [16:17:44] how can you know if Port 1/2 are left/right, top/bottom? [16:17:51] in the physical world of a back of a server [16:17:54] OS Driver State Non Operational [16:18:01] they're numbered usually [16:18:03] hopefully there's a little "1" and "2" next to them [16:18:06] yeah that [16:18:15] but I donno how slot 1 and slot 2 play out [16:18:28] would it become "NIC Slot 3" if we moved the PCIe card? [16:18:43] that is a good question [16:18:47] yeah but it's that readable in real life with all the other cables? is it practical? maybe we should check :) [16:19:06] is there an alternative? :) [16:19:14] well [16:19:48] the alternative would be to throw out physical naming in this sense, and detect lldp wherever there's a link, and udev-rule the name of the intervace to match the vlan it's plugged into [16:20:00] but that gets complicated for lvs-like cases [16:20:45] so [16:20:50] lvs2010, iDRAC 9 [16:20:54] but you could imagine such a scenario in which you end up with linux interface names like "eth-public1-a-eqiad", and you could shut the box down, randomly swap the cables on the back, and reimage, and everything would get fixed [16:21:03] Embedded NIC 1, NIC Slot 2 and NIC Slot 3 [16:21:06] i.e. no NIC Slot 1 [16:21:36] (by looking at link+lldp info at initial install time and exporting that to the netbox source of truth and to persistent udev fules) [16:22:02] )(actually maybe in that world netbox never needs any interface name info)) [16:23:21] netbox needs ifaces to attach IPs to devices and to attach cables [16:23:32] and an iface needs a name [16:23:34] fwiw [16:24:31] yeah, but you could already know (before any physical work happens) that fooserver1003 is going to have its only interface plugged into a certain switch port configured for the public1-a-eqiad vlan, and you could write the interface name into netbox right then as "eth-public1-a-eqiad" [16:24:45] and dcops does that when they document the cable number [16:25:17] and then some stage of the installer process figures out where public1-a-eqiad is hooked up via link+lldp info and hardcodes a local udev rule for that macaddr to that custom interface name [16:25:25] faidon@lvs1013:~$ sudo ./biosdevname -i enp4s0f0 [16:25:25] p1p1 [16:25:25] faidon@lvs1013:~$ sudo ./biosdevname -i enp5s0f1 [16:25:25] p2p2 [16:25:25] faidon@lvs1013:~$ sudo ./biosdevname -i enp4s0f1 [16:25:27] p1p2 [16:25:32] faidon@lvs2010:~$ sudo ./biosdevname -i ens2f0np0 [16:25:32] p2p1 [16:26:19] so pp [16:26:23] ok [16:26:25] and em [16:26:28] not bad! [16:26:58] so, let's assume we can make some udev script that can figure that out for all of our hardware. when new generations/vendors come along, we fix the script. [16:27:14] and that the udev logic can also be run during the installer too I guess [16:28:15] you'd still have to export this from the installer to netbox at some point in the process [16:28:34] before the netbox truth about interface names could be an api fact source for the agent run on the installed host [16:29:01] we can run any command from the reimage into the d-i or after d-i before puppet fwiw [16:29:09] get it back to the reimage and push it to netbox [16:29:18] unless [16:29:35] I mean, the other alternative is we trust that we can predict the udev rules reliably, too, as humans [16:30:06] so before the first boot, dcops enters the cable#, and enters the interface name as eth-p1p1 or whatever right off, knowing based on physical reality that that's how udev will end up naming it [16:30:18] I'd do iDRAC -> netbox, and then netbox -> d-i -> running system [16:30:43] ok [16:30:44] in that case biosdevname wouldn't even be necessary [16:31:13] so idrac->netbox is just going to give it a list of all the names? [16:31:14] well, maybe [16:31:39] depends on whether we capture the mac address from iDRAC [16:32:05] sounds like we maybe need a doc to capture all these ideas :) [16:32:09] we don't really, at the end of the day, care about the mac, other than maybe as a temporary link for matching up two sets of names [16:32:37] yeah, the bridge can be either mac, or smbios I suppose [16:32:50] but if dcops already knows (physically) that it's hooking up nic slot 1 port 1 to public1-a-eqiad, there might not need to be any idrac->netbox step [16:32:57] that can be specified when they put the cable ID in [16:33:02] right [16:34:10] can we create a doc to capture the different paths here? [16:34:11] and then how does "netbox->d-i" work to configure the interface IP? [16:34:14] maybe volans? :) [16:34:30] (at least initially) [16:34:43] not sure i get the question [16:34:55] 16:30 < paravoid> I'd do iDRAC -> netbox, and then netbox -> d-i -> running system [16:34:56] paravoid: I can try, lots of ideas, to expand on... [16:35:11] the netbox->d-i step there, which I assume gives d-i the netbox-assigned IP for a certain interface [16:35:13] I guess that is netbox -> dns -> d-i [16:35:27] no, we can just query the netbox API from d-i [16:35:47] I'd prefer some proxy in the middle, but sure [16:35:48] over... I guess an auto-assigned ipv6 initially? [16:36:04] yeah that was the idea, ipv6 autoconf [16:36:14] I meant to configure the *names* for _all_ the interfaces [16:36:30] but it could be smbios too, sure :) [16:36:54] yeah IP allocation and pxe bootstrapping is another big chapter [16:36:58] netbox is also where we assign the IP addrs now, and d-i needs the IP info to do the config I guess [16:37:05] since we're moving off of macaddrs+dhcp? [16:37:34] that is another very long conversation :P [16:37:39] yeah [16:38:08] you can see why some people like the idea of just racking 600 identical servers with autoconf and virtualizing everything else on top :P [16:38:31] VM IP provisioning is actually another curve ball to all fo these plans [16:38:56] yeah but it simplifies the bottom layer if there are no edge cases down there to care about [16:39:20] eh, modern virtualization is with SRIOV and NIC virtual functions and all that [16:39:23] so not sure about that ;) [16:39:45] by edge cases, I mean things like lvs or bonding or whatever [16:40:06] this whole thing is much simpler if you can start with the assertion "every one of these hundreds of boxes just has one 10G NIC port, done" [16:40:30] (or whatever the case may be) [16:41:54] the naming problem is only complicated for the edge cases [16:42:07] which is multi-port boxes and special add-in cards, etc [16:43:44] not so sure about that [16:43:56] depending on the dell model there are different options for example [16:44:11] yeah [16:44:12] some boxes have thse 2x1G 2x10G NICs, others need a PCIE 10G [16:44:32] some *models* rather, so it's not even edge cases, is what I'm trying to say [16:44:36] but at the end of the day, if dcops hooks up only one cable, and only one interface has link, it's not hard to have software figure things out regardless of naming [16:44:40] given the infrequency of special boxes changes, we could also have something totally automated for the common cases that covers 95% of the fleet and allow to specify something manually for the true corner cases [16:46:12] the two cases that come to mind are router boxes like LVS, and the few cases in the past and/or future where we've done bonded ports [16:47:53] (it's hard to justify bonded ports imho, vs, say, supporting bigger interfaces for rare cases, or software solutions to spread the load over multiple machines) [16:49:09] and it does seem crazy if we push our design of how to automate all this stuff to extremes just to support these few rare cases [16:50:18] even lvs, doesn't necessarily have to be designed the way it is, which we're rethinking anyways [16:51:06] there are probably valid versions of an L4LB design that don't involve hooking one machine up to 4 physical row networks [16:53:22] wmcs uses tagged vlan stuff like lvs, too [16:53:31] their needs might be very different [17:16:20] yeah, I'll start a doc to summarize options and workflows where we can iterate and get consensus on next steps [17:16:46] fyi compiler1002 has run out of disk space so i need to clean some of the old reports [17:17:57] go ahead, I thought we had some cron to cleanup older ones [17:18:27] i think we do but i have run a few large pcc tasks today [17:19:00] start deleting those if ou know the IDs and are not needed anymore :D [17:19:16] yep :) [17:19:25] anything older than 2 weeks anyway can probably be nuked [17:19:55] would be nice to check the gerrit patch, if merged/abandoned delete [17:22:23] its fine removing the 4 big jobs has reclaimed 25% of the space :) [17:25:25] maybe it's a tad underprovisioned :D [17:28:21] lucky that very few people run with an empty `hosts:` currently we clean at 31 days which could be reduced. however unrelated to this it did occure to me that the "keep this build forver" in jenkins (i.e.: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/25929/) button has no affact on pcc [17:29:14] if we could somehow get that button to work then i think reducing the retention period to something like 7 days or even 1 day would be fine [17:30:31] * jbond42 thinks the button probably needs to be updated to say "keep this build for $n weeks/months/years" as i doubt its kept for ever [17:35:08] please not as low as 1 day, that's usually not enough to get a review and we regularly link to compiler results to show others [17:35:44] not realistic but in a perfect world they would all keep working forever ... [17:37:25] tbh right now the 31 day period is fine and it normally is me that fills the disk when doing something puppet related that needs testing on everything e.g. https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278. however might still be nice to make the keep this build forever" button work as jenkins expects it to [17:38:22] yea, sometimes i just need "*" but I try to avoid it [17:38:54] we can manually delete those on a case by case basis.. just need to remember it [17:40:45] yes i think if more people start using the check experimental function we could get issues as by defulat if there is no 'Hosts: ' section in the commit message its runs a pcc assuming all hosts [17:42:24] tbh, I don't use it because it once fooled me [17:42:34] it made it look like stuff worked when it did not [17:43:11] so went back to [17:43:19] manually running the compiler and pasting URL [17:44:19] lol :) [17:44:23] my favorite is using "C:class" now [17:44:45] it just picks one node per group [17:46:03] yes tbh i dont use check experimental either but i have the following aliasis which dose a simlar thing [17:46:06] function pcc { cd ${HOME}/git/puppet ./utils/pcc.py last parse_commit cd - [17:46:22] function pcc { cd ${HOME}/git/puppet; ./utils/pcc.py last parse_commit; cd -; } [18:17:22] .