[06:25:42] <wikibugs>	 10Traffic, 10netops, 10Operations, 10ops-eqiad: lvs1016 enp5s0f0 interface errors - https://phabricator.wikimedia.org/T264227 (10ayounsi) No errors on the switch side.  `lang=bash lvs1016:~$ sudo ethtool -S enp5s0f0 | grep crc      rx_crc_errors: 27387518 lvs1016:~$ sudo ethtool -S enp5s0f0 | grep crc...
[07:26:29] <wikibugs>	 10Traffic, 10Operations, 10Patch-For-Review: Upgrade production cache nodes to Varnish 6 - https://phabricator.wikimedia.org/T263557 (10ema) 05Open→03Resolved a:03ema All production nodes are now running Varnish 6.0.6-1wm1. Closing!
[07:54:53] <wikibugs>	 10Traffic, 10Analytics, 10Operations: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10elukey) I am not an expert in `perf` but I tried to do the following on cp5012: `sudo perf record -F 99 -p 29945 --call-graph dwarf sleep 10` (the pid is varnishkafka-webrequest)  And I...
[11:02:55] <wikibugs>	 10Traffic, 10Operations, 10observability, 10User-fgiunchedi: Aggregated metrics for ats-tls <-> clients ttfb percentiles - https://phabricator.wikimedia.org/T263536 (10fgiunchedi) Added a panel to https://grafana.wikimedia.org/d/000000479/frontend-traffic to showcase the top p95 offenders:  {F32369902}  I'...
[11:53:59] <wikibugs>	 10netops, 10Operations, 10observability: active/active links monitoring - https://phabricator.wikimedia.org/T264300 (10ayounsi) p:05Triage→03Medium
[11:57:16] <wikibugs>	 10Traffic, 10netops, 10Operations, 10Epic: Capacity planning for (& optimization of) transport backhaul vs edge egress - https://phabricator.wikimedia.org/T263275 (10ayounsi)
[11:57:21] <wikibugs>	 10netops, 10Operations: Consider balancing VRRP primaries to cr1/cr2 - https://phabricator.wikimedia.org/T263212 (10ayounsi) 05Open→03Resolved Monitoring discussion moved to T264300. Balancing is done.
[12:41:46] <gehel>	 vgutierrez: Hey! Would you have a few minutes to check  https://gerrit.wikimedia.org/r/c/operations/puppet/+/629829 ? I'm never entirely sure about LVS configs.
[12:42:09] <gehel>	 Also, are there special steps that need to be taken when deploying changes to LVS configs?
[12:46:10] <vgutierrez>	 sure.. one sec
[12:46:23] <vgutierrez>	 I was into varnishland and that drains my tiny brain
[12:48:00] <gehel>	 vgutierrez: I can help fill that brain with LVS :)
[12:50:26] <vgutierrez>	 so, profile::lvs::realserver::pools is affecting your servers
[12:50:44] <vgutierrez>	 s/your/cloudelastic/
[12:50:59] <gehel>	 yep, that's my understanding
[12:51:26] <vgutierrez>	 if that's the missing bit, basically cloudelastic servers are missing the service IP attached to localhost
[12:51:40] <vgutierrez>	 so they're ignoring the traffic that the LVS is routing their way
[12:53:07] <vgutierrez>	 but from hieradata/common/service.yaml
[12:53:33] <vgutierrez>	 you can see that all chi, psi and omega are referring to the same VIP, tagged as id004 on service.yaml
[12:53:45] <gehel>	 We already have one service exposed via LVS on those servers, but not the other 2
[12:53:52] <vgutierrez>	 cloudelasticlb: 208.80.154.241
[12:53:52] <vgutierrez>	 cloudelasticlb6: 2620:0:861:ed1a::3:241
[12:56:12] <vgutierrez>	 so basically that part from the LVS point of view is going to be a NOOP I believe
[12:56:20] <vgutierrez>	 but not from the conftool side of things
[12:57:02] <vgutierrez>	 as you can see from modules/profile/manifests/lvs/realserver.pp
[12:57:07] <gehel>	 LVS is only at IP layer? It does not care about TCP. But conftool will add additional service checks?
[12:59:00] <vgutierrez>	 what I mean is that lvs::realserver doesn't have anything to do with the load balancers
[12:59:12] <vgutierrez>	 but with the backend servers handling the traffic
[13:00:01] <vgutierrez>	 psi and omega are already configured on the lvs
[13:00:15] * gehel is now confused :)
[13:00:34] <gehel>	 quick check: `gehel@elastic2058:~$ curl https://cloudelastic.wikimedia.org:9643`
[13:00:41] <gehel>	 that already works as expected.
[13:01:17] <vgutierrez>	 yup
[13:01:22] <vgutierrez>	 https://www.irccloud.com/pastebin/K6Vwa6sg/
[13:02:11] <gehel>	 so the only think that this change would bring, is the additional pool/depool scripts for each service
[13:02:21] <vgutierrez>	 right
[13:02:40] * gehel should have done more reading before pinging vgutierrez 
[13:02:47] <vgutierrez>	 np :)
[13:02:57] <gehel>	 ok, thanks a lot!
[13:50:33] <jayme>	 hi traffic o/ - I would like to remove two LVS services (just a heads-up)
[13:51:50] <ema>	 we like
[14:08:53] <vgutierrez>	 \o/
[14:08:59] <vgutierrez>	 kil them all jayme ;P
[14:09:28] <jayme>	 working on it :)
[14:37:01] <volans>	 bblack: for when you're around, we would be ready to migrate esams (includes also few knams records) to Netbox if today is deemed a good day and not too close to eqsin migration.
[14:37:05] <volans>	 the patch is: https://gerrit.wikimedia.org/r/c/operations/dns/+/630647
[14:46:57] <jayme>	 bblack: so you still need puppet disabled on lvs1016?
[14:59:03] <jayme>	 bblack: s/so/do/ :) ... if you re-enable at some point you will probably see "Services in IPVS but unknown to PyBal: set([10.2.2.10:8081, 10.2.2.47:8889])". Feel free to remove them (ipvsadm -D -t 10.2.2.10:8081; ipvsadm -D -t 10.2.2.47:8889)
[14:59:26] <bblack>	 jayme: is it just a removal?
[14:59:52] <bblack>	 lvs1016 is kinda broken, we should probably note that somewhere for now, because it's not a great idea to restart the pybals elsewhere and fail over to it, if we can help it
[14:59:59] <bblack>	 there's a ticket started yesterday
[15:00:20] <bblack>	 https://phabricator.wikimedia.org/T264227
[15:00:23] <bblack>	 needs some dcops
[15:00:24] <jayme>	 bblack: yeah. It's done on the others. Just needs puppet run, pybal restart and "ipvsadm -D ... " 
[15:00:46] <bblack>	 jayme: if you've already done the others, go ahead and re-enable puppet and go for it there
[15:01:39] <jayme>	 bblack: uh. Did not knew about that. Should I re-disable puppet afterwards?
[15:01:53] <bblack>	 no, that was mostly for some testing I did, it can be left re-enabled
[15:02:12] <bblack>	 what I'm left pondering is whether we should (after this one) take a pause on all lvs service changes in eqiad until we get past this hw issue
[15:02:35] <bblack>	 (because they all involve a short failover to lvs1016 while a primary pybal is restarting, and lvs1016 has packet loss)
[15:03:05] <jayme>	 Okay. Will do lvs1016 after meeting (~30m)
[15:03:12] <bblack>	 ack, sounds good
[15:05:27] <bblack>	 volans: from my pov, you're good to go for esams
[15:18:51] <volans>	 ack thanks a lot
[15:27:29] <jayme>	 bblack: I'm currently unable to tun puppet on lvs1016 because "puppetmaster1001.eqiad.wmnet [10.64.16.73] 8140 (puppet) : No route to host"
[15:40:14] <jayme>	 don't know if I should/can leave it like this...ema/vgutierrez maybe ^
[15:40:53] <vgutierrez>	 uh... that's related to your testing bblack?
[15:41:15] <volans>	 fwiw it's not reporting to debmonitor either since 15h
[15:41:19] <vgutierrez>	 jayme: I'd say that's far from ideal :)
[15:41:21] <jayme>	 it's maybe more related to !log re-enable and run puppet on  I guess
[15:41:30] <jayme>	 ops
[15:41:35] <jayme>	 to https://phabricator.wikimedia.org/T264227
[15:49:44] <bblack>	 I don't think I did anything strange there
[15:49:46] <bblack>	 looking
[15:52:19] <bblack>	 ah the link is flapping now
[15:52:27] <bblack>	 o we've lost route to one row
[15:52:32] <bblack>	 awesome
[15:55:41] <wikibugs>	 10Traffic, 10netops, 10Operations, 10ops-eqiad: lvs1016 enp5s0f0 interface errors - https://phabricator.wikimedia.org/T264227 (10BBlack) The link has gotten worse and began flapping up and down rapidly since last update, causing a loss of routing to the row.  I've downtimed the whole host now in icinga, di...
[15:56:14] <bblack>	 (which ironically will fix the proxyfetch errors for now, because now they'll just check over other interfaces and get routed)
[15:58:28] <jayme>	 bblack: So you think I can apply my changes now and run? :)
[16:00:06] <bblack>	 hmm yeah, I guess you can try
[16:00:14] <bblack>	 it can probably reach puppet now
[16:00:27] <bblack>	 but it does put the host now in a dangeorusly-unusable state for real LVS traffic
[16:00:53] <bblack>	 (it being the state of affairs with the dead link, not your changes)
[16:02:22] <jayme>	 thats clear. I just want to get all in the same state
[16:05:49] <jayme>	 bblack: okay, I'm fine. Should I re-disable puppet this time? (as you had it disabled again)
[16:07:52] <bblack>	 yeah may as well for now
[16:07:55] <bblack>	 jayme: ^
[16:08:35] <bblack>	 will see what happens on the dcops side, we might have a quick resolution.  if not, I'll send some irc/email updates about not messing with LVS until we get this fixed.
[16:12:11] <jayme>	 bblack: okay. Disabled again. Thanks!
[16:12:53] <volans>	 jayme: we'll not forget that is all your fault!™ :-p
[16:45:04] * jayme as blame proxy hereby forwards your blame to kormat. x-blame-forwarded-for: volans
[16:45:46] <volans>	 lol
[17:19:10] <wikibugs>	 10Traffic, 10netops, 10Operations, 10ops-eqiad: lvs1016 enp5s0f0 interface errors - https://phabricator.wikimedia.org/T264227 (10BBlack) 05Open→03Resolved a:03Cmjohnson @Cmjohnson replaced the SFPs on both ends of this link before my reboot above.  Since the reboot, we don't seem to have any abnormal...
[17:20:40] <volans>	 esams migrated too, so far so good, we'll keep an eye ofc
[17:22:51] <bblack>	 \o/
[17:26:27] <volans>	 bblack: fyi ar.zhel opened today T264273, you might have an opinion too :)
[17:26:27] <stashbot>	 T264273: DNS: per prefix zone-file limitation - https://phabricator.wikimedia.org/T264273
[17:27:38] <volans>	 let me also comment on what we can add to the current approach
[17:31:19] <bblack>	 yeah that's tricky
[17:31:43] <volans>	 added https://phabricator.wikimedia.org/T264273#6509764
[17:31:58] <bblack>	 I think a desirable end-state is one include per zonefile, but obviously we're not doing that during the rollout, so that we can attack little peices at a time
[17:32:03] <volans>	 the bottom line is if we want flexibility or simplification
[17:34:01] <bblack>	 if we ignore this transitional period for now and look at the end-state, is there any desirable flexibility in having multiple separate netbox includes for a single zoneflie?
[17:34:05] <bblack>	 *zonefile
[17:34:54] <volans>	 if we want to be able to not manage with netbox something
[17:35:24] <bblack>	 well
[17:35:44] <bblack>	 you mean something that netbox is exporting, but we want to ignore the export and provide a manual version of the records instead
[17:36:29] <volans>	 or a whole subzone, like svc if we decide to manage that in another way (I hope not)
[17:37:22] <bblack>	 but even with a single include per zonefile, if netbox doesn't export a thing (like svc), and we define manual records in the zonefile, we're good
[17:37:34] <volans>	 yes as long as netbox doesn't export them
[17:37:52] <volans>	 so for NS records right now we've done this:
[17:37:52] <volans>	 https://netbox.wikimedia.org/search/?q=ns0
[17:37:54] <bblack>	 so as a thought experiment, we could control that on export
[17:38:18] <bblack>	 through config to the exporter or something, to filter out some otherwise-exported records.
[17:38:30] <bblack>	 but then transitions get tricky too
[17:38:52] <bblack>	 if we had exporter config filtering ns[012], and had a manual set of records for those, and a single-include-per-zone style
[17:39:03] <bblack>	 and then later we wanted to let netbox manage them, there's no clean way to get there
[17:39:19] <bblack>	 if you remove the filter first you get duplicate definitions, and if you remove the dns side first you lose the records till netbox pushes again
[17:40:40] <bblack>	 even without contemplating the single include per zonefile, this sort of thing is already a potential problem
[17:41:06] <bblack>	 there could be subsets of records in the existing netbox export includes that we want to transition back to manual, or vice-versa
[17:42:34] <volans>	 for those we just need to deploy with the same gdnsd "reload" both manual and auto-generated stuff, is not that hard
[17:42:51] <volans>	 just a matter to allow the tools to support it
[17:42:54] <bblack>	 do we have a mechanism to do that?
[17:43:08] <bblack>	 yeah I guess we could make one, but it's a "special" transition time
[17:43:37] <bblack>	 tell everyone to stop other dns changes, push the ops/dns change without authdns-update, then let the netbox-side change also pull in the latest authdns git at the same time.
[17:44:33] <bblack>	 (or some reversed equivalent, where netbox pushes new files but doesn't reload, and then authdns-update manual run picks up both changes)
[17:45:02] <volans>	 yeah
[17:45:50] <volans>	 another more "natural" way could be to make the change so that there is a duplicate record and make the current zone-validator (or equivalent) check for those and abort the reload
[17:46:18] <volans>	 so you merge one change, deploy fails because of duplicate (expected), you merge the other one and deploy succeed
[17:46:20] <bblack>	 yeah that's tricky too, since they're not illegal in most common cases
[17:46:28] <bblack>	 but yeah, we could explicitly check for matching data
[17:46:47] <volans>	 zone validator was already failing on totally duplicated records (within the maual repo)
[17:47:04] <volans>	 it's just a matter of making it check in teh autogenerated one too and be a bit more flexible
[17:47:13] <bblack>	 ok, so rewinding back to your "svc" example
[17:47:14] <volans>	 as I guess if we want to run it manual we might change something
[17:47:16] <volans>	 sure
[17:47:31] <bblack>	 basically if svc.wmnet is a separate export, it's easier to turn it off and switch to manual records if we needed to
[17:48:23] <bblack>	 because we wouldn't have to (a) create some filtering mechanism to exclude svc from the whole-wmnet export + (b) do whatever simul-deploy magic like above.
[17:48:43] <volans>	 we would still need b
[17:48:48] <volans>	 but not (a)
[17:49:15] <bblack>	 we wouldn't need (b) either, because a single ops/dns change can supply the new manual records and comment out the include line for the svc.wmnet-specific include.
[17:49:33] <bblack>	 assuming we picked those export file boundaries well and they match what we need to switch
[17:50:02] <volans>	 right, yeah I was thinking the mixed case
[17:50:16] <volans>	 as long as you replace the whole thing
[17:50:40] <bblack>	 the mixed case is something like "stop exporting foo.svc.wmnet from netbox, and create a manual record for it?"
[17:51:06] <bblack>	 what's the mechanism for stopping the export if not the filtering mechanism in (a)? is there already a flag in netbox or something?
[17:51:37] <volans>	 emptying the dns name field in netbox make it non exportable
[17:51:41] <bblack>	 ok
[17:51:50] <volans>	 that's how the ns* records are "blacklisted"
[17:52:02] <volans>	 but then if we do that for many records defies a bit the whole purpose
[17:52:17] <bblack>	 yeah it does
[17:52:32] <bblack>	 I think, the exceptions will be rare
[17:52:58] <bblack>	 ns[012] are the only ones currently right?
[17:54:00] <volans>	 and few others because of different TTL
[17:54:01] <volans>	 https://netbox.wikimedia.org/ipam/ip-addresses/?q=keep
[17:54:18] <volans>	 because a host-prefix IP was used instead of a service one
[17:54:24] <volans>	 gerrit and lists
[17:54:32] <bblack>	 right
[17:54:50] <bblack>	 so currently the TTL differs on a subnet level basically?
[17:55:20] <volans>	 it's all 1H flat
[17:55:24] <bblack>	 maybe we could add a TTL override field.  It might be nice for some transitions anyways, I know we've used that before (turning TTLs down low on special names and then moving them, etc)
[17:55:51] <volans>	 it was proposed, but fa.idon was kinda against it and more towards fixing the oddities instead
[17:56:50] <bblack>	 but what about actual service IPs that are in service subnets?
[17:57:16] <volans>	 like wmnet:blubberoid  1H  IN A        10.2.2.31 ?
[17:57:41] <volans>	 the sort TTL ones are almost all CNAMEs AFAIK
[17:57:45] <volans>	 *short
[17:58:28] <volans>	 *CNAMEs or discovery records
[17:58:34] <bblack>	 right, but what if we needed to change the blubberoid IP
[17:58:42] <bblack>	 I guess is what I mean
[17:59:07] <bblack>	 in this case, i think it has a discovery record anyways, and hopefully nobody's hitting the direct one
[17:59:24] <bblack>	 which is analgous to the text-lb.ulsfo vs en.wikipedia.org scenario
[17:59:33] <bblack>	 (enwiki has a short DYNA, text-lb has a 1H A)
[18:00:13] <volans>	 right now for the generated ones we can't change it on a per-record basis, but would be pretty easy to do that if needed
[18:00:29] <bblack>	 so "fixing the oddities" means anything that's a real service IP should use one of the DYNA mechanisms, and not have to worry about a fast transition of a netbox-exported A-record
[18:00:39] <volans>	 it's all a matter of where to save that data ,if it's worth having it in netbox or if it's just for migrations have it in some more temporary place
[18:01:07] <volans>	 I guess so yeah
[18:01:42] <volans>	 but I think for gerrit/lists is more tricky
[18:01:51] <bblack>	 [out of scope for netbox, but I wonder how we audit/prevent/whatever the case that some internal service uses the blubberoid.svc.eqiad.wmnet hostname to reach a service, when we didn't want it to]
[18:02:06] <bblack>	 [since that's really more of a placeholder/documentation hostname than anything, given discovery]
[18:02:55] <volans>	 [eh, I think it's a non-totally-solved problem yet, code search, checking logs, dunno]
[18:04:34] <bblack>	 yeah I'm really lost now on these side threads
[18:04:46] <bblack>	 I think for now what you're doing makes sense, because transition and care
[18:05:19] <bblack>	 in the long run post-transition, we might be better off consolidating the includes more, so there's not so many of them.
[18:05:37] <bblack>	 the transition from many-includes to more-consolidated includes itself might be difficult, too
[18:06:27] <bblack>	 the simplest way would be to temporarily export both versions (export the 10 smaller includes of some zone, and a new combined include, then do the ops/dns include switch, then stop the smaller export copies)
[18:06:42] <bblack>	 I don't know how hard that would be on the exporter side
[18:07:39] <volans>	 some work but probably not that much
[18:07:41] <bblack>	 it does reduce some flexibility, but really we shouldn't be aiming to support very much flexibility.  it will get abused :)
[18:08:09] <volans>	 yeah, I was asking arzhel though how to get how much to go up in the prefix chain
[18:08:22] <volans>	 and is not totally clear yet which are the conditions
[18:08:29] <bblack>	 I think that's unknowable
[18:09:00] <bblack>	 and I think really if you dig into the practicalities of the problem he's stating, it comes down to the same transition problems we've talked about earlier
[18:09:29] <bblack>	 (about having some special mechanism for coordinated change, in the case that we didn't consolidate includes enough to cover a given case for some future subnet change he's talking about)
[18:10:34] <bblack>	 in any case we can't consolidate further than the zonefile level
[18:10:56] <bblack>	 so there's no way to make it all happen in one include if a /31 interface subnet moves to a different /24
[18:11:20] <volans>	 sure
[18:11:50] <bblack>	 but it also seems needlessly-complex, the scenario at the end of the current path with tons of includes
[18:12:01] <volans>	 right now we do /64 for v6 and /24 for v4 *unless* the IP has a larger prefixlen in netbox, in that case we pick the prefix with the higher prefixlen
[18:12:49] <volans>	 to change that I need to know which parent prefix pick (or forcely assume /24 dunno)
[18:12:55] <bblack>	 there is a certain organizational sanity to that, since for the common cases it will create a file per actual subnet (e.g. row vlans)
[18:13:11] <bblack>	 but yeah creating tons of /31 for links seems iffy
[18:13:22] <volans>	 indeed
[18:14:34] <bblack>	 so in the ulsfo example, above all the /31 there is:
[18:14:34] <volans>	 but we have also a bunch of things in the middle /25, /26 /27 /28
[18:14:37] <bblack>	 ; 198.35.26.192/27 (192-223) - Infrastructure Space
[18:15:02] <volans>	 that's https://netbox.wikimedia.org/ipam/prefixes/15/ 
[18:15:06] <volans>	 and is marked as container
[18:15:08] <bblack>	 which is inaccurate actually
[18:15:25] <bblack>	 well maybe not, depends on your pov
[18:15:29] <volans>	 lol
[18:15:30] <bblack>	 but the office subnet is outside that space
[18:16:12] <volans>	 office is so 2019...
[18:16:35] <bblack>	 maybe there's some logic that makes sense here and can work?
[18:17:08] <bblack>	 like "if the cidr mask is >= 29 and there's a container above it, use the container instead"?
[18:17:38] <bblack>	 (and some v6 equivalent of the same)
[18:18:21] <bblack>	 either way we'll probaby have to solve the coordinated-change problem for some edge cases
[18:18:39] <bblack>	 and I don't know if this will or won't solve most problems ay is predicting either
[18:19:17] <volans>	 surely not arzhel's problem of creating/deleting many new /31
[18:19:27] <bblack>	 I'm sure there will be others in the long run, too
[18:19:28] <volans>	 as this new Q work
[18:19:45] <bblack>	 like if we renumber an existing vlan to a whole new /24 or whatever
[18:20:27] <bblack>	 (other scenarios/cases that will require upping our automation/deploy game wrt netbox+dns I mean, when we reach the need)
[18:21:05] <volans>	 yeah
[18:25:29] <volans>	 and let's not forget that for an emergency there is always the manual modification of the exported repo on the netbox hosts + push + deploy
[18:27:15] <chaomodus>	 what is the potential failure conditions that you're worrying about though?
[18:27:32] <volans>	 that can be easily added as a use case of the dns.netbox cookbook with a flag
[18:27:52] <volans>	 chaomodus: like we need to change a single record in a non-standard way and we need to do it *now*
[18:28:04] <chaomodus>	 hm
[18:28:15] <volans>	 different TTL for example
[18:28:15] <wikibugs>	 10netops, 10DC-Ops, 10Operations, 10ops-eqiad, 10cloud-services-team (Hardware): (Need By: 2020-06-12) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10Cmjohnson) 05Open→03Resolved updated em0 for both...resolving
[18:28:41] <bblack>	 yeah I see little need for non-emergency overrides, or any "going back"
[18:29:01] <bblack>	 but in strange unpredictable emergency scenarios, we might need some unpredictable random changes to something about some generated records
[18:29:46] <volans>	 that's super easy to implement with the current cookbook, I'll generate everything in the tmp dir as it does now, pause and give you the path
[18:30:03] <volans>	 you go do you changes, commit --amend and then tell the cookbook to continue
[18:30:10] <bblack>	 awesome
[18:30:19] <chaomodus>	 what would gdnsd if you put a record after the include it also appears in? :)
[18:30:42] <bblack>	 it depends, but for the most part it would accept it and serve both, because most records allow multiple data
[18:30:57] <chaomodus>	 hm like the same a record with different ttl
[18:31:14] <chaomodus>	 yah i guess having two a records with different addresses would be a problem
[18:31:25] <bblack>	 depends on your POV
[18:31:41] <chaomodus>	 so you couldn't just override things by adding them to the manual part of the dns
[18:31:43] <bblack>	 but from the pure pov of dns software and protocols, two A records with different IPs for the same name are fine.
[18:31:46] <bblack>	 it will serve both
[18:32:15] <bblack>	 mixed TTLs are a different matter, though
[18:32:18] <chaomodus>	 it'd be a problem from the perspective of only wanting one of them
[18:32:50] <bblack>	 well yeah, but what I mean is gdnsd-level validation wouldn't fail, and the result would not be what you wanted/intended
[18:33:02] <chaomodus>	 right
[18:33:21] <bblack>	 I'd have to look, to remember how it treats the edge case of mixed TTLs on zonefile load.  it has changed over time.
[18:33:40] <bblack>	 it's possible to express mixed TTLs, even on the wire, but you're not supposed to do it, by the standards
[18:34:47] <bblack>	 looking at the code, the zonefile loader issues a warning about it, and forces all TTLs to the first one it encountered
[18:35:00] <bblack>	 but there's a flag to upgrade warnings to errors, and we use that flag, so the load would fail
[18:35:36] <volans>	 then you go to netbox, reset the dns name and be happy :)
[18:36:03] <chaomodus>	 interesting
[18:36:21] <bblack>	 if we turned off the warning-upgrades, you could use this to do an emergency TTL change without an address change, with the result being a double-A record that "works"
[18:36:24] <volans>	 in the sense than running the cookbook will remove the generated record and gdnsd should be happy to reload
[18:37:27] <chaomodus>	 right that might involve munging the records in netbox altho i think that's nbd a known quantity
[18:37:33] <bblack>	 e.g. if netbox was exporting "foo 1H A 192.0.2.1", and you defined a manual record "foo 30 A 192.0.2.1" above the include, it would load with a warning and serve 2x A records on the wire, both with the shorter (first) TTL
[18:38:06] <bblack>	 which "works", but it's kinda wonky
[18:38:44] <bblack>	 the warnings are going away in gdnsd-4.x anyways, to be replaced by "please do non-fatal sanity checks with external tooling, and maybe we'll ship a simplistic one as an example"
[18:38:48] <volans>	 I prefer the fail, then modify netbox and run the cookbook, more explicit and we get only 1 record
[18:39:28] <bblack>	 gdnsd-4.x is due to be released at least 30 days before the heat death of the universe, but it will eventually happen.
[18:40:28] <volans>	 ahahah
[18:41:08] <bblack>	 (the basic plan is that 4.x is a simplification release that gets rid of all the plugin junk and replaces it with external-to-the-daemon tooling that we can do in convenient scripting languages for stuff like GeoIP and friends, and all data loads are explicit rather than auto-detected, and then 5.x is the version that implements DNSSEC on top of the simpler daemon)
[18:41:33] <chaomodus>	 it's good to have a deadline for your work.
[18:42:21] <bblack>	 I've already done a ton of work for 4.x on the design front, but only a handful of real commits in the right directions.  Maybe by the end of the year I'll at least start pushing up some WIP branches.
[18:44:55] <bblack>	 lines of C code will be greatly reduced, which is always a win :)
[18:47:03] <volans>	 +1
[18:55:50] <bblack>	 sukhe: Stdlib::IP::Address::V4::CIDR ?
[18:56:09] <bblack>	 I don't think there's a blended V4::CIDR + V6::CIDR though, like there is for non-CIDR
[18:57:08] <bblack>	 someone should maybe define a Stdlib::IP::Address::CIDR as Variant over them
[18:57:32] <sukhe>	 bblack: but the documentation says for Stdlib::IP:Address, "Match any string consisting of an IPv4 address in the quad-dotted decimal format, with or without a CIDR prefix"
[18:59:09] <wikibugs>	 10netops, 10Cognate, 10Growth-Team, 10Language-Team, and 6 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Cmjohnson)
[19:01:44] <wikibugs>	 10netops, 10DBA, 10Operations, 10ops-eqiad, and 2 others: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10Cmjohnson) 05Open→03Resolved This has been completed
[20:16:55] <wikibugs>	 10netops, 10Cognate, 10Growth-Team, 10Language-Team, and 6 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi)
[20:18:46] <wikibugs>	 10netops, 10DBA, 10Operations, 10ops-eqiad, and 2 others: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10ayounsi) 05Resolved→03Open From the task description: > [DCops] Update Netbox At least the status and name are incorrect (should be asw2-d4 for consistency)  > [D...
[21:11:44] <wikibugs>	 10netops, 10DBA, 10Operations, 10ops-eqiad, and 2 others: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10wiki_willy) Related to Arzhel's previous comment, getting these Netbox errors:  test_missing_assets_from_accounting asw3-d4-eqiad Device with s/n TA3716160376 (WMF542...
[21:38:03] <wikibugs>	 10Domains, 10Traffic, 10Operations: URL to redirect to upcoming Wikipedia Birthday page on wikimediafoundation.org - https://phabricator.wikimedia.org/T264367 (10hdothiduc)
[23:07:06] <wikibugs>	 10Traffic, 10Operations: ATS-BE Lua mitigations for cacheable responses w/ Set-Cookie seemingly not working - https://phabricator.wikimedia.org/T264378 (10CDanis)