[08:28:39] 10Traffic, 10Operations, 10Patch-For-Review, 10Puppet: Remove old letsencrypt puppet module - https://phabricator.wikimedia.org/T221268 (10Krenair) [08:28:51] Krenair: that was fast :) [08:29:45] 10Acme-chief, 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Write designate integration script for certcentral DNS challenges - https://phabricator.wikimedia.org/T206922 (10Krenair) 05Open→03Resolved [08:30:38] :) [08:33:22] 10Traffic, 10Operations: Puppetize ATS TLS configuration for incoming traffic - https://phabricator.wikimedia.org/T221594 (10Vgutierrez) [08:33:33] 10Traffic, 10Operations: Puppetize ATS TLS configuration for incoming traffic - https://phabricator.wikimedia.org/T221594 (10Vgutierrez) p:05Triage→03Normal [11:42:46] 10Traffic, 10Beta-Cluster-Infrastructure, 10DNS, 10Operations, and 4 others: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs - https://phabricator.wikimedia.org/T153468 (10MoritzMuehlenhoff) All bust... [12:55:29] Krenair: you were right regarding your concerns using get_fld(), I've fixed the CR using the suggested approach: https://gerrit.wikimedia.org/r/c/operations/software/acme-chief/+/504512 [13:01:08] great [13:03:02] and got rid of the extra dependency in the process :) [13:08:20] vgutierrez, for i in range(0, len(parts)-2): [13:08:30] is the -2 to prevent TLDs from setting CAA records? [13:09:15] or at least, to prevent them being recognised [13:09:22] the later [13:11:45] I wonder if catching DNSNoAnswerError is enough [13:12:52] NoAnswer will occur if it returns an empty set of records right? [13:15:20] looks ok [13:18:11] yes [13:18:22] hmmm empty set of records? [13:18:33] you mean if it lacks the record? [13:19:05] yes [13:19:39] I'm thinking of the case where we issue somemiscservice.wikimedia.org and it has to be able to find wikimedia.org's CAA [13:19:50] it should check for somemiscservice.wikimedia.org's CAA first and get nothing, then move on [13:20:13] one question I have is whether it should look for `org` itself having a CAA record [13:20:35] well... I don't know what the RFC says regarding that [13:20:46] RFC 6844 appears... self-conflicting on the subject [13:20:56] " The search for a CAA record climbs the DNS name tree from the [13:20:56] specified label up to but not including the DNS root '.'." [13:20:57] but it would be a mess if .org. could set a default CAA policy for the whole *.org. [13:21:07] That is to say, a TLD can have CAA set [13:21:41] hmmm I'd personally exploit that to set a default CAA record denying certificate issuance from any CA [13:21:44] But then it says: If X is not a top-level domain, then R(X) = R(P(X)), otherwise (X) is empty. [13:22:03] which is to say that a TLD cannot have CAA set [13:22:19] I wonder what's the CA implementation regarding that [13:22:25] let's check boulder source code [13:22:34] I'm not going mad here am I? [13:22:41] The RFC is self-contradictory? [13:23:18] oh wait [13:23:35] P(X) is the name immediately above X in the hierarchy [13:24:03] that means it cannot go above a TLD, but it will check a TLD for CAA records [13:24:43] so, https://github.com/letsencrypt/boulder/blob/cc4ce59d7d7c43733027deb93cf870616d6de410/va/caa.go#L142-L153 [13:25:55] this comment is interesting: https://github.com/letsencrypt/boulder/blob/cc4ce59d7d7c43733027deb93cf870616d6de410/va/caa.go#L163-L167 [13:26:30] there is errata regarding that part of the RFC you were mentioning Krenair: https://www.rfc-editor.org/errata/eid5065 [13:28:28] that looks ok [13:31:26] vgutierrez, approved [13:31:50] yey \o/ [13:34:27] 10Traffic, 10Operations: Puppet broken on two VMs in the 'traffic' project - https://phabricator.wikimedia.org/T221454 (10ema) p:05Triage→03Normal [13:36:37] 10Traffic, 10Operations: Puppet broken on two VMs in the 'traffic' project - https://phabricator.wikimedia.org/T221454 (10ema) 05Open→03Resolved Fixed the former, deleted the latter. Thanks for the reminder! [13:59:49] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in ulsfo - https://phabricator.wikimedia.org/T219967 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp4021.ulsfo.wmnet'] ` The log can be... [14:28:19] ema, cumin is unable to contact traffic-upload_ats.traffic.eqiad.wmflabs — does that VM just need deleting or should I mount a rescue effort? [14:29:50] andrewbogott: I've created it a few minutes ago and it never got registered in DNS [14:30:03] ema: oh, sorry :) [14:30:14] I keep re-running this same querry, must not have noticed this one was new [14:30:20] andrewbogott: perhaps the underscore in the name made things go south? [14:30:30] ema: it might just need another minute [14:30:40] I'll check if it doesn't ever come up [14:30:49] andrewbogott: thanks! [14:33:14] ema: while you're here… is T221531 something you know how to handle? [14:33:15] T221531: Update RIPE about changes in WMCS auth servers - https://phabricator.wikimedia.org/T221531 [14:35:19] 10netops, 10Operations: Netbox report to validate network equipment data - https://phabricator.wikimedia.org/T221507 (10herron) p:05Triage→03Normal [14:40:01] ema: I'm not sure what's going on with traffic-upload_ats.traffic.eqiad.wmflabs; do you mind deleting and trying again? [14:40:13] It looks like it failed to get a hostname from dhcp, but I don't see that failure happening anywhere else [14:43:14] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10Cwek) I am not sure if it is influential, but I still have to report it. I'm from mainland China. As we all kno... [14:44:48] heya folks, I’ve got a patch to change the scheduler on kibana.svc to source hash that I’d like to roll out in the near future. are there any timing considerations or concerns related to doing that? [14:44:52] patch is at https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504590/ [14:45:50] andrewbogott: busy right now, but I'll take a look later [15:04:25] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in ulsfo - https://phabricator.wikimedia.org/T219967 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp4021.ulsfo.wmnet'] ` and were **ALL** successful. [16:17:02] 10netops, 10Operations, 10fundraising-tech-ops, 10Patch-For-Review: Revoke production prometheus fundraising access - https://phabricator.wikimedia.org/T217355 (10cwdent) a:03cwdent [16:34:27] 10netops, 10Operations, 10fundraising-tech-ops: Network setup for frmon2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T221475 (10cwdent) commit 606f45371334528bbbd51a4daa17805f1fddd7e4 (HEAD -> master, origin/master, origin/HEAD) Author: Casey Dentinger Da... [16:47:29] so we've got cp4021 reimaged as Varnish/ATS and it seems to be looking kind-of OK [16:48:16] it is however still depooled as I haven't had the chance to look deeply at everything, and surely certain things are missing (like prometheus metrics not showing up in grafana yet) [16:49:29] also, we assume that all nodes in cache::upload::nodes need to both be listed as backends for varnish-fe and be involved in ipsec shenanigans [16:49:40] the latter isn't true anymore with ATS, so that needs to be fixed too [16:49:49] (I've just ack'ed the alerts for now) [16:50:23] but we're getting close! :) [16:58:22] nice! [18:50:29] 10netops, 10Analytics-Kanban, 10EventBus, 10Operations: Allow analytics VLAN to reach schema.svc.$site.wmnet - https://phabricator.wikimedia.org/T221690 (10Ottomata) [19:40:40] jynus: https://phabricator.wikimedia.org/T208263#5130752 [20:05:36] Sorry to ping twice in one day about this, but can anyone advise me about https://phabricator.wikimedia.org/T221531 ? I'm not sure if that falls under 'traffic team' or is someone else. [20:10:07] XioNoX might know, but I'm pretty sure para.void was the person who last updated that. [20:11:13] yeah, I think I can update that [20:15:03] andrewbogott: good to edit https://apps.db.ripe.net/db-web-ui/#/lookup?source=ripe&key=56.15.185.in-addr.arpa&type=domain anytime? [20:16:05] and both at once or one after the other? [20:16:05] XioNoX: the new nameservers are up and synced. The forward authority for .wmflabs.org has been updated but has I think a 24h ttl [20:16:53] XioNoX: and honestly you're probably in a better position to notice if it breaks than I am :) A good test is tools.wmflabs.org 185.15.56.5 [20:19:50] looks good to me, for both A and PTR [20:20:57] oh, if you dig you get the new cloud-ns0 in the auth section? [20:22:36] I did `host 185.15.56.5 cloud-ns1.wikimedia.org` and similar [20:22:59] getting a RIPE error: [20:22:59] Parent has nameserver(s) not listed at the child (cloud-ns0.wikimedia.org; cloud-ns1.wikimedia.org). [20:22:59] None of the nameservers listed at the parent are listed at the child. [20:23:05] looking at what it mean exactly [20:26:45] Maybe it wants us to update our end first XioNoX ? [20:27:04] Right now pri.authdns.ripe.net serves this: [20:27:05] 56.15.185.in-addr.arpa. 172800 IN NS labs-ns0.wikimedia.org. [20:27:05] 56.15.185.in-addr.arpa. 172800 IN NS labs-ns1.wikimedia.org. [20:27:13] Whereas we serve: [20:27:14] 56.15.185.in-addr.arpa. 3599 IN NS labs-ns1.wikimedia.org. [20:27:14] 56.15.185.in-addr.arpa. 3599 IN NS labs-ns2.wikimedia.org. [20:27:14] 56.15.185.in-addr.arpa. 3599 IN NS labs-ns0.wikimedia.org. [20:27:14] 56.15.185.in-addr.arpa. 3599 IN NS labs-ns3.wikimedia.org. [20:27:46] Maybe to get RIPE to add cloud-ns0 we have to add the cloud-ns0 on our end etc.? [20:29:39] Krenair: possibly, can you do it now? [20:29:51] I can't but andrewbogott could probably [20:30:42] I'm not 100% sure I know what you mean but let me look... [20:31:50] um… ok, now I officially don't know how to do that. Is it something in our dns repo? [20:32:54] no, this would be a setting in designate somewhere I think [20:33:20] it might be referred to as a pool [20:33:27] hm... [20:34:22] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10CDanis) `authdns-update` complete as of ~20:33:56 UTC. [20:38:10] I can't find that set anywhere in designate, but I do need to update this pool anyway [20:38:15] so will do that and see what shakes out [20:41:30] At some point you presumably did it to add ns2 and ns3 andrewbogott [20:42:18] possibly with novaadmin credentials, 'designate server-list', 'designate server-update', or the openstackclient equivalents? [20:51:12] andrewbogott, possibly something under https://docs.openstack.org/designate/pike/admin/designate-manage.html [20:51:23] yep, I upgraded that just now [20:51:46] although I failed to check if ptr records were working properly before the change :( [20:52:03] mitaka docs: https://docs.openstack.org/designate/mitaka/pools.html#designate-manage-pools-command-reference [20:52:41] Krenair: do things look any different on your end? [20:53:07] I changed the cloud-ns0 servers but the labs-ns0/ns1 servers don't know about it so if you have those cached you'll still see the old results [20:53:24] yes: [20:53:24] 56.15.185.in-addr.arpa. 3599 IN NS cloud-ns0.wikimedia.org. [20:53:24] 56.15.185.in-addr.arpa. 3599 IN NS cloud-ns1.wikimedia.org. [20:53:27] that looks righght [20:53:34] right* excuse my keyboard [20:53:42] oh, great [20:53:50] ok, so now… XioNoX did that warning go away? [20:54:03] It also looks fine when I query labs-ns* [20:55:00] oh, that's right, they share a db [20:55:06] so I updated it everywhere [20:56:27] Your object has been successfully modified [20:57:23] great! So now I just have to wait for the markmonitor ttl and we're ready to shut those down [20:57:33] (and break those deployment-prep VMs) [20:57:37] thank you XioNoX [20:58:00] got another task I can close with this too [20:58:14] 10Traffic, 10Operations, 10cloud-services-team (Kanban): Update RIPE about changes in WMCS auth servers - https://phabricator.wikimedia.org/T221531 (10ayounsi) 05Open→03Resolved a:03ayounsi Updated. `lang=diff @@ -7,7 +7,7 @@ zone-c: WMF-RIPE -nserver: labs-ns0.wikimedia.org -nserver:... [21:01:53] andrewbogott, note RIPE's TTL of 172800 [21:01:53] 10Traffic, 10Cloud-VPS, 10DNS, 10Operations, 10cloud-services-team (Kanban): Inconsistent lists of labs-ns* nameservers - https://phabricator.wikimedia.org/T205344 (10Krenair) 05Open→03Resolved a:03Andrew With the shutting down of labs-ns* looming, to make {T221531} possible @andrew made a change w... [21:02:15] that's 2 days [21:04:34] Krenair: ok, so… Thursday [21:05:24] at 20:56:27 UTC [21:07:05] andrewbogott, it looks like the MarkMonitor change already went through, though I don't know when exactly [21:07:50] it was this morning. So everything from them will be up-to-date by the time the RIPE ttl expires [21:09:52] 10netops, 10Operations, 10ops-eqiad: Replace eqiad mgmt switches with EX4200s - https://phabricator.wikimedia.org/T213128 (10ayounsi) Filed T221675 for the aggregation switches. I agree, it doesn't make sens to re-purpose such old gear into "production". I guess we're down to: 1/ Buy new EX2300, more expe... [21:38:24] andrewbogott, you could still do the recursors [21:38:47] also you could update labs-ns* A records to point to the new nameserver IPs, the TTL of that is only 1 hour [21:39:27] though I'm not 100% sure everything would handle that as one might expect