[08:33:35] 10Traffic, 10Beta-Cluster-Infrastructure, 10DNS, 10Operations, and 4 others: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs - https://phabricator.wikimedia.org/T153468 (10fgiunchedi) [11:08:44] 10Certcentral: Provision unique LE accounts for each cercentral node - https://phabricator.wikimedia.org/T208212 (10Vgutierrez) [11:09:04] 10Certcentral: Provision unique LE accounts for each certcentral node - https://phabricator.wikimedia.org/T208212 (10Vgutierrez) p:05Triage>03Normal [11:44:54] 10Traffic, 10Operations: Varnish won't purge thumbnails of specific file - https://phabricator.wikimedia.org/T207615 (10BBlack) Most likely, this is related to URI normalization rules (note %-encoded chars in the relevant titles) and/or the generation of purges at the origins (tracking known thumbnails for pur... [15:28:46] afternoon vgutierrez [15:28:50] oh okay you already merged the release [15:29:02] hi Krenair [15:29:53] Krenair: BTW, I've splitted the LE staging accounts here: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/470404/ [15:30:30] ok [15:34:48] vgutierrez, we got the pinkunicorn cert issued by staging in the end right? [15:35:02] yup [15:35:18] but I'm going to get rid of it before upgrading to 0.4 in certcentral[12]001 [15:35:23] to check how it behaves [15:35:55] it should be able to automatically solve the issues that we've seen so far [15:36:44] great [15:37:05] and then you'll set up accounts for LE Prod and get a test cert issued from that? [15:37:11] indeed [15:37:28] BTW, are you creating regular or annotated tags for cercentral releases? [15:37:35] regular [15:37:37] *certcentral, damn t [15:37:41] ack [15:37:44] need to do it for the new one actually, one sec [15:37:53] oh, ok thx [15:38:41] done [15:46:24] * volans cough chough... annotated+signed... cough cough... [15:46:51] * vgutierrez hands some water to volans [16:13:47] volans, what would you like them annotated with? [16:17:01] Krenair: lightweight tags on git are basically just pointers to a commit, while annotated tags save an object in git with author, message, optional GPG sign, etc... [16:17:27] yeah [16:17:31] do you want something in the message? [16:17:32] so what I usually do for releases is to create an annotated-signed tag with the release version in the message something like [16:17:38] and from git-tag documentation... https://git-scm.com/docs/git-tag -- Annotated tags are meant for release while lightweight tags are meant for private or temporary object labels. [16:17:40] Release v0.1.2 [16:17:43] or similar [16:17:49] uh ok [16:17:56] for cumin I also set the comment to link the changlog file [16:18:48] but that's a nitpick ;) [16:19:35] even just the version as message might be enough I guess, just find an agreement between you and be consistent ;) [16:33:03] 10netops, 10Operations, 10ops-eqiad: asw2-a-eqiad FPC7 faulty PEM0 - https://phabricator.wikimedia.org/T206972 (10Cmjohnson) I boxed the broken pem and will be shipping it today [17:18:35] 10Traffic, 10Operations, 10Performance-Team: Investigate using RFC 7838 Alternate Services to better optimize edge connections - https://phabricator.wikimedia.org/T208242 (10BBlack) [17:20:26] 10Traffic, 10Operations, 10Performance-Team: Investigate using RFC 7838 Alternate Services to better optimize edge connections - https://phabricator.wikimedia.org/T208242 (10BBlack) [17:48:15] 10netops, 10Cloud-VPS, 10Operations: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Dzahn) [17:50:26] 10netops, 10Cloud-VPS, 10Operations: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Dzahn) @ayounsi @faidon are there router ACLs to allow udp/123 NTP and these don't have the new cloud IP ranges but do have old cloud IP ranges ? [17:53:05] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Andrew) I've attached patches that propose running a cloud-specific NTP server. I'd also be OK with changing the network ACLs to allow the new region to access the standard NTP... [17:53:09] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10faidon) Can we set up a couple of NTP servers within VPS e.g. in the cloudinfra project instead? Should be just a couple of generic instances with `role::ntp` applied, right? [17:56:12] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Krenair) I just noticed profile::ntp also contains some ACLs of it's own which restrict source addresses and don't use ferm::service @faidon: Well, standard::ntp and profile::n... [18:12:32] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10BBlack) Yeah I'd agree that's the direction we should go. We don't offer our ntp servers to the globe for good reasons, and we similarly probably shouldn't be offering them to W... [18:13:34] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10BBlack) (or alternatively, we could look at this as one of the clear examples where a separate WMCS puppetization would be far simpler). [18:13:37] bblack: hi! I am wondering if I could use interface::rps { $facts['interface_primary']: } as test for mc1035, the memcached shard that it is causing timeouts to mediawiki/mcrouter [18:13:50] IIRC there was some work ongoing [18:19:39] it's not a panacea for all network issues, but it can help in some situations! [18:20:15] there's a few patches missing right now, though, before it would be wise to spread it very far [18:20:26] because otherwise everyone who applied it before those patches will need manual runtime fixups [18:22:22] ack, I'll wait then :) [19:09:32] bblack: FYI, reboots of authdns1001/authdns2001 for kernel sec updates are forthcoming, Arzhel is handling the router redirection [19:10:00] moritzm: ack, thanks [20:01:49] 10Traffic, 10Operations: Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10BBlack) p:05Triage>03Normal [20:02:08] XioNoX: something not working right with authdns routing? [20:03:07] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10ayounsi) >>! In T208244#4703730, @Krenair wrote: > I just noticed profile::ntp also contains some ACLs of it's own which restrict source addresses and don't use ferm::service Goo... [20:03:17] (re: !log redirect ns1 to authdns1001 - try 3) [20:04:54] bblack: yeah, I figured it out. The backup route for ns1 on cr1/2-eqiad was still pointing to radon, so when I pointed ns1 to eqiad from codfw, traffic got blackholed [20:05:53] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Andrew) Running an ntp server or two on a cloud VM is probably not a big deal. But, before I go down that road... does anyone want to argue against us just using pool.ntp.org fo... [20:07:02] ouch! [20:40:30] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10BBlack) So, a few things I can say along those lines: * Production ultimately derives its clock sources from various pool.ntp.org sources (mediated by our per-DC server pools).... [20:40:44] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Krenair) >>! In T208244#4704231, @Andrew wrote: > what is the external source of ntp authority that the production NTP servers use? seems it's all 0.*.pool.ntp.org. for eqiad it... [21:12:52] 10Traffic, 10Operations, 10Performance-Team (Radar): Investigate using RFC 7838 Alternate Services to better optimize edge connections - https://phabricator.wikimedia.org/T208242 (10Imarlier) @BBlack We're moving this to our radar for now (on the assumption that we don't need to do the actual work), but woul... [21:14:54] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10faidon) >>! In T208244#4704313, @BBlack wrote: > But then this all reminds me of something I should've thought about earlier: NTP and VMs haven't historically blended well anyway... [21:19:50] I think it is KVM yeah bblack [21:20:16] based on a hack I saw a few days ago checking $::virtual == 'kvm' [21:23:32] seems it gets that from facter which gets it from virt-what [21:26:23] krenair@deployment-deploy01:~$ /usr/lib/x86_64-linux-gnu/virt-what-cpuid-helper [21:26:23] KVMKVMKVM [22:25:47] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10ayounsi) p:05Triage>03Normal [22:47:15] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10ayounsi) [23:12:13] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10ayounsi)