[06:20:35] 10netops, 10Analytics, 10Operations, 10LDAP: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN - https://phabricator.wikimedia.org/T227611 (10elukey) From puppet I can see that the change for ldap-ro was reverted: ` elukey@notebook1003:~$ sudo grep ldap /var/log/puppet.log Jul 9 17:46:07... [06:33:51] 10Traffic, 10MediaWiki-extensions-CentralAuth, 10Operations, 10TimedMediaHandler, and 3 others: Consistent HTTP 503 Error on some urls for some logged-in users (CentralAuth Set-Cookie storm) - https://phabricator.wikimedia.org/T226840 (10TheDJ) Maybe an update to the class documentation to make it easier t... [07:12:14] 10netops, 10Analytics, 10Operations, 10LDAP: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN - https://phabricator.wikimedia.org/T227611 (10MoritzMuehlenhoff) There are two issues here: 1. We'll need to fix the ACLs so that the analytics VLAN can access the ldap-ro replicas, there's a w... [07:36:00] 10netops, 10Analytics, 10Operations, 10LDAP: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN - https://phabricator.wikimedia.org/T227611 (10elukey) About 1. ` elukey@re0.cr1-eqiad# show | compare [edit firewall family inet filter analytics-in4 term ldap from destination-address]... [10:19:53] 10netops, 10Analytics, 10Operations, 10LDAP, 10Patch-For-Review: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN - https://phabricator.wikimedia.org/T227611 (10elukey) I am a little bit lost with LDAP config, since we use: 1) ldap-labs.eqiad.wikimedia.org in Jupyterhub's config withou... [10:56:41] 10HTTPS, 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Vgutierrez) [14:52:32] so the last message from wikibugs here was at 10:56 [14:52:36] that can't be right [14:58:45] ema: check https://www.mediawiki.org/wiki/Wikibugs [14:59:21] I thought the bot were supposed to replace us [14:59:45] it seems they're making us work more instead [15:03:25] every single time there's a problem with wikibugs I look at that page and find that there's a problem with the page too [15:03:32] ffs [15:04:34] we need a wikiwikibugs [15:09:21] 10Traffic, 10Operations: Upgrade Varnish to 5.1.3-1wm11 - https://phabricator.wikimedia.org/T227672 (10ema) [15:09:53] wikibugs: I'm happy to see you're pleased now [17:40:05] bblack: are you the right person to ask about the ::dnsrecursor pdns-recursor config? I've got a bug report in Cloud VPS that I think I have convinced myself is due to pdns-recursor 4.0.4 and the default setting of "dnssec=process-no-validate" -- https://phabricator.wikimedia.org/T226088#5321746 [17:40:50] mostly what I'm wanting to chat with someone about is if the dnssec stuff is useful in the prod recursor or not [17:50:39] It was easy enough to add a feature flag in PUppet. I'll add bblack to the review. :) [18:08:30] bd808: yeah seems sane. Arguably we could/should turn it off for prod too, for now, but we haven't known of any direct issues from it so far. [18:09:24] bblack: *nod* T227415 may be that prod confirmation [18:09:24] T227415: Cite for PubMed article URLs or IDs triggers HTTP 400 error for ncbi.nlm.nih.gov - https://phabricator.wikimedia.org/T227415 [18:10:07] hmm yeah :) [18:10:36] bblack: while I have you here though, should I change the patch to make this a complete no-op unless the param is explicitly set? Or do you think I can trust the docs on the default value? [18:10:37] (although still, it's hard to say whether the real problem is on pdns-4.0's side or NIH's authservers without investigating deeply) [18:11:16] yeah. I've found other reports online of nih.gov's authoritative servers being flakey [18:11:29] bd808: actually, given the above and the warning about pdns-4.0 vs 4.1, I'd say default it off and leave it off for both prod and cloud. [18:11:32] but DNSSEC in general is flakey last I knew :) [18:11:51] bblack: ok! I can easily do that [18:11:51] yeah [18:12:06] maybe it will soon be a new entry in the long list at https://ianix.com/pub/dnssec-outages.html [18:19:28] bblack: andrew was quick on the merge. I didn't get my change to set it off everywhere in before he merged. I can do a follow up to set DNSSEC off for the prod recursors too if you'd like. [18:20:04] or... lets see if this fixes Cloud VPS space first actually [18:23:24] sure [18:24:28] I'll make the followup right quick [18:27:16] probably good to keep your separate 'off' setting though. prod + wmcs may upgrade or change plans on this at different future times [18:28:26] bd808: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/521921/ [18:28:28] lookups for eutils.wip.ncbi.nlm.nih.gov are working again inside Cloud VPS [18:29:13] we may eventually move ours to Knot actually, but plans aren't imminent yet [18:30:35] interesting. We could too I suppose if it is a better stack. We are "stuck" with pdns on the primary side to work with Designate I think, but the recursor could be anything [18:34:50] we'll see how it goes. primary driver for us is going to be supporting DoH clients. we'll probably use Knot to build a separate public DoH resolver (not shared with infra resolver), but once we've done that, I feel like we may as coalesce on a single recdns solution if we like it. [18:35:07] it seems fairly high-quality all things considered (e.g. that it implements the kitchen sink) [19:47:17] 10netops, 10Operations: RPKI Validation - https://phabricator.wikimedia.org/T220669 (10ayounsi) > This has been fixed now in https://www.juniper.net/documentation/en_US/junos/topics/topic-map/bgp-origin-as-validation.html Good news, thanks. > Next step is to review/merge https://gerrit.wikimedia.org/r/c/52033... [21:28:58] on cp4031: varnishmtail-backend.service: Failed [21:29:08] just starting it doesnt cut it [21:29:20] maybe mtail issues are known [21:32:20] cp4031 varnishmtail-backend[12858]: flag provided but not defined: -logfds [22:16:37] looks like mtail was upgraded (3.0.0~rc5-1~bpo9+1wmf1 -> 3.0.0~rc24.1-1+wmf1) and the flag is gone? [22:20:16] sounds like that was the test host for the upgrade then [22:40:05] https://phabricator.wikimedia.org/T225604#5322783