[08:14:29] 10Traffic, 10Analytics, 10Analytics-Kanban, 10Operations, and 2 others: Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10JAllemandou) @Ottomata - After standup please, as today is kids-day for me :) [12:40:34] 10Traffic, 10Operations, 10Patch-For-Review: puppetize http purging for ATS backends - https://phabricator.wikimedia.org/T204208 (10ema) 05Open>03Resolved a:03ema Done, `profile::trafficserver::backend` now installs and configures vhtcpd. [13:15:33] vgutierrez, so, anything else to do on the puppet patch? [13:37:28] 10netops, 10Operations, 10fundraising-tech-ops: Qualys scans causing problematic pfw logspam - https://phabricator.wikimedia.org/T206431 (10Jgreen) 05Open>03Resolved a:03Jgreen The underlying problem was that bellatrix was logging to the root partition rather than the /srv data partition as it should.... [13:52:04] Krenair: hmm build the certcentral package and upload it to our repo [13:52:11] Krenair: so we need to merge the latest changes in the debian branch [13:57:40] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Jgreen) EV certs do seem to have lost almost all their value. That said the cost difference over an OV cert is under $100. Also, I'm not sure whose d... [14:01:39] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Krenair) >>! In T204931#4648564, @Liuxinyu970226 wrote: > @krenair please, no more DV certs, that's the reason why jawiki, ugwiki, wuuwiki, zhwiki, z... [14:02:07] bblack, ^ is there any basis to what Liuxinyu is writing there? [14:04:05] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10BBlack) >>! In T204931#4654860, @Krenair wrote: >>>! In T204931#4648564, @Liuxinyu970226 wrote: >> @krenair please, no more DV certs, that's the reas... [14:05:31] vgutierrez, so I'm a little confused [14:05:45] vgutierrez, did we need to have the latest changes covered by the 0.1 tag, or did they need to be on the debian branch? [14:05:46] both? [14:06:20] both [14:06:24] AFAIK [14:06:45] why both? [14:07:35] it'd be nice if there was standardized signalling (especially a non-TOFU mechanism) to tell clients that SNI is useless and can be set to arbitrary values [14:08:14] what's actually being packaged is the debian branch, and tags are useful to have track in the repo of previous versions [14:08:20] the whole functional purpose of SNI is to offer multiple different certs with different CNs/SNIs on the same IP address. [14:09:10] it would be uesful therefore, to declare e.g. "for all hostnames in the wikipedia.org domain, you don't need to send SNI to the server", (because on the server-side, all servers for hostnames within wikipedia answer with the same wildcard certificate matching them all) [14:09:39] because if that were available, we could totally make that a constraint in our environment, and client browsers could just send random SNI data, or pick random unrelated legit domainnames to cycle through, etc. [14:09:50] vgutierrez, alright well I updated the tag already [14:10:09] vgutierrez, so we just need to merge stuff into debian [14:10:40] nice, create the change and I'll approve it [14:13:23] had to resolve some conflicts for some reason [14:13:29] https://gerrit.wikimedia.org/r/465633 [14:14:00] nice.. let's mention version 0.1 in the commit message [14:14:36] done [14:14:37] maybe tack it on as an extension to HSTS? strict-transport-security: max-age=106384710; includeSubDomains; preload; nosni=604800 <- You can send meaningless SNI values for at least 1 week into the future. [14:15:43] plus if everyone sees how easy it is to get rid of SNI problems this way, maybe they'll all start dis-aggregating TLS cert:IP mappings (using separate server IPs for every cert) [14:15:56] which will ramp up the consumption of IPv4 and force faster IPv6 adoption rates again :P [14:17:23] maybe needs server-side code patches for e.g. nginx/apache/etc to ignore SNI when only one cert is present, but that's not hard. [14:18:00] (they may already do it by default) [14:30:27] Krenair: lets take this into account before merging the puppet patch: https://gerrit.wikimedia.org/r/#/c/operations/dns/+/465636/ [14:31:15] did we decide to do that in the end? [14:31:39] I think not, but really it's all bikeshedding [14:31:45] well [14:32:27] vgutierrez: you left yesterday, we decided without you :-P [14:32:35] I can fill you up in a bit, after switch stuff is over [14:32:37] (vs putting certcentral1001.eqiad.wmnet in a hieradata variable for "which puppet fileserver name to use for the file source=>foo", and switching the hieradata with a commit if we need to switch) [14:33:27] so layer8 issue here, sorry about that :) [14:33:42] we've still got to use a hieradata variable [14:33:56] don't want to hardcode wmnet stuff in there [14:37:00] bblack: there are a few icinga warnings on all authdns [14:37:02] Stale template error files present for '/var/lib/gdnsd/discovery-videoscaler.state' [14:37:12] stuff like this ^ [14:37:12] ema: ignore [14:37:21] those will be cleared by the script once we rever the TTL [14:37:35] k [14:37:39] I'm holding the last 2 steps maintenance and TTL revert [14:37:56] the first because of load the 2nd just inc ase we need to do something that involves those endpoints [14:41:57] Oct 10 14:41:37 authdns1001 nrpe[153865]: Host 208.80.154.84 is not allowed to talk to us! [14:42:30] icinga1001, I guess unrelated for now [14:43:03] ema: those warning should go away now (icinga time ofc) [14:43:50] bblack: yeah not yet authorized AFAIK [14:43:55] volans: confirmed, they're going away [14:48:53] I don't think we have a lot to discuss for the upcoming meeting, I say we just cancel/async it [14:49:19] there's a note from volans about the DNS checker script [14:49:40] I'm here if needed :) [14:49:56] which I think is a wonderful idea in general, but no I haven't made time to try to mentally vet whether every rule makes sense :) [14:50:22] I assume there's currently still outstanding violations and we're not sure if they should corrected or the rule relaxed? [14:51:34] I think if we can resolve those (which means maybe making some internal-standards bikeshedding decisions, and/or just fixing obviously-wrong things in the data) [14:51:54] bblack: ok, no meeting then? [14:52:12] my first question would be, is there anything that block us to start merging it as just a script in the repo, not in CI [14:52:14] then there's no reason to stall out on quibbles about whether the rules are perfect. If we find a legitimate use-case to break a rule in a future DNS commit and decide the rule needs relaxing, we can always fix it at that time. I'm sure it will have to evolve over time in general [14:52:30] volans: as a script for manual use, no blocker I don't think [14:52:33] and start using it on-demand on our laptops and slowly fix the errors (ignore the warnings for now) [14:52:49] ema: ack, no meeting [14:53:02] we already discovered a big miss configuration in our DNS with the script: https://gerrit.wikimedia.org/r/#/c/operations/dns/+/451614/ [14:53:23] yeah [14:53:31] * vgutierrez still waiting for a +1 there BTW [14:53:32] ;P [14:53:33] I'm sure it won't be the last of the fixups we find by far :) [14:54:05] if it can help I can re-run the script on top of master and update the 2 phab paste [14:54:38] eh [14:54:49] if we're just talking about merging the script for manual use, I say go for it [14:55:06] the more complicated question is when we're happy enough with the rules (and the data is also clean against them) to turn it on for CI [14:55:38] sure that's something for later ofc, right now it will fail all the time [14:56:19] volans: I haven't looked at the code... is it fairly general, or more WMF-specific? [14:57:00] because also, sometime in some future gdnsd release, I really want gdnsd to ship a separate zone data linting tool that does a lot of similar stuff, maybe with flags to turn on/off various checks, etc. [14:57:23] so that I can kick lint-checking stuff out of the daemon's C source completely and have it just validate the bare minimum for "legal data that can be loaded at all" [14:58:13] so at some point, that probably means writing a (?python?) script in gdnsd's repo that may have substantial overlap with the script you just wrote! :) [14:58:35] bblack: it's kinda wmf-specific, ofc some thigns can be taken out [14:58:41] right [14:59:11] let me re-have a look at the code :D [14:59:22] or maybe can start with that, and abstract a bit where the WMF-checker subclasses the same code and adds more checks, or whatever. [15:00:00] or maybe... ship a not-wmf-specific extensible linter with gdnsd in python, and allow it to have configurable extra checks loaded from python code too [15:00:04] the more general part is the parsing part, that in a gdnsd-specific script might not be needed as coming directly from gdnsd parsing [15:00:18] gdnsd won't be parsing, the script's on its own [15:00:30] gdnsd's parsing is in C, and the linter would be python or some scripting language [15:00:59] ok [15:01:38] (which yes, technically opens a validation hole: the two could differ on some crazy corner case and the linter says ok but gdnsd rejects, but in the bigger picture that's ok) [15:01:43] but yeah it was a kinda-weekend project done in the middle of other things, not planned or anything, so if we want something nice will need quite some refactoring :) [15:02:06] (because gdnsd won't fail or stop or wipe out the previous data if a zone fails to load, and it does inform the reloader of the failure) [15:02:56] mostly I want the linter to validate higher-level properties that don't matter to correct authserver operation, but probably do matter in the sense of users making stupid data mistakes [15:02:57] bblack: also this script is checking only the wmnet, wikimedia.org and ipv4/6 reverse zones [15:03:04] (e.g. matching up forward/reverse) [15:05:50] volans: for context: https://phabricator.wikimedia.org/P7658 [15:06:03] ^ those are all the non-fatal lint-like checks in the current C source, executed when loading a zone [15:06:33] which are unecessary complexity and slowdown in the daemon, and also painful to reasonably expand the set of checks to other things, hence the desire to kick it all out to a script [15:06:52] hi XioNoX... meeting has been cancelled/made async [15:07:04] indeed, interesting checks, we could add them too [15:07:05] just in case you were waiting for us there [15:07:17] ah, okay, yeah, was a bit lonely in that room :) [15:07:34] sorry about that, I didn't realize you weren't here [15:09:23] no problem, looks like freenode had issues overnight :) [15:24:55] vgutierrez, so, package building? [15:29:27] triggering the package building process in boron... [15:29:40] 10netops, 10Operations, 10fundraising-tech-ops: icinga reports frbast2001.frack.eqiad.wmnet as host down - https://phabricator.wikimedia.org/T206637 (10Jgreen) [15:32:04] lintian looks happy [15:32:06] W: certcentral source: newer-standards-version 4.2.1 (current is 3.9.8) [15:32:17] that's ok [15:32:19] yup [15:32:25] lintian is just too old in boron [15:35:02] 10Traffic, 10Operations, 10ops-eqiad: cp1076 hardware failure - https://phabricator.wikimedia.org/T206394 (10Cmjohnson) @bblack is there any action item for me? [15:35:07] so is debhelper [15:36:14] Depends: init-system-helpers (>= 1.18~), python3-acme, python3-cryptography, python3-dnspython, python3-flask, python3-openssl, python3-requests, python3-yaml, python3:any (>= 3.3.2-2~), adduser [15:36:20] that seems right :) [15:46:10] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Jgreen) >>! In T204931#4654860, @Krenair wrote: > > Presumably whoever would be responsible for purchasing a renewal has to consider this. It's one... [15:49:13] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10BBlack) The kicker probably wouldn't be the monetary cost. It would be that if you didn't require EV, you could auto-issue certs from LetsEncrypt an... [16:05:51] bblack, now I'm wondering, what exactly is the draw of an OV cert over DV? [16:19:39] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Jgreen) >>! In T204931#4655176, @BBlack wrote: > The kicker probably wouldn't be the monetary cost. It would be that if you didn't require EV, you c... [16:35:53] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10cwdent) @BBlack I have been exploring options and it sounds like the DNS TXT record challege would allow us to issue certs without disturbing the hos... [16:48:25] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Krenair) That is being set up for prod at the moment actually, but it relies on trusted servers SSHing to prod auth DNS machines. I'm not sure frack... [17:17:45] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10greg) >>! In T191183#4653436, @thcipriani wrote: > This is probably something we should enforce somehow (jenkins? some tool to be created to upload?) before exposing this feature b... [17:21:31] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Jgreen) >>! In T204931#4655430, @Krenair wrote: > How complex is the payments site? Is it possible to do http challenges there? Off the top of my he... [17:47:22] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Krenair) >>! In T204931#4655504, @Jgreen wrote: >>>! In T204931#4655430, @Krenair wrote: >> How complex is the payments site? Is it possible to do ht... [17:49:33] 10HTTPS, 10Traffic, 10Fundraising-Backlog, 10Operations: Re-evaluate use of EV certificates for payments.wm.o? - https://phabricator.wikimedia.org/T204931 (10Jgreen) >>! In T204931#4655857, @Krenair wrote: > > Oh wow, okay - I was expecting you to say it was behind LVS or something but not that. Ha, well... [17:54:55] 10netops, 10Operations, 10fundraising-tech-ops: icinga reports frbast2001.frack.eqiad.wmnet as host down - https://phabricator.wikimedia.org/T206637 (10Jgreen) 05Open>03Resolved a:03Jgreen This is fixed. - fix nagios_nsca.conf in prod puppet for frbast2001's new IP - fix modules/network/data/data.yaml... [18:08:07] hi. if i check the SOA serial numbers on ns0/ns1/ns2 i am getting different results for: wikipedia.is [18:08:37] the .IS registry apparently checks this and complains about it.. then tells MarkMonitor [18:08:52] and they tell WMF Legal.. and they send emails to me.. (great workflow) [18:09:11] looks like i can confirm what they are saying.. the numbers are not in sync [18:09:25] they say if we don't fix it within 6 weeks they are suspending the domain [18:09:42] i will open a ticket for that [18:21:32] 10Traffic, 10Operations: SOA serial numbers returned by authoritative nameservers differ - https://phabricator.wikimedia.org/T206688 (10Dzahn) [18:22:20] 10Traffic, 10Operations: SOA serial numbers returned by authoritative nameservers differ - https://phabricator.wikimedia.org/T206688 (10Dzahn) [18:26:07] 10Domains, 10Traffic, 10Operations: SOA serial numbers returned by authoritative nameservers differ - https://phabricator.wikimedia.org/T206688 (10Dzahn) [19:05:22] 10netops, 10Operations: Intermittent connectivity issues in eqiad's row C - https://phabricator.wikimedia.org/T201139 (10ayounsi) There are 2 parallel issues here. 1/ IPv6 neighbor discovery randomly broken when igmp-snooping is enabled. This has been worked-around by disabling igmp-snooping yesterday T201039... [19:17:30] 10Traffic, 10Fundraising-Backlog, 10Operations, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561 (10CCogdill_WMF) @Jgreen @BBlack thanks for bumping this and continuing to check. It does look like the SSL rating has bumped up to an A: https://www.ss... [19:42:38] 10Domains, 10Traffic, 10Operations: SOA serial numbers returned by authoritative nameservers differ - https://phabricator.wikimedia.org/T206688 (10BBlack) SOA Serial values only have meaning to the administrators of a zone, and to servers with which they authorize legacy zone transfers. The registrar is nei... [20:16:56] https://blogs.dropbox.com/tech/2018/10/dropbox-traffic-infrastructure-edge-network/ [20:17:44] and matching HN link: https://news.ycombinator.com/item?id=18186503 [20:18:18] " https://github-debug.com/ and https://www.fastly-debug.com/ and https://dropbox-debug.com/ " is pretty cool, a more automated version of https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue :) [20:32:19] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Paladox) We won't be able to use git lfs on cobalt as it is using jessie whereas the git-lfs package is in stretch-backports+. We could enforce it so users can only upload there i... [20:38:33] http://s2geometry.io/ is really neat (from dropbox's CDN article) [20:38:38] I might have some uses for it! [21:32:50] 10netops, 10Operations: Enable access from icinga1001 to mgmt interfaces - https://phabricator.wikimedia.org/T206704 (10colewhite) [21:37:07] heh dropbox's CDN article includes a lot of the stuff we've talked about doing ourselves [21:58:28] 10netops, 10Operations: Enable access from icinga1001 to mgmt interfaces - https://phabricator.wikimedia.org/T206704 (10ayounsi) 05Open>03Resolved a:03ayounsi Management firewall policies updates. [22:20:36] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10greg) >>! In T191183#4656413, @Paladox wrote: > We won't be able to use git lfs on cobalt as it is using jessie whereas the git-lfs package is in stretch-backports+. That's simply... [23:15:43] 10Traffic, 10Gerrit, 10Operations, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Tgr) UX-wise a single central place for profile images is obviously preferable, so using Phabricator makes a lot of sense. (Having some way to store a profile image in your Wikimed...