[01:22:59] From the place I'm at right now, en.wikipedia.org GeoDNS me to eqiad, while if I do the `gdnsd_geoip_test` for my IP (63.140.88.225), or my ISPs' resolver (216.67.0.2) they both say ulsfo first. I have a ~60ms latency difference between eqiad/ulsfo. From https://www.highspeedinternet.com/ak Alaska communications is the #1 ISP in Alaska, so there is a risk of most of Alaska having a sub-optimal latency. [01:25:56] bblack: is there isn't more that we can do on our side, we should probably send an email to the contacts listed on https://www.peeringdb.com/net/1426 ? [08:10:35] 10netops, 10Operations: IPv6 packet loss registered by the Ripe Atlas anchor in eqsin - https://phabricator.wikimedia.org/T228015 (10elukey) p:05Triage→03High [08:10:45] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Deprecate python varnish cachestats - https://phabricator.wikimedia.org/T184942 (10fgiunchedi) [08:13:54] 10netops, 10Operations: IPv6 packet loss registered by the Ripe Atlas anchor in eqsin - https://phabricator.wikimedia.org/T228015 (10elukey) [08:46:49] 10netops, 10Operations: IPv6 packet loss registered by the Ripe Atlas anchor in eqsin - https://phabricator.wikimedia.org/T228015 (10elukey) From my home ipv6 address (removed the first hops): ` [..] 6. AS6939 100ge9-2.core1.par2.he.net 0.0% 10 46.0 49.6 40.9 67.5 9.2 7. AS6939... [09:49:24] bblack: in https://gerrit.wikimedia.org/r/c/operations/dns/+/523114/1/templates/ncredir-parking CNAMEing to ncredir-lb.wikimedia.org. is the way to go or it would be better to use the good old IN DYNA geoip!ncredir-addrs? [10:19:46] 10Traffic, 10Operations: Wikipedia is unavailable on Symbian phone's browsers - https://phabricator.wikimedia.org/T227828 (10ema) p:05Triage→03Normal [10:44:43] 10Traffic, 10Operations: ATS: log mode cannot depend on log filters being configured - https://phabricator.wikimedia.org/T224397 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [12:20:42] XioNoX: you might check a lookup on "reflect.wikimedia.org" from that network if you're still there, it will tell you how the authdns really sees the client or recursor IP [12:24:12] vgutierrez: I could make several bikesheddy arguments for either direction really, but I guess CNAMEs like you have it now probably has slight advantages. [12:24:42] ack [15:06:43] bblack: reflect.wikimedia.org has address 162.158.105.159, which is a cloudflare IP [15:06:57] wait no [15:07:03] I changed my DNS [15:09:04] reflect.wikimedia.org has address 216.67.109.147 [15:09:36] which is only geolocated to "USA" in maxmind's website [15:19:13] right [15:19:30] for US (or North america and other such enclosures), if we have no more-specific info eqiad is the defalut [15:19:34] *default [15:20:05] bblack: next step is to email the ISP I guess? [15:20:40] cloudflare has claimed before they aim to keep their dns cache exit IPs well-located in e.g. maxmind specifically to help with these issues (because they choose not to use edns-client-subnet in the name of privacy) [15:21:06] you can send updates directly to maxmind as well [15:21:21] https://support.maxmind.com/geoip-data-correction-request/ [15:21:47] yeah, they ask for my email, so I wasn't sure I'm legitimate in filling it [15:21:49] (or ask the ISP to update maxmind for all their stuff, I guess) [15:22:00] they haven't in the past cared if 3rd parties send in legit updates for others [15:22:43] haven't cared as in didn't do anything with the request? or acted on the request regardles? [15:22:57] yeah I guess that was ambiguous :) [15:23:12] I mean, they do accept 3rd party updates that seem legitimate, and have acted on them in the past. [15:23:22] ah cool, will do it then [15:23:58] seems like some IPs in their range 216.67.0.0/17 properly geolocate to Alaska [15:24:31] 216.67.0.0/17 -> Alaska "The location you submitted for correction already matches the location in both our GeoIP Legacy and GeoLite databases." [15:25:22] and even if do it with the specific 216.67.109.147 [15:25:41] maybe it's corrected upstream but hasn't reached us in an update yet? [15:27:05] but it's not correct on maxmind's online lookup [15:27:18] maybe there is some lag there too [15:27:41] I added https://wikitech.wikimedia.org/wiki/DNS#Know_which_IP_the_AuthDNS_is_seeing_a_query_from to the doc too, I didn't know about reflect.wikimedia.org [15:36:36] sent an email to maxmind as the bug seems to be in their DB [15:49:04] thanks! [15:49:46] < X-SSL-Reused: 0 [15:49:46] < X-SSL-Cipher: ECDHE-ECDSA-AES256-GCM-SHA384 [15:49:54] ATS reporting TLS stats via HTTP headers \o/ [15:54:36] nice! [15:55:25] yeah.. hopefully I'll submit a PR to upstream tomorrow [16:50:59] < X-SSL-Reused: 0 [16:50:59] < X-SSL-Protocol: TLSv1.2 [16:50:59] < X-SSL-Cipher: ECDHE-ECDSA-AES256-GCM-SHA384 [16:50:59] < X-SSL-Curve: X25519 [16:51:20] looking good :) [16:51:52] I think we can implement proxy_set_header X-Connection-Properties "H2=$h2; SSR=$session_reused; SSL=$ssl_protocol; C=$ssl_cipher; EC=$ssl_ecdhe_curve;"; in ATS [16:51:55] .D [16:51:56] :D [16:54:07] nice! [16:54:25] and then yeah perhaps we'll talk yo analytics about whether/how best we could export that to webrequest usefully [16:54:55] (I think that may turn out a lot better than prometheus for this, but some Q's about whether we send them a single string we can regex or it's better to send them the ~4 fields separately) [16:55:04] s/yo/to/ [17:16:20] if you care about correlating what sources of traffic (geo, UAs, etc) are using which ciphers, with granularity beyond a handful of options, prometheus won't be good for that :) [17:20:07] yup.. IMHO we should submit that info to analytics [17:20:43] it was rejected in the past by them, at least to append that data to webrequest [19:26:42] 10netops, 10Analytics, 10Operations, 10hardware-requests, and 2 others: Upgrade kafka-jumbo100[1-6] to 10G NICs (if possible) - https://phabricator.wikimedia.org/T220700 (10wiki_willy) a:03Cmjohnson [19:33:39] 10Traffic, 10Operations, 10ops-esams: cp3035 PS Redundancy Lost - https://phabricator.wikimedia.org/T225035 (10wiki_willy) Server will be refreshed in late Q1 / early Q2, along with a hardware refresh of the entire site. [20:42:48] 10Traffic, 10DC-Ops, 10Operations, 10ops-codfw: (OoW) lvs2006 Embedded Flash/SD-CARD iLO errors - https://phabricator.wikimedia.org/T192082 (10wiki_willy) [20:42:59] 10Traffic, 10DC-Ops, 10Operations, 10ops-codfw: (OoW) lvs2006 Embedded Flash/SD-CARD iLO errors - https://phabricator.wikimedia.org/T192082 (10wiki_willy) a:03Papaul [20:46:18] 10Traffic, 10Operations, 10ops-codfw, 10Patch-For-Review: (OoW) lvs2006 crashed into (what it seems) an unrecoverable state - https://phabricator.wikimedia.org/T209337 (10wiki_willy) [20:46:31] 10Traffic, 10Operations, 10ops-codfw, 10Patch-For-Review: (OoW) lvs2006 crashed into (what it seems) an unrecoverable state - https://phabricator.wikimedia.org/T209337 (10wiki_willy) a:03Papaul [20:59:57] 10netops, 10Operations: IPv6 packet loss registered by the Ripe Atlas anchor in eqsin - https://phabricator.wikimedia.org/T228015 (10ayounsi) Seems like HE in eqsin is having a bad time. I depref all AS paths that go through HE and packet loss stopped. Emailed HE's NOC. [21:01:34] 10netops, 10Operations: mr1-eqsin.oob IPv6 connectivity flapping - https://phabricator.wikimedia.org/T227967 (10ayounsi) > So far I don't think there is a link between the ripe alerts and the oob alerts. Well, seems like they are, as the return path from mr1 -> icinga1001 goes through HE, nothing we can do th... [21:09:10] 10netops, 10Operations: IPv6 packet loss registered by the Ripe Atlas anchor in eqsin - https://phabricator.wikimedia.org/T228015 (10ayounsi) 05Open→03Resolved a:03ayounsi They were very quick to reply and fix the issue. >RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK [21:12:21] 10netops, 10Operations: mr1-eqsin.oob IPv6 connectivity flapping - https://phabricator.wikimedia.org/T227967 (10ayounsi) 05Open→03Resolved Seems like fixing T228015 fixed that issue as well. [22:14:01] 10netops, 10Operations, 10ops-codfw: Cable mr1-codfw<->cr1/2-codfw through asw-a-codfw - https://phabricator.wikimedia.org/T228112 (10ayounsi) p:05Triage→03Normal [22:14:15] 10netops, 10Operations, 10ops-codfw: Setup new msw1-codfw - https://phabricator.wikimedia.org/T224250 (10ayounsi) [22:14:18] 10netops, 10Operations, 10ops-codfw: Cable mr1-codfw<->cr1/2-codfw through asw-a-codfw - https://phabricator.wikimedia.org/T228112 (10ayounsi) [22:15:45] 10netops, 10Operations, 10ops-eqiad: (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300 - https://phabricator.wikimedia.org/T225121 (10ayounsi) [22:15:49] 10netops, 10Operations, 10ops-codfw: Setup new msw1-codfw - https://phabricator.wikimedia.org/T224250 (10ayounsi) [22:19:43] 10netops, 10Operations, 10ops-codfw: Cable mr1-codfw<->cr1/2-codfw through asw-a-codfw - https://phabricator.wikimedia.org/T228112 (10ayounsi) [22:37:04] 10netops, 10Operations: Cleanup confed BGP peerings and policies - https://phabricator.wikimedia.org/T167841 (10ayounsi) > Finally, the more user-visible issue that we have right now is that we're underutilizing eqord: we currently do not announce our supernets from eqord. The reason for this is that I hadn't... [23:42:30] 10Traffic, 10Operations, 10Patch-For-Review, 10discovery-system, 10services-tooling: Figure out a security model for etcd - https://phabricator.wikimedia.org/T97972 (10CDanis) I think we likely want to revisit this. * Right now the `guest` user has access to `/eventlogging` which I don't think we actual...