[11:35:49] <wikibugs>	 10Traffic, 10Data-Engineering-Radar, 10SRE: Lock-in Varnish and VarnishKafka versions - https://phabricator.wikimedia.org/T304617 (10jbond) p:05Triage→03Medium
[11:54:57] <jinxer-wm>	 (EdgeTrafficDrop) firing: 62% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop
[11:59:57] <jinxer-wm>	 (EdgeTrafficDrop) resolved: 62% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DEdgeTrafficDrop
[12:00:18] <_joe_>	 uhhh what was that?
[12:00:44] <_joe_>	 ah we actually had a spike of requests
[12:54:52] <bblack>	 in general, drmrs's traffic volume is low enough to make that EdgeTrafficDrop thing unreliable/flaky (known issues)
[14:57:52] <XioNoX>	 laptop:~$ host en.wikipedia.org
[14:57:52] <XioNoX>	 en.wikipedia.org is an alias for dyna.wikimedia.org.
[14:57:52] <XioNoX>	 dyna.wikimedia.org has address 91.198.174.192
[14:57:58] <XioNoX>	 bblack: ^
[14:58:06] <XioNoX>	 from France of course
[14:58:09] <bblack>	 :)
[14:59:00] <XioNoX>	 host reflect.wikimedia.org
[14:59:00] <XioNoX>	 reflect.wikimedia.org has address 145.100.185.15
[14:59:06] <bblack>	 XioNoX: if you do "host reflect.wikimedia.org" it will give some insight on what IP address our geoip is seeing (likely a recursor exit)
[14:59:11] <bblack>	 doh, you typed faster :)
[14:59:22] <XioNoX>	 looks like something in NL
[14:59:35] <XioNoX>	 that's from the coworking space I'm in
[15:00:29] <bblack>	 yeah
[15:00:41] <XioNoX>	 the set DNS to 8.8.8.8
[15:00:45] <XioNoX>	 they*
[15:00:50] <bblack>	 this can confirm the other part of it too:
[15:00:56] <bblack>	 bblack@dns1002:~$ gdnsd_geoip_test generic-map 145.100.185.15 2>/dev/null
[15:00:59] <bblack>	 generic-map => 145.100.185.15/10 => esams, eqiad, codfw, ulsfo, eqsin
[15:01:30] <XioNoX>	 https://wikitech.wikimedia.org/wiki/DNS#Know_which_IP_the_AuthDNS_is_seeing_a_query_from
[15:01:48] <XioNoX>	 and the block above
[15:02:04] <bblack>	 ah yeah, nice
[15:02:10] <XioNoX>	 I'm a bit surprised that 8.8.8.8 in France exits in the NL
[15:02:25] <elukey>	 TIL reflect.w.o :)
[15:02:32] <bblack>	 probably depends a bit on whatever ISP is supplying the co-working space
[15:03:10] <bblack>	 maybe they're regional and they get all their upstream access out of NL or something
[15:04:01] <XioNoX>	 laptop:~$ host reflect.wikimedia.org 8.8.8.8 -> reflect.wikimedia.org has address 217.128.133.0
[15:05:15] <bblack>	 yeah that maps to drmrs
[15:05:43] <bblack>	 I wonder why you get a different answer when specifying 8.8.8.8 directly?
[15:05:50] <XioNoX>	 yeah, no idea :)
[15:46:01] <wikibugs>	 10netops, 10Infrastructure-Foundations, 10SRE, 10Sustainability (Incident Followup): Add linecard diversity to the router-to-router interconnect in codfw - https://phabricator.wikimedia.org/T248506 (10ayounsi)
[16:07:04] <btullis>	 Quick question: When you depool a cp-* server you currently either use `confctl` on a puppetmaster or `depool` on the host itself, is that right? No cookbook at the moment.
[16:10:02] <btullis>	 I'm asking because I've drafted a new Alertmanager check for varnishkafka throughput (https://gerrit.wikimedia.org/r/c/operations/alerts/+/773801) T300246 - but I think the alert will trigger when hosts are intentionally depooled. I was wondering about integrating a 'create silence' in Alertmanager or some other way of preventing this from firing.
[16:10:03] <stashbot>	 T300246: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246
[16:19:06] <godog>	 IIRC pooled status (as seen by etcd/conftool) isn't in prometheus yet as a metric (it is from pybal though IIRC), the former should be simple enough to add these days when/if needed
[16:19:37] <cdanis>	 +1 to putting pooledness in prometheus
[16:19:45] <elukey>	 +1 would be really cool
[16:20:34] <cdanis>	 is it confd that maintains what's on https://config-master.wikimedia.org/ ?
[16:20:44] <cdanis>	 could write a template in node_exporter textfile format ;)
[16:23:05] <btullis>	 Oh yeah, that would be a very neat solution.
[16:23:09] <godog>	 IIRC yeah that's it, also yes it'd be textfile indeed
[16:23:09] <wikibugs>	 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: cp1090.mgmt ssh port not accessible - https://phabricator.wikimedia.org/T304589 (10Cmjohnson) 05Open→03Resolved a:03Cmjohnson re-seated the mgmt cable. no issues logging into mgmt interface  root@cp1090.mgmt.eqiad.wmnet's password: /admin1->
[16:23:38] <godog>	 a slight variation on the theme for "reasons" but I did the work already with "mini textfile exporter" for the network probes
[16:23:52] <godog>	 basically because we need to be able to write an arbitrary "instance" label
[16:24:03] <godog>	 anyways that's a detail, point being that it should be easy
[16:37:03] <btullis>	 godog: Thanks. I have tagged you on the ticket and the patch. Feel free to let me know if I can help implement the pooled/depooled metric.
[16:37:08] <topranks>	 godog: nice 
[16:37:11] <topranks>	 which network probes are those?
[16:38:30] <godog>	 topranks: for now the work I did at https://phabricator.wikimedia.org/T291946 though possibly any network level check
[16:38:52] <godog>	 btullis: for sure, I don't have the bandwidth to implement the metric but happy to assist/brainstorm
[16:39:25] <godog>	 btullis: my understanding is that it should be a variation of what confd does on config-master as cdanis was pointing out
[16:39:50] <topranks>	 godog: super I'll dig in and check it out :)
[17:34:05] <bblack>	 even with just a few countries mapped, you can see the reduction in peak esams traffic, nice view here:
[17:34:08] <bblack>	 https://w.wiki/4zGW
[19:29:51] <cdanis>	 already more traffic than ulsfo at peak :)
[21:06:24] <wikibugs>	 10Traffic, 10Data-Engineering, 10SRE, 10Trust-and-Safety, and 2 others: Disable GeoIP Legacy Download - https://phabricator.wikimedia.org/T303464 (10Dzahn) a:03Dzahn