[00:05:12] 10HTTPS, 10Traffic, 10SRE, 10Performance-Team (Radar): Enable QUIC support on Wikimedia servers - https://phabricator.wikimedia.org/T238034 (10Bugreporter) [00:05:45] 10HTTPS, 10Traffic, 10SRE, 10Performance-Team (Radar): Enable HTTP/3 (QUIC) support on Wikimedia servers - https://phabricator.wikimedia.org/T238034 (10Bugreporter) [08:00:28] 10Traffic: ATS backend origin server certificate validation behavior - https://phabricator.wikimedia.org/T281673 (10ema) [08:01:27] 10Traffic: ATS backend origin server certificate validation behavior - https://phabricator.wikimedia.org/T281673 (10ema) p:05Triage→03Medium [10:53:08] Hi, i have a traffic related question. We are trying to figure out the cache hit/miss ratio in the varnish level for requests to maps infrastructure (maps.wikimedia.org). Do we somehow keep track of this kind of metric? What would be the right data source to query? [12:05:58] nemo-yiannis: hey, I don't think we have a breakdown per Host header (eg: Host: maps.wikimedia.org) [12:06:25] we do have one per cache cluster, so you can distinguish between the "upload" cluster (upload.wikimedia.org + maps.wikimedia.org) and everything else [12:06:31] but that doesn't tell you much I suppose [12:07:24] I think one reason to avoid having that distinction in prometheus is that [12:07:49] we could easily end up with an explosion of metrics (variance of Host values is pretty high) [12:09:26] Yeah, i am not sure how this can help. Currently i am looking at "webrequest_sampled_128" on turnillo for `Host: maps.wikimedia.org` where i filter the `X-cache` header [12:09:37] right, that's a good idea [12:09:42] Maybe thats good enough in our case [12:11:24] x-cache is a bit harder to interpret though [12:12:27] compared to something like https://grafana.wikimedia.org/d/000000500/varnish-caching which tells you the "terminal layer" [12:28:23] is it safe to ignore the `x-cache: ... int` entries ? [12:41:25] nemo-yiannis: the meaning of those depends on the status code, but in general for the purpose of cache hitrate it is safe to ignore them [12:41:56] sounds good, thanks for the info [12:42:07] nemo-yiannis: https://wikitech.wikimedia.org/wiki/Caching_overview#Headers might be useful too [15:49:15] I changed the `site.pp` entry to switch `wdqs2004.codfw.wmnet` over from the internal wdqs to the public-facing one. pybal still shows this instance as an internal node though (https://config-master.wikimedia.org/pybal/codfw/wdqs-internal). Any idea on how to get pybal to see the new state of the world? Do I just manually change it with `conftool`? [15:50:50] ryankemper: https://github.com/wikimedia/puppet/blob/production/conftool-data/node/codfw.yaml#L290 should somehow be moved to wdqs I think? [15:51:22] ah! that makes sense to me :) thanks [17:11:01] sukhe_: should we send "all the mail" to Marc? root@ peering@ ... and anything else for traffic? [17:11:48] or the other option is to leave that to him as an example for "how to do private repo change" and to confirm shell access works [17:13:05] probably not peering@ [17:13:37] ack, not netops [17:14:36] dns-admin@ [17:16:01] I see a few special ones, could just do "everything that Sukhbir has" [17:19:04] mutante: thanks, that works for now [17:19:30] mmandere: ^ if you want to receive more email :) [17:20:43] mmandere: oh, hi, I missed that you are already on IRC. welcome. I am the guy commenting on your onboarding ticket [17:21:24] sukhe: ok, ops/ops-private done, and exim private coming up [17:21:48] mutante: thank you! I think the critical item left in that is pwstore [17:21:58] I am working with him on that [17:25:56] sukhe: sounds great! yea, also just part of clinic duty now [17:53:33] done. expect mail avalanche to your inbox