[13:53:01] <cdanis>	 the ipv6 internet is just not as reliable as the ipv4 one https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-target_site=ulsfo&var-ip_version=All&var-country_code=All&var-asn=All&from=now-7d&to=now
[14:35:13] <cdanis>	 elukey: looks like things finally quieted down on the memcache front https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=now-2d&to=now the shape of the traffic does suggest to me not reparse jobs, but rather, pages needed to be re-rendered after being (for the first time in months?) effectively purged from the CDN quickly
[14:41:22] <cdanis>	 interestingly, we also had a bunch of micro-bursts of saturation on mc1027, despite the usual TX metric reporting 440Mbps or less
[16:26:19] <_joe_>	 cdanis: we purged 4.8M pages out of 6M?
[16:26:22] <_joe_>	 something like that
[16:26:51] <_joe_>	 also, we purged a lot of broken urls :D
[16:55:05] <cdanis>	 indeed
[16:55:10] <cdanis>	 I think more like 4.5M
[16:55:12] <cdanis>	 but... still.
[16:56:23] <cdanis>	 vgutierrez: more fun times with ats-tls?
[16:56:45] <vgutierrez>	 yey
[16:57:10] <vgutierrez>	 I need to compile it with ASAN support and see what's going on
[16:57:50] <cdanis>	 ugh :(
[17:38:51] <elukey>	 cdanis: one question though - why did we see that sqlblob thing hammering mc1028? Just because it was contained in a ton of pages? (the ones getting re-rendered)
[17:39:31] <elukey>	 also the rise in bw usage from the slab point of view matches with the change in the module
[17:39:36] <cdanis>	 elukey: yeah, the template in question was the Lua for one of the two styles of citations used on enwiki
[17:39:48] <cdanis>	 so I think what's different about this event is that purges in esams were working
[17:40:07] <cdanis>	 that blob is transcluded in something like 4.5M enwiki pages
[17:42:16] <elukey>	 cdanis: so to understand, the change in the lua template/module caused all those 4.5M pages to be purged and hence re-rendered, hammering memcached
[17:44:29] <elukey>	 I see that https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/587902/ got merged, after the next train we'll have some new mw metrics to play with :)
[17:44:43] <cdanis>	 elukey: that's my theory, given the shape of the tx traffic from that host
[17:44:51] <cdanis>	 I don't know how to actually verify this, ofc
[17:45:26] <cdanis>	 but if it was jobqueue stuff I'd expect something flatter, not something that looked diurnal
[17:45:40] <elukey>	 makes a lot of sense
[17:46:10] <cdanis>	 yeah here look at this --
[17:46:12] <cdanis>	 https://grafana.wikimedia.org/d/000000607/cluster-overview?panelId=84&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&var-instance=All&from=now-7d&to=now
[17:46:34] <cdanis>	 network rx on the appserver machines spikes around 12:50UTC that day, which is the right time
[17:46:44] <cdanis>	 and then tapers off to normal nadir
[17:47:47] <cdanis>	 https://grafana.wikimedia.org/d/000000607/cluster-overview?panelId=84&fullscreen&orgId=1&from=now-7d&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=api_appserver&var-instance=All
[17:47:55] <cdanis>	 there's some effect on the apiservers, but not nearly as pronounced
[17:48:09] <cdanis>	 https://grafana.wikimedia.org/d/000000607/cluster-overview?panelId=84&fullscreen&orgId=1&from=now-7d&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=jobrunner&var-instance=All
[17:48:15] <cdanis>	 and basically no effect on the jobrunners, which is actually kind of odd
[17:48:24] <cdanis>	 but anyway, I think that mostly confirms this theory
[17:50:50] <elukey>	 it was strange this time since the key was relatively small, like 80k
[17:51:01] <elukey>	 in the past I saw problems with 200k+
[17:51:34] <elukey>	 (or even jumbo keys like 400/500K)
[17:59:38] <cdanis>	 hm, I'm not finding anything that looks super-obvious in either CDN cache hit rate (in either varnish or ats-be stats) or in parsercache (there's *some* effect on pc hit rate, but not hugely dramatic)
[18:10:09] <elukey>	 (afk! will read later :)