[14:30:42] we just had a bursts of memcache requests to mc1034 up to saturation of RX bandwidth [14:30:53] mcrouter tkos seems all from appservers [14:31:32] two things happened: [14:31:43] 1) RX bandwidth saturation for mc1034 [14:31:44] https://grafana.wikimedia.org/d/000000316/memcache?panelId=59&fullscreen&orgId=1&from=1587737709843&to=1587738572343 [14:31:54] 2) almost TX bandwidth saturation for mc1019 [14:32:05] https://grafana.wikimedia.org/d/000000316/memcache?panelId=56&fullscreen&orgId=1&from=1587737709843&to=1587738572343 [14:33:11] the problematic slab seems to be mc1034:180 [14:33:30] drum roll... key size 700K :D [14:33:31] https://grafana.wikimedia.org/d/000000317/memcache-slabs?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1034&var-slab=180 [14:33:35] jumbo key [14:34:59] there are 3 keys in the slab, saved in mc1034 in /home/elukey/slab_180 [14:37:27] mc1019 was hammered by get requests for slab 164, ~330K [14:37:32] https://grafana.wikimedia.org/d/000000317/memcache-slabs?orgId=1&from=1587737730527&to=1587738694219&var-datasource=eqiad%20prometheus%2Fops&var-cluster=memcached&var-instance=mc1019&var-slab=164 [14:38:25] saved the slab's content as well, but there are a lot more keys [14:42:54] elukey: something that's been on the back of my mind has been probabilistic sampling of memcached gets / top-k hottest keys tracking [14:46:47] cdanis: in theory there are some patches from Aaron that should give us some metrics from MediaWiki, I hope that those will give us more insights [14:47:40] and I see two problems in general: [14:47:46] 1) identification of hot keys [14:48:00] 2) follow up with whatever creates the bursts [14:48:23] the latter is still a big problem, even if we come up with metrics :( [14:48:40] yeah... [14:48:53] i just feel it's hard to know what to look at for sure, right now [14:49:12] aside from, local memcached might help [14:49:19] agreed [14:50:14] even if local memcached (which is still a ways out AIUI) does help, it would be good to know _why_ -- like, are some flows in mediawiki just requesting the same key N times in one request? [14:50:21] (i don't know if anyone knows that for sure) [14:50:57] oh, also elukey, i made some edits to how the nic bw saturation panels are defined, i hope they make sense to you [14:51:23] there's some help text behind the 🛈 [14:51:40] super thanks a lot [14:52:43] ah snap then I completely misundersood the metric [14:53:15] so the 80% for mc1019 was the % of time that it worked at >90% bw saturation [14:53:35] then it is worse than I described, both shards with bw saturated :D [14:53:43] thanks for the clarification! [14:54:55] yeah, it's a ratio of seconds per second, so it comes out unitless [14:55:06] not the most intuitive to understand [14:55:24] any sustained nonzero value there is pretty bad [14:55:30] ahahha yes yes [14:55:38] that could be a good description [14:55:42] :D :D :D [14:56:09] you described it more politely [14:56:18] in the panel [14:56:37] another thing that I was wondering is about https://netbox.wikimedia.org/search/?q=mc10&obj_type= [14:57:29] now if we pick A6, the top of rack switch should have 1G links to mc10xx and 10G links to other switches (don't know if it is a leaf or a spine) [14:58:05] one of the things to do when we have the gutter pool ready is to spread the hosts in multiple racks [14:59:05] (A6's switch should be a leaf IIUC from netbox) [16:08:59] elukey: A6 has 2*40G links to the spines [16:11:07] elukey: not sure if it would help with uplink saturation, but have you looked into jumbo frames? It might remove some overhead with large data transfer [16:26:45] that's a good thought, it would probably give us like 5% more capacity on the larger values [16:26:47] XioNoX: ah nice I thought 10G, 40G looks very nice :)