[07:42:57] arturo: I have pushed: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/534577/ which should be transparent for everyone, but please let me know if you see something (specially quarry related) [07:46:01] arturo: I have run some queries from querry and it worked, but let me know if you see or notice something [08:48:44] ack [09:02:42] <_joe_> I am looking at the puppet ci image [09:03:04] <_joe_> just rebuilding it for buster makes it 1.12 GB [09:03:21] <_joe_> of which 585 Mb are tox and bundler caches, to speed up execution [09:03:44] <_joe_> I'm asking myself if it wouldn't make more sense to publish such caches somewhere instead [09:03:52] <_joe_> and make the script download them [09:04:11] <_joe_> like having a periodic job that every N hours recreates them [09:05:02] <_joe_> at the same time, this means that every time you want to run a CI job you need to re-download ~ 350 mb of data, or to re-build your tox /bundler from scratch [11:43:51] _joe_: can you package the pre-populated cache as part of the image? [11:56:53] <_joe_> cdanis: that's what we're doing now [11:57:06] ohh sorry I misunderstood [15:55:37] https://lore.kernel.org/wireguard/CAHmME9qOpDeraWo5rM31EWQW574KEduRBTL-+0A2ZyqBNDeYkg@mail.gmail.com/T/#u 🎉 [15:55:48] [ANNOUNCE] WireGuard 1.0.0 for Linux 5.6 Released [15:55:56] 🎉 [15:56:15] yeah! great news [15:56:30] I use it everyday and love it. I wish it had TCP support but hey [15:56:35] has anyone played around with Tailscale any? [15:59:50] I have seen discussions about it on HN. it does solve the key management problem in a clean way [16:01:38] the latest linux upload to Debian unstable also backports the wireguard kernel support from 5.6 to 5.5: https://packages.qa.debian.org/l/linux/news/20200330T081008Z.html [16:01:59] "Add WireGuard driver and required crypto changes from 5.6-rc7 and cryptodev-2.6, thanks to Jason A. Donenfeld (Closes: #953569)" [16:47:18] Heads-up: I'm switching the CI job for operations/puppet over to buster. Looks OK in testing, but if anything breaks, shout and I can trivially revert. [16:54:36] interesting, I was checking memcached metrics and noticed this trend of get hit ratio [16:54:39] https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=now-90d&to=now&fullscreen&panelId=37 [16:55:00] that seems to line up with [16:55:01] https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=now-90d&to=now&fullscreen&panelId=42 [16:55:55] on some shard the hits dropped [16:56:13] (I am checking now mc1029) [16:58:52] on a separate note, there are some tko events registered for wtp nodes [16:58:53] https://grafana.wikimedia.org/d/000000549/mcrouter?orgId=1&from=now-2d&to=now&fullscreen&panelId=9 [16:59:56] the winner seems to be mc1027 [17:01:24] brb and then I'll open a task [17:01:56] elukey: please subscribe me as well [17:05:59] ack! [17:07:58] the slab is 154, ~200KB [17:29:14] ok so tx bandwidth saturation on mc1027 [17:29:14] 17:13:01 981662.1 [17:29:14] 17:13:02 981972.2 [17:29:15] 17:13:03 981631.8 [17:29:32] (those are Kbps) [17:30:12] so now the problem is finding the key that does it :D [17:59:28] I had meant to add even half-reasonably-done saturation monitoring for NIC rx/tx on memcache hosts this quarter, but it didn't happen [17:59:43] (as opposed to the heinous shell one liner I wrote under duress one of the last times this happened) [18:03:41] cdanis: how dare you not doing another thing on top of the 1000 that you follow :D [18:04:52] jokes aside, it is super fine, might be good to the future to alarm on (even for other hosts!) but we are also missing mcrouter alerts etc.. [18:06:23] of course the minute that I start doing tcpdump on mc1027 matches with the end of the bursts [18:06:34] * elukey cries in a corner [18:08:47] * volans remind to be careful before adding high frequency fleet wide checks [18:15:55] volans: this would have been implemented by a small python daemon running on each machine, and either running an http server to export to prom, or writing to a textfile for node_exporter [18:16:14] and then one check_prometheus ;) [18:31:46] I know was more to spread awarness :D [19:14:19] <_joe_> elukey: I think in those situations it's ok to use memkeys tbh [20:18:49] cdanis: that'd be relatively easy to implement...