[08:25:00] sigh, definitely a bug for check puppetrun to go critical after being reenabled, I'm opening a task [08:26:02] <_joe_> godog: heh it's not easy to solve without history [08:26:28] <_joe_> also it's partially my fault - I should've reenabled puppet yesterday evening [08:27:03] <_joe_> godog: ok one thing - the grace time for disabling puppet should be shorter than the one for having puppet not running (not failed, but not run) [08:27:52] _joe_: yeah definitely [08:31:13] https://phabricator.wikimedia.org/T263720 [08:32:02] brb [08:37:26] <_joe_> godog: <3 [09:37:37] elukey, klausman: there's an icinga warning for space on stat1008:/srv, just fyi [09:37:54] Yep, aware [09:38:03] 👍 [09:38:07] I am waiting for Luca to come back from an errand, to discuss next step [09:38:24] We did the backups of stat1004/6 or reimaging to 1008, so that's why it's running low. [09:38:30] for* [15:58:46] Hey, anyone has a reason _not_ to create a wiki when we're running via codfw? It would be T262812. [15:58:47] T262812: Create private arbcom-ru wiki - https://phabricator.wikimedia.org/T262812 [16:00:11] Urbanecm: that should be fine, just note it should be mwmaint2001, not 1002 [16:00:48] thanks rzl [16:22:13] _joe_: heyo, if you'd like, I'll gladly share the preliminary results of the mcrouter-on-k8s tests we were discussing - either in this channel or someplace else till we get an official venue for this :D [16:22:47] <_joe_> here or #wikimedia-serviceops is ok I think :) [16:23:00] alright :) [16:23:56] <_joe_> so, what did you experiment with? running daemonsets? [16:25:56] <_joe_> as usual - I'm not sure that's the best solution. Maybe having a k8s service is. Maybe it's running it in the pod. Maybe again - this is when we start running separate mcrouter proxies [16:26:03] Haven't tried DaemonSet yet - tested a 2-replica normal Deployment and a sidecar in the MW deployment. Baseline here was twemproxy which is sitting outside of the cluster. [16:27:09] <_joe_> also for context - we plan to run a on-host memcached, which would be part of the DaemonSet, possibly [16:27:25] yeah, I think I saw that task - to replace APCu, right? [16:28:10] <_joe_> not completely, for now it would mostly be a way to offload some of the busiest key classes from hammering the memcached servers [16:28:17] <_joe_> we're running out of network capacity [16:28:22] got it [16:29:03] so the initial benchmark results are here: https://gist.github.com/mszabo-wikia/0e93c44e13d49f44cb07954847c7f315 . I used Special:RecentChanges as it's a page that relies heavily on memcached - a good next step to refine the benchmark would probably be to use a more representative URL set based on a snapshot of real requests or something similar [16:30:02] and of course running a test with mcrouter deployed as DaemonSet - just need to get Rob's stamp of approval first ;) [16:30:55] deployment seems like a no-go as it seems to trigger significantly higher latency spikes than what we've been seeing with the external twemproxy [16:30:59] -_- [16:31:09] wait, do we have multiple robs again? dammmmmmmmn [16:31:20] I thought i waited them all out ;D [16:32:21] sorry, I meant Robert Jerzak from our side :D [16:33:01] no worries ;D i was just curious =] (no matter what im totally keeping rob@ though hehe) [16:46:49] also, it might be nice to test a unix socket mounted from host volume for communication mcrouter <-> mediawiki in either the DaemonSet or sidecar options, I recall there being a phab task to use socket for mcrouter on current appservers [16:57:51] <_joe_> mszabo: the issue with using the unix socket mounted thing is [16:57:59] <_joe_> you cannot have replication if you do that [16:58:09] <_joe_> as mcrouter either works with tcp or a unix socket [16:58:11] <_joe_> not both [16:58:19] ohhhh, right [17:00:23] <_joe_> so either you think of a more complex replication strategy [17:00:29] <_joe_> and it might be worth it for k8s [17:00:31] <_joe_> uhmmmm [17:00:39] <_joe_> yes please, do a test with unix sockets :P [17:00:51] <_joe_> that might be interesting if it gives you a big advantage [17:00:58] <_joe_> btw, php-fpm does [17:01:34] ah I see - so the mcrouter on appservers broadcast the replicated ops to the mcrouters on the appservers in the remote DC if I'm reading https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/mediawiki/mcrouter_wancache.yaml right [17:01:54] for some reason I thought the broadcast target was a different pool [17:05:04] <_joe_> yes [17:05:14] <_joe_> but we can do something different ofc [17:05:26] <_joe_> and have the proxies run with a different configuration [17:05:41] <_joe_> while the mcrouters run in the pods, via unix socket [17:06:58] yeah [17:09:44] yeah unix sockets are nice for stuff like this. It avoids some of the bullshit things TCP does in the name of real networks that are senseless overhead on localhost, but more importantly if you end up opening tons and tons of connections, you don't have to worry about the exhaustion of port numbers and TIMEWAIT, etc [17:16:50] 👍 - deploying it now [18:00:29] interesting - ran the tests a few times and latencies with mcrouter sidecar using unix socket in an emptyDir volume with memory backing are consistently higher than with connecting to mcrouter sidecar over tcp [18:00:33] updated the gist [18:40:01] <_joe_> it might mean that there is a bug in mcrouter [18:52:30] shocker! :DD [18:54:32] I'll troubleshoot my build a bit to try to rule out the most likely cause - me [18:58:13] how bad is the latency increase? [18:59:16] around 120% increase in p50 (mcrouter_sidecar.txt vs mcrouter_sidecar_unix_socket.txt) [18:59:31] granted, this is on a page that uses memcache a *lot* [19:00:06] (lot = 1k+ calls) [19:59:59] I forget, is there a usual apt component where we put trivially-backported stuff? [20:00:53] not sure if it's 'main' or something else [20:11:19] I donno if it's outdated, but wikitech says: [20:11:20] main: for Wikimedia native packages, as well as Debian/Ubuntu packages that have had source-modifications [20:11:23] universe: for existing Ubuntu packages that just have been recompiled or backported for the given distribution. [20:11:34] (found in main: for Wikimedia native packages, as well as Debian/Ubuntu packages that have had source-modifications [20:11:37] oops [20:11:44] (found in https://wikitech.wikimedia.org/wiki/Reprepro ) [20:16:13] Ubuntu :) [20:16:46] anyway it does look like ipvsadm trivially backports [20:18:03] yea, universe is ubuntu-only and we just have COMPONENTS="main backports thirdparty" when looking at installer files. so maybe "backports" it is [20:20:59] I don't think there's a 'backports' component since jessie, actually [20:21:14] judging by modules/aptrepo/files/distributions-wikimedia, IIUC [20:21:20] so main it is [20:21:53] oh if you're updating ipvsadm, maybe it will finally close a couple other tickets too [20:22:25] yea, fair enough. the source of that is just an "else" branch that starts with "if ubuntu" [20:23:49] cdanis: I see your T263788 and raise you a T171850 + T82849#3519518 [20:23:50] T263788: backport ipvsadm>=1.30 to buster-wikimedia or buster-backports - https://phabricator.wikimedia.org/T263788 [20:23:50] T82849: lvs servers report 'Memory allocation problem' on bootup - https://phabricator.wikimedia.org/T82849 [20:23:51] T171850: Backport ipvsadm - https://phabricator.wikimedia.org/T171850 [20:24:12] sorry but I'm scared of any task number with fewer than six digits [20:24:48] those two tasks should almost certainly be closed now right? [20:24:59] I suspect your 1.31 will have the patch e.ma references in it [20:25:15] whatever we currently have on buster, it still does the "memory allocation problem" thing [20:25:35] but mabe 1.31 will finally be new enough [20:26:30] 24 Dec 2019 [20:28:18] yeah Buster's version is 1.29 [20:28:39] and the patch e.ma refs to get those other two tickets closed, isn't in a release till 1.30+ [20:29:18] (because 1.29 was released in late 2016, there was a huge gap after that until 1.30 in july 2019) [20:38:18] just use the main component for importing the backport, the component/foo are mostly for libraries or similar non leaf packages (or for things where we selectively upgrade some systems, but not all, like the memcached 1.6 backport which is only used on the IDPs [20:38:42] since the new ipvsadm will reach all LVSes, simply using "main" is fine [20:39:40] ack! [20:39:51] thanks Moritz