[11:36:35] <_joe_>	 XioNoX: did we change something re: esams-eqiad routing today?
[11:36:48] <XioNoX>	 _joe_: not afaik, let me check
[11:37:12] <_joe_>	 I see this jump in kafka RTT in esams https://grafana.wikimedia.org/d/RvscY1CZk/purged?viewPanel=36&orgId=1&from=1615935497437&to=1615981017642&var-datasource=esams%20prometheus%2Fops&var-cluster=cache_text&var-instance=cp3058
[11:37:53] <_joe_>	 but not in any other DC
[11:38:02] <XioNoX>	 _joe_: I see it there too https://smokeping.wikimedia.org/?target=esams.Hosts.bast3005
[11:38:28] <_joe_>	 that also means an additional 20 ms of rtt on every uncached request from esams
[11:41:04] <XioNoX>	 yeah, brief cut there too https://librenms.wikimedia.org/graphs/to=1615980900/id=6835/type=port_bits/from=1615894500/ maybe they re-routed ouw wavelength, let me check maintenance calendar
[11:42:14] <XioNoX>	 _joe_: there was a maintenance, but it ended, let me check more
[11:44:01] <XioNoX>	 "Click here to open a case for assistance on this scheduled maintenance via the Lumen Customer Portal. "
[11:44:05] <XioNoX>	 perfect
[11:51:10] <XioNoX>	 Your ticket #20890103 has been successfully created.
[11:51:47] <XioNoX>	 _joe_: is it causing an issue? there is the option of failing over to the backup circuit, but it can become expensive if used for a long period of time
[11:58:49] <_joe_>	 XioNoX: just a perf degradation, nothing more
[12:00:28] <XioNoX>	 ok! will open a talk
[12:00:54] <XioNoX>	 _joe_: did something alert or you found out randomly about it?
[12:01:22] <_joe_>	 XioNoX: I was looking at that dashboard, but I swear I had a good reason to do so
[12:01:24] <_joe_>	 :P
[12:01:29] <XioNoX>	 :)
[12:02:20] <XioNoX>	 Lumen says it's a set of multiple maintenances, so may guess is that they diverted our wavelength the time to fix something, but let's see
[12:10:46] <XioNoX>	 https://phabricator.wikimedia.org/T277654
[14:52:29] <Majavah>	 o/ can someone merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/669477/? thanks!
[14:57:11] <moritzm>	 Majavah: will do
[14:59:47] <Majavah>	 thanks moritzm!
[22:23:34] <Krinkle>	 effie: is a daemonset more or less the same as  "side car" ?
[22:26:43] <Krinkle>	 Hm.. seems they are simialr but the difference is that when a k8s worker runs multiple pods for the same app, deamonset effectively deduplicates the sidecar demand
[22:26:56] <Krinkle>	 I assumed sidecars already worked that way, so cool, better :)
[22:27:20] <effie>	 daemonsets guarantees that each node will run one copy of a pod, in this case where mcrouter will live
[22:28:57] <Krinkle>	 and if I understand your breakdown correctly, unlike a sidecar, a deamonset would make mcrouter its own pod, and thus they can die separately
[22:29:05] <effie>	 and pods running in that node can potentially access this pod
[22:29:22] <effie>	 a sidecars live within the pod
[22:29:51] <Krinkle>	 Im not sure I understand the distinction between e.g. mcrouter process dying within a given MW pod, vs the mcrouter separate deamonset pod abe to die.
[22:29:52] <effie>	 yes
[22:30:02] <Krinkle>	 is it more liely to die as a deamonset?
[22:30:49] <effie>	 if it is running as daemonset, all pods running in this node, will lose access to a working mcrouter
[22:31:37] <effie>	 while in the other case, the specific pod will lose access to mcrouter
[22:32:12] <Krinkle>	 ah, I see. so it's not per se that we're worried about the container's own failure likelihood
[22:32:13] <effie>	 on th otgher hand, mcrouter has caused us little trouble when it comes to daily operation
[22:32:17] <Krinkle>	 but just the impact of failure in general
[22:32:50] <Krinkle>	 yeah, it would be amplified in that case.
[22:33:09] <Krinkle>	 if it's just one, I suppose that would quickly cause that pod to be killed or restart as a whole or otherwise recover, and thus affect fewer ongoing requests.
[22:33:24] <effie>	 I do not recall a mcrouter in an mw* host being unavailable and in need of a restart unless there was a bad config deployment
[22:33:53] <Krinkle>	 for things that are as light as mcrouter appears to be though, maybe t's okay to duplicate as sidecar along each pod. The on-host memcached, with hits memory consumptino and improved cache reuse seems more benefitting from the deamonset approach.
[22:34:02] <Krinkle>	 (and has a better failure scenario anyway, just cache miss basically)
[22:34:23] <effie>	 yes, an unavailable onhost memcached is not a problem
[22:35:39] <Krinkle>	 does k8s health-track each process in a pod (e.g. the app and the sidecar) separately such that it gets no traffic if either of them die as a process?
[22:35:53] <Krinkle>	 or does that only apply to the "main" process
[22:36:01] <Krinkle>	 e.g. the nginx or apache process I guess
[22:37:15] <effie>	 k8s uses the readiness probles to know when a container can accept traffic
[22:38:52] <effie>	 and the liveness to check if this container needs to be restarted or not
[22:40:21] <effie>	 we do not make changes to mcrouter frequently, well apart from these days, where we have the the upcoming TLS work and server refresh
[22:40:45] <effie>	 one thos eare done, I don't know when we will change the mcrouter config again
[22:41:26] * effie off