[13:44:45] could I get a quick review, volans so I can test your patches? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1080707 [13:44:54] sure [13:45:09] as currently test-s4 is not a valid alias [13:45:27] can be reverted later if needed, I just want to test it [13:46:30] {done} [13:46:33] yeah make sense [13:46:44] and I don't see any real problem to leave a test alias around too [13:47:06] I think it should be in too, but people can disagree on the ticket if they want later [13:47:20] is not that the list of alias is used as a source for anything [13:47:23] ack [13:47:26] T374933 [13:47:26] T374933: Add section alias for databases in the test-s1 and test-s4 sections - https://phabricator.wikimedia.org/T374933 [13:49:08] volans: quicky on source of truth- I think we have a problem, don't get me wrong [13:50:26] but I feel our team is forgotten when we ask for stuff that is hard for us [14:04:55] I saw minor defects on the proposed patch- I think there are some edge cases of running in dry mode failing expenctations, but will compile all rather than report one by one [14:05:03] *expectations [14:06:02] as you prefer, also one by one here it's fine [14:06:08] or maybe I will put the outputs on the ticket but not expecting actioanbles [14:12:22] I will have an interview soon so I may not be able to finish all testing today (sorry) [14:12:32] no prob [14:36:54] elukey: re: T363996, when was mesh enabled for echostore? [14:36:54] T363996: Sessionstore's discovery TLS cert will expire before end of May 2024 - https://phabricator.wikimedia.org/T363996 [14:37:44] urandom: o/ IIRC last week, the tls cert was expiring and Scott deployed the new version [14:38:27] urandom: https://sal.toolforge.org/production?p=0&q=echostore&d=2024-10-10 on the 10th I'd say [14:38:54] elukey: yeah, just found the same :) [14:40:35] elukey: umm...https://grafana-rw.wikimedia.org/d/IfJykaTZk/echostore?orgId=1&from=now-7d&to=now&viewPanel=50 [14:40:54] am I reading that right, would that be...better latency? [14:41:59] a smaller share of the 2.5-5ms bucket, larger share of the 1-2.5ms *after* the deployment? [14:42:12] never seen a graph like this before :D but yes I read that we jumped the baseline that ranges between 1-2.5 ms [14:42:39] that before was occupied by 2.5-5ms [14:43:56] elukey: here's another way of looking at that: https://grafana-rw.wikimedia.org/d/IfJykaTZk/echostore?orgId=1&from=now-9d&to=now&viewPanel=47 [14:44:28] looks a very nice improvement to me [14:44:35] same thing though, a higher share of the lower latency requests / lower share of the bucket right above [14:44:43] elukey: why is that, you think? [14:45:03] why is it improved, I mean? [14:45:15] urandom: almost surely envoy, tuned for TLS termination [14:45:35] so we disabled TLS in Kask? [14:46:02] exactly yes, it listens for plaintext on a specific port that envoy proxies to [14:46:02] sorry, I should probably try harder to reverse engineer the chart changes [14:46:09] nono please I can summarize :) [14:46:53] Ok, so this an extra network hop, but Envoy vs Kask doing TLS? [14:46:58] when we enable mesh envoy gets deployed as sidecar and takes over ingress/egress traffic (egress if explicitly used, like setting a proxy to localhost:port etc..) [14:47:17] gotcha [14:47:30] it fetches a TLS cert from cfssl automatically, sets up buffers and connection pools, etc.. [14:48:08] I guess that kask can reach the same level of performance but it would need to be tuned for tls termination like we do for envoy [14:48:43] yeah, I never really made any attempts to optimize there [14:48:44] but with mesh we don't have to care so it is a win [14:48:48] right [14:48:53] let me update the ticket [14:52:31] elukey: https://grafana-rw.wikimedia.org/d/IfJykaTZk/echostore?orgId=1&from=now-9d&to=now&viewPanel=23 (mean latency) [14:55:50] nice drop :) [14:56:14] so far it seems that session store would benefit from the change [14:57:42] I'm not sure what else we could do to better establish that, short of rolling back sessionstore staging, testing, re-deploying mesh, and testing again