[00:06:16] @inflatador: Will do. I thought we should be able to see it in Grafana, too, so thank you for confirming that that's anomalous. [00:07:22] @swfrench-wmf: For the memory issue, https://phabricator.wikimedia.org/T400515. For the other, we don't (yet) have a direct task, but some of the symptomatic logs can be seen here: https://phabricator.wikimedia.org/T400757. We seem to have an undocumented 100kB request size limit in our backend services. [00:39:42] apine: got it, thanks. from a quick glance, a default heap size of 512 MiB sounds plausible given the container memory limit you're using on orchestrator - i.e., 50% of 1GiB, which is coming from the default value in the orchestrator chart [0]. [00:39:42] if needed, you should be able to override those defaults in the helmfile values for the service - e.g., in [1] - though, depending on what value you have in mind, some other changes may be necessary on our end to allow that (there are limits on how high you can request). [00:39:42] in any case, I can ask around a bit tomorrow on the other one (the request size limit doesn't sound familiar off hand). [00:39:42] [0] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/function-orchestrator/values.yaml#21 [00:39:42] [1] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/helmfile.d/services/wikifunctions/values-main-orchestrator.yaml [00:44:51] @swfrench-wmf: Thank you! Okay, that makes sense. In local testing, 2GiB seems safe, but this issue is hard to repro--it's ultimately related to GC, so very spiky. I'd want to try bumping the chart to 2 CPUs and 4GiB. Would that be reasonable or beyond the limits for what we can request? [00:52:22] that cpu limit should be fine, but I think the memory limit might require a change on our side to allow it. I'll take a look to confirm tomorrow, unless someone else from serviceops does in the interim :) [20:26:26] Are the ES hosts for trace.wikimedia.org directly queryable somewhere? rather than going via trace.wikimedia.org/api/ ? [20:37:24] addshore: you can also get at the raw spans storage via the kibana “discover” interface — I can help you with that tomorrow if you need [20:37:49] ooooo [20:38:02] how long are they kept for? and whats the request sampling rate? [20:43:09] oooooh, `select * from `jaeger-span-2025.08.06` limit 10` [21:00:00] oooh, and found it in discover, thanks for the pointer [21:06:51] addshore: sampling rate depends on service, you can find it in deployment-charts; for mediawiki I think it’s 0.1% iirc [21:07:00] Retention is 90d [21:07:04] 👍 [21:07:38] oh except mw-debug is 100% :) [21:11:36] right, time for a nap