[08:05:31] Do we know at which frequency a prometheus endpoint should be scraped? [08:12:46] the reason I'm asking is that the metrics exposed by the airflow scheduler can be a bit expensive to compute, and the recompute period is set to 5s by default. If we scrape metrics every minute, I'd rather align the recompute period with the statsd rollup period [08:27:26] every mins seems good enough in my opinion [08:27:50] if the metrics are expensive I mean [08:28:03] how long does it take to scrape? [08:28:22] because it needs to be well under a min, otherwise you'll start see lagging etc.. [08:30:28] when I curl, I get an immediate response, if that's what you're asking [08:30:38] the metrics generation happens asynchronoulsy [08:30:41] okok, what you mean with "expensive" then? [08:31:02] ah ok, and you get the last computed result immediately [08:31:13] when the new one is ready, then the next scrape will get the data [08:31:20] did I get it correctly? [08:31:41] I'm trying to reduce overall load on the scheduler, in order to free up as much cpu cycles as possible to scheduling the dumps v1 tasks, and according to documentation, some pool related metrics can be "a bit onerous to generate" [08:31:50] yep, that's exactly right [08:32:26] okok then I'd tune the scrape interval to account for that, so you don't risk to fetch the same data twice because the other one is still computing [08:32:27] so what I'm really trying to do here is adjust the metrics generation interval with the scraping interval, otherwise the generated metrics will get rolled up into statsd without any additional value [08:32:36] exactly yes [08:32:39] same thought [08:32:46] it's currently at 60minutes and it's improbable that it will be reduced any time soon [08:33:05] wait, 60 minutes? [08:33:50] I'd have expected something like 60s [08:34:41] 🤦 [08:34:45] 60 seconds ofc [08:34:53] sorry about that [08:34:53] okk, haha no worries [08:35:06] ok, so let's increase that metrics generation period to 60s as well [08:35:08] thanks folks! [08:38:19] np! [08:38:25] Filed a change to move citoid to ingress https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1135378 [08:42:45] +1ed [08:47:10] wow super quick thanks! [08:47:36] as far of rolling out [08:47:49] 1) I can deploy the patch as is since it will keep the node port [08:47:54] 2) test ingress calls etc.. [08:48:17] 3) create a citoid.k8s-ingress-wikikube.discovery.wmnet CNAME [08:48:27] 4) point clients to it [08:48:43] no idea where to look for 4), I guess puppet and mw-config? [09:15:52] it's exposed solely under RESTBase [09:16:36] ah lovely, so I'll ping Hugh :D [09:18:43] and being migrated to the rest-gateway, which shouldn't change too much [09:20:51] it's already fully behind the rest gateway thankfully [09:21:11] so we just need to change the hostname/port/networkpolicy for the gateway [09:23:06] hello hnowlan :D [09:23:25] do you have any test URLs that I can use by any chance? [09:23:29] otherwise I'll find them [09:25:56] yep, with the gateway or without? [09:26:00] with the gateway: https://rest-gateway.discovery.wmnet:4113/en.wikipedia.org/v1/data/citation/mediawiki/10.1038%2Fs41586-021-03470-x [09:27:03] citoid directly: https://citoid.discovery.wmnet:4003/mediawiki/10.1038%2Fs41586-021-03470-x [09:29:07] hnowlan: awesome, I tested in staging that the nodeport still works, going to test the ingress setup. Would you be available to assist if I rollout the change to prod? Just to make sure that everything work with the actual setup [09:30:55] yeah sgtm! [09:31:19] the staging rest-gateway is pretty representative of what prod will look like as far as internal testing goes [09:37:41] super, I am still battling with testing ingress in staging :D [09:45:34] elukey: the rest-gateway is in wikikube fwiw [09:45:55] so what I was talking about yesterday (egress rules, sni not rewriting host header) might apply [09:48:20] ah snap :( [09:48:58] akosiaris: re: staging, I see that wikifunctions have staging: true [09:49:19] and in fact, the gateway resource has host: wikifunctions.k8s-staging.discovery.wmnet [09:49:29] meanwhile for citoid I have the prod svc [09:49:41] and the istio gateway refuses my conn [09:50:09] IIRC you mentioned yesterday that you found why staging: true wasn't needed with the most recent istio ingress template [09:50:26] basically I am trying with curl https://citoid.k8s-staging.discovery.wmnet:30443/mediawiki/10.1038%2Fs41586-021-03470-x [09:53:01] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/981333/4/modules/ingress/istio_1.1.0.tpl [09:58:04] I think that {{- $domains := .Values.ingress.gatewayHosts.domains | default $certmanager_domains | default $fallback_domains -}} may not work [10:00:26] ok I got it sigh [10:00:49] so in the default values.yaml we have gatewayHosts->domains defined [10:01:09] I added it in the chart's value.yaml, so it gets inherited [10:01:19] for staging, it needs to be "~" [10:03:30] or better, we probably don't need to set it at all [10:09:00] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1135396 basically [10:13:39] the diff looks good [10:14:10] I'd be inclined to comment out those values in the module's values.yaml as well [10:19:09] I 'll have a look in a few [10:24:14] works now! [10:33:32] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1135402 for that [10:33:56] and also opened https://phabricator.wikimedia.org/T391457 to track the migration of services to ingress [12:59:22] citoid changes went out earlier on apparently: https://sal.toolforge.org/production?p=0&q=citoid&d= [14:04:00] so something that I didn't take into account [14:04:44] (scratch that, need some rework in my head first :D) [14:06:21] okok, so to make the citoid transition I'd need something like https://gerrit.wikimedia.org/r/c/operations/dns/+/1135433 [14:06:39] so citoid-ingress.discovery.wmnet CNAME -> k8s-ingress-wikikube.discovery.wmnet [14:06:48] because I cannot modify citoid.discovery.wmnet yet [14:19:03] and https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1135449 + next