[08:01:06] greetings [08:36:18] morning! [11:11:14] quick review adding coverage report/integration for jobs-api (as first repo) https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/283 https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/83 [12:23:08] very cool [13:19:30] [13:19:54] I'm surprised that actually works on first page load for me on chrome, subsequent loads then csp intervienes [13:20:24] actually sometimes the correct url is issued [13:20:52] i.e. on next load I got [13:21:30] does that ring a bell and/or a task? if not I'll open one [13:24:00] godog: I guess that's why I keep seeing styling issues on quips too. I haven't made a task yet though. [13:24:23] RhinosF1: ack, will open one now and cc you [13:24:27] happens to me too, weird [13:27:14] Perfect [13:27:53] {{done}} T422829 [13:27:53] T422829: Toolforge HTML head links sometimes are issued as http://.toolforge:443 - https://phabricator.wikimedia.org/T422829 [13:28:04] godog: might be an issue between how haproxy and nginx passes some header through [13:28:30] I upgraded ingress-nginx this morning, though it was a minor version upgrade so no big changes [13:29:13] taavi: could be yeah, would that explain the apparent randomness too ? [13:30:11] taavi@tools-k8s-haproxy-8:~$ curl --connect-to ::tools-k8s-gateway-1.tools.eqiad1.wikimedia.cloud:30000 http://sal.toolforge.org 2>&1 | grep stylesheet [13:30:11] [13:30:11] taavi@tools-k8s-haproxy-8:~$ curl --connect-to ::tools-k8s-ingress-7.tools.eqiad1.wikimedia.cloud:30002 http://sal.toolforge.org 2>&1 | grep stylesheet [13:30:11] [13:30:13] I'd guess different config/software in the lbs? [13:30:15] or maybe not [13:30:26] dcaro: this has been happening for at least a few days so it won't have been an upgrade this morning [13:30:33] ack [13:31:19] godog: if you could get the full headers of a broken response that'd be useful [13:31:50] for sure, I'll try to reproduce [13:32:09] well, apparenly I just did as well [13:32:45] hmm, with curl I don't seem to get it, only browser [13:34:19] wait [13:34:21] it is an istio bug [13:34:31] got it [13:34:42] look at the protocols there: https://phabricator.wikimedia.org/P90341 [13:34:42] https://www.irccloud.com/pastebin/rAS4MMDM/ [13:35:00] yep, one is http + port 443, the other https [13:35:34] I guess istio might be overriding x-forwarded-proto to http since the hop to it is not encrypted? [13:35:51] hmmm, maybe yep [13:38:22] please ping me if I can help with further debugging, I'll go back to popping and filing tasks from the stack I have [13:38:53] the cause seems pretty clear, now I just need to find a knob to tune that behaviour [14:16:01] bah, just spent a while wondering why istio doesn't handle a header coming from haproxy before I realized I was curling it directly so haproxy would not have had any effect [14:17:41] at least that means https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1218 does work [14:17:54] lol [14:18:11] typical aha moment, you need a break ;) [14:18:37] i may have had to get up rather early today [14:24:11] dhinus: did you wind up getting a chance to look at the tofu failure? [14:24:24] I tried a few things yesterday but didn't learn much [14:24:26] andrewbogott: nope, sorry [14:24:48] I only checked the last failure quickly and it was starting from a not-clean state [14:24:54] so I had some hopes for today's run :) [14:25:59] ok, I will stand back for now [14:26:46] today's run should have completed now, but I didn't check the result [14:30:53] 2 failures: openstack_containerinfra_cluster_v1 and openstack_db_instance_v1 (postgres db) [14:32:01] :( [15:50:53] godog: for ceph metrics, we pull them from the monitor nodes [15:51:27] looking, but I think I remember that we configured it to not export everything as it was using too much memory [15:51:35] (it was hanging when pulling the stats too) [15:52:05] dcaro: ah! thank you, that's good info, I'll see if I can pull more history from phab and/or file a task [15:53:17] https://www.irccloud.com/pastebin/0L2NvQex/ [15:53:23] from ceph config dump [15:55:08] nice ok I'll dig deeper starting from there [15:55:39] on puppet https://gerrit.wikimedia.org/g/operations/puppet/+/e73f466dfa92abd758299ecbb9fd0f69f2c304d9/modules/profile/manifests/prometheus/cloud.pp#128 [15:55:54] maybe all we adjusted was the internal ceph scrape interval [15:57:15] yeah makes sense that all of osd metrics would be too much, interesting [16:02:10] taavi: I have to go shortly though https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/1218 LGTM [17:51:51] Raymond_Ndibe: please take care of https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/163 [17:51:53] * dcaro off [17:51:57] cya!