[08:00:02] Cteam: welcome to today 🦄! Don’t forget to post your update in thread. [08:00:02] Feel free to include: [08:00:02] 1. 🕫 Anything you'd like to share about your work [08:00:02] 2. ☏ Anything you'd like to get help with [08:00:02] 3. ⚠ Anything you're currently blocked on [08:00:02] (this message is from a toolforge job under the admin project) [08:08:36] Done: [08:08:36] * big and little fires everywhere after PTO [08:08:36] * [toolforge dns] dns was flaky from some worker nodes (~10%) and was breaking many tools and processes in random ways, took the workers out and provisioned new ones that don't see that issue (cause still unknown) [08:08:36] * [toolforge upgrade/calico] did a bit of debugging in the morning, blancadesal took over (and fixed it!) [08:08:36] Doing: [08:08:37] * [toolforge prometheus] prometheus is crashing with OOM periodically, and more and more frequently causing pages and alerts (as it should), I'm working on lowering the memory footprint by reducing the amount of metrics it collects (dropping the most expensive unused ones, promising so far, https://gerrit.wikimedia.org/r/c/operations/puppet/+/1067220) [08:08:37] * [toolsadmin/striker] it went down yesterday too (reason unknown, not sure why it would be related to the dns issue but timing matches), restarted the container and it's working ok so far, will close if nothing breaks again [08:08:38] * [wmcs-cookbooks,openstack] fixed a couple issues with the project specificication for some openstack cli calls (I'm sure others might be broken too, we'll fix on the fly) [08:08:38] * catching up, ~1k new emails to go [08:08:50] might take a short day today (yesterday was a 11+ hours day) [12:13:35] Done: [12:13:36] * [k8s] deployed a fix for Calico. The k8s 1.26 can now proceed. An upgrade window for toolsbeta has been scheduled for tomorrow 12 UTC. [12:13:36] * [k8s] tested the new version of Calico also with k8s 1.26 in lima-kilo [12:13:36] * cleared my inboxes :tada: [12:13:36] Doing: [12:13:36] * some more backlog from PTO [12:13:36] * preparing wikimania presentation for next week's team sync (many folks still out this week) [12:13:37] * [k8s] dealing with the ingress-nginx vuln https://phabricator.wikimedia.org/T37304 [12:13:37] ** testing v1.11.2 w/ k8s 1.26 in lima-kilo [14:47:18] Current status of toolforge prometheus before closing for the day, it's running stable with ingress-nginx stats disabled, I'm trying to find out the right config to filter out all the stats we don't need from ingress but have not found it yet