[08:37:13] 06serviceops, 10CirrusSearch: Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794#10542969 (10Gehel) [08:39:27] 06serviceops, 06collaboration-services, 06SRE, 13Patch-For-Review, 07Technical-Debt: Sunset search.wikimedia.org service - https://phabricator.wikimedia.org/T316296#10543018 (10Gehel) [08:40:54] 06serviceops, 10CirrusSearch, 10envoy, 06Infrastructure-Foundations, and 4 others: Half a million of CirrusSearch jobqueue execution errors per hour since 2021-09-30 16:02 - https://phabricator.wikimedia.org/T292291#10543055 (10Gehel) [13:39:55] 06serviceops, 10Deployments, 06Release-Engineering-Team, 07Wikimedia-production-error: httpb fails upon deployment of 1.44.0-wmf.5 - https://phabricator.wikimedia.org/T380958#10544343 (10Aklapper) Got this again when deploying `1.44.0-wmf.16` to `group1` for the second time today, but with a 302: ` 13:32:3... [14:40:17] 06serviceops, 07PHP 8.1 support: Update PCRE in PHP 8.1 images to PCRE 10.39 or newer - https://phabricator.wikimedia.org/T386006#10544704 (10Scott_French) Following up, @MatthewVernon is the debian maintainer for pcre2, and (my understanding is) already has package builds running on our CI. It sounds like it... [15:19:17] Hi folks, it looks like k8s staging is undeployable right now. [15:19:17] k8s staging seems to be out of IP addresses [15:19:17] - https://phabricator.wikimedia.org/T386107 [15:19:27] anything we can do? i'm blocked upgrading eventgate to node20 [15:41:01] 06serviceops, 07Kubernetes: k8s staging seems to be out of IP addresses - https://phabricator.wikimedia.org/T386107#10545098 (10cmassaro) We've run into the same issue while deploying today: ` $ kubectl get events LAST SEEN TYPE REASON OBJECT... [15:42:18] ottomata: We just ran into the same issue. [15:45:23] 06serviceops, 07Kubernetes: k8s staging seems to be out of IP addresses - https://phabricator.wikimedia.org/T386107#10545111 (10Jdforrester-WMF) > No more free affine blocks and strict affinity enabled Are we (and data systems) set with affinity to some particular set of hosts, whereas others (who've had no i... [15:45:45] trying to look at it, but we're majority offsite so doing the best I can [16:16:05] ottomata: Does eventgate-analytics need canary in staging? [16:16:39] I don't have a simple way to add ip blocks right now so trying to free up by scaling down some things [16:24:17] ottomata: eventgate isn't deploying rn because the container is in crahloopbackoff, not because it can't get an ip [16:24:59] James_F: can you retry deploying wf please? [16:25:08] I think I've made enough room to unblock [16:25:28] claime: Sure. [16:30:02] claime: Looks good, just deployed to staging successfully. Thanks! Didn't think you all would have time until next week. <3 [16:30:37] James_F: This is really a stopgap measure, we need to actually add pod ip blocks to make sure it doesn't happen again [16:30:47] Ack. [16:31:02] but that's not something I'm comfortable doing with all the rest of my team on the other side of the atlantic [16:31:09] Totally. [16:31:10] And make the crashloop issue more visible, I suppose. [16:31:52] We should have a way of alerting when we get close to ip exhaustion yeah [16:34:58] 06serviceops, 07Kubernetes: k8s staging seems to be out of IP addresses - https://phabricator.wikimedia.org/T386107#10545352 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert Problem solved temporarily by removing some workloads from staging-eqiad. Creating subtasks for longer term action items. [16:38:51] 06serviceops, 07Kubernetes: Add pod ip address blocks to staging-eqiad - https://phabricator.wikimedia.org/T386232 (10Clement_Goubert) 03NEW [16:39:58] 06serviceops, 07Kubernetes: Create alerting when nearing pod ip exhaustion on kubernetes - https://phabricator.wikimedia.org/T386234 (10Clement_Goubert) 03NEW [16:40:07] 06serviceops, 07Kubernetes: Create alerting when nearing pod ip exhaustion on kubernetes - https://phabricator.wikimedia.org/T386234#10545416 (10Clement_Goubert) p:05Triage→03Medium [16:40:22] 06serviceops, 07Kubernetes: Add pod ip address blocks to staging-eqiad - https://phabricator.wikimedia.org/T386232#10545419 (10Clement_Goubert) p:05Triage→03High [18:05:37] claime: okay thank you! [18:05:37] canary is not needed in staging, it was there more for developing the helm chart with canary support long ago [18:07:24] ottomata: ok, cool, would you be open to removing that release from staging? it would be a couple pods less [18:11:27] 06serviceops, 06Content-Transform-Team, 06MediaWiki-Engineering, 06MW-Interfaces-Team, and 3 others: Migrate parsoidtest functionality to kubernetes - https://phabricator.wikimedia.org/T386246 (10jijiki) 03NEW [18:36:01] can do [21:31:45] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Rename wikikube worker nodes during OS reimage - https://phabricator.wikimedia.org/T365571#10546839 (10ops-monitoring-bot) pool host wikikube-worker[1124-1128].eqiad.wmnet by kamila@cumin1002 with reason: reimage complete [21:31:49] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Rename wikikube worker nodes during OS reimage - https://phabricator.wikimedia.org/T365571#10546840 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by kamila@cumin1002 pool for host wikikube-worker[1124-1128]....