[08:37:13] <wikibugs>	 06serviceops, 10CirrusSearch: Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794#10542969 (10Gehel)
[08:39:27] <wikibugs>	 06serviceops, 06collaboration-services, 06SRE, 13Patch-For-Review, 07Technical-Debt: Sunset search.wikimedia.org service - https://phabricator.wikimedia.org/T316296#10543018 (10Gehel)
[08:40:54] <wikibugs>	 06serviceops, 10CirrusSearch, 10envoy, 06Infrastructure-Foundations, and 4 others: Half a million of CirrusSearch jobqueue execution errors per hour since 2021-09-30 16:02 - https://phabricator.wikimedia.org/T292291#10543055 (10Gehel)
[13:39:55] <wikibugs>	 06serviceops, 10Deployments, 06Release-Engineering-Team, 07Wikimedia-production-error: httpb fails upon deployment of 1.44.0-wmf.5 - https://phabricator.wikimedia.org/T380958#10544343 (10Aklapper) Got this again when deploying `1.44.0-wmf.16` to `group1` for the second time today, but with a 302: ` 13:32:3...
[14:40:17] <wikibugs>	 06serviceops, 07PHP 8.1 support: Update PCRE in PHP 8.1 images to PCRE 10.39 or newer - https://phabricator.wikimedia.org/T386006#10544704 (10Scott_French) Following up, @MatthewVernon is the debian maintainer for pcre2, and (my understanding is) already has package builds running on our CI.  It sounds like it...
[15:19:17] <ottomata>	 Hi folks, it looks like k8s staging is undeployable right now.  
[15:19:17] <ottomata>	 k8s staging seems to be out of IP addresses
[15:19:17] <ottomata>	  - https://phabricator.wikimedia.org/T386107
[15:19:27] <ottomata>	 anything we can do?  i'm blocked upgrading eventgate to node20
[15:41:01] <wikibugs>	 06serviceops, 07Kubernetes: k8s staging seems to be out of IP addresses - https://phabricator.wikimedia.org/T386107#10545098 (10cmassaro) We've run into the same issue while deploying today:  ` $ kubectl get events LAST SEEN   TYPE      REASON                   OBJECT...
[15:42:18] <James_F>	 ottomata: We just ran into the same issue.
[15:45:23] <wikibugs>	 06serviceops, 07Kubernetes: k8s staging seems to be out of IP addresses - https://phabricator.wikimedia.org/T386107#10545111 (10Jdforrester-WMF) > No more free affine blocks and strict affinity enabled  Are we (and data systems) set with affinity to some particular set of hosts, whereas others (who've had no i...
[15:45:45] <claime>	 trying to look at it, but we're majority offsite so doing the best I can
[16:16:05] <claime>	 ottomata: Does eventgate-analytics need canary in staging?
[16:16:39] <claime>	 I don't have a simple way to add ip blocks right now so trying to free up by scaling down some things
[16:24:17] <claime>	 ottomata: eventgate isn't deploying rn because the container is in crahloopbackoff, not because it can't get an ip
[16:24:59] <claime>	 James_F: can you retry deploying wf please?
[16:25:08] <claime>	 I think I've made enough room to unblock
[16:25:28] <James_F>	 claime: Sure.
[16:30:02] <James_F>	 claime: Looks good, just deployed to staging successfully. Thanks! Didn't think you all would have time until next week. <3
[16:30:37] <claime>	 James_F: This is really a stopgap measure, we need to actually add pod ip blocks to make sure it doesn't happen again
[16:30:47] <James_F>	 Ack.
[16:31:02] <claime>	 but that's not something I'm comfortable doing with all the rest of my team on the other side of the atlantic
[16:31:09] <James_F>	 Totally.
[16:31:10] <James_F>	 And make the crashloop issue more visible, I suppose.
[16:31:52] <claime>	 We should have a way of alerting when we get close to ip exhaustion yeah
[16:34:58] <wikibugs>	 06serviceops, 07Kubernetes: k8s staging seems to be out of IP addresses - https://phabricator.wikimedia.org/T386107#10545352 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert Problem solved temporarily by removing some workloads from staging-eqiad. Creating subtasks for longer term action items.
[16:38:51] <wikibugs>	 06serviceops, 07Kubernetes: Add pod ip address blocks to staging-eqiad - https://phabricator.wikimedia.org/T386232 (10Clement_Goubert) 03NEW
[16:39:58] <wikibugs>	 06serviceops, 07Kubernetes: Create alerting when nearing pod ip exhaustion on kubernetes - https://phabricator.wikimedia.org/T386234 (10Clement_Goubert) 03NEW
[16:40:07] <wikibugs>	 06serviceops, 07Kubernetes: Create alerting when nearing pod ip exhaustion on kubernetes - https://phabricator.wikimedia.org/T386234#10545416 (10Clement_Goubert) p:05Triage→03Medium
[16:40:22] <wikibugs>	 06serviceops, 07Kubernetes: Add pod ip address blocks to staging-eqiad - https://phabricator.wikimedia.org/T386232#10545419 (10Clement_Goubert) p:05Triage→03High
[18:05:37] <ottomata>	 claime: okay thank you! 
[18:05:37] <ottomata>	 canary is not needed in staging, it was there more for developing the helm chart with canary support long ago
[18:07:24] <claime>	 ottomata: ok, cool, would you be open to removing that release from staging? it would be a couple pods less
[18:11:27] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06MediaWiki-Engineering, 06MW-Interfaces-Team, and 3 others: Migrate parsoidtest functionality to kubernetes - https://phabricator.wikimedia.org/T386246 (10jijiki) 03NEW
[18:36:01] <ottomata>	 can do
[21:31:45] <wikibugs>	 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Rename wikikube worker nodes during OS reimage - https://phabricator.wikimedia.org/T365571#10546839 (10ops-monitoring-bot) pool host wikikube-worker[1124-1128].eqiad.wmnet by kamila@cumin1002 with reason: reimage complete
[21:31:49] <wikibugs>	 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Rename wikikube worker nodes during OS reimage - https://phabricator.wikimedia.org/T365571#10546840 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by kamila@cumin1002 pool for host wikikube-worker[1124-1128]....