[09:33:09] Hi, there is this changeprop charts change I would like to deploy: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1064013. cc @hnowlan is this a good time or folks are still busy with other things ? [09:39:18] nemo-yiannis: works for me - anything we should be watching as you roll it out? [09:40:13] i can monitor the change on PCS (i deployed some observability yesterday) other than that i guess we need to make sure there are no error spikes etc [09:40:30] and there is no drop on requests to restbase [09:41:26] ack [09:41:43] ok deploying now [09:45:08] hm identation looks a bit broken in the diff: https://phabricator.wikimedia.org/P70571 [09:47:03] hmmm yeah [09:48:45] it's the use of "-}}" in your change [09:51:25] yeah [09:51:56] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1082430 [09:52:58] CI diff looks better [10:07:00] Apologies i should have checked the CI output diff first :/ https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1082435 [10:41:23] hnowlan: the thumbor error rate has gone up a bit since 03:30 in codfw; don't suppose you could check it out, please? [I saw the rise on the swift graphs, but I think it's thumbor errors being passed on] [10:48:15] Emperor: looking [10:48:24] <3 [10:49:56] looks like https://phabricator.wikimedia.org/T374350 [10:50:05] deleted the pod, should improve from now [10:50:35] I'll try to look into a fix for this this week [10:55:19] hnowlan: done [10:56:17] Thanks! [11:00:49] hnowlan: objections on killing the sessionstore pod that's on the wrong node? [11:02:42] claime: go for it, thank you [11:03:38] hmm [11:03:46] it respawns on the wrong node again [11:03:52] Did we change sizing or something? [11:04:37] ah it's got pods that it shouldn't have, namely cxserver, media-analytics and shellbox [11:04:40] I'll evict them [11:07:31] hmm no, that's not it, I checked the wrong node... [15:15:03] hnowlan, swfrench-wmf: What's the plan for codfw sessionstore? I've got a thing in ~15min lasting ~1hour, but free afterward [15:15:35] urandom: I'm happy to just roll ahead and do codfw without doing siege again, does that work for you? [15:15:42] it does yeah [15:16:01] I was about to say: Not sure I'm needed for this, but happy to be around and help watch graphs or whatnot [15:31:40] hnowlan: I have a meeting at 17:00 UTC, but am otherwise around and happy to help [15:32:35] swfrench-wmf: cool, thanks! let's try at 1600? [15:32:57] ack, sounds good [15:33:27] 👍 [16:07:47] swfrench-wmf: if you're okay for me to start, I'll depool codfw. change is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1082441 [16:08:22] hnowlan: I'm around and ready when you are :) [16:09:07] cool, thanks [16:20:08] okay, we're at zero in codfw. Destroying [16:23:10] alright, deployment looks good [16:23:29] I'll repool unless there's any objections [16:23:37] SGTM [16:30:40] all done. Logins/sessions look okay [16:31:27] hnowlan: nice! yeah and latency is settling back to normal [17:11:12] has there been any changes made to iptables stuff today? [17:14:29] sukhe: some hosts in prod have begun using nftables [17:15:33] cdanis: yeah thanks [17:15:57] oh sorry you said 'today', I misread at first [17:16:18] I don't know of anything directly changed today [17:16:45] yeah we are having this weird issue on the dns hosts where authdns-update is failing as SSH as it is trying over v6 [17:19:38] acutally scratch that, still seems to be v4 but timing out nonetheless