[07:21:05] _joe_ ack thanks, IIUC poolcounter doesn't have any special settings on its side, it is the client that specifies (for a given lock) the max-workers/queue-size. Everything is event based, so pretty quick, TIL. Should I open a task anyway for the "known" ES slowdown? Not sure if it is tracked elsewhere [07:21:27] anyway, for the on-callers - I am going to decom the old poolcounter VMs, not receiving any traffic now [07:28:17] ack [08:41:54] effie: o/ I've just approved some dns changes for wikikube-worker2124 [08:42:25] interesting, I run the cookbooks [08:43:04] yes but I am decomming some vms [08:44:17] elukey: sorry I meant, I had already ran the cookbook that should have taken care of this [08:44:45] ah okok [08:45:14] maybe there is something in our renaming process that leaves somethings lingering and takes care of it later as we go [08:45:24] there was stuff like [08:45:25] -wikikube-worker2124 1H IN A 10.192.0.79 [08:45:29] -wikikube-worker2124 1H IN AAAA 2620:0:860:101:10:192:0:79 [08:45:32] +wikikube-worker2124 1H IN A 10.192.8.23 [08:45:36] +wikikube-worker2124 1H IN AAAA 2620:0:860:109:10:192:8:23 [08:45:37] it is being re-numbered [08:45:39] so it seemed changing the A/AAAA records [08:45:42] yes [08:45:50] okok then we are good [08:45:55] cool, tx [08:47:21] elukey: I am also seeing -poolcounter1005 1H IN A 10.64.32.236 [08:47:30] is that correct? [08:50:14] elukey ^ :) [08:54:44] effie: https://netbox.wikimedia.org/search/?q=poolcounter1005&obj_type= looks like it's not in netbox anymore, so I guess yes [08:55:09] https://netbox.wikimedia.org/extras/changelog/192399/ [08:55:33] sorry I was afk for a moment, yes I am decomming it [08:56:37] I am also going to decom poolcounter2003 and 2004 [09:00:42] I was reading T332015 and I was slightly confused, thank you! [09:00:43] T332015: Migrate poolcounter hosts to bookworm - https://phabricator.wikimedia.org/T332015 [10:38:16] elukey: moritzm - feel free to merge my pending puppet patch. [10:38:25] dc8f9ad59c [10:40:39] btullis: ack, might take a few mins, currently sorting out an issue with one of the patches [10:40:48] I'll ping you when it's merged [10:41:17] Cool, no hurry. [10:56:30] btullis: patches are merged [10:57:41] Ack, thanks. [10:58:13] sorryyy I forgot to puppet-merge sigh [11:03:15] we've all done it (more than once in my case) [12:29:09] swfrench-wmf and people oncall this week/helping with dc switchover, he had a hw issue on pc3, we will discuss it with the dbas and then update you, we may had to do failover before the switchover (TBD) [12:29:49] Context: T375382 [12:29:50] T375382: Post pc1013 crash - https://phabricator.wikimedia.org/T375382 [12:32:55] in other news, someone asked about domain redirects on wikitech over the weekend, not something urgent, and I have personally more or less an idea, but maybe traffic wants to give a more authoritative answer? [12:35:13] jynus: Noted, linking to switchover task [13:28:29] I think CI might be broken [13:28:32] 09:22:54 error during compilation: Function lookup() did not find a value for the name 'profile::configmaster::server_name' (file: /srv/workspace/puppet/modules/profile/manifests/configmaster.pp, line: 8) on node 03c6fe97ef0b.integration.eqiad1.wikimedia.cloud [13:30:54] stepping out for a bit but I see b32c2b919a1eab14756a29b9641297cd9143fe78 so we should just update the spec file I think [13:43:17] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1075013 [13:43:19] needs a review [13:43:27] hashar: ^ the failing CI [13:44:33] then I guess you wanna remove modules/profile/manifests/configmaster.pp as well? [13:45:02] I am not sure :] [13:45:24] ah no [13:45:30] the spec uses the role "puppetmaster/frontend" [14:13:07] Update pc3 crash: So after meeting, it seems the most likely turn of action will be to move the spare host to pc3 before the switchover [14:13:30] jynus: ack, thanks for highlighting that! [14:14:07] this would be the normal course of action, just today it happens we don't have a lot of dbas around! [16:11:01] for the on-callers: the docker registry has now two new VMs in service, registry[12]005, and registry[12]003 are now pooled=no [16:11:25] the VMs are on bookworm and run the same sw, so no issues are expected [16:11:54] but if you notice any weirdness in deployments etc.. you can easily revert (depool *005, pool *003) [16:12:03] more info https://phabricator.wikimedia.org/T332016 [16:12:15] if nothing comes up I'll reimage the other VMs to Bookworm :) [18:21:31] I have a dumb question: during the pc2013 death, I noticed text sometimes flakily loading but images never loading (logged out, multiple devices). Why would a parser cache problem affect images more than text? Or was that random?