[07:38:01] \o I have a mildly urgent change for the API GW that I need reviewed, any takers? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1062368 [08:47:10] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10063581 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main2010.codfw.wmnet with OS bullseye [08:53:29] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10063597 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main2010.codfw.wmnet with OS bullseye executed with error... [09:13:56] klausman: done [09:14:00] merci! [12:17:48] 06serviceops, 06Data-Engineering, 06Data-Platform-SRE, 10observability, and 3 others: Upgrade Kafka to from 1.x to later version - https://phabricator.wikimedia.org/T300102#10063948 (10elukey) @brouberol after T355550 do we have any plans to start testing the upgrade on kafka-test or similar? I can help if... [12:38:11] hello folks [12:38:35] I know that this will bring you joy https://gerrit.wikimedia.org/r/c/operations/software/thumbor-plugins/+/1062702 [12:38:52] I'd need to deploy thumbor for https://phabricator.wikimedia.org/T372466 [12:39:02] better - test it in staging, then prod [12:39:55] go for it, I have some tests I can hit it with in staging [12:41:30] thanks! [12:57:52] 06serviceops, 06All-and-every-Wikisource, 10Thumbor: Elevated 429 responses from Thumbor on codfw starting August 14th 00:00 UTC - https://phabricator.wikimedia.org/T372470 (10Poslovitch) 03NEW [13:07:34] 06serviceops, 06All-and-every-Wikisource, 10Thumbor: Elevated 429 responses from Thumbor on codfw starting 2024-08-14 00:00 UTC - https://phabricator.wikimedia.org/T372470#10064066 (10Aklapper) [13:29:08] filed https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1062706 :) [13:56:23] hnowlan: thumbor updated on staging :) [14:02:46] elukey: thanks! looks good to me [14:09:50] 06serviceops, 06All-and-every-Wikisource, 10Thumbor: Elevated 429 responses from Thumbor on codfw starting 2024-08-14 00:00 UTC - https://phabricator.wikimedia.org/T372470#10064176 (10hnowlan) I'm investigating this issue, but it does not appear to be the same issue as T337649 on first glance. Restarts and... [14:10:25] hnowlan: all right! Proceeding with codfw :) [14:10:42] 06serviceops, 07Kubernetes, 10SRE Observability (FY2024/2025-Q1): Alert on unscrapable pods - https://phabricator.wikimedia.org/T372242#10064183 (10lmata) [14:33:20] hnowlan: thumbor deployed! [14:33:44] thanks! [16:58:32] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10064707 (10JMeybohm) They all failed because the installer tried to bring up some old mdadm arrays and failed doing so, Maybe they where on the new disks, or it is bec... [17:00:30] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10064712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main2010.codfw.wmnet with OS bullseye [17:50:50] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10064969 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main2010.codfw.wmnet with OS bullseye executed with error... [19:52:14] 06serviceops: Prepare WMF PHP 8.1 packages for Bullseye - https://phabricator.wikimedia.org/T372507 (10Scott_French) 03NEW [22:48:52] 06serviceops, 07Wikimedia-production-error: PHP Notice: Undefined index: min_avail_workers in /srv/monitoring/lib.php on line 334 - https://phabricator.wikimedia.org/T372521 (10Krinkle) 03NEW [22:56:08] 06serviceops, 07Wikimedia-production-error: PHP Notice: Undefined index: min_avail_workers in /srv/monitoring/lib.php on line 334 - https://phabricator.wikimedia.org/T372521#10065600 (10RLazarus) a:03hnowlan This looks related to https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1...