[07:01:27] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068522 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main2007.codfw.wmnet with OS bu... [07:34:15] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068548 (10JMeybohm) >>! In T371423#10064707, @JMeybohm wrote: > They all failed because the installer tried to bring up some old mdadm arrays and failed doing so, May... [07:41:10] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068556 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main2007.codfw.wmnet with OS bullseye completed: - kafka-... [07:55:00] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068582 (10JMeybohm) 05Open→03Resolved All of these have been reimaged with raid10-6dev now, thanks! [07:59:52] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10068587 (10JMeybohm) From T371423#10068548 I did: - Partition the new disks in current state: sgdisk -R /dev/sde /dev/sda; sgdisk -R /dev/sdf /dev/sda; sgdisk -G... [08:00:30] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068589 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye [08:01:51] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068591 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1007.eqiad.wmnet with OS bullseye [08:02:45] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068592 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1008.eqiad.wmnet with OS bullseye [08:04:03] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068593 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye [08:05:24] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068594 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye [08:47:41] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068647 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye executed with e... [08:48:52] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068648 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1007.eqiad.wmnet with OS bullseye executed with e... [08:49:54] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068649 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1008.eqiad.wmnet with OS bullseye executed with e... [08:50:56] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068650 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye executed with e... [08:52:24] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10068651 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye executed with e... [09:15:27] 06serviceops, 10API Platform, 10CirrusSearch, 10MediaWiki-Configuration, 03Discovery-Search (Current work): Provide a method for internal services to run api requests for private wikis - https://phabricator.wikimedia.org/T345185#10068712 (10Gehel) 05Open→03Resolved [09:33:47] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10068778 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye [09:34:34] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10068779 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1007.eqiad.wmnet with OS bullseye [09:34:41] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10068780 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1008.eqiad.wmnet with OS bullseye [09:35:12] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10068783 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye [09:35:43] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10068784 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye [10:10:37] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10068995 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye completed: - kafka-... [10:14:13] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10069004 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1008.eqiad.wmnet with OS bullseye completed: - kafka-... [10:16:53] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10069019 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye completed: - kafka-... [10:20:02] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10069027 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye completed: - kafka-... [10:22:00] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10069031 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kafka-main1007.eqiad.wmnet with OS bullseye completed: - kafka-... [10:23:55] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10069033 (10JMeybohm) 05Open→03Resolved a:03VRiley-WMF All of these have been reimaged with raid10-6dev now, thanks! [11:18:47] 06serviceops, 06All-and-every-Wikisource, 10Thumbor: Elevated 429 responses from Thumbor on codfw starting 2024-08-14 00:00 UTC - https://phabricator.wikimedia.org/T372470#10069145 (10hnowlan) To better trace this issue, could I get a sample of failing URLs please? [14:54:55] 06serviceops, 07Datacenter-Switchover: Audit / update switchover-related cookbooks - https://phabricator.wikimedia.org/T372649 (10Scott_French) 03NEW [14:56:10] 06serviceops, 07Datacenter-Switchover, 13Patch-For-Review: Audit / update switchover-related cookbooks - https://phabricator.wikimedia.org/T372649#10069713 (10Scott_French) [14:56:12] 06serviceops, 07Datacenter-Switchover: Southward Datacenter Switchover (September 2024) - https://phabricator.wikimedia.org/T370962#10069714 (10Scott_French) [15:18:42] 06serviceops, 06MediaWiki-Engineering: EtcdConfig using stale data: lost lock in /srv/mediawiki/php-1.42.0-wmf.1/includes/config/EtcdConfig.php on line 218 - https://phabricator.wikimedia.org/T349376#10069762 (10Krinkle) This message is still the most frequent in php-fpm error logs (nowadays, Logstash: mediawi... [17:00:34] 06serviceops, 10MW-on-K8s, 07Datacenter-Switchover: Control mw-on-k8s periodic maintenance jobs with an etcd value - https://phabricator.wikimedia.org/T367118#10070121 (10Scott_French) a:03Scott_French +1 to option #3 as the most sensible / obvious one: adding something more complex than a single global bo... [17:01:37] 06serviceops, 10MW-on-K8s, 07Datacenter-Switchover: Control mw-on-k8s periodic maintenance jobs with an etcd value - https://phabricator.wikimedia.org/T367118#10070124 (10Scott_French) Reminder to self: once live, wire this into the 01-stop-maintenance.py and 08-start-maintenance.py cookbooks. [17:22:13] 06serviceops, 10MW-on-K8s, 07Datacenter-Switchover: Control mw-on-k8s periodic maintenance jobs with an etcd value - https://phabricator.wikimedia.org/T367118#10070189 (10RLazarus) See also T359130 for the cookbook work. We aren't as far as I expected we'd be, so we can revisit which of those steps for cronj... [23:15:17] 06serviceops, 10MW-on-K8s, 07Datacenter-Switchover: Control mw-on-k8s periodic maintenance jobs with an etcd value - https://phabricator.wikimedia.org/T367118#10070811 (10Scott_French) Thanks, Reuven! Agreed, yeah: Some subset of those items will need done before the switchover, but exactly which subset dep... [23:32:40] 06serviceops, 10MW-on-K8s, 07Datacenter-Switchover: Update DC switchover cookbooks to handle maintenance scripts on k8s - https://phabricator.wikimedia.org/T359130#10070850 (10Scott_French) Thanks for writing this up, Reuven. At a minimum, completing "Update 01-stop-maintenance.py to wait for Jobs to termin...