[08:18:20] morning! I'm back from vacation, currently catching up on email, meeting recordings, etc. [08:18:26] let me know if there is something that requires more immediate attention! [08:26:30] welcome back! [08:29:52] blancadesal: I hope you had time to disconnect and recharge :), I had little success getting https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/17 reviewed, so feel free to take it back/review whenever you are tired of reading email xd [08:32:51] is anyone looking at "DegradedArray event on /dev/md/0:cloudcephosd1031"? [08:33:28] taavi: no, and I don't see that in alerts.w.o or in the -feed channel [08:33:41] that's an email I got via the root@ alias [08:33:43] https://usercontent.irccloud-cdn.com/file/2nMospJB/image.png [08:34:25] it seems I did not get that email either [08:34:46] T364060 also it seems [08:34:47] T364060: Degraded RAID on cloudcephosd1031 - https://phabricator.wikimedia.org/T364060 [08:36:07] ceph is HEALTH_OK so maybe there is not much to do, other that wait for the failed drive to be replaced by DCops [08:36:13] other than* [08:40:00] it's the OS raid, it's been a while since that though :/, I guess I missed it too [08:44:59] dcaro: I just approved the oapi patch [08:47:33] thanks! [11:17:41] * arturo brb [13:55:03] andrewbogott: puppet runs on instances using the central puppet server are saying this which is making me think the puppet updating is stuck somehow: [13:55:04] > Info: Applying configuration version '(0631e78bec) Andrew Bogott - puppetserver-deploy-code: bail out if current branch is not 'production'' [13:55:19] since that commit is now a week old [13:56:19] taavi: yes, it's broken. That's T364492 which I'm working on this morning (but I've just swerved away from the solution I thought was the right one.) [13:56:20] T364492: Ownership confusion on cloud-local puppet servers - https://phabricator.wikimedia.org/T364492 [15:44:34] join #rapid7 [15:46:13] * bd808 waves to jbond [15:47:44] * jbond o/ [16:00:17] * arturo offline [16:28:11] Where are documents on the last toolforge rebuild? [16:31:59] Rook: what do you mean with rebuild? k8s upgrade? [16:34:14] An upgrade could be useful, though I'm looking for the last time it was fully rebuilt. I remember seeing docs about it somewhere wikitech but I don't remember where they were [16:36:41] I have not seen it fully rebuilt, the upgrade docs are https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Upgrading_Kubernetes [16:37:05] https://phabricator.wikimedia.org/T363683 might be relevant too [16:38:27] this has some info on how would you bootstrap a new cluster https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/New_cluster [16:38:29] It was an older doc, I think it was Brooke and Artur.o who were rebuilding it... [16:38:33] only on the k8s side though [16:39:23] Regardless I think what you liked should get me somewhere. Thanks! [16:39:51] 👍 feel free to ask for specifics [16:39:53] gtg though [16:40:00] cya tomorrow [16:45:15] Rook: you might find useful also the lima-kilo project to setup toolforge locally, though it's not setting up everything (https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo), not sure what you are looking for but that might give you better insight on the components themselves [16:45:33] Thanks [16:59:37] Rook: There may be things of interest linked from T214513. That was the epic for the 2020 cluster build () which was the last time we started a new kubernetes cluster in Toolforge from scratch. [16:59:38] T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster - https://phabricator.wikimedia.org/T214513 [17:01:09] Neat I'll look at that. Thanks [18:27:15] Do we have any plans of adding lbaasv2 to neutron? [20:59:24] Rook, I don't know about lbaasv2 specifically but after Taavi migrates us to OVS it will at least be possible to adopt some more normal neutron features. [21:01:56] Alrighty, thanks