[04:41:54] ceph in codfw1dev is in a bad way thanks to the mon nodes being in a weird chicken/egg situation. The cloudbackup100x-dev alerts are from that. [04:42:06] Feel free to ignore and I'll keep prodding it when I'm awake [08:09:09] andrewbogot.t I think mons might not be upgradable in-place, but need to be readded, the db format they use changed in quincy [08:10:07] https://pve.proxmox.com/wiki/Ceph_Pacific_to_Quincy [08:12:50] the official docs don't mention it in the upgrade section though :/ https://docs.ceph.com/en/latest/releases/quincy/#upgrading-non-cephadm-clusters [08:21:49] the new KernelErrors on cloudcephosd1041 are a bit worrying T400222 [08:21:52] T400222: KernelErrors Server cloudcephosd1041 logged kernel errors - https://phabricator.wikimedia.org/T400222 [08:22:13] a quick google suggests it is an "error on one of your RAM modules, which was not able to be corrected" [08:25:37] but another kernel message suggests it //was// corrected, so maybe not so critical [15:27:39] taavi: did you already switch us over to the new acme server in project-proxy? Seems like the alerts went away. [15:27:48] yes! [15:28:23] great, then I'll continue wrestling with codfw1 ceph. thanks! [17:14:39] I //thought// I understood how Cinder snapshots worked, but I probably don't :) T400285 [17:14:40] T400285: [cinder] Clean up unused linkwatcher volumes in "trove" project - https://phabricator.wikimedia.org/T400285 [17:15:35] not urgent at all, but andrewbogott I'm curious if you have an explanation for that behavior (a snapshot apparently vanished without me deleting it...) [17:16:15] if it was a snapshot managed by the backup service then it might've been cleaned up during incremental backup [17:16:19] otherwise I don't have a guess [17:17:14] it was not [17:17:28] but that's an interesting guess, I wonder if it's possible the "cleanup" process we have [17:17:34] can delete other snapshots [17:18:04] even more mysterious, because that snapshot had a child, so I thought I could not even delete that snapshot myself :D [17:23:49] I'll stop here for today and see if I can make sense of it tomorrow with a fresh mind :) [17:24:08] * dhinus off [20:30:27] Does the iops multiplier on a flavor (e.g. g4.cores2.ram4.disk20.ephemeral20.4xiops) change the iops for a volume attached to an instance of that flavor too? [20:32:06] I'm trying to think out the quota request for scaling the zuul magnum cluster up and wondering if there is a way to get the volumes that Magnum sets up for working space to be faster. Mostly because we have had issues with CI being slow when disk is slow before. [21:08:32] andrewbogott: do you have some time to chat about magnum and flavors? I'm wondering if you can help me reason about things like ephemeral storage and ceph IOPS for the zuul test runner cluster I'm working on. [23:54:09] bd808: sorry, I was out on a wild goose chase. [23:55:10] As far as I know, the iops is a characteristic of the volume and not of the VM that it's mounted on. I would have to test to be sure. [23:55:25] I think we can create new volume types with different iops but I'm not 100% sure