[08:06:09] can I start the s8 codfw master flip ? https://phabricator.wikimedia.org/T397164 [08:25:59] Amir1: is the rolling restart script trusted for es* hosts? I see it has support for them https://gitlab.wikimedia.org/repos/data_persistence/dbtools/scripts/-/blob/main/rolling_restart.py#L90 [09:19:01] federico3: for es, it works on RW sections (es6/es7) but for RO sections it's a bit weird, auto_schema needs some changes too [09:19:20] I made those changes locally but haven't pushed it [09:38:05] Amir1: ah because yesterday Manuel and me were looking at doing it for RO sections. If you have some changes WIP can you push them in a branch? [09:38:52] we have the glue to fetch the https://noc.wikimedia.org/dbconfig/eqiad.json etc and find out who the RO masters are [10:25:43] Emperor: I've been following https://wikitech.wikimedia.org/wiki/Debian_packaging/Package_your_software_as_deb but there seems to be some duplication with https://wikitech.wikimedia.org/wiki/Debian_packaging_with_dgit_and_CI#Create_Repository_and_set_up_CI_for_package_builds and https://wikitech.wikimedia.org/wiki/Debian_packaging/Tutorial#Building_with_CI [10:27:16] federico3: yes, that first link is a process that _joe _ set up because they didn't think the process documented in the latter links was quite what they wanted [10:27:35] oh, https://wikitech.wikimedia.org/wiki/Debian_packaging#Upload_to_Wikimedia_Repo is also duplicated [10:32:39] now `sudo -i reprepro --noskipold --restrict vopsbot checkupdate bookworm-wikimedia` is not showing any pending update for bookworm and yet the new package does not seem to be pulled in [10:32:41] I don't think that's duplication? Anyhow, it needs to talk about the staging repo too at some point [10:36:49] hum, in the CI run I see "dcmd cp ../*.changes WMF_BUILD_DIR/ " - do you know what that is? [10:38:28] federico3: documented in https://wikitech.wikimedia.org/wiki/Debian_packaging_with_dgit_and_CI#Package_Building [10:39:09] (FWIW, I think their workflow is less good than the one I documented ;) ) [12:18:30] Emperor: so... as I've mentioned in past meetings, I've been working on a JBOD setup for Cassandra. I got it all working right —even reimaged a couple of production machines with it— but now I'm thinking it's not so right. It's creating labels and mount points that are based on device names (sda, sdb, sdc, etc)...I reckon you know where I'm going with this... [12:20:58] functionally it's fine, I did use labels, so any reordering won't create any actual problems, other than confusion [12:21:35] as long as puppet is happy (which was the real issue with ms-be nodes, puppet would get confused by the label/drive mismatch) [12:22:56] I'm glad you brought that up [12:23:29] Part of my "approach" was to not try and do this with Puppet [12:23:57] how do you envisage managing storage post-install, then? [12:23:58] I have a script, and was thinking it would be run from a cookbook [12:24:25] once after install, and again if any drives were added or replaced [12:24:36] it's idempotent [12:24:47] (I guess you could even run it from Puppet) [12:24:57] ...that was going to be my next question :) [12:25:20] I guess I should say, I've tried to make it idempotent... pending testing & review [12:25:29] sure. [12:27:35] So does swift use puppet for this because that's the way you inherited it, or did you choose that (and would you still)? [12:27:44] we seem to have examples of both [12:28:23] the hadoop cluster uses a cookbook for setting up hdfs drives [12:28:34] prometheus too, I think [12:31:36] swift I inherited thus; apus uses puppet to template out cephadm config files (and then cephadm does the actual storage management). [12:31:56] I'm not sure what Current Preferred Practice looks like [12:32:30] I'm guessing it varies depending on who you ask [12:33:44] surely not ;) [12:33:45] Emperor: I'm looking at ms-be1095, I guess it's state of the art? I see it is no longer using filesystem labels [12:34:34] urandom: yeah, if you look at fstab, the swift drives are all specified as /dev/disk/by-path [12:34:46] what was the reason for that? [12:34:56] so they appear consistently regardless of whether they turn up as sdc sdd or whatever [12:35:22] but earlier incarnations used labels, no? [12:35:38] doesn't that achieve the same? [12:35:58] urandom: yes, but then puppet was sad if the label and drive didn't match, and if you had to swap a disk puppet would try and write swift-sdf1 onto /dev/sdf1 even if another drive was already labelled swift-sdf1 [12:36:22] and if sda and sdb weren't exactly so, puppet could never run to completion OK [12:36:23] auh, ok [12:37:35] whereas an individual disk will always have the same /dev/disk/by-path entry [12:41:07] Hm, checking all the thumbnail dbs is going to take a while [12:43:46] I think I'm going to go ahead and update this to use something less likely to cause confusion than "device name at the time the script ran", probably something like `storage-{some random bits}`, document the (intended) semantics, and then see if I can't bother you for a review [12:45:02] urandom: you might like the newer-swift approach where we turn disk-path into objects17 or whatever [12:45:25] (you might not, also, but it did mean the device names weren't too horrid) [12:46:19] I assume the sequence is probably something puppet just derives each time it runs? [12:47:12] if I take the one-shot script approach, that would require groking the original sequence on subsequent runs [12:47:37] which is doable of course, but one more thing [12:48:15] also (playing devil's advocate) I wonder if it doesn't suffer the same problem of tempting someone to think that that sequence has meaning [12:49:41] probably easier to remember/disambiguate when plodding around manually from the command line though [12:50:29] it sorts them based on the hardware path, which is (should be?) stable [12:50:42] see the swift_disks fact for the gory details :) [12:51:06] auh, will do [12:52:25] oh hell, I just remembered. 🤦‍♂️ [12:53:22] so (unlike swift), cassandra is working from the same set of drives for everything [12:54:09] it is therefore based on the raid1-2dev preseed, and there is a JBOD partition on each of the two "base" drives [12:54:47] but the reuse preseed requires that you supply a mount point [12:54:55] I couldn't find a way around that [12:55:18] the swift backends have their own preseed (as do the Ceph nodes, I think, let me check) [12:55:46] yeah, I created one for cassandra too, sorry I created confusion there... [12:56:09] the point is there are always (at least) two drives, in a raid1 [12:56:31] the preseed assumes as much, and those two drives are also used for the jbod [12:56:42] right, ms-be have partman/custom/ms-be_simple.cfg and run a small script in preseed to identify the OS drives to use [12:57:09] ceph has a similar thing where it builds raid1 for the OS and leaves all LVM alone (as those are the Ceph things) [12:58:09] small script in preseed? [12:58:53] these preseed files are impenetrable [13:00:07] (not the swift ones, all of them ;)) [13:00:52] urandom: modules/install_server/files/autoinstall/scripts/partman_early_command.sh [13:01:32] essentially spits out a partman fragmant with the correct OS drives in that is then used by the rest of the preseed setup; it makes it rather easier to figure out what disks you want to use [13:04:43] mind blown. [16:37:37] Is someone working with test host? [16:37:40] anybody doing heavy traffic on db2230 ? [16:37:41] https://grafana.wikimedia.org/goto/H4vm7BPHg?orgId=1 [16:37:46] db2230 [16:37:54] it has weird patterns [16:38:19] ...in the last 30 minutes it's been lagging a lot [16:38:39] it is replica what is doing that [16:39:25] maybe there is phabricator maintenance, it seems to be replicating from it [16:40:36] could be just https://phabricator.wikimedia.org/T397726 [16:40:46] its source db1176.eqiad.wmnet https://grafana.wikimedia.org/goto/46YHnBPHR?orgId=1 [16:40:55] yeah, I think that is a leftover [16:41:03] I will create a task to cleanup test-s4 [16:41:11] unless it is being used [16:41:28] not going to clean it up without making sure that replication is a mistake [16:42:00] 99% sure it just needs to be wiped, but I will create the task for now and wait to be sure 100% tomorrow [16:44:03] I created https://phabricator.wikimedia.org/T397746 [16:44:33] will ask the other dbas tomorrow, it seems a mistake/leftover, but better be sure. No worries, as it is just replication, no other access [16:46:10] afaik db2230 is just a test host that gets wiped etc as needed so we might be able to drop the leftovers in few days [16:48:03] yeah, what it is weird is the replication [16:48:16] I mentioned at https://phabricator.wikimedia.org/T397746#10943852 potential reasons [16:48:30] but it is not a worry right now, so will ask around [16:48:57] e.g. maybe I setup replication to test recovery, then forgot to wipe the replication [16:49:14] but as it is not urgent, better to double check with the rest of the team [16:51:09] lag also recovered