[00:57:04] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1246:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:57:04] <jinxer-wm>	 FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1246:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[07:36:13] <arnaudb>	 checking
[07:36:43] <arnaudb>	 also, db2202 which is not pooled (ans is a s1 host) has had corruption issue, will deal with it afeter
[07:42:05] <jinxer-wm>	 FIRING: MysqlReplicationThreadCountTooLow: MySQL instance db2202:9104 has replication issues. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db2202&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationThreadCountTooLow
[07:48:26] <arnaudb>	 db2202 has pagelinks repairing
[09:37:46] <jynus>	 Data engineering: whatever is happening to an-redactdb1001, I doubt it will be fixable by end of the month, as it will take around the same time to catch up than its lag (it now has 8 days)
[13:13:22] <jynus>	 Emperor: do you have a host with the original new RAID controller?
[13:15:35] <arnaudb>	 I'll restart mariadb on zarcillo fyi
[13:18:47] <Emperor>	 jynus: plenty, yes, none yet in service.
[13:19:02] <Emperor>	 ...pourquoi?
[13:19:12] <jynus>	 Emperor: could I borrow one for a few minutes?
[13:19:53] <jynus>	 e.g. ms-be2082
[13:20:14] <Emperor>	 jynus: sure, pp elukey and jhathaway for awareness
[13:21:10] <jynus>	 Can I write to a disk, if I later drop everything I created?
[13:22:16] <Emperor>	 sure
[13:22:55] <jynus>	 ok, taking ms-be2082 will ping you when done and cleaned up
[13:24:14] <Emperor>	 ack
[14:29:04] <elukey>	 ack!
[14:41:19] <elukey>	 Emperor: FYI https://gerrit.wikimedia.org/r/c/operations/puppet/+/1091597 is merged, so docker-registry wise we should be good  with hammering Swift
[14:43:21] <urandom>	 Emperor: re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1092345 and the lack of...dryness... :)  Are you referring to the duplication between the seeds (which I think has come up before), and the Cassandra instance configs, or the repetitiveness of the Cassandra instance configs? 
[14:45:18] <Emperor>	 kindof both, really. Almost that entire CR looks like it should be doable from roughly "hostname of node & that it has -a -b -c"
[14:47:34] <urandom>	 so...the seeds aren't actually a Cassandra cluster thing, it's restbase (and everything there is called "restbase", but there is a distinction).  I'm not sure what you would/could do there without creating unwanted coupling, BUT, all of that should be going away soon (because restbase-the-service is being removed from that cluster).
[14:48:48] <jynus>	 Emperor: I actually didn't need the host, I didn't event touch it
[14:49:04] <urandom>	 for the rest of it, you're configuring the instance(s), and while in a perfect world that would always be the same, they aren't necessarily the same
[14:49:19] <jynus>	 (I just run lspci on it)
[14:49:23] <Emperor>	 jynus: FE 
[14:49:41] <urandom>	 as-in, we would endeavor to the keep them the same because consistency is nice, but they can be (have been) different
[14:50:01] <urandom>	 and it would be an error for any of that to change (sans a reimage)
[14:50:36] <urandom>	 which is not to say it couldn't be better, only that I'm not sure what that would look like
[14:50:50] <Emperor>	 Let me spitball a couple of things, and you can tell me I'm wrong :)
[14:53:19] <Emperor>	 in the hieradata where we assign jbod_devices - presumably we already know what those should be since the reimaging process has to know that (so there's already a risk of hiera being out of sync with the reimage); likewise rack is knowable from either netbox or IP address...
[14:53:56] <Emperor>	 are the "seeds" you mention the lengthy stanza in hieradata/role/common/restbase/production.yaml? It'd be good if that were going away :)
[14:54:43] <urandom>	 no, that's hieradata/role/codfw/restbase/production.yaml
[14:55:22] <urandom>	 Ok, so jbod_devices does not necessarily map to devices
[14:55:49] <urandom>	 it does here, but isn't constrained to, and it doesn't at all in other clusters.
[14:56:54] <urandom>	 and for restbase, the rack (so far) matches one-to-one with the concept of a _row_ in the data-center, but as we run out of room to provision nodes in a row, we've been using others and creating... "meta rows", I guess
[14:57:14] <urandom>	 basically saying: Ok, rows A & D will be "rack A"
[14:58:43] <Emperor>	 jbod_devices> `git grep cassandra::jbod_devices` only produces lines of exactly the form `cassandra::jbod_devices: ['sda4', 'sdb4', 'sdc4']` ?
[14:59:07] <Emperor>	 urandom: row> so is the question then how I should have checked the rows were correct?
[14:59:46] <urandom>	 yes, that's a good question!
[15:00:43] <urandom>	 those row equivalencies haven't happened (yet) for restbase, for the other clusters where it has I've left comments in all of those hosts heira files that document the mappings
[15:00:46] <Emperor>	 [ for the very lazy> git grep 'cassandra::jbod_devices' | cut -d ':' -f 2- | sort -u ]
[15:00:58] <urandom>	 that is terrible, I know, a comment
[15:01:10] <urandom>	 it's also on wikitech, but I'm not sure what else to do there
[15:01:33] <Emperor>	 sorry, I'm not trying to be difficult, nor saying this should all get fixed now! Just reviewing this has made me ask how we could do it easier/better
[15:01:44] <urandom>	 Emperor: yes, for the restbase cluster, right now, we are fortunate in that all of the hosts have the same disk config (they didn't always)
[15:02:12] <urandom>	 no no, I also hope there is a way to make it better!
[15:02:31] <urandom>	 It's awful no matter the reason
[15:03:07] <Emperor>	 urandom: with swift we encode the disk mapping in one place, and then just set hiera for "it's a host of this sort" in regex.yaml
[15:03:31] <urandom>	 like a profile or alias?
[15:03:36] <Emperor>	 likewise, for racks I would be tempted to have a mapping of physical rack -> cassandra rack in one place [like we do with swift ring zones]
[15:04:30] <Emperor>	 urandom: yeah, or set a hiera variable (e.g. like we do for when we toggled servers_per_port for swift)
[15:05:20] <Emperor>	 hah, though I see a bug in how that's done for the new thanos h/w, which I will now fix 🤦
[15:07:20] <urandom>	 So... you'd take the machine's actual location in netbox, evaluate that against a mapping of real to virtual row/rack in puppet, and use that to automatically assign rack in Cassandra?
[15:07:42] <Emperor>	 Yep (or do the lookup by network if that's easier)
[15:08:02] <urandom>	 In that case, anyone who updated the location in netbox would break a cluster
[15:08:39] <urandom>	 even if they actually moved the machine and had updated netbox accurately, that would create breakag
[15:09:05] <urandom>	 bad breakage actually, topology breakage
[15:09:51] <urandom>	 unless you could restrict to that to a one-time deployment action
[15:10:20] <Emperor>	 Is "relocate cassandra storage nodes" a thing we expect to do?
[15:11:10] <Emperor>	 [without going through the cassandra puppetry in detail I don't know how hard/routine/impossible it would be to not change a rack once deployed]
[15:11:18] <urandom>	 no, but I suspect that accidentally changing a location in netbox is an easier mistake to make than committing a puppet changeset
[15:11:37] <Emperor>	 that would be an argument for doing it by network instead, certainly
[15:11:42] <urandom>	 and having more than one vector to destroy a cluster is probably not ideal either :)
[15:12:00] <Emperor>	 urandom: while I'm wasting your time, would you mind looking at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1092847 please? to fix the bug I spotted 
[15:12:17] <urandom>	 sure
[15:12:28] <urandom>	 (and this is not a waste of time)
[15:13:50] <Emperor>	 Thanks :)
[15:15:29] <urandom>	 Anyway, so I think that if we were to inadvertently change a hosts network, we probably have bigger/more problems, but insofar as topology goes this would create a problem where none existed before.  Meaning: you could change a nodes network without breaking the cluster, but if you change the effective rack you will most definitely break it.
[15:16:18] <urandom>	 Again, I don't know if that is a valid argue against, because if we're accidentally renetworking cluster hosts, we're probably already toast.
[15:17:07] <Emperor>	 Mmm
[15:17:38] <urandom>	 again, if we could make it deployment-only, and immutable thereafter, that's a nonissue
[15:18:31] <urandom>	 maybe if that properties file weren't a template, if it were generated wholesale, and you used file existence as a condition
[15:20:38] <Emperor>	 I think the approach depends on what you expect to happen if you renumber a cassandra host currently
[15:23:48] <Emperor>	 (I think the answer is 🔥 since renumbering it will change the set of seeds in the cluster, which is presumably Bad(TM)?)
[15:24:05] <urandom>	 It's not
[15:24:52] <urandom>	 Everything here is pretty resilient when compared to changing the rack, which is pretty much guaranteed data loss
[15:25:39] <urandom>	 I mean, if the actual location were wrong, then you might have data unavailable in a data-center failure mode that you didn't expect, but no actual loss
[15:25:59] <urandom>	 I think I've reached the part of this discussion where my brain is ready to go into: Stop trying to put a square peg in a round hole, and make the peg round-mode
[15:26:11] <Emperor>	 Fair enough 
[15:26:31] <urandom>	 Or, "redefine the problem"
[15:26:51] <urandom>	 Like when you can't quite remember how to spell a word, so you choose an alternative :)
[15:27:57] <urandom>	 Networking wants us to stop using secondary IPs, and that's now possible (by binding to different  ports on the same/host IP).
[15:28:39] <urandom>	 And the unit of failure in the data-center is no longer the row, everything is now (or will soon be) interconnected, making by-rack (1:1 with netbox) feasible
[15:30:54] <urandom>	 And I would love to simplify the disk configs of all the clusters, and we're being asked to —where possible— increase density and do more/the same in less hosts, so actually using jbod might be The Answer™ there too.
[15:31:08] <Emperor>	 :)
[15:31:26] <urandom>	 None of which could easily be done with the existing clusters. :(
[15:32:24] <urandom>	 But we could probably  come up with migration plans that convert them in place, decommissioning and rebootstrapping everything
[15:33:15] <urandom>	 Combined with attrition to slowly change hardware profiles...
[15:33:57] * urandom reaches for a brown paper bag as he begins to hyperventilate
[15:34:16] <Emperor>	 la la la, T123918
[15:34:16] <stashbot>	 T123918: 'swift' user/group IDs should be consistent across the fleet - https://phabricator.wikimedia.org/T123918
[15:37:23] <Emperor>	 (never mind how long the disk_by_path migration is going to take...)
[15:39:29] <urandom>	 it's like a river carving out a canyon sometimes
[16:10:45] <Emperor>	 let's start calling this the oxbow approach to change management, then?
[23:39:43] <bd808>	 urandom, Emperor: i quipped y'all at https://bash.toolforge.org/quip/zBrKRpMBFk7ipym_NyjP, because wow https://commons.wikimedia.org/wiki/File:Meander_Oxbow_development.svg seems very much to describe how we do some of the bigger migrations.