[03:11:26] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1246:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:06:26] FIRING: [3x] SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1246:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:36:49] I will extend the downtime of db1246 for a week or so, ok? [09:03:56] yeah, thanks [09:08:37] I just started the schema change on s8 on revision table in eqiad, that's going to take a while [09:28:11] from a UI point of view for a pool cookbook would you prefer to pass as argument the percentage increment (with a sane default) or the number of steps to get to 100%? Or anything else like a non-linear progression [09:45:55] percentage increment is more aligned to what we do these days [09:46:00] 10 25 75 100 [09:46:05] I know them by heart now [09:46:42] why this progression? doesn't make a lot of sense to me the double curve [09:47:32] but maybe you have a reason [09:47:37] cache warming I think [09:47:50] you increment by 10, 15, 50 ... and then 25? :D [09:48:51] the 25 is I think because 100 is limit, otherwise we could have gone higher :D [09:51:35] could we change it to something mathematic more than hardcoded? [09:52:04] as I said, it should be done based on queuing/hit rate, not percentage [09:55:14] yeah, let's go with this, until Manuel comes back and then we can rethink them [09:56:02] to give examples in 4 steps: linear -> 20/50/75/100, power -> 6/25/56/100, exponential -> 6/31/63/100 [09:56:31] jynus: one thing doesn't exclude the other no? the live hit rate or queueing will be instead of the sleep [09:56:54] to know WHEN we can increase the pooled percentage, but how you decide by how much based on those parameters? [09:58:02] as long as it's gradual, it really doesn't matter [09:58:49] volans: from prometheus, if hit_ratio > 99.999 and processes < 100 -> next step [13:36:21] Amir1: what's the flexibility you're looking for in the pool cookbook? We ofc want the defaults to be the safe right ones so that it will just DTRT, but I'm wondering how much flexibility you're looking for in terms of how much you need to tweak the parameters for special cases. [13:37:13] Because one option could also be to just offer a normal way (with the slow pool) and a force way for emergencies or cases in which you can safely pool in one step without the slow progress. [13:37:54] The other option being ofc that of offering all the parameters for a free tweaking of them, but I'm not sure if they will be useful in your day by day. [13:46:39] volans: for quick pool we can use dbctl directly. I'd say keep it simple [13:47:25] ack, we can always add them after :)