[03:06:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1169:9104 has too large replication lag (8h 11m 35s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1169&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [03:46:05] FIRING: MysqlReplicationThreadCountTooLow: MySQL instance db1169:9104 has replication issues. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1169&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationThreadCountTooLow [07:06:05] checking [07:07:37] pagelinks corrupted on enwiki on db1169 [07:18:48] $ sudo fdfind pagelinks -x du -hs [07:18:48] 105G ./enwiki/pagelinks.ibd [07:18:55] takes a while, no doubt x) [07:52:46] There will be T378267 (db1234) from the weekend too, I'm afraid - probably dud hardware? [07:52:47] T378267: db1234 crashed - https://phabricator.wikimedia.org/T378267 [07:53:03] thanks Emperor will check it! [07:53:12] (thanks for handling it as well 🙏) [10:32:28] Could I get a +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1083769/1 (sysctl settings for OSD nodes, pinched from upstream) and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1083770/1 (no functional change, fix a typo in a comment), please? [10:37:13] I've done the second. I wonder what kind of validation you are asking for the first, because I have no idea about that [11:04:03] jynus: you might observe that they're the same as set in the upstream code? ( https://github.com/ceph/ceph/blob/9c287ec40e0c83b16258936fd33ebccfdd607f69/src/cephadm/cephadm.py#L430-L437 ) [11:04:37] I guess really I'm hoping for a gentle sanity-check that this is a halfway-plausible change to put through puppet [11:07:11] then let me do it now, I am happy that will have the consequences expected [11:07:29] it was just that I didn't have the context to know what those were [11:08:53] Sure, and thanks for the reviews :) [11:08:59] Sometimes I would love to have to separate +1s "this looks fine and it will do what you want to do" and "this is a change that, if it does what was expected, should be deployed into production now" [11:09:41] sometimes it is important to separate between idea and execution, or one has the skills to +1 one of them [11:10:24] Maybe I should add a "and this is a right way to do it" to the first sentence [15:23:05] !5 [15:23:08] err :) [15:53:27] elukey: 120 HTH [15:57:12] ??? :D [16:01:40] 5! === 120