[02:06:55] FIRING: [3x] SystemdUnitFailed: cassandra-a.service on sessionstore1006:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:01:55] FIRING: [4x] SystemdUnitFailed: cassandra-a.service on sessionstore1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:36:55] FIRING: [4x] SystemdUnitFailed: cassandra-a.service on sessionstore1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:41:55] RESOLVED: [4x] SystemdUnitFailed: cassandra-a.service on sessionstore1004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:48:34] federico3: good morning! Answering here to some questions you had on friday in another channel regarding the support of multiple hosts when calling list_host_instances(). What elu.key suggested is the way if you need it right now. As for the general support I suggested a multi-host implementation in the original CR in [1] (see the comment) that would return something like [2] (expand [07:48:40] the comment), but was deemed not ... [07:48:43] ... necessary at the time. It can surely be added if needed. [07:48:46] [1] https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1005531/43..78/spicerack/mysql_legacy.py#b122 [07:48:49] [2] https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/1005531/32..78/spicerack/mysql_legacy.py#b108 [08:01:29] thanks, I put together a workaround for now, in the long term the mysql module might need some tweaks [08:04:25] everything needs continuous improvement indeed, last year sprint effort on the mysql module added a lot of features to it, but the work was supposed to continue after that effort but some team changes got in the way [08:21:51] volans: BTW thanks for https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1130977 - can I merge it? [08:34:40] t your will, it's all yours. I didn't live-tested as I didn't know if there was a host I could use and also I didn't want to cause conflicts if there were other CRs for the same cookbook. [08:34:44] *At [08:44:29] ok, merging it, thanks [08:46:43] anytime [11:16:40] PROBLEM - MariaDB sustained replica lag on s8 on db1211 is CRITICAL: 935.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1211&var-port=9104 [11:20:40] RECOVERY - MariaDB sustained replica lag on s8 on db1211 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1211&var-port=9104 [13:22:31] <_joe_> i might be a couple minutes late to the meeting [13:32:40] sobanski: you joining our meeting today? [14:21:58] <_joe_> urandom: tbh, I think this bot might be the cause of the issues https://logstash.wikimedia.org/goto/c2d778382850a85389191ee174cceb79 [14:22:10] <_joe_> the timing is striking [14:24:23] _joe_: so... this would manifest as a session being overwritten at a high rate? [14:24:57] and the high storage utilization then being unreclaimed/tombstoned data? [14:25:16] <_joe_> yes that's kind of where I'm going [14:25:35] <_joe_> or maybe we write multiple sessions for the same user! [14:25:53] <_joe_> I've written to the bot author