[01:13:05] PROBLEM - MariaDB sustained replica lag on s1 on db1232 is CRITICAL: 11 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1232&var-port=9104 [01:13:27] PROBLEM - MariaDB sustained replica lag on s1 on db1207 is CRITICAL: 17 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1207&var-port=9104 [01:13:43] PROBLEM - MariaDB sustained replica lag on s1 on db1234 is CRITICAL: 14.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1234&var-port=9104 [01:14:05] PROBLEM - MariaDB sustained replica lag on s1 on db1196 is CRITICAL: 11.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1196&var-port=9104 [01:16:03] PROBLEM - MariaDB sustained replica lag on s1 on db1219 is CRITICAL: 35.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1219&var-port=9104 [01:17:05] PROBLEM - MariaDB sustained replica lag on s1 on db1169 is CRITICAL: 11.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1169&var-port=9104 [01:17:05] PROBLEM - MariaDB sustained replica lag on s1 on db1163 is CRITICAL: 10.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [01:17:11] PROBLEM - MariaDB sustained replica lag on s1 on db1218 is CRITICAL: 22 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1218&var-port=9104 [01:17:51] PROBLEM - MariaDB sustained replica lag on s1 on db1186 is CRITICAL: 16.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1186&var-port=9104 [01:18:01] PROBLEM - MariaDB sustained replica lag on s1 on db1184 is CRITICAL: 28 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1184&var-port=9104 [01:18:01] PROBLEM - MariaDB sustained replica lag on s1 on db1195 is CRITICAL: 11 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1195&var-port=9104 [01:18:05] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 15.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [01:18:05] RECOVERY - MariaDB sustained replica lag on s1 on db1163 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1163&var-port=9104 [01:18:11] PROBLEM - MariaDB sustained replica lag on s1 on db1235 is CRITICAL: 34 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1235&var-port=9104 [01:18:51] RECOVERY - MariaDB sustained replica lag on s1 on db1186 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1186&var-port=9104 [01:19:01] RECOVERY - MariaDB sustained replica lag on s1 on db1195 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1195&var-port=9104 [01:19:05] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [01:19:05] RECOVERY - MariaDB sustained replica lag on s1 on db1169 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1169&var-port=9104 [01:19:05] RECOVERY - MariaDB sustained replica lag on s1 on db1232 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1232&var-port=9104 [01:19:11] RECOVERY - MariaDB sustained replica lag on s1 on db1218 is OK: (C)10 ge (W)5 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1218&var-port=9104 [01:19:27] RECOVERY - MariaDB sustained replica lag on s1 on db1207 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1207&var-port=9104 [01:20:01] RECOVERY - MariaDB sustained replica lag on s1 on db1184 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1184&var-port=9104 [01:20:03] RECOVERY - MariaDB sustained replica lag on s1 on db1219 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1219&var-port=9104 [01:20:05] RECOVERY - MariaDB sustained replica lag on s1 on db1196 is OK: (C)10 ge (W)5 ge 4.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1196&var-port=9104 [01:20:11] RECOVERY - MariaDB sustained replica lag on s1 on db1235 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1235&var-port=9104 [01:20:43] RECOVERY - MariaDB sustained replica lag on s1 on db1234 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1234&var-port=9104 [09:00:07] swfrench-wmf: Hm, I am going to kick them [09:00:59] <_joe_> Emperor: that's what scott avoided doing to allow you to debug the issue :) [09:02:23] I mean, they started two days ago and are now wedged. All they do is call head on the relevant account/container, which should return almost immediately. [09:06:21] the script itself is a locally-written one, and keeps python swiftclient.Connection's default timeout (None) [17:23:44] Emperor: thanks for taking a look! so, it looks like these are once again stuck in the same state - e.g., strace on swift-container-stats (1015836) shows recvfrom fd=3, which is a TCP connection to 10.2.1.27:80 (presumably via lo) [17:23:44] is it possible that something is up with swift-proxy on ms-fs2009? [17:25:40] if I look at the host overview (https://grafana.wikimedia.org/goto/I_mR70SNR?orgId=1), "something" definitely changed toward the end of the 5:00 hour on 12/16 (which correlates with a drop in utilization on various metrics) [17:29:04] It's possible, swift-proxy does sometimes get into a range of odd states :( [17:29:09] I've restarted it on that node [17:29:39] aaaaand I think that might have done it? [17:31:37] I'll check in again later today, but yeah ... at least for the moment, these are all making progress again. thank you :) [17:32:12] brb replacing wikitech Swift/Howto with "just restart the swift proxies" :( [17:33:01] lol [17:37:29] * Emperor has a bad feeling that's going to end up on bash