[07:22:29] some connection yesterday drops due to a sudden increse on read connections at s1(db1073) at 9:39 [07:24:05] lag at the same time on s3 [07:25:29] ^9:39 pm, that is 2015-07-23 21:39 UTC [08:30:18] I've enabled es2004 on tendril, but got 1 error: [08:30:21] mysql> alter event es2004_codfw_wmnet_3306_usage enable; [08:30:22] ERROR 1539 (HY000): Unknown event 'es2004_codfw_wmnet_3306_usage' [09:30:39] MostlinkedPage::reallyDoQuery is not causing any issue (it gets executed on a separate host by wikiadmin), but I wonder if it could be done differently with triggers/column-based engine instead of taking 14 hours to execute [10:50:44] we had some spikes on connection failures, again I think because of lag on s3 [16:25:48] db1035 just notified of 90% disk space, checking other nodes [16:34:05] https://phabricator.wikimedia.org/P1057 [16:34:31] codfw is not affected, it has 3TB [16:42:02] http://ganglia.wikimedia.org/latest/graph.php?c=MySQL%20eqiad&h=db1035.eqiad.wmnet&r=hour&z=small&jr=&js=&st=1437755887&v=88.9&m=part_max_used&vl=%25&ti=Maximum%20Disk%20Space%20Used&trend=1&z=xlarge&trendhistory=6&trendhistory=6&trendhistory=1&trendhistory=2&trendhistory=2&trendhistory=2&trendrange=6 [16:59:37] created T106847 [17:01:24] also speak now against T105843#1479238 before it is too late! [17:53:24] also, learn something useful this weekend! https://mariadb.com/blog/five-things-you-must-know-about-parallel-replication-mariadb-10x