[00:01:31] ori: https://phabricator.wikimedia.org/T78589 [00:02:20] killed it [00:04:01] i knew it was trouble when it walked in [00:08:29] haha [03:07:58] 3MediaWiki-Core-Team: InfoSec Taylor Swift bot for #mediawiki-core - https://phabricator.wikimedia.org/T78589#993527 (10MZMcBride) https://twitter.com/SwiftOnSecurity has become rather busy... the IRC bot noise was too much. [06:29:00] bd808: is logstash doing any better? [06:29:13] hmmmm... let's look [06:30:26] there is mediawiki log data for the last 5 minutes so that's better than before [06:30:52] i changed a setting [06:31:15] (digging up its name) [06:31:34] i set index.merge.scheduler.max_thread_count to 3 (was 1) [06:31:44] and restarted elasticsearch [06:31:58] then i reverted the file so that puppet doesn't refresh the service [06:33:00] hope that's cool -- i figured it was already slammed so it couldn't get much worse [06:33:00] cool. The redis queue depth on logstash1001 is 0 and things are getting in so that sounds like a good thing [06:33:32] i changed it to 3 based on https://github.com/elasticsearch/elasticsearch/issues/469 [06:34:24] so our config had it pegged at 1? [06:35:35] * bd808 reads http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/indexing-performance.html#segments-and-merging [06:36:02] yeah [06:36:31] "Elasticsearch defaults here are conservative: you don’t want search performance to be impacted by background merging. But sometimes (especially on SSD, or logging scenarios), the throttle limit is too low." [06:37:15] yeah. I think we actually want to optimize for indexing over searching [06:37:25] i got there by reading and following some of the links [06:39:35] note that i didn't change it in puppet so it'll reset to 1 if the service is restarted [06:41:34] well it seems to be doing well at the moment which is awesome. [06:42:13] cool [06:43:03] I'm guessing that if it causes a problem it will be IOP exhaustion [06:44:03] we should be able to see if that will be a problem by tomorrow morning as the segments in the shard get larger [06:44:40] right now there are only 2 max size segments in today's shard [06:46:42] 18G index for 29M log entries since 00:00Z [06:49:13] looks like the puppet code needs some surgery to be able to change that setting permanently [06:49:57] the right way to do it would probably be to introduce hiera for the settings in that file [07:03:48] ori: tracked in https://phabricator.wikimedia.org/T87526 [07:31:49] cool [07:32:28] bd808: how do you interpret "18G index for 29M log entries"? I haven't spent much time looking at logstash so I don't know if that's good or bad or neutral. [07:37:01] easier I guess when looking at http://localhost:9200/_plugin/whatson/ tunneled to one of the logstash hosts. The biggest day we have had was 77G for 153M events. With the quoted stats at about 1/4 of a day it looks to be on par or a little bigger [07:38:34] nod [07:38:58] but elasticsearch in the mean time looks to have pooped out :/ [07:39:42] on 1003? [07:40:06] 02 and 03 are re-building from 01 [07:40:30] I guess they both OOMd but I haven't looked yet at their logs [07:41:09] now 01 has 1M log events in redis again as it is prioritizing updating the replicas over indexing [07:42:15] I was playing with some old dashboards that had long default search windows so I probably caused them to die [07:46:25] 02 looks to have oom'ed, started to resync and the oom'ed again in the middle [07:48:19] 03 oom'ed as well [07:48:23] poop [20:29:16] 3MediaWiki-Core-Team, MediaWiki-extensions-TitleBlacklist: Title blacklist intermittently failing, allowing users to edit things they shouldn't be able to - https://phabricator.wikimedia.org/T85428#993871 (10Magog_the_Ogre) Seems to still be happening: https://commons.wikimedia.org/w/index.php?title=Special%3ALo... [20:48:09] 3MediaWiki-Core-Team, MediaWiki-extensions-TitleBlacklist: Title blacklist intermittently failing, allowing users to edit things they shouldn't be able to - https://phabricator.wikimedia.org/T85428#993889 (10Anomie) It all seems to be on mw1118. The new log file is reporting 243 entries for all non-mw1118, and m... [20:50:35] 3MediaWiki-Core-Team, MediaWiki-extensions-TitleBlacklist: Title blacklist intermittently failing, allowing users to edit things they shouldn't be able to - https://phabricator.wikimedia.org/T85428#993894 (10ori) I depooled mw1118 for now so we can investigate it. [20:50:39] anomie: ^ [20:50:42] thanks for debugging [20:51:03] ori: No problem. Thanks for doing something about it, so I don't have to find someone to do something about it ;) [20:53:28] * anomie uses eval.php to poke the bad cache entry out of memc [21:14:33] manybubbles: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Search_utterly_fails_on_dashes (whenever you're looking at stuff)