[00:00:09] bd808: The replacement is probably whatever the deprecated function calls... [00:00:31] bd808: Checking $wgInitialSession or something like that, IIRC. [00:00:52] anomie: are we double fetching the session? SessionManager::loadSessionInfoFromStore() fetches from $this->store and so does the SessionBackend constructor. [00:01:30] bd808: Possibly. Although what I'm doing now will fix that anyway by caching it in-process after it's loaded the first time. [00:02:22] oh. loadFromStore is fetching SessionInfo which is then used to construct a Session [00:02:32] double-fetching the session doesn't explain the sharp increase in the dataset size in redis [00:02:39] * anomie is probably not going to get this done in time, and so will have to deal with another weekend being sad because this got rolled back again due to people finding issues last-minute on Friday. [00:03:27] what would explain it is if one of the inputs that is used to construct the hash part of the key is inconsistent [00:05:40] in fact, something like that has to be the case [00:06:23] I'm going to undo the hot patch. It's not makign things better or worse [00:09:02] bd808: So removing the saving didn't fix things at all? [00:09:53] it made a tiny difference in bytes_in but nothing significant in bytes_out [00:10:53] this graph is trending down slightly but the weekly trend shows that's normal seasonality -- https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Memcached+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [00:13:30] ok, when I run redis monitor on mc1001 and filter for a particular app server, i see requests for each particular session come in groups of 7 [00:13:41] the new "empty" session data isn't like 2x the old data size is it? [00:14:31] there must be some code to prevent the session from being re-retreived once it has been retrieved once in a particular request, right? [00:14:34] that must not be working [00:14:59] ori: Brad said "what I'm doing now will fix that anyway by caching it in-process after it's loaded the first time." [00:15:09] so possibly there was not [00:15:25] that sounds like the right fix [00:15:25] I hate to interrupt, but, it's 4:15, and I'd love us to be clear about where we're going to end today [00:15:37] in the old code it was just $_SESSION [00:15:43] so magically cached [00:15:51] yep [00:15:55] greg-g: *nod* [00:15:58] so yeah, there's your issue [00:16:30] greg-g: i think they're close.. maybe let's explain the situation to ops and ask one of them to stick around for another hour? [00:16:43] bd808: The new "empty" session data still has a bunch of metadata, so it probably is a good bit bigger. [00:16:54] yeah I was thinking that too [00:16:56] anomie: how much longer until you turn into a pumpkin? [00:16:56] it's not the key size [00:16:59] bd808, ori: Ok, my new version of the patch is up [00:17:03] the "safety" data [00:17:05] it's the multiple fetches per request [00:17:10] i'm sure of it [00:17:11] greg-g: I have nothing to do all evening. [00:17:15] k [00:18:18] ori, tgr: want ot help review https://gerrit.wikimedia.org/r/#/c/267387/3 ? [00:18:24] * ori looks [00:20:43] anomie: > This introduces an in-process cache (using a HashBagOStuff) for session data, and only saves to the external cache when the session is persisted. [00:20:55] can you separate the two? i think the former is the crucial fix [00:21:54] ori: Sure. The second part is just one 'if'. [00:22:02] don't bother, in the case [00:22:13] it's the part that bd808 synced, too, so we know it's fine (even if it doesn't fix the root issue) [00:22:18] *in that [00:23:43] anomie: the tests should use two different bags shouldn't they? [00:23:58] bd808: Probably. But I was in a hurry ;) [00:24:00] looks ok, haven't tested. if there was no time pressure i'd whine about the tiered BagOStuff caching having to be repeated in several places rather than being centralized somewhere [00:24:31] I t could use some followup certainly. [00:24:40] i propose we merge it, sync it to mw1017 [00:25:08] then one of you logs in and does some session-y stuff and i'll see how many gets it's performing via tcpdump [00:25:25] ori: If it works, I'll probably make a "CachedBagOStuff" and change it to use that in a followup. [00:26:26] anomie: there's MultiWriteBagOStuff [00:26:35] /** [00:26:35] * A cache class that replicates all writes to multiple child caches. Reads [00:26:35] * are implemented by reading from the caches in the order they are given in [00:26:37] * the configuration until a cache gives a positive result. [00:26:39] * [00:26:41] * @ingroup Cache [00:26:43] */ [00:26:50] but like bd808 said, that's for a follow-up [00:27:28] ori: I saw that, but I couldn't figure out what it was doing in doGet(), whether it's storing in [0] when it finds it in [1] or whether it would also be storing it in [1] when it finds it in [0]. [00:27:28] I'm testing a bit locally. 5 mintues +/- won't matter right now [00:29:52] oh ffs gerrit [00:31:21] ok. let's try ori's plan. Merge, sync mw1017 and verify [00:32:07] doit [00:32:31] it's in zuul's hands now [00:32:40] all praise the gatekeeper [00:34:25] anomie: thank you for staying late and working hard and all the other things I forget to thank you for because they are "normal" [00:39:44] fucking qunit fucking fuck fucker [00:52:19] sorry, i came back because i realized something [00:52:24] we still don't have a theory about the memory usage [00:52:44] but that doesn't have to be hash churn [00:53:12] it could be that the expiry is not getting set correctly, or the repeat fetches cause things to stay in redis longer than they would otherwise [00:53:17] i'd check the redis stats if i have time [00:53:19] * ori runs off again [01:04:43] ori: My theory about memory usage is that it's saving another session's data for every reader view. [07:31:21] hello [09:43:15] bd808: logstash timestamp seconds are displayed to three decimal digits but are actually all integers [09:43:48] is that intentional? microsecond precision would be nice when trying to find out the order of events [17:29:07] tgr|away: we had microsecond precision (or an attempt at it) until https://gerrit.wikimedia.org/r/#/c/249309/ For more info see https://github.com/Seldaek/monolog/issues/657 [18:25:26] bd808: so just use {datetime}.{microtime} in the format string? on my machine at least, that's still fairly fast [18:26:07] 8us vs 5us for second-precision datetime and 25us for createFromFormat