[14:18:02] gwicke: Around? [16:33:06] tgr, anomie: how are the two of you feeling about the roll to group1 today? anything that we should be looking closely at? [16:34:07] the need_token stats, I guess [16:34:17] bd808: I haven't gotten through all my email yet to see if people reported new interesting bugs overnight, but absent that I don't see any reason to worry. When's the "go" time for group1? [16:34:37] ~19:00Z [16:34:50] baring other fubar problems like we had yesterday [16:34:50] but all problems we have heard of so far have a credible fix deployed [20:00:57] anomie: "MediaWiki\Session\SessionManager::getSessionById: failed to create empty session: Session ID already exists" seems to show up a lot in the MW logs. -- https://logstash.wikimedia.org/#dashboard/temp/AVKErJ8_ptxhN1XaIBmi [20:11:03] bd808: That happens if someone comes in with a session cookie but they logged out centrally or something like that. Not terribly unexpected, perhaps the log level on it is too high. [20:15:35] these are mostly crosswiki requests from group2 to meta or commons, or centralauth edge login requests from the group1 wikipedias [20:18:07] the most frequent log message is actually "Metadata merge failed: Key "CentralAuthSource" changed" [20:18:26] probably followed by "User token mismatch" [20:18:49] these just don't cluster so well because the session ID is included in the message [20:20:29] "Metadata merge failed" is because of the fix for T124409, that's a case where it would have been a reproduction of the bug. "User token mismatch" happens when the user_token database field was changed. [21:05:35] this seems vaguely SessionManager-related, but I have no idea what the error message even means: https://logstash.wikimedia.org/#dashboard/temp/AVKE5u0vptxhN1XaLHkP [21:07:01] would sure be nice if the method logging these had some documentation [21:07:38] I guess something that's expected to run in <1s takes longer than that? [21:10:52] hm, no, it does not expect any master DB connections to be opened [21:11:00] where does that expectation get set? [21:12:17] tgr: Looks like stuff AaronSchulz did not too long ago. [21:14:22] tgr: He added in all that expecting stuff for his cross-DB stuff, and he's got $wgCentralAuthUseSlaves to have CA avoid hitting master for looking stuff up but it's still false everywhere except testwiki so his expectations are going to fail. [21:14:46] yeah, I haven't rolled it out beyond test, mw.org, and loginwiki [21:15:25] also not sure when I want to push that with all the auth stuff going on, I might wait till next week [21:15:26] tgr: Also, I see the same sort of log entries two days ago when everything was on wmf.10. [21:15:44] ah, ok [21:49:04] bd808: is it possible to push the message template as a separate key into log events? [21:50:03] tgr: not sure I'm understanding the question [21:50:19] you want the unexpanded "message" in the log event? [21:50:19] e.g. if I say $logger->warning( 'Error connecting to {ip}', array( 'ip' => '1.2.3.4' ) ) then have that string with the literal '{ip}' as message_template or something [21:50:39] it would be much nicer for fatalmonitor counts, for example [21:50:47] it would need a filter to do that, but I suppose it could be done [21:51:28] right now the most frequent errors don't necessarily show up because they have some random parameters in the message [21:51:32] username, for example [21:52:57] A filter before the psr3 expansion filter could make a copy of message as message_template or something. Then we could adjust the logstash rules that create normalized_message to use that field [21:53:33] consider the overhead [21:53:50] I suppose the trick is really that sometimes we'd want it one way and other times we'd want the other [21:54:05] ori: string copy should be pretty cheap no? [21:54:41] if it's just that [21:54:42] when you want it the other way, you can just construct the message string manually [21:54:59] inelegant and developers need to be aware of the convention, but not horrible [21:55:21] and I don't think we would want it the other way too often [21:55:26] "Error connecting to 10.64.48.26: Can't connect to MySQL server on '10.64.48.26' (4)" [21:55:34] that one you want expanded [21:55:39] I think [21:56:31] do you? [21:56:50] right now it has 14 entries in the fatalmonitor top20 [21:57:27] it would be the third or fourth most frequent entry if it wasn't fragmented [21:58:26] yeah I guess you'd mostly want the expanded versions in a drilldown [21:59:12] naively I think it would be easy to do the config side [21:59:46] I think that for ori's perf concern that the filters don't run unless a message is going to be emitted too [22:00:19] That wasn't always the case but I'm pretty sure that got fixed upstream [22:00:52] discussion about updating the PHP version requirement starting now in #wikimedia-office [22:22:10] tgr, bd808: Fix for T124971 is in Gerrit. [22:22:30] thanks, looking [22:24:13] tgr, bd808: Also, I'm not seeing any spike in need_token (: yet, anyway [22:25:05] anomie: yeah there's a couple of very small bumps but nothing like we saw last week :) [22:32:57] anomie: when is the new persist flag useful? doesn't the mere fact a session has its metadata stored somewhere mean it's persisted? [22:35:18] tgr: It's possible in SessionManager to have a session saved without actually being marked as persisted. But the more pressing reason for saving the flag in metadata is because otherwise the unit test added in 266923 incorrectly passes without the actual fix because getSessionById() comes out not-persisted, despite whatever /rpc/RunJobs.php does coming out as persisted because the SessionBackend is still live. [22:37:29] tgr: I don't know if there's actually a use case for ->save() rather than ->persist() on a session; I suspect the distinction exists because it's possible for an incoming session to be persisted (via the "Token" cookie) even though it's not saved because it expired from memc.