[00:02:43] Krinkle_: the stack traces I've seen so far are fixed by my patch to RL [00:03:01] the one that got rid of checking $wgUser's username [00:03:19] gj fb [00:04:19] bd808: we'll know soon. [00:04:27] after wmf14 patches are out we can re-patch load.php on mw1017 [00:04:37] this is among them [00:09:00] TimStarling: the output buffer patch had to be reverted. Fortunately, we have a detailed description of the problem, so we should be able to reproduce it easily: We found some internal issues that were not known. ...and it was breaking things that we can't have broken. [00:09:14] did you get that? [00:10:49] you mean did joelm talk to me or send me something? no [00:11:10] no, I'm just making a joke about how little that is to go on [00:12:18] you probably noticed that my patience for this kind of thing wore out a while ago [00:12:31] they should just open-source their shit [00:12:36] then I'll fix it for them [00:12:55] yeah, it's ridiculous [00:13:06] I fixed proxygen for them within a couple of weeks of them open-sourcing it [00:14:21] "I'm TimStarling and I'll fix your shitty code." <-- campaign slogan [00:15:14] maybe fred moving positions will help with this in the long run [00:16:34] I am not optimistic. They don't seem to grok the severity of this. :-/ [00:19:02] merges finished [00:19:06] tgr: staging now [00:19:53] Krinkle_: can you also put it on mw1017? the really nasty CA bugs only tend to happen once the patch is on loginwiki [00:20:07] Yep [00:20:17] though loginwiki won't trigger even with debug header [00:20:19] since it's on wmf13 [00:20:52] set everything on mw1017 to .14 [00:21:06] we need a helper script for that [00:21:35] I just sed -i 's/wmf.13/wmf.14' wikiversions.php [00:21:38] * bd808 just found another nasty message->user interaction [00:22:35] bd808: would be cool to make mw1017 read the wikiversion from a header, though [00:22:44] this one comes from Cite -- https://phabricator.wikimedia.org/P2661 [00:22:54] tgr: oh htat could be cute [00:23:58] * bd808 would `perl -pi -e 's/13/14/'` [00:24:02] isn't that https://phabricator.wikimedia.org/T127864 ? [00:24:05] loading wikiversions for loginwiki or commons or wikidata prematurely or wrongly can have side effects though [00:24:31] tgr: it's up on mw1017 now [00:24:35] wmf14 with all patches [00:24:42] thx [00:25:21] tgr: this one is Cite, not Citoid but probably similar [00:25:34] Krinkle_: can you also flip the wikiversion numbers? [00:25:42] tgr: k [00:26:43] done [00:27:52] Failed to create empty session: exception 'InvalidArgumentException' with message 'Session ID already exists' in /srv/mediawiki/php-1.27.0-wmf.14/includes/session/SessionManager.php:246 [00:28:00] that's normal? [00:28:46] not sure [00:29:00] it won't break things, it will just pick another ID [00:29:20] not sure if there is a normal browser condition it corresponds to, though [00:30:20] that was a log message, not a proper exception, right? [00:32:00] bd808: Got another one from load.php after your patch [00:32:03] https://logstash.wikimedia.org/#/dashboard/temp/AVMQeLecptxhN1Xael9V [00:32:13] MobileFrontend/includes/MobileContext.ph [00:33:18] Krinkle_: Oh. I've fixed that one already [00:34:55] looks like some kind of weird race condition? SessionManager tries to load the session with the given id, but it's not in redis, so it tries to create a new one with that id, but by that time it already exists [00:35:13] Krinkle_: these fix mobilefe -- https://gerrit.wikimedia.org/r/#/c/272810/ ; https://gerrit.wikimedia.org/r/#/c/272819/ ; https://gerrit.wikimedia.org/r/#/c/272822/ [00:36:12] tgr: hmmm... is that a side effect of my caching empty responses patch [00:36:38] not sure, I remember us looking at this weeks ago [00:37:28] the test for is_array() would fail on a cached false [00:37:29] maybe it happens when you send two http requests simultaneously with the same session id and there is no session for that id [00:39:37] shouldn't this be impossible with miss caching? [00:39:52] Are these all new telemetry into existing issues pre-SessionManager? Or are these conflicts e.a. new issues in SM? [00:40:05] oh, yeah, the is_array thing [00:40:36] no, that's the wrong way around [00:40:42] was that patch deployed? [00:40:45] it's an is_array() check both times [00:41:20] yes, but if gets are cached, how can they give different results? [00:41:31] false !== array() both times if cached [00:41:47] so the second one isn't cached [00:42:22] ah, ok, that logic is weird [00:42:52] this happens if the get returns an array but it cannot be turned into a session [00:43:04] the CA key conflict thing, maybe [00:43:10] that should be handled better [00:43:43] so loadSessionInfoFromStore() is fialing [00:44:00] yes [00:44:09] which is probably not a big deal [00:44:17] a changed user token, or such [00:44:44] my bet would be on "Metadata merge failed" [00:44:59] since we are seeing that in the logs a lot [00:45:06] at that point we should just return null since creating the session with that id will fail anyway [00:45:11] I'm still not completely clear on what it means [00:46:04] that's when there is a valid local session in the cookies and a central session in the backend [00:46:20] not sure why it happens as often as it does [00:46:37] anyway, not related to the current patches so lets focus on testing that [00:46:56] yeah. there's a 'Key "CentralAuthSource" changed' that goes with it [00:47:48] User token mismatch; Key "CentralAuthSource" changed; Session ID already exists -- all in the same request [00:48:58] You can see them by filtering on uid:0231bc4 [00:49:06] "Failed to create empty session: exception 'InvalidArgumentException' with message 'Session ID already exists'" generally happens when something tries to use SessionManager::getSessionById() but the token changed: it can't load the existing session because of the bad token, but it can't create a new session with that ID either because one exists. See https://phabricator.wikimedia.org/T124126. [00:59:47] things generally seem to work for me [01:01:22] I've got my local wiki working with define( 'MW_NO_SESSION', 1 ) now. [01:01:49] I'm sure there are more extensions in prod that will need ->inLanguage( $context->getLanguage() ) changes [01:14:22] bd808: Yeah [01:14:33] I haven't really thought about how to do this in RL [01:14:43] ideally we wouldn't need inLanguage(context-.getLang) every where [01:14:51] the default should be user language, which for RL is that [01:14:58] it's arbitrary either way [01:15:15] in api/index, the user is the session user and is filled via $wgLang StubUserLang [01:15:24] in load the user lang is from RLContext->getLanguage() [01:15:37] ideally they'd both be accesed by the Message constructor [01:15:41] seems doable? [01:16:05] To set the default language for Messages, set $wgLang. It seems somewhat unexpected to me though. [01:23:17] Sorry, I lost track of deployments. Is the fix for WANCache:v:centralauth-user-* keys getting hit multiple times per request out to wmf13? [01:24:45] Krinkle / tgr ^ [01:24:47] anomie: bd808: Ideas at https://phabricator.wikimedia.org/T127920 [01:24:59] ori: https://etherpad.wikimedia.org/p/perf-20160223 [01:25:04] not on 13, those were all NO_SESSION related [01:25:09] I don't think that was merged to wmf13 [01:25:36] https://gerrit.wikimedia.org/r/#/c/272642/ for wmf14 should fix it [01:25:50] can that go out to wmf13? [01:26:36] it's nice to have risky changes ride the train [01:27:55] it should be a clean backport, but with all the race condition related stuff in CA, it is hard to tell for sure it won't have side effects [01:28:05] fair enough [01:28:12] which I guess is a long way of saying yes, but only if it's important [01:28:34] not important enough to merit compromising the safety or stability of the site [01:29:41] redis traffic is back to normal, but memcached usage is still elevated [01:32:10] * tgr wanders off to find lunch [01:32:23] I have my eye on https://phabricator.wikimedia.org/T124356 [01:32:39] legoktm: is the RejectParserCacheValue hook handler in MFE for that bug still needed? [08:18:07] ori: yes and no. MF changed their link format, so the hook stopped purging stuff, and there are new bad cache entries with the new style links, indicating that *something* is still causing bad new entries to be created, meaning we haven't solved the root cause. So once that's found (I haven't had much luck so far) we'll need to fix the hook and have it actually purge stuff [08:18:29] legoktm: I am making some progress on it, actually [08:18:49] I depooled mw1099 and disabled its nutcracker service, so that it can't read from or write to the parser cache [08:19:23] i'm using 'güirro' on eswikitionary [08:19:35] wmf9 renders it correctly; wmf10, incorrectly [08:20:02] i disabled MF entirely so it can't possibly be it [08:20:14] ditto Cirrus (it was showing up in some traces) [08:25:00] err, how do you reproduce the bug if MF is gone? [08:25:17] that's the only skin that changes edit section links [08:26:14] I'm looking at the TOC: https://phabricator.wikimedia.org/T124356#2050285 [08:29:18] have I misunderstood the bug? [08:31:04] oh, I hadn't gotten into the TOC stuff yet. I looked at one of my old pastes from a month ago and they also had the TOC bug in them [08:31:23] https://phabricator.wikimedia.org/T124356#1954372 [08:32:19] I was working with the assumption that https://phabricator.wikimedia.org/T124356#1954680 is still causing the bug [18:38:00] Krinkle, bd808, tgr: I mentioned the load.php MW_NO_SESSION thing in Scrum of Scrums. yurik suggests an announcement of some sort to wikitech-l. [18:38:46] Hm.. I dont want to overdo it though. Session restriction is not new to RL [18:38:53] The way we do it is a bit different however. [18:39:10] But we havent' settled on the changes yet. Once that's round up, I'll announce it in a perf update, or separately [18:44:28] anomie: on https://gerrit.wikimedia.org/r/#/c/272952/ would calling Message::exists still be ok? [18:45:09] aude: As long as you do the ->inLanguage() first. [18:45:12] ok [18:45:35] it's probably best to not change behavior in the patch and then would be more ok to bcakport [19:28:01] anomie: for the most part MW_NO_SESSION will just be a return to the bits.wikimedia.org days where you couldn't have a session anyway [19:30:38] tgr: Yeah. Though the main difference is that there it fell back to content-lang/anonymous wgUser without session cookies present at all [19:30:53] Whereas now there are session cookies there (though not supposed to look at) [19:56:34] tgr, anomie: heading out to grab lunch and find a place to join the hangout for the next meeting. talk to you in 30m [22:00:27] RFC meeting starting now in #wikimedia-office: Standardise on how to access/register JavaScript interfaces [23:18:29] tgr: what is an empty session? [23:28:55] tgr: rather more specifically: in if ( $this->provider->getManager()->isUserSessionPrevented( $this->user->getName() ) ) { ... } [23:29:13] User::getName() is expensive for anons, because the IP has to be sanitized yadda yadda yadda [23:29:29] it's for system users [23:29:43] 'Maintanance script' and such [23:29:53] would it be OK to re-write the condition as: !$this->user->isAnon() && $this->provider->getManager()->isUserSessionPrevented( $this->user->getName() ) ? [23:30:11] that would short-circuit the getName call in the case of anons [23:30:37] not sure about that exact formulation without context, but something like that is doable [23:30:56] I'll submit a patch for your consideration [23:31:32] this is basically there to prevent you from registering a user that will be used by an extension, before that extension is enabled, and then keep a valid session somehow [23:32:18] bd808: how would you rate your experience with your phabricator user project? your last on T555 isn't exactly a glowing endorsement, but that was quite some time ago. [23:32:18] T555: Per-user projects for personal work in progress tracking - https://phabricator.wikimedia.org/T555