[16:00:53] anomie: any idea where we are at now with .11? I haven't plowed through all of the tickets yet today [16:04:13] bd808: Not really. I posted some (IMO) improved versions of tgr|away's patch for the NeedToken deal, and as far as I know the rest of the major issues were already fixed on Friday. [16:04:35] Except the wikisource login issue, that we rolled back before I could look at it. [16:04:57] the auto login thing still happening was the killer on Friday evening. We didn't want that going on all weekend [16:05:44] Did anyone report it after it was patched, besides Gergő on the one account? [16:07:25] I honestly can't remember. I'll look in irc logs in a minute [16:07:48] Whoever is the greenish color in https://etherpad.wikimedia.org/p/SessionManagerRolloutFailure wasn't sure either. [16:16:54] ah [16:16:59] anomie: that is me I believe [16:20:40] anomie: looking at my irc logs I see tgr report the re-login at 2016-01-23T00:53 and then we all decided to rollback [16:21:51] my notes are merely because on Jan 21st, we had a surge of PURGE requests coming [16:21:56] ( can be seen at https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes ) [16:22:16] and eventually we think it comes from some wikibase jobs doing too many cdnPurges / htmlCacheUpdate jobs [16:22:52] haven't get confirmation, but it seems the cron got killed and restarted possibly with an update, the purges request seems to be throttled [16:23:26] and [16:23:36] WebResponse has an undefined variable :( https://phabricator.wikimedia.org/T124641 [16:26:12] hashar: bd808 already fixed that [16:29:30] yeah [16:29:35] seen his reply [16:29:45] the test is a but funnier and kept me busy this morning [16:34:51] anomie: I think we should queue up your patches for backporting to .11 [16:35:18] bd808: Sounds good to me [16:35:41] I want to look at https://phabricator.wikimedia.org/T124510 right now. I think I have an idea of what is happening but I need to find it in the code [16:36:04] hmm, we're not going straight to .12? are we delaying .12 by a week, then? [16:36:20] MatmaRex: not decided yet AFAIK [16:36:57] I'd rather not hold up .12 for a week [16:37:07] switching all wikis from .10 to .11 today/tomorrow, and then to .12 tomorrow/day-after-tomorrow, sounds kind of wasteful. :) [16:37:26] but testing .11 on at least testwikis today would be nice [16:49:02] anomie: Do you remember removing a session_cache_limiter() call anywhere when dropping SessionManager in? [16:51:29] found it [16:53:52] * anomie sees bd808 found it [16:54:03] https://phabricator.wikimedia.org/rMWa73c5b7395a07d490f7052fd3b2491ebd656b190#ee0e1394 [16:54:32] we need to put `session_cache_limiter( 'private, must-revalidate' );` back somewhere [16:55:09] Setup.php, just before the session_start(). [16:55:18] (well, "MediaWiki\quietCall( 'session_start' );") [16:55:19] easy enough [16:55:51] bd808: Could also do in GlobalFunctions.php for anything using the deprecated wfSetupSession [16:56:10] yeah. I'll put it back there too [16:57:26] bd808: Also SessionBackend.php near line 646 wouldn't hurt. Like 226 should be ok since it only gets reached if the session was already started before that function got called. [16:59:55] anomie: how about line 646 in that same file too? [17:00:10] should we pretty much put it right before any call to session_start? [17:00:32] bd808: SessionBackend.php line 646 is where I suggested putting it. Line 226 probably doesn't need it. [17:00:41] bd808: I haven't read full scrollback, but just saw "testing wmf.11 on testwikis would be nice", agreed. Releng is going into an hour of annual planning right now, but lets sync up (hah, get it?) later [17:01:00] greg-g: *nod* [17:40:03] anomie: is https://phabricator.wikimedia.org/T124367 the new "Recursion detected in RequestContext::getLanguage"? [17:40:42] bd808: Yeah. It's on my list to look into, but IMO it's "normal" rather than "unbreak now". [17:41:20] *nod* I vaguely remembered us talking about it at some point [17:42:10] bd808: One thing to check right away, are we getting stack traces in logstash for that log entry? If not, how do we make that happen? [17:44:11] anomie: we'd need to put an exception into the context to get a trace. We don't add traces to all messages by default because that would be slow and painful [17:45:06] bd808: Re the session_cache_limiter stuff, should we wrap it in a quietCall() to avoid warnings, like is done with the session_start()s? Or do we not really care about that? [17:46:05] anomie: hmm... probably should actually or we may get "headers already sent" spam [17:46:12] I'll amend [17:49:16] bd808: Is there an example of someplace we add the backtrace that I can see how to do it? [17:50:58] anomie: look for getLogContext in MWExceptionHandler. [18:21:02] tgr: time for a quick hangout with me and anomie? [18:21:19] give me 10 mins [18:21:30] k [18:22:21] anomie: re: Except the wikisource login issue, that we rolled back before I could look at it. -- I find mw1017 really useful for things like that [18:23:01] you can locally hack wikisource to be on wmf11 on mw1017, and then use the X-Wikimedia-Debug header to route all your requests to mw1017 [18:23:09] so from your point of view wikisource is on wmf11 [18:23:38] if it's a trick you already know well, feel free to disregard, but if not, i'm mentioning it because it can be very handy [18:24:12] ori: I hadn't thought of hacking the version assignment locally... [18:44:02] ori: Any documentation as to how exactly to hack wikiversions? There are too many different similar files there. [18:44:04] ori: so we just edit wikiversions.json and then run multiversion/updateWikiversions to do that right? [18:44:35] umm [18:44:39] i forget, let me take a quick look [18:44:53] thanks. we are tyring to remember too :) [18:45:36] I think you can just edit wikiversions.php; I don't think the cdb file is used any more. And no point changing the JSON since this isn't a "proper" change anyways [18:46:07] * ori is grepping operations/mediawiki-config.git for references to the cdb file; we may be able to get rid of it [18:46:08] ok. we didn't know if the cdb or the php was the "real" wikiversions data [18:47:02] yeah, that's my fault, i should have removed the code to generate the cdb file [18:47:44] it doesn't look like scap actually updates the cdb anymore [18:48:02] maybe it's just a matter of git rm-ing it [18:48:51] yeah. wikiversions.cdb hasn't changed since October [18:49:10] so if anything is reading it it's already broken badly [18:50:46] oh [18:50:48] it's not in git [18:51:10] yeah it just gets built on the deploy server [18:51:30] deleteing on mira before the next scap will clean it up [18:51:43] we don't really have a "sync-rm" script to call [18:51:51] you have to sync-dir or scap [18:58:56] {{done}} [18:59:01] did editing wikiversions.php work? [18:59:25] It did [18:59:30] cool [19:26:49] legoktm: is the logout script still running? [19:55:45] bd808: nope [19:55:46] Resetting user_token for "AppleJack-7": DB connection error: MySQL server has gone away (10.64.48.26) [19:58:13] legoktm: so mid-"A" it croaked? [20:00:09] no it went in order by gu_id [20:00:34] so that was 17225036 [20:00:45] out of 45332056 total [20:01:07] ah. can we get it restarted from where it stopped? [20:01:25] yes [20:01:29] sweet [20:01:52] but it's been running since friday and only made it what, a quarter of the way? [20:02:03] yeah which is not awesome [20:02:11] bd808: seems like /srv/images/lockdir should have www-data as owner [20:02:24] AaronSchulz: probably... [20:02:25] * AaronSchulz wonders why he never ran into that before [20:02:34] anyways, restarted [20:02:34] I was trying to test some upload patch [20:03:01] AaronSchulz: what host? [20:03:26] vagrant, forgot to mention that [20:04:09] can you file a bug to remind me to look into it? (or someone to) [20:06:29] yep [20:12:58] AaronSchulz: thanks [20:13:38] greg-g: what channel do you want to hash out what to do with the wikis in? Here, operations, releng? [20:24:29] bd808: -operations, I suppose [20:24:40] sounds good [20:25:05] I have to wander to a coffee shop, be back online shortly [20:25:29] but rope in the usual suspects from releng :) [20:30:51] maybe I should work from a coffee house [20:34:48] there's free coffee in the office [20:34:54] and lots of quiet space if you don't want to socialize [20:40:28] bd808, anomie: prompted by our conversation earlier, I sent an email to engineering@ with some proposed modifications to the mw1017 setup [20:44:55] ori: cool. I'll look for it in a bit [21:00:27] * greg-g drinks OJ instead of coffee, has a cold :/ [21:07:22] tgr|away: With the hack I just put on mw1017, can you reproduce the wikisource.org thing anymore? [21:13:23] * csteipp drinks some of the free coffee, and tries to imagine what kinda of coffee shop would serve coffee that tastes like this... [21:27:00] * ori will pitch in for an office espresso machine if it means AS comes in more frequently [21:36:45] anomie: https://gerrit.wikimedia.org/r/#/c/266262/1/common/ScribuntoContent.php,cm ParserOutput::$mText is public, so why not use that instead of getText()? [21:37:43] legoktm: Eew, direct access to internal fields. I suspect mText being public is for hysterical raisins and shouldn't be perpetuated. [21:38:20] I meant only in the back/compat case [21:38:43] but it's fine [21:38:58] greg-g: what branches do I need to backport the mobile parser cache corruption stuff to? [21:40:02] legoktm: wmf.11 is rolling out this week as though it were last week [21:40:05] so, wmf.11 [21:40:19] (and 10 if you need it out there quickly) [21:40:22] er, but if I want to backport it today, .10? [21:40:22] yeah [21:40:26] ie before thursday [21:40:26] right [21:47:46] anomie, MaxSem: could you take a look at https://gerrit.wikimedia.org/r/#/c/266399/ ? 'Add RejectParserCacheValue hook for edit section link cache corruption' [21:49:29] legoktm: Seems like it would work, although I'm slightly surprised it's in MobileFrontend instead of operations/mediawiki-config. [21:50:53] #wikimedia : 13:46 Why isn't cross-wiki login working? [21:51:29] anomie: Because it was caused (though not the fault of) by Minerva? It doesn't really matter to me though [21:51:45] sigh [21:51:50] 13:51 oh wait nm 13:51 Lirodon has left ("Leaving") [21:53:58] bd808: One more for the backport list: https://gerrit.wikimedia.org/r/#/c/266400/ [21:54:29] bd808: Also, https://gerrit.wikimedia.org/r/#/c/266268/ would be nice so we can start that cleanup right away. [21:56:44] anomie: Cool. I've got them open now for review [21:57:09] * bd808 tries to get all the things open in one tab set [22:13:57] greg-g: can I deploy https://gerrit.wikimedia.org/r/#/c/266401/, https://gerrit.wikimedia.org/r/#/c/266406/ and https://gerrit.wikimedia.org/r/266410 now? [22:15:01] anomie: that seems to have fixed it, thanks! [22:15:37] I still see the weird duplicate cookie thing, but now the wikisource.org and .wikisource.org ones are keeping in sync [22:15:49] legoktm: ah, sure [23:55:57] anomie, tgr, marxarelli, greg-g: I just sent an email with links to the backports and a suggestion for how to stage it tomorrow morning for testing before we roll to group0 [23:56:29] * bd808 has done more software work since mid-day Friday than he did for all of December