[00:08:06] legoktm, ori: I bet https://gerrit.wikimedia.org/r/#/c/216012/ will lower a bit of redis cpu too [01:12:09] * @var array [01:12:12] */ [01:12:12] private $_kw_replace_group = 0; [14:33:34] sooo, latest hhvm of yours do not support connecting to mysql via socket... but mediawiki always tries to connect via socket and I can't figure out how to change that [14:59:35] Nikerabbit: what is your $wgDBserver ? [15:01:22] Nikerabbit: 'localhost' causes mysql to connect to a socket. Maybe 127.0.0.1 [15:02:49] hashar: okay I didn't try ip [15:03:07] Nikerabbit: I am just quoting http://www.mediawiki.org/wiki/Manual:$wgDBserver :-} [15:15:03] legoktm: :(( hexmode just won't drop that bone. It's very much like the "salt and docker will fix it" meme that is going around about packaging [15:22:33] * greg-g sighs [15:22:37] g'morning [15:50:01] greg-g: cheer up! it's a gorgeous day (at least in Boise) and these random config debates are won with working code and not blog post driven ideas ;) [15:56:39] :) [15:56:43] it is a beautiful day [17:11:08] anomie: I wonder if I can tap you to help me isolate FSS crashes again. We still get segfaults, but they're different now. There's a lot of global environment and request context things that I am not able to isolate the bug in a concise snippet of code. [17:12:53] ori: Sure. Where do I start looking to see it? [17:13:33] fluorine:/a/mw-log/hhvm-fatal.log [17:13:46] (warning: huge file) [17:15:45] Wow, huge stacktraces [17:17:38] yeah [17:17:42] grep -Po ' #\d+[^(]+.{1,120}' hhvm-fatal.log [17:17:48] makes it a tad more readable [17:20:36] I've attempted to repro with: [17:20:38] require_once '/srv/mediawiki/multiversion/MWVersion.php'; [17:20:39] require_once getMediaWiki( 'maintenance/commandLine.inc', 'zhwiki' ); [17:20:40] $r = new ReplacementArray; [17:20:42] $r->setPair('约瑟夫', '约瑟夫'); [17:20:44] echo "OK.\n"; [17:20:47] but that works fine, no crashes [17:22:46] ori: So far I can see it's another crash during the destructor. [17:27:28] php extension + destructor crash almost always == double free [17:27:43] yeah, that was the bug yesterday [17:27:49] ref counting is hard [17:28:45] my yaml extension came into being because I couldn't get the ref counting right in the older yaml binding [17:29:31] maybe it's about res->set [17:47:35] kwsalloc() will call kwsfree((kwset_t) kwset); if it fails to allocate kwset->trie and will return null [17:47:41] we don't check for that [17:47:49] and we call kwsfree() again in that case [17:47:54] so maybe this happens when we run out of memory [17:48:08] which is why a minimal repro case is hard to come up with [17:52:48] ori: Seeming reproduction via eval.php (I'll work on paring that down now): ``$rev = Revision::newFromId( 35732049 ); $rev->getContent()->getParserOutput($rev->getTitle(), 35732049, new ParserOptions); $rev->getContent()->getParserOutput($rev->getTitle(), 35732049, new ParserOptions);`` [18:10:44] anomie: Let me know when you have time for a few authmanager questions [18:53:49] ori: I'm about to agree with you on memory issues of some sort. I have mw1017:/home/anomie/4 as a repro (include it in eval.php), but even stuff like removing the useless preg_replace on line 19 makes it stop segfaulting. [18:54:04] csteipp: ok [18:58:13] anomie: Cool... first up, "usingGlobalSession". I think that's basically a way to flag between using AuthManager or local php sessions? [18:58:38] I was thinking AuthManager would be used for both [18:59:00] And would have a different session provider depending on which you wanted to use. [18:59:15] But was there a reason for not doing that? [19:00:16] csteipp: It's basically because User::loadFromSession() is getting used with FauxRequest in weird ways, populating the FauxRequest session to fool User::loadFromSession() to return a particular user. I don't remember if it's only unit tests or if some extensions are doing it too. [19:01:25] It's hard to track down because it can go through a RequestContext made with a FauxRequest, too. [19:01:44] nasty [19:03:41] Uhg, yeah, I see why that would be a pain to refactor in this. [19:05:05] At some point I may try again to kill that hackiness. Maybe I can grep for the session keys the hacky code path checks or something. [19:06:06] That would be great. I think were some unit tests with @TODO, don't use $_SESSION... might be able to fix a few. [19:06:22] Next: getAuthenticationRequestTypes [19:06:37] RequestContext::importScopedSession() looks scary though [19:07:29] Can you point to how that gets used? The way it's defined is confusion me. [19:09:10] Or maybe, can you explain what that's supposed to do in general? [19:09:26] The ApiLogin will call getAuthenticationRequestTypes( 'all' ) and then iterate over the returned types' fieldInfo() to figure out what fields it needs to support. Special:Login would use 'login' to decide what fields to show at the start, then 'login-continue' for continuation. Special:CreateAccount would do the same for its flow. Special:ChangePassword (or whatever it's called) would use 'change' to figure out what fields it should show. [19:11:08] Ah, that's right. But then in something like AuthManagerAuthPlugin::authenticate, are you pulling that again to find out what fields were shown to get the right data from the request? [19:12:45] AuthManagerAuthPlugin is weirdness to start with... What it's doing is basically what Special:Login etc would do once I get to rewriting them. [19:13:33] i.e. fetch the list of types that might have had fields filled in, ask AuthenticationRequest::requestsFromSubmission() to actually create the ones that did have fields filled in, then pass those to beginAuthentication() [19:15:35] (still trying to get the mental picture of all this, so I could be way off) [19:16:16] In that case, would it make more sense to mock up a password-based AuthenticationRequest, since the username and password are passed in? [19:17:55] That's what AuthenticationRequest::requestsFromSubmission() is basically doing, except instead of making a PasswordAuthenticationRequest it's asking for any request types that can be made with "username" and "password" (and maybe "domain") fields. [19:19:14] So that handles LDAPAuth / OATHAuth before they get converted. Gatcha. [19:20:33] Uhg, I feel dirty putting more hacks in... but I guess we're still cleaning up the flow, so we can clean these up as we migrate extensions... [19:21:52] Anything in AuthManagerAuthPlugin and AuthPluginPrimaryAuthenticationProvider is BC code of some sort, the former to handle stuff that tries to call AuthPlugin methods when core is using AuthManager and the latter to handle AuthPlugin extensions that haven't been updated yet. [19:23:38] anomie: I think I have it [19:23:54] ori: Good, because I have no idea! [19:25:08] anomie: can I walk you through my thinking and ask you if it's sane? It should be quick [19:25:34] ori: Sure, I think I can multitask that in [19:25:38] oh, sorry, if you're still in the middle of working something out with csteipp I'll hang back [19:26:00] So I noticed you're having most providers extend AbstractAuthenticationProvider and implement an interface that specifies where they go in the flow. Would it make sense to have an abstract PreAuthenticationProvider, PrimaryAuthnProvider, etc, so that the concrete classes can just show their specific bits of the implementation?> [19:26:44] For example, all the return Status::newGood() functions in AccountCreationThrottlePreAuthenticationProvider. [19:27:36] csteipp: I didn't really think it made sense or I'd have done it ;) AbstractAuthenticationProvider is handling really trivial stuff like setConfig(), while most of the stuff in the more specific interfaces really should be implemented directly. [19:28:32] Yeah, I was just thinking a LoginThrottlePreAuthenticationProvider would have to duplicate most of AccountCreationThrottlePreAuthenticationProvider [19:29:42] csteipp: Why not just make one "ThrottlePreAuthenticationProvider"? [19:30:29] Well, that would be too easy ;) [19:30:44] Yeah, that would make sense [19:33:37] anomie: And last question for now, I think.. You have a comment in the config that the last password provider needs to be authoritative. If it wasn't how badly would that hurt security? [19:34:37] I'm worried someone is going to not set that up, and then they have authentication that fails open... which would be bad. [19:34:53] csteipp: I don't think it'd hurt security at all. But it'd tell the user authmanager-authn-no-primary ("The supplied credentials could not be authenticated.") rather than a "wrong password" error of some sort. [19:35:16] Gatcha [19:35:44] Should the class just default to authoritative, and then let anything that's not the last one override that? [19:36:29] Just trying to figure out how to make admins always do the right thing.. [19:37:07] csteipp: Yeah, that might be a good idea. I thought I had done that, but I see not now that I look at it. [19:38:05] * anomie changes that locally so it'll be in the next patchset [19:38:06] I'll add a comment on the patchset [19:39:14] Or that works. Also, maybe comment that it's so the error message is right, so I don't freak out and ask you about it again when I forget why it's that way again... :) [19:42:06] csteipp: Updated that comment [19:45:52] anomie, bd808: https://gerrit.wikimedia.org/r/216294 [19:47:49] ori: ... Yeah, it's obvious now that you point it out. [19:48:46] your repro case runs successfully with that patch applied [19:49:12] (as in: it doesn't crash) [19:49:26] ori: Although... is that the right fix, or should be be post-filling the array? In other words, how do indexes in res->replace correspond to the successive calls to kwsincr()? [19:50:41] no, this way is correct. each run through the loop uses a different key [19:51:02] the fact that there are explict NULLs makes me think the slot was supposed to be left empty [19:51:03] if we don't increment i, and post-fill the array, the replacement array will have mismatched keys and values [19:51:17] yeah [19:51:47] ori: What host has your patch applied so I can poke at it? [19:51:48] see 196-197 [19:53:30] anomie: mw1041 [19:54:09] ori: hhvm --php -r '$x = fss_prep_replace( array( "a" => "1", "" => "2", "c" => "3" ) ); echo fss_exec_replace( $x, "abcde" )."\n";' should return "1b2de". But on mw1041 it returns "1bde". [19:55:19] "1b3de" right? [19:55:47] err, yeah [19:56:30] *nod* that's why my local hhvm gives -- 1b3de [19:56:39] good point [19:58:31] so maybe 151 wasn't incrementing on purpose? [19:59:07] actually maybe both and the = NULLs are a redherring [19:59:58] which would mean that the correct fix would be to NULL fill from i..pairs.size() after the for(;;) [20:00:46] there almost certainly should be some tests added for this [20:00:48] That's what I was thinking, since i wasn't getting passed into kwsincr() [20:01:38] or we could initialize replace_size to 0 and increment it as we add items [20:02:02] no, that's trickier to get right [20:02:08] you guys are right [20:02:15] Or just assign replace_size = i after the loop. [20:02:47] go back in time and change the spec for malloc to NULL out all allocated blocks? ;) [20:03:26] calloc does that ;) Although C doesn't guarantee that all-bits-0 is the same as NULL, not sure about C++. [20:05:39] http://www.cplusplus.com/reference/cstring/NULL/ -- 0L should always be treated as equal to NULL but NULL is not necessarily == 0L -- what a spec [20:06:53] transitivity is too boring [20:07:06] you want people to stay on their toes [20:07:18] the hardware nerds have to have wiggle room to make things interesting [20:07:19] ok [20:07:21] update patch [20:07:23] and applied it on mw1041 [20:07:33] no segfault for test case 1, correct result for test case 2 [20:07:48] bd808: http://c-faq.com/null/machexamp.html [20:07:58] it's causing segfaults quite frequently in prod, so if it looks OK please +2 so godog can rebuild the package [20:08:14] (godog: this is https://gerrit.wikimedia.org/r/#/c/216294/ ) [20:08:31] ori: can you do a followup to add unit tests? [20:08:47] ori: kk, I've added myself [20:08:50] php modules without tests are a tarpit [20:09:19] bd808: if not unit tests, then replace with a pure php extension, which i've started looking into [20:09:21] i'll do one or the other [20:09:42] won't that make things slower for non-hhvm users? [20:09:47] like a lot slower [20:10:09] the non-hhvm build of this extension has been reliable for a long time [20:10:14] ah [20:10:14] we're not touching it now [20:10:22] so it should still get unit tests, but that's less urgent [20:11:19] thanks guys [20:11:21] those res->replace[i] = NULL; lines preserve confusion [20:11:31] godog: patch merged, so repackage when you can plz [20:12:28] ori: Now that I look at it, the PHP version looks like it probably has this same bug. [20:13:07] err, Zend version [20:13:40] Or, not the same bug, but the bug with returning "1bde" [20:16:58] anomie: I think it would. That version uses i++ in the for loop so it always increments [20:17:50] bd808: I tested it on mw1041, same test with "php5 -r" instead of "hhvm --php -r". It returns "1bde". [20:18:24] *nod* the logic is the same as with the i++ additions in ori's first patch [20:26:05] ori: any objection to me killing the deployment-fluoride instance in deployment-prep? I'm trying to make enough room in the quote to build a new jessie logstash host and I think that box never really got setup properly [20:26:12] *quota [20:26:48] mwerrors seems to not be running there and if it was the statsd host it points to doesn't exist [20:27:04] ori: ffs 1.1.7 is on apt.wikimedia.org [20:27:19] bd808, yeah the <replace[i] = NULL>> is confusing, since it's not adding the the search set and the NULL filling is handled elsewhere [20:28:14] * bd808 tries not to get tricked into adding tests and doing some cleanup on that code [20:29:07] AaronSchulz: it makes sense in the .c version but also has the bug anomie noticed with an empty key [20:29:09] I guess it's still needed to fill gaps [20:30:33] fss_prep_replace( array( "a" => "1", "" => "2", "c" => "3" ) ); -- that showed that the gaps in the replace array shouldn't exist [20:31:20] Or that if they do there is a separate bug [20:32:34] using that prep with fss_exec_replace( $x, "abcde" ) should either give you "1b3de" or "212b232d2e" [20:32:48] more reasonably the first result [20:39:10] hmm, res->replace[i++] would just override the last NULL ones, heh [20:39:32] * AaronSchulz was thinking of ++i [20:41:42] bd808, it's amazing how terse 200 lines can be [20:44:05] bd808: go for it [20:45:44] legoktm (?): What's the situation with fixing regressions in 1.25 now that 1.25 is released? T101555 is a duplicate of T100635, but if I just close it as such I expect I'd get "But what about 1.25‽". [20:46:39] anomie: the fix should probably be backported to REL1_25 then with release notes, and it'll go out with the next point release (cc ostriches) [20:47:18] "cc ostriches"? [20:48:47] anomie: I was just trying to ping him [20:49:07] Oh, that's an irc nick [20:50:30] legoktm: https://gerrit.wikimedia.org/r/216311 [20:50:31] Hmm? [20:50:38] Backport stuff, sure [21:22:21] ori, https://gerrit.wikimedia.org/r/#/c/216140/1 [21:56:33] bd808: i have a Zend PHP extension question. when I run 'make test' for fss, the test gets skipped. The skip condition is , which I see is a standard pattern (your YAML extension does it too). But where do I tell the test runner to load the extension? [21:58:19] ori: I do it like this -- https://github.com/bd808/pecl-file_formats-yaml/blob/master/dev-tools/test.sh [21:59:21] the important part in there is `run-tests.php -n -d extension_dir=modules -d extension=yaml.so ${TEST}` [21:59:43] which tells the run-tests.php script to activate the extension [22:00:20] cool, thanks [22:44:06] bd808: how does https://github.com/legoktm/at-ease look? (the general concept, I'll put it up for review in gerrit for all the details and stuff) [22:46:23] concept looks good. Don't forget to add copyright headers to the php files [23:16:51] greg-g: I'd like to formally catch up with you about security in the releng team, and goal alignment (oh crap, I'm I manager now) for next quarter. Can I pick a free spot on your calendar and schedule you? [23:17:20] csteipp: omg! manager speak! (yeah, take whatever time works) [23:17:49] warning: next 2 weeks are rough with annual review meetings [23:18:03] (and my normal stupid schedule)