[00:00:39] anomie: great tutorial! [00:01:17] probably not on-wiki, but in the email the $wgDisableAuthManager hacks would be worth showing/mentioning [00:01:33] + the need for extension registration [00:03:57] anomie: Did you run a bunch of api logins in the last few hours? [00:04:51] Trying figure out the recent spike on https://grafana.wikimedia.org/dashboard/db/authentications [00:24:31] it coincides with 20:53 logmsgbot: twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.2 [11:12:28] bd808: are we logging whether OAuth is used for action API requests? [11:12:43] I found https://wikitech.wikimedia.org/wiki/Analytics/Data/ApiAction but I vaguely remember there being more tables [13:15:35] ori: Where does "MediaWiki.authmanager.login.api.success.count" on https://grafana.wikimedia.org/dashboard/db/authentications come from? I note the increase csteipp was asking about on that graph matches the rise in NeedToken (versus need_token) responses on https://grafana.wikimedia.org/dashboard/db/authentication-metrics?panelId=13&fullscreen, which makes me suspect something is analyzing the log stream and needs updating for changes to the [13:15:35] 'status' field in the log event generated at the end of ApiLogin::execute(). [13:21:44] anomie: https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/master/includes/AuthManagerStatsdHandler.php#L39 [13:27:15] tgr: ok, that looks like it's good. I'll need a new theory. [13:28:37] those discrete steps in NeedToken count look really weird [13:29:01] good morning :-) [13:29:09] then again, it's just group0 so maybe a few high-traffic bots switching on and off? [13:29:51] tgr: I note that TaxonBot@TaxonBot (yay BotPasswords) and Fabian Flöck are having high numbers of logins currently. The former on test2wiki and the latter on enwiki. [13:29:52] if you get some spare idle cycles, Ori has sent a patch to remove a sleep( 3 ) from PHPSessionHandlerTest.php ends up being invoked three times or 9 seconds sleeping total https://gerrit.wikimedia.org/r/#/c/289360/ :D [13:31:06] hashar: I see he's not just removing a sleep, he's removing a test. Do we care about testing that session expiry works correctly? [13:31:22] If not, go ahead and merge. [13:31:30] I have no idea to be honest :( [13:32:01] hashar: can test be moved post-merge? [13:32:25] I don't think anyone would mind +10 secs if it is not blocking the merge [13:32:42] I havent deal with session management in a very long time and I would assume session_start() to create a brand new session unless it is passed a session id [13:33:01] potentially we could have post merge [13:33:03] err [13:33:10] potentially we could have tests to only run on post merge [13:33:33] but history has shown that the job will quickly start failing and nobody would bother looking at its failure since ... "it always fails anyway" [13:33:44] we have been discussing about producing daily/hourly jobs though [13:33:56] so we could drop a bunch of tests from CI [13:34:13] and have folks catch up on a daily basis after merge to keep the build green. [13:34:53] we need to reorganize our tests in groups / tiers. Running everything everytime is starting to be annoying :D [13:40:19] hashar: session_start() opens the session that was previously set with session_id(), or creates a new one if there was no session previously set or it was last set to the empty string. [13:41:19] hashar: I commented on the patch, pointing out the incorrect assumptions in the commit summary and leaving it to others to decide whether saving 9 seconds is worth not covering the thing being tested. [13:46:50] tgr: The stair-step pattern in the graph is a good indicator that it's mainly due to a small number of users/processes, so when one starts hitting the problem or gets shutoff/fixed it makes a big difference. [13:47:02] anomie: neat thank you very much! [13:48:21] hashar: For the record, I'm slightly biased towards code coverage on this one, but not enough to argue over it (as long as the reasons for removing are accurate) or be upset if the patch gets merged. [13:58:08] anomie: your explanation on the patch looks like good argument to keep it around [13:58:28] anomie: and maybe we will find out how to optimize it / or tests it without requiring a sleep :-) [13:59:25] hashar: The difficulty is that the session expiry mechanism works with second resolution, so we can't say "expire after 1 microsecond" [13:59:29] anomie: that patch is part of a serie ori pushed to speed the tests run . So some are not ready [13:59:36] yup [14:00:38] looking at the code, one would suppose it just test PHP built-in but you clarified it actually tests SessionHandlerInterface [14:01:00] so we should keep that test :-D [14:13:43] tgr: I posted a message on TaxonBot's user page. It looks like the issues with the Fabian Flöck account started on the 15th or 16th, so even more clearly not connected to 1.28.0-wmf.2. [14:17:33] Oh, and his problem seems to be bad token handling, his tokens being submitted end in "%20%5C" (" \") rather than "%2B%5C" ("+\")... [14:17:49] Hah, comment I hadn't noticed prior. [14:17:51] "It's only slightly flawed. Don't use for anything important." [14:17:54] On generalizeSQL [14:19:53] it's not a very good test, in my opinion [14:21:32] if we were not so concerned with protecting ourselves from other programmers we could just make PHPSessionHandler::$instance not private, which would make the class more testable [14:23:13] Go back to PHP4 days and make all fields public? [14:24:27] that would be progress, yes [14:25:21] Hmm, permissions on a fresh vagrant box are all busted to where I can't run a ton of unit tests [14:25:24] Fannnnn-tastic [14:26:08] i keep this in my fortune file: [14:26:09] "At some point though, someone is going to need to have access to the data. And if you have a notion of "private", you need corresponding notions of privilege and trust. And that adds a whole ton of complexity and little value, creates rigidity in a system, and often forces things to live in places they shouldn’t. " -- Rich Hickey [14:27:57] breakfast time [14:28:31] * ostriches fumes a little [14:28:53] ori: I left a comment on the userDefaultOptions() patch, otherwise it can land [14:28:59] Testing the shallow copy patch now [14:29:23] I'd like to backport as many of these to REL1_27 after we're done, too [14:31:36] the biggest impact on test run time is https://gerrit.wikimedia.org/r/#/c/289369/ [14:31:41] the other ones may not worth be backporting [14:34:07] That and the serializing. [14:39:31] anomie: Have I mentioned (this week, yet) how much I hate hooks? [14:39:49] ostriches: If you have, it wasn't where I saw it. [14:40:17] * anomie likes hooks, after dealing with how phpbb2 did extensions. [14:46:35] Oh man phpbb2 [14:47:21] Progress! [14:47:22] Tests: 11753, Assertions: 70685, Errors: 13, Skipped: 476. [14:48:35] Heh, had to https://gerrit.wikimedia.org/r/#/c/289422/ first [15:28:46] tgr: no, no explicit logging of OAuth usage is done. The raw data collected is https://wikitech.wikimedia.org/wiki/Analytics/Data/ApiAction . The other tables you may be thinking of are all derived from that data and shamefully only exist in the "bd808" database on the hive cluster. [15:29:19] I never finished setting up the oozie job pipeline to build the derived tables into the "wmf" database [15:29:46] I do have nightly builds via cron into the bd808 db though [15:30:43] Early on there were plans to log more data but I abandoned that after getting embroiled in the various user tracking discussions [15:36:16] Lots of test skips in master from $wgDisableAuthManager :) [15:53:15] tgr: I think I figured out the login issue on test2wiki, the extension.json-ization of CentralAuth boosted CentralAuthSessionProvider's priority above BotPasswordSessionProvider's. Fix should be https://gerrit.wikimedia.org/r/#/c/289437/ [15:53:44] nasty [16:10:56] Yay, only one actual failure! [16:10:58] https://phabricator.wikimedia.org/P3124 [16:11:06] anomie: Might be important, might be noise ^ [16:15:14] ostriches: Strange, it passes for me locally. [16:15:39] This is with a pretty fresh vagrant. [16:16:57] I can't get mediawiki-vagrant running (T132449), so I'm testing on my usual local checkout of master. Sanity check: does it still fail for you if you run only tests/phpunit/includes/api/ApiLoginTest.php instead of all tests? [16:16:57] T132449: Mediawiki-vagrant setup.sh fails - https://phabricator.wikimedia.org/T132449 [16:20:24] Lemme check [16:20:49] Nope, all passes. Something's leaking probably [16:21:16] Ugh, I hate when that happens. [16:22:20] Let's narrow it a tad, running all includes/api/* tests [16:23:52] Fails for me if I run all tests. [16:24:10] And just includes/api/ [16:24:46] Yep [16:25:41] Looks like the login state (from a previous api test) is leaking over. So it's still got the username on hand and is like "Just give me a token you're already half logged in" or something? [16:26:06] Vs. should just fail outright when we give no username. [16:52:34] ostriches: If I remove the $user->checkPassword() call added to tests/phpunit/includes/TestUser.php in I423f09f, it passes. If I change it to load the password directly from the database and do $passwordFactory->newFromCiphertext( $row->user_password )->equals( $password ), it passes. [16:58:06] Clearly it's not the underlying code at fault here, it's the tests themselves. [16:58:09] Hmm [16:59:57] What seems to be going on is that, under AuthManager, the checkPassword() call is doing something to the session that overrides the session trickery that ApiLoginTest is trying to do. Since checkPassword() has been deprecated since October, it shouldn't have been added to TestUser in the first place. [17:15:49] anomie, tgr: globalrename is broken right now presumably due to authmanager... https://phabricator.wikimedia.org/T135656, (I just woke up, haven't looked into it yet) [17:17:53] legoktm: Since AuthManager isn't enabled yet, "due to AuthManager" isn't terribly likely. "Due to some change made by the AuthManager patches", maybe, although some change to make something use DeferredUpdates is as likely. [17:19:17] right sorry, I titled the bug "presumably due to authmanager changes" :) [17:21:30] the traceback points to https://gerrit.wikimedia.org/r/#/c/288093/ [17:24:17] legoktm: technically that's a sessionmanager change :) [17:25:03] legoktm: Ugh. I suspect adding $user->load( User::READ_LATEST ) in RenameuserSQL.php before the call to invalidateSessionsForUser() will fix it. Although perhaps we should add a $flags parameter to User::newFromId(). [17:29:24] [10:29:20] (PS1) Legoktm: Load User with READ_LATEST before invalidating sessions [extensions/Renameuser] - https://gerrit.wikimedia.org/r/289461 (https://phabricator.wikimedia.org/T135656) [17:31:15] legoktm: Want to live-hack that onto mw1017 and see if it fixes things? [17:31:49] tgr: http://authmanager.wmflabs.org/w/index.php?title=Main_Page&diff=166&oldid=165 (: [17:34:34] will do in a few [17:35:22] anomie, tgr: congrats on getting so much merged. :) [17:35:42] legoktm: I tried to test the patch and got a warning saying "User AFtest5 has been migrated to the unified login system. Renaming it will cause the local account to be detached from the global one. [17:35:56] I suppose that's not the right rename function to use? [17:36:14] tgr: uh, use Special:GlobalRenameUser [17:39:23] anomie: Testing 289453 now [17:41:48] Works, will merge if Jenkins agrees. [17:42:08] Oh yeah ci is dead [17:42:19] legoktm: verified working [17:42:25] tgr: locally or on mw1017? [17:42:31] locally [17:42:45] but I could repro the error locally [17:43:11] ok, thanks [17:43:12] anomie: Is there any reason we can't turn AuthMgr stuff on in master? [17:43:47] (And in fact, we should preemptively shut it off in prod so we don't accidentally something) [17:44:10] ostriches: You mean setting $wgDisableAuthManager = false by default? No reason, except we'd want to do a config change first to make sure it doesn't get enabled in production yet. [17:44:24] that was https://gerrit.wikimedia.org/r/#/c/289413/ [17:44:34] haven't gotten around to the master patch yet [17:44:34] Yeah, I wanna do that anyway, but it would be nice to let it start baking in people's masters, REL1_27 and beta. [17:44:47] but will do today [17:45:36] We'll actually want an accompanying change for the -labs config that'll turn it back on [17:47:08] ostriches: we can do that, but some extensions might break [17:47:28] Ones in beta or people's masters generally? [17:47:42] beta: https://phabricator.wikimedia.org/T110282 [17:47:56] masters too but that will happen anyway [17:48:02] we can live with it [17:49:03] ostriches: We might want to wait for T110283 (CentralAuth) first. Possibly T110448 (AbuseFilter), T110765 (DisableAccount), T110457 (OATHAuth). I don't think LdapAuthentication (T110453) is used in Beta Labs? [17:49:21] Not in beta, just on wikitech [17:49:22] T110283: Update CentralAuth to use AuthManager - https://phabricator.wikimedia.org/T110283 [17:49:23] T110448: Update AbuseFilter to use AuthManager - https://phabricator.wikimedia.org/T110448 [17:49:23] T110453: Update LdapAuthentication to use AuthManager - https://phabricator.wikimedia.org/T110453 [17:49:23] T110765: Update DisableAccount to use AuthManager - https://phabricator.wikimedia.org/T110765 [17:49:23] T110457: Update OATHAuth to use AuthManager - https://phabricator.wikimedia.org/T110457 [17:49:35] The rest of the extensions on T110282 at first glance seem like they'll just be using deprecated hooks but won't actually break. [17:49:35] T110282: Update extensions which are deployed on the Wikimedia cluster to use AuthManager - https://phabricator.wikimedia.org/T110282 [17:50:27] actually merging the centralauth patch and enabling on beta sounds like a good idea, that could certainly use some extended testing time [17:50:46] and the patch shouldn't affect the behavior with AM off [19:42:11] PHP Warning: require_once(/var/www/wiki/mediawiki/vendor/composer/autoload_static.php): failed to open stream: No such file or directory in /var/www/wiki/mediawiki/vendor/composer/autoload_real.php on line 32 [19:43:33] There's no autoload_static in https://github.com/wikimedia/mediawiki-vendor/tree/master/composer [19:44:11] $useStaticLoader = PHP_VERSION_ID >= 50600 && !defined('HHVM_VERSION'); [19:44:11] if ($useStaticLoader) { [19:44:11] require_once __DIR__ . '/autoload_static.php'; [19:46:40] https://phabricator.wikimedia.org/T135533 [19:52:36] Ah, already fixed [19:53:30] * Reedy hugs bd808 [20:14:32] tgr, ostriches: Want to merge https://gerrit.wikimedia.org/r/#/c/289453/ now, so we can see what failures are left in https://gerrit.wikimedia.org/r/#/c/289498/? [20:19:49] merged [20:38:58] anomie: Stupid simple patch I came across when poking tests earlier: https://gerrit.wikimedia.org/r/#/c/289465/ [20:39:14] Wait.... [20:39:18] That has syntax errors. [20:39:25] See this is why ci is useful! [20:39:41] I was about to say that line 115 didn't look right [20:40:09] Amending [20:40:29] (Also, xdiff unit tests that nobody can ever test are kinda silly...) [20:40:38] (considering we don't have it anywhere on the cluster these days...) [20:40:57] Fixed [20:42:31] ty [20:43:21] tgr, ostriches: Even better than I hoped, it fixed all the failures in https://gerrit.wikimedia.org/r/#/c/289498/. [20:43:33] \o/ [20:59:21] RFC meeting starting now in #wikimedia-office: Requirements for change propagation [20:59:42] ostriches: so, are we ready to merge that? [21:00:25] I think so [21:00:39] It'll either go wonderfully, or beta will fall over and crap itself! [21:00:41] :D [21:03:39] robla: TimStarling: Sorry, missed the meeting today. Catching up with notes now.