[00:34:46] anyone brave around? :P https://gerrit.wikimedia.org/r/293247 [12:57:49] anomie: can you look at https://phabricator.wikimedia.org/T135656 ? it's the worst blocker so far and I am out of ideas [13:02:08] tgr: I don't see anything related to AuthManager in the backtrace in T135656#2363474... [13:02:09] T135656: GlobalRename is broken, presumably due to authmanager changes - https://phabricator.wikimedia.org/T135656 [13:10:21] tgr: Does the user rename trigger LocalUserCreated when renaming a local user? That could have caused the database situation behind T135656#2363474, although the mechanism I'm think of there (i.e. the change to CentralAuthHooks::onLocalUserCreated() in 5782d8577) should have been happening in 1.28.0-wmf.4 as well. [13:10:21] T135656: GlobalRename is broken, presumably due to authmanager changes - https://phabricator.wikimedia.org/T135656 [13:13:14] anomie: timing puts the blame on the train, just added a comment to the task [13:15:48] anomie: the exact one hour difference between identical log records makes me think the rename job fails due to a CAS error and then retries later even though localuser is already updated [13:16:04] what I can't figure out is what triggers the CA CAS error [13:17:20] tgr: The trace in #2363474 is because the renamed user got added to the localnames table, so the attempt to update the old localnames record to the new one fails. If nothing else, we could just change that job to delete the old row if the new row already exists. [13:17:37] tgr: The first one in #2363543 should be fixed by https://gerrit.wikimedia.org/r/#/c/293285/ [13:18:06] anomie: re: testing on labtestwiki: I check out /srv/mediawiki/php-authmanager, copy LocalSettings and StartProfiler from some other branch, switch wikiversions.php over, and then get a HTTP 500 [13:18:13] no logstash error either [13:18:22] what am I missing there? [13:18:23] tgr: The second one there, I have no idea unless it's state corruption due to the first one. [13:20:03] let's deploy the first then and see if that fixes it [13:20:11] I'll add it to SWAT [13:23:51] ...if CI recovers back before SWAT [13:28:07] tgr: I forget where logs end up, but I remembered that I had needed to create /srv/mediawiki/wmf-config/ExtensionMessages-authmanager.php to get it to work. [13:32:04] thanks, that's what I was missing [14:29:03] anomie: can you try to remember the log location? :) [14:29:18] or merge https://gerrit.wikimedia.org/r/#/c/293130/ as it is and I'll fix it in a separate patch later [14:31:22] tgr: Some logs end up on fluorine (production fluorine, not one in labs). And in logstash too, https://logstash.wikimedia.org/#dashboard/temp/AVUwbJTQivsygkP3YgE4 [14:32:09] I don't know where the log for the ExceptionMessages file being missing was, maybe I just found that one if eval.php was failing in the same way. [14:41:17] thanks [14:41:29] apparently fatals end up on fluorine but not logstash [14:41:43] I thought those always go to both places [14:42:11] tgr: which kind of fatals? fatal.log or things in hhvm.log? [14:42:23] I think bot should end up in logstash/kibana [14:42:38] bd808: looks like this: [14:42:41] 2016-06-08 14:40:48 [ee81c6c49d93ec8efa9cb7f0] labtestweb2001 labtestwiki 1.28.0-wmf.5 fatal ERROR: [1d15a428] PHP Fatal Error: syntax error, unexpected '{' {"exception_id":"1d15a428"} [14:42:52] so fatal.log I think [14:44:15] * bd808 looks [14:46:01] hmmm... I'm not finding type:mediawiki AND channel:fatal data. That doesn't seem right [14:47:19] it is enabled in InitialiseSettings as expected -- 'fatal' => 'debug', [14:49:02] tgr: Huh. I was trying to narrow it down, and suddenly it just started working. Did you fix it? [14:49:37] yes [14:50:29] thanks for the help; now that I know the errors are on fluorine it should be easy [14:59:53] tgr: I opened T137316 [14:59:53] T137316: Log events from the MediaWiki fatal channel not appearing in Logstash/Kibana - https://phabricator.wikimedia.org/T137316 [15:00:03] thanks bd808 [15:00:40] ugh, why is AuthPluginPrimaryAuthenticationProvider still used on labtestwiki? $wgAuth is AuthManagerAuthPlugin [15:06:55] tgr: Did you figure that out already? I'm not seeing it. [15:12:03] anomie: no. Try to register with an existing shell name and the error you get is 'authmanager-authplugin-create-fail' (untranslated, but I guess that's just a side effect of the hackyness of the test branch) [15:13:06] tgr: LdapPrimaryAuthenticationProvider uses that message too, if calling LdapAuthenticationPlugin::addUser() fails. [15:13:50] ahh [15:14:24] I guess that means I should be using a preauth provider instead [15:14:40] Remember, LdapPrimaryAuthenticationProvider is just a degeneralized version of AuthPluginPrimaryAuthenticationProvider :( [15:15:01] nvm, it doesn't mean that at all [15:15:27] just need a testAccountCreation [15:30:47] anomie: LdapAuthPlugin::addUser calls the LDAPSetCreationValues hook to "Let other extensions modify the user object before creation" and then the OpenStack implementation reads the shell name from the request [15:31:35] I don't see much chance of changing that to use an auth request [15:32:27] eh, I guess I can just fetch AuthManager session data from the hook [15:35:06] tgr: That doesn't seem likely to be why it would be failing, though. I adjusted the logging so the ldap printDebug() calls should go into logstash from labtestweb2001, but it looks like you have things in a broken state again so I can't try it. [15:35:40] right now it's failing because it's half-done [15:36:08] I realized there is another hook call hidden behind an LdapAuth-specific proxy hook that needs to be converted [15:50:52] anomie: are there other places for a HTTP 500 to be logged apart from fatal.log/exception.log/hhvm.log? [15:51:22] tgr: No idea. [16:24:10] anomie: https://gerrit.wikimedia.org/r/#/c/293130/ is as far as I got, any thoughts about the last problem? [16:24:50] * anomie looks [16:28:25] tgr: Sanity check: Are you sure that line is even being reached, rather than line 34 skipping it entirely? [16:32:23] tgr: Commented some more on the patch. [16:37:21] anomie: I copied that verbatim from the AbortNewAccount hook, I have no idea what it does [16:38:16] it could be that the connection was initiated somewhere else [16:38:21] I'll test [16:38:28] tgr: When I try it in eval.php, it works after I add a call to $ldap->connect() to ensure it's connected and change that to $ldap->getBaseDN( USERDN ). [16:39:53] indeed [16:51:16] anomie: is there anything we still need to do or fix before group1? [16:51:29] apart from the mobile style stuff [16:53:28] Nothing I know of. [19:34:19] tgr: `php maintenance/createAndPromote.php --force some_user new_password` runs without complaint but does not reset the password, unless I $wgDisableAuthManager = true; [19:34:41] am I doing something dumb? [19:47:52] ori: no; that script does not handle errors, I'll update it [19:48:13] ah, thanks [19:48:19] which does not explain why it doesn't work, but that's easier to figure out once errors are displayed [22:09:29] tgr: Are we waiting for slow Jenkins again this evening? [22:10:31] anomie: no (I hope), I was just re-checking that I am not forgetting any patch [22:15:07] nice thing about the disk failure is that the queues are completely now, the merge took under a minute [23:47:50] tgr: uh, I'll unstall the renames in a few hours, I have a script for it