[15:29:56] tgr, anomie: it woulod be helpful to have the edit.php thing merged https://gerrit.wikimedia.org/r/c/mediawiki/core/+/460490 [15:30:35] oops, I reviewed that last week and forgot to merge [15:31:08] done [15:31:15] thanks! [15:31:29] that makes it way easier to test mcr undo/restore [15:32:52] tgr: I#m not sure what to do about https://gerrit.wikimedia.org/r/#/c/435822/ [15:33:10] Krinkle probably isn't awake yet [15:37:30] I think that patch should be merged [15:37:57] fixing Wikimedia data and gracefully handling broken data in MediaWiki are different issues [15:38:28] the current patch using a potentially-existing username is scary, though [15:38:40] tgr: also, revisionStore is strict about the user name not being ''. That'S what caused this error in the first palce [15:39:11] tgr: can you commeent ang maybe give a +1, asking krinkle if he's ok with this being merged? [15:50:08] on second thought, the database is used by other things than MediaWiki, and that might be true for non-Wikimedia wikis as well, so Krinkle's suggestion of running a maintenance script in the installer is probably better [15:50:41] so maybe do that and remove the catch clauses (and put some reference to the maintenance script in the exception text)? [15:52:01] but if you need a short term-fix for this which does not take too much effort, the patch is fine, just avoid conflicting with an existing user [15:52:30] tgr: i think the long term fix will arrive automatically with actor migration [15:53:10] you mean if this is in place actor migration will put the provided username in the database? [15:53:56] tgr: it'll have to put *something* into the database when encountering an empty username in the old schema. it should indeed change that to "". [15:54:17] anomie will know if that's the case [15:54:27] does that script exist yet? [15:54:50] I'm not a fan of using a valid username [15:55:01] did you know enwiki has an 'Unknown User'? [15:55:09] not the same but still... [15:55:54] DanielK_WMDE_, tgr: Note "Unknown user" is reserved by default in $wgReservedUsernames. But if you don't want to rely on that config setting for some reason, use an "interwiki" username rather than making up something else invalid. [15:56:01] I guess if WikiImporter already uses it, that concern is somewhat moot [15:56:48] in general reserving a 'system' interwiki prefix would be a good idea, I think [15:57:06] that would be neat, yes [16:02:02] mcrundo seems to be missing messages [16:06:24] * Krinkle has awoken [16:09:54] tgr: DanielK_WMDE_: fwiw, agreed that it should be fixed in the db, not rely on casting X to Y on every read. However, I think it makes sense to do the cast in the short term first. Also, it seems like this patch isn't actually a plain cast, I imagine there could be other reasons for fromFromAnyId to throw, and catching that with a fallback to unknown seems fair. [16:10:10] unless we want such scenarios to fatal, which would also be fair. [16:10:23] after the conversion [16:11:42] After the conversion, an empty user name could probably be fatal. it's a db inconsistency. [16:11:51] though historically, we have been pretty lenient about that kind of thing [16:11:59] it should definitly be logged, whatever we do [16:12:40] yea, actually, about that. Should we use wfLogWarning instaed of wfWarn? [16:22:01] anomie: how is the preview for McrUndoAction supposed to work? it seems to me like it does not. [16:22:27] I in fact see no code for doing a preview at all. [16:23:14] * anomie looks [16:27:53] DanielK_WMDE_: https://github.com/wikimedia/mediawiki/blob/master/includes/actions/McrUndoAction.php#L234-L236, looks like it was broken by https://github.com/wikimedia/mediawiki/commit/4835a75ec56444c61c5d0cfbc9e98ceb420fc513#diff-6ce4bad77eaf1d8bd47947c6b8ccc3e4R910 [16:30:01] DanielK_WMDE_: preferably neither in my opinion. Use $logger->warn() instead. [16:30:38] I have log been confused about the hundred different wf*(Log|Warn|Debug|)*() functions we have. [16:30:59] I think after 7 years, I'm getting the hang of it, but also avoid them like a plague. [16:31:31] I can now tell with confidence which one I need when looking at their code for a second, but the name along doesn't tell me. [16:32:37] Krinkle: my problem is that $logger-warn doesn't make unit tests fail [16:33:18] DanielK_WMDE_: Right. But we do have a way to make it fail the build. [16:33:22] Is that enough? [16:33:32] Not really [16:34:00] Why? [16:34:21] We assert that various log files are empty/non-existant after each build before reporting to Jenkins [16:34:25] becauese it's useful to see early when new code triggers a warning. [16:34:29] also, consistency [16:34:30] If we switched everything to $logger-warn, then deprecation wouldn't fail unit tests either [16:34:36] Incorrect. [16:34:43] Trigger a deprecatin and Jenkins will fail the build [16:34:46] per what I just said. [16:34:52] the build, yes [16:35:00] i want unit tests to fail immediately [16:35:29] OK. So create a PsrLogger sub class that throws on level >= WARN and inject it? [16:35:33] I'd +2 that [16:35:37] CindyCicaleseWMF: Should we archive the Phabricator projects for MCR-SDC phases 1 and 2? AFAICT they're done. [16:35:49] Krinkle: i tried that, and got *really* confused [16:35:57] perhaps the phpunit wrapper could force the logs-must-not-have-bad-stuff check [16:36:18] The default Psr loggers should be null and/or got to debug.log [16:36:40] Not during unit tests [16:36:43] but we can make the default groups in MWTestCase set to this thrower, and for unit tests you'll probablt want to inject it instead. [16:37:00] Because if you instantiate a class fresh, the service wiring won't apply [16:37:09] and your class will porbably (good) default to NullLogger [16:37:12] so need injection either way [16:37:21] for unit tests related to the class [16:38:23] Our Jenkins jobs set wgDebugLogFile, which means all groups are enable with all severities and go to debug.log as artefact. [16:38:41] in a perfect world that would work, but we have to make sure warnigns triggered via global state also cause tests to fail [16:38:47] We can then have MWTestCase switch the default Spi to throw on >= warning instead. [16:39:05] yes, that would be good [16:39:25] James_F: yes, I was doing a bit of cleanup this morning which moved an epic from each of phase 2 and phase 3 to the deployment milestone, so phase 3 can be archived, too. I will archive all 3. [16:39:35] but for unit tests that instantie new class, the throw-logger will need to be injected either way, given those will default to NullLogger and bypass the default [16:39:49] CindyCicaleseWMF: Aha, excellent. Congratulations! [16:39:52] Krinkle: https://gerrit.wikimedia.org/r/c/423153/ [16:40:10] should probably have a separate ticket [16:40:41] Krinkle: this is one of the few cases where i'd say it's probably not a good thing to rely on CI for this [16:40:55] err, not to rely on *DI* for this [16:41:17] this is essentialy equivalent to throw XYZ [16:43:54] tgr: since this has a +1 from krinkle now, do you want to give a +2? is there any reason to hold this back? https://gerrit.wikimedia.org/r/c/mediawiki/core/+/435822 [16:44:09] James_F: Done! \o/ [16:44:36] Now we just need to migrate all the data. Simple! [16:44:50] But seriously, excellent work all. [16:46:39] \o/ [16:49:45] On the other side of the fence, things are coming together: https://usercontent.irccloud-cdn.com/file/fjXaKims/Screenshot.png [16:50:06] (Forgive the mundanity of my test edits.) [16:52:32] James_F: that's nto using MCR yet, is it? [16:52:57] (and should that section really be called "structured data"?) [16:53:15] Nope, no MCR yet, all stashed into a namespace. [16:53:47] Yeah, the designs are evolving. Amongst other things, we'll want the captions will go ahead of the wikitext content rendering. [16:54:21] uuuhhhhh.... [16:54:36] that'S actually one of the thigns that MCR currently doesn't support. [16:55:00] it's not *impossible*, but MCR won't help yopu with that. [16:55:15] MCR assumes that the output of one slot is one lump of HTML that goes into one place [16:55:17] not two places [16:56:24] exposing output of the slots to the skind separately, and maybe even bits and pieces of that output, is something we have discussed before, but that's rather far away. [16:57:48] CindyCicaleseWMF: --^^^ [16:58:31] DanielK_WMDE_: I think the fact that patch will cause 'Unknown user' to be written into the database when actor migration happens merits a mention somewhere [17:00:15] tgr: but this patch does not cause this to happen. It would cause "unknown user" to be written to the database only during undeletion. [17:00:30] DanielK_WMDE_: noted, thanks [17:01:20] DanielK_WMDE_: why? it changes newRevisionFromRow too, not just newRevisionFromArchiveRow [17:01:23] I think actor migration should replace empty user names with "unknown user", but this patch does not make that happen. Well, not directly anyway. [17:01:42] or does the actor migration operate on a lower level? [17:03:43] tgr: I'd assume that it would not load revision records. but i have not checked. anomie? [17:03:50] I'm not really familiar with the internals of that project but I thought it has to go through the revisions to convert non-local usernames ino actors [17:03:57] neither am i [17:04:40] i assumed it would not go through revisions, since it also needs to process logs and other stuff. easier to directly hit the database for all cases, and just look at strings. [17:05:20] tgr: anyway, "unknown user" is a system user, there is no reaso for it not to exist in the database. i see nothing problematic about that [17:05:41] The actor migration uses the user ID if one is present. If user ID is 0 and the user name is empty, the migration script report an error. [17:06:15] anomie: but does it use RevisionStore to load revisions when copying data from the revision table? [17:07:22] tgr: i'm fine with adding whatever warnings, I just want to make sure they are accurate [17:07:36] DanielK_WMDE_: It doesn't copy data from the revisions table at all. It looks at rev_user, rev_user_text, and revision_actor_temp.revactor_actor. [17:12:23] anomie: rev_user_text is on the revision table, so I supposed it does look at the table. But it doesn't use RevisionStore. That'S what I thought. [17:13:09] DanielK_WMDE_: anomie: so basically, this patch is a short-term fix for production fatals, the long-term fix is a (still TBD) patch to the actor migration script so that when it encounters an empty user name it uses the actor ID of the 'Unknown user' system user? After which this patch is probably not needed and RevisionStore should just hard fail? [17:13:36] that's my take on it, yea [17:13:59] (is there an assumption that any wiki with a sufficiently modern MediaWiki version will have already migrated actors? or do we intend to support both schemas for a long time?) [17:14:23] DanielK_WMDE_: then the commit summary should explain that, IMO [17:15:03] DanielK_WMDE_: Did the parser object changes (the ones that introduced the notices/serialiation format change) also make the object bigger? [17:15:16] tgr: Not necessarily "to the actor migration script", it could be its own script. ... Actually, I may have misread my own code earlier. The actor migration script might go ahead and create an actor for the invalid empty-string username., [17:15:34] I'm seeing strict increase in "ITEM TOO BIG" errors, of which 99% correlates with ApiStashEdit failure to store a key by the same name as the one in the logs with "ITEM TOO BIG" [17:16:06] Starting Sept 12 [17:16:23] There are <150 "ITEM TOO BIG" errors in the 30 days until Sept 12. and there are 3,500 in the last 6 days. [17:16:40] * Krinkle files task [17:18:17] DanielK_WMDE_: Is it now time to do T198309? [17:18:18] T198309: Enable MCR migration stage "write both, read new" on testwiki - https://phabricator.wikimedia.org/T198309 [17:20:07] James_F: i told dba that we'd do it on wednesday, but i assume that wmf22 has landed on group0, so why not? we should ping dbas anyway, though [17:21:40] DanielK_WMDE_: did you see T204475? [17:21:41] T204475: update.php terminates without messages after MCR migration - https://phabricator.wikimedia.org/T204475 [17:25:30] tgr: i updated the commit message [17:26:09] DanielK_WMDE_: filed as https://phabricator.wikimedia.org/T204742 [17:27:10] CindyCicaleseWMF: no... that sounds odd. [17:32:07] Krinkle: that would point to something dragging a closure into ParserOutput. [17:32:09] that would be bad [17:35:39] uh. there is no origin/wmf/1.32.0-wmf.21 branch? [17:39:15] DanielK_WMDE_: switchover week was skipped [17:39:20] say hello to wmf.22 [17:41:50] Krinkle: yea, i fugured it out. so the branch that showes the error is wmf.20 [17:43:01] DanielK_WMDE_: I see save timing regress starting with wmf.20. Not sure why the memc errors only started Sept 12. [17:44:38] DanielK_WMDE_: I'm triaging a number of different issues for different components. Can you look at it further? [17:44:55] Krinkle: not sure how to investigate this [17:45:00] Let me know if you need anything wrt Logstash/channels etc. [17:45:06] I can't think of any change in MCR that would cause ParserOutput to bloat [17:45:42] Krinkle: I'm completely clueless about Logstash/channels etc :) [17:45:53] DanielK_WMDE_: OK. Let's PM. [18:52:26] Reedy wondering if you could review https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CreateRedirect/+/461187/ (which fixes a extension) please? :) [18:52:30] i've tested the fix [19:10:53] Amir1: Do you know what a common reason could be for ORESFetchScoreJob to error? [19:11:17] Krinkle: has it started yesterday? [19:11:30] if so, it's probably because of the pool counter but it shouldn't [19:11:32] It's one of <3 job types in prod that quite regularly fail. If it is intended to be normal/best-effort only, they should probably be caught and ignored in the code. [19:11:58] Oh, interesting. Started exactly on Sept 12. [19:12:00] 0 hits before then [19:12:08] wmf.20 regression [19:12:17] * Krinkle files task [19:12:18] that's interesting [19:14:51] https://phabricator.wikimedia.org/T204753 [19:34:04] Krinkle: perhaps related, an old bug about the same thing https://phabricator.wikimedia.org/T196076 [20:22:29] anomie: if you are around, PageTriage fails with Unknown column 'actor_rev_user.actor_user' https://phabricator.wikimedia.org/T204764 [20:22:58] sounds related to the $wgActorTableSchemaMigrationStage flag [20:25:29] * anomie sees RoanKattouw claimed it [20:25:40] Yeah I'm trying something [20:27:24] OK it worked, patch incoming [20:27:37] (I tested on mwdebug2002 to avoid having to set a bunch of weird config stuff locally) [20:29:36] anomie: https://gerrit.wikimedia.org/r/461208 [20:30:17] and greg noticed another one [20:30:34] Error: 1054 Unknown column 'rc_actor' in 'on clause' (10.192.32.110) | https://phabricator.wikimedia.org/T204767 [20:30:39] reaching https://test.wikipedia.org/wiki/Special:ActiveUsers?username=&groups[]=sysop&wpFormIdentifier=specialactiveusers [20:31:04] we will want to take a decision as to whether we keepo the feature enabled or not [20:34:11] ^ yeah, just noticed that [20:37:22] and maybe we could have a CI job that runs with feature flags turned on [20:38:28] Not sure CI would catch these [20:38:33] Could we move this feature flag to test2 maybe? [20:38:38] So as to keep test usable for testing [20:39:45] OTOH, finding these is part of testing... [20:41:43] OTOOH, I'm considering turning the feature flag off everywhere for the moment to redo things per https://phabricator.wikimedia.org/T204669#4593893 [20:41:44] I mean, testing other software. PageTriage is fatally broken right now (although my patch would fix that) [20:41:53] I wrote a quick summary on the train task at https://phabricator.wikimedia.org/T191068#4595680 [20:42:09] seems we can remove those tasks from blocking the train and maybe create a new tracking tasks for them? [20:42:46] anomie: or since it is feature flagged, keep it just on test2 [20:43:02] and maybe we could get it enabled on beta which might gather some infos as well [20:46:42] It seems to already be enabled on beta [20:46:52] Or at least the same PageTriage error happens there [20:50:48] RoanKattouw: Feel free to deploy https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/461209 for me. [20:52:53] On it [21:37:09] > You do not have permission to edit this page, for the following reason: [21:37:09] > ⧼interfaceadmin-info⧽ [21:37:12] Hm.. undefined message? [21:37:15] Seeing it on wikitech [21:37:53] Krinkle: It's in WikimediaMessages as wikimedia-interfaceadmin-info I thought? [21:38:12] is that installed on wikitech? [21:38:27] "interfaceadmin-info": "$1\n\nPermissions for editing of sitewide CSS/JS/JSON files were recently separated from the editinterface right. If you do not understand why you are getting this error, see [[mw:MediaWiki_1.32/interface-admin]].", [21:38:29] That's in MW core [21:38:33] Krinkle: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaMessages/+/master/i18n/wikimediaoverrides/en.json [21:38:51] the message "wikimedia-interfaceadmin-info" is not defined on wikitech, nor other wikis [21:39:00] Oh, wikitech. Probably the right's not configured there? [21:39:00] maybe in the next branch? [21:39:22] Yeah [21:39:23] defined here [21:39:24] https://www.mediawiki.org/wiki/MediaWiki:Wikimedia-interfaceadmin-info [21:39:32] It's in wmf.19 at least. [21:39:33] not defined here: https://en.wikipedia.org/wiki/MediaWiki:Wikimedia-interfaceadmin-info [21:39:54] (and thus on wikitech neither) [21:40:23] Oh, yes, we screwed up the definition. [21:40:41] https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaMessages/+/457809/ says included in master and wmf.22 [21:40:51] But not wmf.20. [21:40:55] yeah [21:41:04] backport for next swat? [21:41:15] Lame, it merge conflicts. [21:42:04] also, if you could grant me interface-admin on wikitech, that'd be great :) [21:42:14] (I see you're one of the bureaus there) [21:43:06] BRITISH [21:43:07] a writing desk with drawers and typically an angled top opening downwards to form a writing surface. [21:43:45] Gah. [21:46:03] Krinkle: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaMessages/+/461230 [21:47:40] Krinkle: I'm not a crat on wikitechwiki with my accessible account. https://wikitech.wikimedia.org/w/index.php?title=Special%3AUserRights&user=JForrester vs. https://wikitech.wikimedia.org/w/index.php?title=Special%3AUserRights&user=Jforrester (aka screw you, gerrit/LDAP integration). [21:48:27] Hm.. both are crat according to https://wikitech.wikimedia.org/wiki/Special:Listusers/bureaucrat [21:48:33] I'm doing it [21:48:37] I figured there's to James's there, one of them should work. [21:48:43] Wikitech is slow as fuck for userright changes though [21:48:44] But you have a third account [21:48:52] (change visibility) 21:48, 18 September 2018 Reedy (talk | contribs | block) changed group membership for Krinkle from autopatroller, content administrator, shell user and administrator to autopatroller, content administrator, shell user, administrator and interface administrator (trololol) [21:49:01] thx [21:49:06] two* [21:49:22] Oh, hmm, no, I'm wrong, it's "just" that +crat is insufficient to grant IA. [21:50:02] I can grant it somehow [21:50:11] Reedy: Are you +steward/ [21:50:21] https://wikitech.wikimedia.org/w/index.php?title=Special:ListUsers&group=steward [21:50:22] No one is [21:50:43] * James_F shrugs. [21:50:52] Member of groups: Cloud administrators, Bureaucrats, Check users, Content administrators, OAuth administrators, Shell users, Administrators, Autoconfirmed users, Users [21:51:08] cloud admin possibly [21:51:10] "Edit all user rights (userrights)" [21:51:19] Ha. That'd do it. [22:05:03] cloudamin has userrights [22:05:18] but as usual I arrive late to the party [22:05:39] oh, wikitech apparently does not inherit the wgAddGroups defaults [22:06:54] Because it's an odd semi-prod wiki. [22:07:42] note that sysop has editinterface but not interface-admin [22:07:59] and content-admin was created to have everything admins have except editinterface [22:08:26] iirc wikitech isn't listed as '+wikitechwiki' on InitialiseSettings.php James_F [22:08:38] so you have to configure everything [22:09:09] Ideally we'd just kill 99% of the wiki. [22:09:35] It's diverged very far from the "emergency back-up wiki that has content we can read to fix production when the data centre falls over". [22:09:38] and what's preventing you to get rid of the remaining 1% :P ? [22:09:55] * Hauskatze quoting somewhat the code coverage rules [22:10:52] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/461240 [22:11:20] tgr: but doesn't allow removal