[01:38:54] With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/ [01:38:54] I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/ [01:38:54] Read what IRC investigative journalists have uncovered on the freenode pedophilia scandal https://encyclopediadramatica.rs/Freenodegate [01:38:58] A fascinating blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/ [05:14:44] PROBLEM - puppet on ORES-redis02.experimental is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:42:44] RECOVERY - puppet on ORES-redis02.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:41:17] (03CR) 10Ladsgroup: Service wrapper to prevent misspellings (031 comment) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/448085 (owner: 10Awight) [08:48:47] 10Scoring-platform-team (Current), 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Rewrite scoring libraries to replace pywikibase with mwbase - https://phabricator.wikimedia.org/T194758 (10Ladsgroup) https://github.com/mediawiki-utilities/python-mwbase/pull/4 [08:54:47] (03CR) 10Ladsgroup: [C: 031] "This looks good to me, I will merge it if there is no objection by the next couple of days." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449274 (https://phabricator.wikimedia.org/T199357) (owner: 10Sbisson) [08:56:48] (03CR) 10Ladsgroup: [C: 032] AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [09:02:09] (03Merged) 10jenkins-bot: AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [09:05:43] (03CR) 10jenkins-bot: AppendCreator for adding judgments [extensions/JADE] - 10https://gerrit.wikimedia.org/r/447927 (owner: 10Awight) [09:06:43] wiki-ai/ores#963 (privacy_policy - b6f6a21 : Amir Sarabadani): The build passed. https://travis-ci.org/wiki-ai/ores/builds/410709357 [09:13:45] (03CR) 10Ladsgroup: [C: 031] Maintenance script to backfill scores in PageTriage queue (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449475 (https://phabricator.wikimedia.org/T198982) (owner: 10Sbisson) [09:14:02] (03CR) 10Ladsgroup: [C: 031] "I will merge this if there is no objection by tomorrow." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449475 (https://phabricator.wikimedia.org/T198982) (owner: 10Sbisson) [10:16:29] (03PS3) 10Sbisson: Maintenance script to backfill scores in PageTriage queue [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449475 (https://phabricator.wikimedia.org/T198982) [10:16:52] (03CR) 10Sbisson: Maintenance script to backfill scores in PageTriage queue (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449475 (https://phabricator.wikimedia.org/T198982) (owner: 10Sbisson) [10:17:18] 10Scoring-platform-team, 10DBA, 10JADE, 10Operations, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) To me it still seems the easiest solution would be to put this on a separate wiki. That way, we can observe... [10:20:45] Hi Amir1, thanks for the reviews. I would like to deploy the cleanupParent config change soon so it is in place when the PageTriage/draftquality code reaches betalabs. See https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/449437/ any objections? [10:21:49] stephanebisson: hey, I wrote something there, no strong feeling about it though. What do you think? [10:22:44] You wrote: "PageTriage is not yet enabled on enwiki. At the end it doesn't matter though." I'm not sure what you mean... PageTriage is in fact enabled on enwiki. [10:23:52] stephanebisson: I meant the ORES part basically [10:24:07] is it? if yes, that's amazing [10:24:48] The ORES features are there but hidden behind a url param [10:26:10] oh okay [10:26:15] stephanebisson: so go on :) [10:26:31] Amir1: thanks [10:27:12] Amir1: can you +1 the patch so your comment doesn't make the deployer doubt [10:27:26] stephanebisson: a question that is a little bit unrelated and it's just for curiosity. I thought you are making a brand new extension to replace https://en.wikipedia.org/wiki/Special:NewPagesFeed. Are you rewriting the extension? [10:27:47] I knew PageTriage extension but I thought it's getting phased out [10:29:05] the icons in the special page make me cringe :/ [10:29:12] Amir1: We are not rewriting it, just adding a few features. PageTriage is an enwiki-specific extension. We are not expanding it to new wiki but not retiring it from enwiki either. [10:30:20] The UI is dated for sure but updating it would be a major fight with very little benefit to this community. [10:30:37] oh yeah. I feel you [10:30:47] been there, done that [11:30:55] 10Scoring-platform-team, 10MediaWiki-extensions-PropertySuggester, 10Wikidata, 10artificial-intelligence: [Spike] Use suggested properties to get signal for completeness - https://phabricator.wikimedia.org/T158430 (10hoo) > (1) Run the set of scripts for generating propertypairs against a database dump.... [12:36:33] 10Scoring-platform-team (Current), 10Wikidata, 10User-Ladsgroup: Run analysis of revert time and number changes over time for wikidata - https://phabricator.wikimedia.org/T189962 (10Ladsgroup) Ready to review [12:55:55] 10Scoring-platform-team, 10ORES, 10Research Ideas: Analyze the effects of ORES deployments on counter-vandalism behavior - https://phabricator.wikimedia.org/T200898 (10Halfak) [12:56:17] 10Scoring-platform-team, 10ORES, 10Research Ideas: Analyze the effects of ORES deployments on counter-vandalism behavior - https://phabricator.wikimedia.org/T200898 (10Halfak) [12:56:20] 10Scoring-platform-team (Current), 10Wikidata, 10User-Ladsgroup: Run analysis of revert time and number changes over time for wikidata - https://phabricator.wikimedia.org/T189962 (10Halfak) [12:58:27] 10Scoring-platform-team, 10ORES, 10Research Ideas: Perform revert time analysis on two more non-enwiki wikis. - https://phabricator.wikimedia.org/T200899 (10Halfak) [12:59:17] 10Scoring-platform-team, 10ORES, 10Research Ideas: Use an interrupted time series analysis on time-to-revert data. - https://phabricator.wikimedia.org/T200900 (10Halfak) [13:53:28] 10Scoring-platform-team, 10ORES, 10Research Ideas: Split revert-time analysis by agent type (bot, tool, registered, anon) - https://phabricator.wikimedia.org/T200905 (10Halfak) [13:54:27] 10Scoring-platform-team (Current), 10Wikidata, 10User-Ladsgroup: Run analysis of revert time and number changes over time for wikidata - https://phabricator.wikimedia.org/T189962 (10Halfak) OK I made an epic to cover other work we should do before we speak publicly about what we found. See {T200898} I thin... [13:55:26] 10Scoring-platform-team (Current), 10Wikidata, 10User-Ladsgroup: Run analysis of revert time and number changes over time for wikidata - https://phabricator.wikimedia.org/T189962 (10Halfak) Oh! And to the point of reviewing this specific task, please limit your aggregate analysis to 12 months. This will he... [14:00:51] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @chiborg - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:03:50] PROBLEM - puppet on ORES-worker04.experimental is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] [14:31:55] RECOVERY - puppet on ORES-worker04.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:51:02] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @chiborg - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:08:53] Was in meetings. Am around now. [15:16:42] 10Scoring-platform-team, 10ORES: ORES should not ckeck highly rusted user's edits - https://phabricator.wikimedia.org/T200908 (10Bencemac) [15:28:05] 10Scoring-platform-team, 10ORES: ORES should not ckeck highly rusted user's edits - https://phabricator.wikimedia.org/T200908 (10Bencemac) I forgot to mention, but this was not the first time; I got likely and very likely before. [15:33:50] 10Scoring-platform-team, 10Edit-Review-Improvements-RC-Page, 10Growth-Team, 10MediaWiki-extensions-ORES, and 3 others: Index on oresc_probability, temporarily or permanently - https://phabricator.wikimedia.org/T175778 (10Ladsgroup) 05Open>03Resolved This index is already in place. [15:33:54] 10Scoring-platform-team, 10Edit-Review-Improvements-RC-Page, 10Growth-Team, 10MediaWiki-extensions-ORES, 10Patch-For-Review: Very long search times on RC Page for "Very likely good faith" + "Likely have problems" (on en.wiki only?) - https://phabricator.wikimedia.org/T164796 (10Ladsgroup) [15:45:17] 10Scoring-platform-team, 10ORES: ORES should not check highly trusted user's edits - https://phabricator.wikimedia.org/T200908 (10Reedy) [16:47:43] Amir1: I messed up: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/449756/ [16:48:46] stephanebisson: if it helps you, I did it too. We need to find a better way for that later [16:49:11] yep, I'll swat this one out tomorrow [16:53:30] Amir1, https://en.wikipedia.org/wiki/Slurpee [16:54:09] trashy shaved ice [16:54:20] :))) [17:26:51] (03PS1) 10Ladsgroup: Join decomposition on maintenance/PurgeScoreCache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449768 (https://phabricator.wikimedia.org/T200680) [17:27:24] Amir1: Thanks for the E:JADE review! [17:27:57] awight: Thank you for working on it! [17:27:59] https://gerrit.wikimedia.org/r/#/c/449768/ [17:28:00] :D [17:28:03] sure thing [17:28:52] Apologies, all, for the lateness this morning—I usually just rely on the family to wake me up, but I looked so pathetic that they let me stay asleep this morning. [17:29:07] * awight continues with the sloow process of waking up [17:29:30] awight: Please don't +2 [17:29:33] I found some bugs [17:29:35] :D [17:29:37] kk [17:29:40] (03CR) 10jerkins-bot: [V: 04-1] Join decomposition on maintenance/PurgeScoreCache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449768 (https://phabricator.wikimedia.org/T200680) (owner: 10Ladsgroup) [17:29:49] hehe, so did jerkins [17:31:00] (03CR) 10jerkins-bot: [V: 04-1] Join decomposition on maintenance/PurgeScoreCache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449768 (https://phabricator.wikimedia.org/T200680) (owner: 10Ladsgroup) [17:31:05] (03PS2) 10Ladsgroup: Join decomposition on maintenance/PurgeScoreCache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449768 (https://phabricator.wikimedia.org/T200680) [17:34:33] (03CR) 10jerkins-bot: [V: 04-1] Join decomposition on maintenance/PurgeScoreCache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449768 (https://phabricator.wikimedia.org/T200680) (owner: 10Ladsgroup) [17:36:22] (03CR) 10jerkins-bot: [V: 04-1] Join decomposition on maintenance/PurgeScoreCache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449768 (https://phabricator.wikimedia.org/T200680) (owner: 10Ladsgroup) [17:37:25] (03PS3) 10Ladsgroup: Join decomposition on maintenance/PurgeScoreCache.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449768 (https://phabricator.wikimedia.org/T200680) [17:39:06] AFK for lunch [17:42:14] awight: oh this easy merge: https://github.com/wiki-ai/ores/pull/257/files [17:42:25] cool, after SoS. [17:42:43] and the gerrit patch is ready for review now [18:00:38] 10Scoring-platform-team (Current), 10ORES, 10User-Ladsgroup: Enable wp10 and draftquality models for testwiki - https://phabricator.wikimedia.org/T198997 (10Ladsgroup) a:03Ladsgroup Just putting it in my plate, hopefully will get to it rather quick. [18:10:00] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, and 3 others: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10awight) @SBisson Should we tag https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/PageTriage/+... [18:16:43] 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-extensions-ORES, 10MediaWiki-extensions-PageCuration, and 3 others: PageTriage requires ORES to be installed - https://phabricator.wikimedia.org/T200412 (10SBisson) >>! In T200412#4469921, @awight wrote: > @SBisson Should we tag https://gerrit.wikimedia.o... [18:37:46] o/ [19:12:42] awight, I was meaning to ask you why you didn't see MCR as a solution to table size in MW [19:13:00] Table size isn’t the issue, it’s the metadata. [19:13:12] Table contains the metadata? [19:13:13] MCR doesn’t seem to solve anything, AFAICT [19:13:18] When I said MCR earlier, I was thinking of the normalization of the revision table. [19:13:31] MCR revisions will show up in page and revision table metadata [19:13:52] Right. But the revision table is split into several smaller tables. [19:14:52] There are still records for every page and revision we create, it’s only the content which is shifted by MCR [19:21:04] awight, I don't think that is right. I think that a big part of MCR is splitting apart the revision table so that no individual sub-table is as big. [19:21:16] And that this allows for multiple content references and better scalability. [19:21:40] That’s right, but the “big” you’re talking about is data length, not row count. [19:21:41] * halfak looks for a schema [19:21:51] Row count is unchanged in our case. [19:21:54] Right. Data length *per table* [19:22:10] Data length is not a limiting factor, per this comment: [19:22:26] > Data (external storage) storage is never a concern, because it is a key-value storage and is already sharded. In general it is never an issue- although growth requires to be blocked on extra hardware resource allocation, but in an ideal world where hardware resouces are not a limiting factor it has almost indistinguishable scalability with infinity. [19:22:28] By splitting one big table into many smaller tables (that add up to a bit more in the end) we get around some of the issues DBAs are worried about. [19:22:39] Yes. I'm not talking about external storage. [19:22:41] nah I don’t believe so [19:22:42] Metadata amount. [19:22:58] This is an issue that Mark clearly raised in our meeting this morning. [19:23:10] The size of the table should ultimately be under 100GB [19:23:40] If I take a 100GB table and I split up the columns using a foreign key, then I could maybe get two 51GB tables (accounting for overhead). [19:23:51] Let’s look at what happens when a judgment is made: a Judgment: page is created, so we add a row to page and revision. If using MCR, we add a slot content row to a slot table. If not using MCR, we add a row to the text content table. [19:24:33] https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 [19:24:48] Right. [19:25:00] In the MCR case, we’re avoiding a row in the text table, but that doesn’t matter because it shards. We’re still adding to page and revision tables, and to their metadata. [19:25:18] ty for the link [19:25:28] Right. But the revision table contains a foreign key instead of a text comment. [19:25:36] Which more than halves its size right there. [19:25:55] Take out the text of username/IP and you get another size-able chunk. [19:26:18] Ultimately, the database is bigger, but each individual table is less big. [19:26:19] ? text isn’t stored in the revision table [19:26:38] There’s no change to revision table under MCR, as I see it [19:26:47] In that link above, see the line about MCR: > not related to compaction [19:27:21] Ah yeah I think I see the confusion. > But the revision table contains a foreign key instead of a text comment [19:27:30] The comment is stored in the revision table. The username is stored in the revision table. [19:27:31] That’s already how content is stored in production [19:27:45] It's not. [19:27:52] The comment is in the revision table. [19:28:00] cool, I’d be happy to be wrong about that. [19:28:17] That page is from 2017 fwiw [19:28:34] halfak: https://www.mediawiki.org/wiki/Manual:Comment_table [19:28:46] https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/357892/ [19:29:10] Huh. That's news to me. [19:29:30] It was done to allow for longer comments [19:30:11] If I look at the prod replicas (not the labsdb views) then it still puts rev_comment in the revision table. [19:30:12] Also, this is where non-MCR content is stored in production: https://www.mediawiki.org/wiki/Manual:Text_table [19:30:22] u should file a bug ;-) [19:30:36] I don't think we're running that comment table in prod. [19:30:41] Not for the big wikis anyway. [19:30:49] lemme look [19:31:02] either way, are you saying that MCR displaces the comment somehow? [19:32:04] Yeah. Same idea. MCR requires refactoring the revision table. That refactoring gets us scalability whether we use MCR for what it was designed to do or not. [19:33:02] enwiki replica on stat1005: | comment | TokuDB | 10 | Dynamic | 15510120 | 97 | 1505447482 | 9223372036854775807 | [19:33:06] We should probably stop saying MCR and start saying "revision table compaction" [19:33:25] Not sure what that line is saying. [19:33:57] just that the comment table is indeed being used [19:34:07] I’m trying to tell whether rev_comment is deprecated, too. [19:34:25] https://gist.github.com/halfak/959a7e26381e5dab994d34dd4f6c1c59 [19:34:50] There's no rev_comment_id column. Is there a revision_comment table? [19:35:11] Aha! "revision_comment_temp" [19:36:20] Looks like comment text still appears in the revision table. [19:36:26] For recent revisions. [19:36:41] I think I’m lost—our goal right now is to estimate impact on table size and row count, and find optimizations? [19:37:10] No. Revision table compaction, if successful, minimizes the impact of JADE on scalability concerns. [19:37:46] Revision table compaction is related to MCR work. But it seems that it is not necessarily the same. [19:37:58] And evidence suggests that compaction is currently underway. [19:38:00] This sounds like we’re finding optimizations... [19:38:13] https://phabricator.wikimedia.org/T189234 [19:38:28] https://www.mediawiki.org/wiki/Manual:Database_layout [19:39:07] It seems that comments are being moved to the comment table, rev_comment will be deleted eventually, and we can assume the future schema for our purposes. [19:39:21] Right. This hasn't been part of our conversation yet,. [19:39:29] kk [19:39:53] If we're blocking on the revision table size and the revision table size is being addressed in the general case, then we should make estimates assuming that the revision table will be compressed. [19:40:05] Unless failure is imminent and will definitely happen before compression finishes. [19:40:25] okay, thanks for explaining. [19:40:38] I don’t think we have to get this low-level, though. [19:40:49] Now, I'm not quite sure what this means, but it's all way farther ahead than I expected (which is awesome for us) [19:41:17] Well, right now, we're at this low of a level when discussing JADE and deployments. According to Mark, it's all a matter of scaling the revision table. [19:41:31] My estimates are that we get an additional 1% of content when enabling all of the workflow integrations I’ve identified. So we can just multiple current data length, row count, and growth by 1% [19:41:47] If we can keep individual table size under 100GB, then jynus' stress level goes down. [19:42:08] The important part IMO is that we have a very fine-grained way to dial up volume from 0-1% [19:42:17] awight, an important question: How big is revision now? And do we need to count the index size when measuring table size? [19:42:32] awight, right now, no growth is acceptable in enwiki or wikidatawiki. [19:42:37] We aren’t even in the business of dealing with 100GB table sizes, unless it’s 99.1GB today... [19:42:39] Not even 1% [19:42:53] We should let the editors know :p [19:43:04] Right. I brought that up with Mark and he got nervous. [19:43:30] I’m not joking when I say that I have the utmost sympathy for everyone responsible for our DB [19:43:35] "We must be on a dead end road if no growth is acceptable" [19:43:47] But yeah, zero growth is clearly not going to work for anyone. [19:44:12] If that’s being stated publicly, we can have a bigger discussion about it [19:44:28] Right. that's where I want techcom to be coming into this discussion. [19:44:29] But if we’re just hearing through back channels “find a way to make it zero growth”, we… can't. [19:44:39] I don't want them talking about The Right Way To Implement JADE(TM) [19:44:58] I want them to talk about What Are We Going To Do So That We Can Keep Growing(TM) [19:45:04] +1 [19:45:14] Well, with revision compression, maybe we can. [19:45:32] that really shouldn’t be our problem? [19:45:32] Revision compression is a big bunch of negative growth from a per-table perspective. [19:45:36] s/?/./ [19:45:37] Yes. I agree. [19:45:40] Not out problem. [19:45:45] But our problem anyway. [19:45:46] s/[?]/./ [19:46:03] Well, there’s nothing we can do to “enable” revision compression AFAIK [19:46:12] It’s just an ongoing process [19:46:37] Right. But it's a discussion point. "The current DB is unmanageable" --> "OK but efforts are being made to make it manageable right now. Don't block us." [19:47:50] I hope we aren’t the mini-blackhole that tears apart the wikis’ fragile spacetime continuum. It’s feeling extremely grim at the moment. [19:48:10] Right. I think we're the straw that broke the camel's back though. [19:48:19] 'cause JADE is pretty tiny as far as content goes. [19:48:46] Or metadata [19:49:00] What I’d really like is for anyone to give us a comptent assessment of “you can grow to 0.1% or less of current volume” [19:49:20] over the next year. that kind of thing. [19:50:05] Right. Well the answer I got from Mark is that no growth is acceptable. But what we get from "revision compression" is that there *must* be some breathing room. [19:50:20] Can Mark write that down... [19:50:25] Good Q. [19:50:34] cos it’s absurd and it should be addressed entirely separately from our deal [19:50:36] like you said [19:50:43] this is a major RFC waiting to happen... [19:51:25] Or, it’s just an impulsive thing that someone said in a private meeting in order to cover for his guy a bit? [19:51:49] I feel like the real truth is “Please don’t create any more content because our DBAs are close to the melting point" [19:52:01] rather than, “our DBs can’t handle any more data" [19:52:24] Probably not far off. [19:52:34] I like your instinct to nail down some constraints we can operate in. [19:52:46] Have the link to your FAQ handy? [19:52:59] https://etherpad.wikimedia.org/p/JADE_scalability_FAQ [19:53:26] harej did some fantastic gardening in there, I need to re-read [19:57:04] We need a section for describing our strategy for limiting scaling. [19:57:17] line 63 [19:57:22] E.g. limited permissions to start off. Then limited deployment to tools. [19:57:53] yeah. Needs a better header. [19:58:00] Here’s my current best explanation, https://phabricator.wikimedia.org/T200297#4462154 [19:58:41] +1, that’s a better header. [20:03:41] I'm done in that section if you want to figure out how to wrap bots into that. [20:03:57] halfak: k, I’m going to reorder also, holler if it looks wrong. [20:04:05] awight, re. deletion of an entire namespace, we *could* do that and save space, right? [20:04:11] nope [20:04:19] I'm thinking that deleting a page requires adding it to the archive table. [20:04:36] But if there's no situation where one would undelete the page, there's no reason to put it into the archive table at all. [20:04:44] But like I was rambling about this morning, the likely “limit new growth” emergency is a lot different than “migrate to new storage". [20:04:57] I don’t see us ever doing the latter, therefore never having a reason to do this bulk delete. [20:05:23] If we migrate to new storage, it’s because there was a positive development in MEdiaWiki content storage, and all content is enjoying the same migration. [20:05:59] Okay I'm now caught up on this [20:06:21] What I want for my purposes is a list of wikis where ORES currently operates, sorted by size (in GBs of revision table) [20:07:41] harej: fyi, we aren’t necessarily deploying to ORES wikis [20:07:59] but I’m getting u the list [20:08:03] Thank you [20:08:21] (Do we have a list of wikis we're deploying to? I agree it doesn't have to be ORES wikis but they seem like a logical choice for early adoption.) [20:08:31] +1 [20:08:40] no list yet, that would be a great thing to work on [20:08:54] Also, a list of which workflows we want to integrate with, though it would be nicest to do the user testing first. [20:09:24] harej: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L19587-L19617 [20:09:25] I would like -- but do not necessarily require -- that they be wikis with ORES. To the extent that we want someone to go first, it helps if they're already familiar with ORES and our work in general. [20:10:24] Agreed, that makes sense. Also makes sense because JADE + ORES is a particularly interesting combination that was part of the incentive for launching this project in the first place. [20:10:57] All I meant was that, we’re also free to deploy to a wiki without ORES, if they’re the right size and are inclined to help with this. [20:12:40] revision table on enwiki is ~239GB [20:12:49] If you include indexes. [20:13:09] 104GB with just metadata. [20:15:53] Do you know how to get this information systematically? [20:16:00] show table status? [20:16:29] or you mean, for all wikis simultaneously? [20:18:43] That would be nice, or one wiki at a time at least [20:18:43] 10Scoring-platform-team, 10DBA, 10JADE, 10Operations, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) @daniel: The separate, single wiki alternative is still on my radar, IMO it's the only alternative which ca... [20:19:10] harej: for one wiki at a time, just issue “show table status” in mysql [20:28:42] hmm, is that data available in the wiki replicas? i just get a bunch of NULLs. [20:31:30] not on labsDB. Gotta use the internal replicas. [20:31:35] I could get this done. [20:31:40] Give me a few minutes. [20:32:04] A list of wikis that use ORES sorted by revision table size in GB. [20:32:13] Thank you for your help; I'm afraid I don't have production access. [20:35:11] oof, halfak sorry I missed that cue. I can produce the list if you’re not already on it. [20:35:20] started work [20:35:36] k [20:36:59] https://gist.github.com/halfak/17a8bc86968f0be39307e9bb3381e32c [20:37:00] Done [20:37:20] Nice! [20:37:24] \o/ [20:37:28] Faster than expected :D [20:37:31] Can you include the query… that’s rad. [20:37:50] I’d be curious about # of rows as well [20:39:26] On another topic from Ops, I’d like to engage more seriously with the idea of a runaway success. [20:39:54] As silly as that might be to expect in the real world, it does seem useful to at least work through together. [20:40:14] added the query to the gist [20:40:28] One big event that would change everything is if a third-party integration really took off. [20:40:56] halfak: haha , maybe I didn’t want to know. Nice text processing :-) [20:41:02] lol right :) [20:41:04] REGEX! [20:41:35] Took the output of [1] and applied grep and sed [20:41:36] 10[1] 10https://www.mediawiki.org/wiki/Extension:JADE [20:41:37] 1. https://ores.wikimedia.org/v3/scores/ [20:41:44] AsimovBot, wat [20:41:44] 04Error: Command “wat” not recognized. Please review and correct what you’ve written. [20:41:56] Quick lying about my footnotes [20:41:59] *quit [20:42:04] [2] [20:42:04] El búfer 2 está vacío. [20:42:06] [3] [20:42:06] El búfer 3 está vacío. [20:42:08] [4] [20:42:08] El búfer 4 está vacío. [20:42:15] [[Foo]] [20:42:15] 10[2] 04https://meta.wikimedia.org/wiki/Foo [20:42:18] [2] [20:42:18] 10[2] 04https://meta.wikimedia.org/wiki/Foo [20:42:21] [1] [20:42:21] 10[1] 10https://www.mediawiki.org/wiki/Extension:JADE [20:42:24] weird. [20:43:05] SELECT table_schema as wiki, round(((data_length) / 1024 / 1024), 2) as table_data_MB, round(((data_length + index_length) / 1024 / 1024), 2) AS index_data_MB FROM information_schema.TABLES WHERE table_schema in ('arwiki', 'bswiki', 'cawiki', 'cswiki', 'enwiki', 'eswiki', 'eswikibooks', 'etwiki', 'euwiki', 'fawiki', 'fiwiki', 'frwiki', 'hewiki', 'huwiki', 'lvwiki', 'nlwiki', 'plwiki', 'ptwiki', 'rowiki', 'ruwiki', 'simplewiki', 'sqwiki', 'srwiki', [20:43:05] 'svwiki', 'trwiki', 'wikidatawiki', 'testwiki', 'test2wiki') AND table_name = "revision" [20:45:00] select table_schema from information_schema.TABLES WHERE table_name = "ores_classification"; [20:45:32] Replace that "IN" constant with a query :D [20:46:37] harej: https://phabricator.wikimedia.org/P7416 [20:46:41] sorted [20:46:52] Thank you! [20:47:15] :100%: [20:47:57] halfak: niice [20:48:47] So in the runaway third-part integration scenario, what *does* happen when we have to protect the namespace? Why would that ever be a thing? What’s done for other namespaces? [20:48:49] https://docs.google.com/spreadsheets/d/1xISxyeZ2yCUOCrkywpq2NAtRYSXxJU-Gwyl6Z-meeBE/edit?usp=sharing [20:49:03] ah yeah I was thinking I should have given you TSV [20:49:18] so, which column is Ops most concerned about? [20:49:20] halfak: ^ [20:49:27] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/ORES] - 10https://gerrit.wikimedia.org/r/449844 (owner: 10L10n-bot) [20:49:34] awight, I'm not entirely sure. [20:49:38] k [20:50:13] so if 100 GB revision table is the maximum, this rules out English Wikipedia and Wikidata, makes us slightly uncomfortable with German and French Wikipedias, but rules everyone else in. [20:50:18] I think there’s some deal where *all* of some type of metadata needs to live on every db node. [20:50:25] in a cluster, at least. [20:51:07] so the 100 GB figure includes redundancy? [20:51:10] the content can be sharded, but metadata for joining needs to be in-memory or nearly so… I’m making it up obviously and should do that homework. [20:51:57] harej: That sounds great, not much of a hit to our project cos we wouldn’t have dared start with a top-five wiki anyway. [20:59:27] Let's do simple english :D [21:02:14] I like that we have confederates on all of these wikis, like harej was saying earlier… [21:03:44] yup. Focus group is already primed! [21:08:50] OK that's it for me for the day. I got through email, but I still haven't really addressed the review column. [21:08:56] I'll get on that first thing tomorrow. [21:46:15] harej: I’m going to move the scalability FAQ to mediawikiwiki, heads-up in case you were editing. [21:46:25] Sounds great. You think it's ready? [21:48:03] Hmm, I wasn’t thinking of it that way [21:49:02] It’s already being referenced by stakeholders in the discussion, so my idea was to make the document more durable and watch-able [21:49:51] That is a good point [21:50:04] (Have they referenced the changes?) [21:53:36] Naw, not sure anyone would actually watch :) [22:03:08] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @bd808 & @mooeypoo - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [22:52:30] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @bd808 & @mooeypoo - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting