[13:01:14] #startmeeting TechCom RFC discussion [13:01:34] huh, doesn't work apparently [13:02:05] o/ [13:02:24] #startmeeting [13:02:29] grr [13:02:33] no meetbot [13:02:35] needs to be kicked again? [13:02:36] hey Amir1 [13:02:57] duesen: hello, haven't seen you for a while [13:03:42] Amir1: ;) [13:03:51] Nikerabbit: i'm trying to restart the bot now [13:04:00] hi all [13:04:00] going by the docs. i havn't done this before [13:04:12] I tried the same, but looks like I dont' have access [13:04:26] Nikerabbit: I have access but also haven't done it yet, reading [13:04:58] restarted [13:05:07] #startmeeting TechCom RFC discussion [13:05:07] Meeting started Wed Jul 17 13:05:07 2019 UTC and is due to finish in 60 minutes. The chair is Nikerabbit. Information about MeetBot at http://wiki.debian.org/MeetBot. [13:05:08] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [13:05:08] The meeting name has been set to 'techcom_rfc_discussion' [13:05:33] #topic Abstract schemas [13:05:44] and schema changes [13:05:48] #link https://phabricator.wikimedia.org/T191231 [13:06:23] So, who's here for the rfc discussion? [13:06:27] o/ [13:06:29] o/ [13:06:32] o/ [13:06:32] * anomie is here [13:06:55] excellent ;) [13:07:02] o/ [13:07:18] Amir1: want to give a brief intro? [13:07:35] Sure [13:08:32] So the reason I am asking for this IRC meeting is to see if it's okay to use Doctrine DBAL for the first implementation of abstraction [13:09:08] plus, talk about extensibility of the abstraction and whether should we move support of oracale/mssql to extensions [13:09:31] and a basic bikesheding if we should go with json or yaml [13:10:10] let's keep that one for the end of the meeting ;) [13:10:43] duesen: sure, one thing should I go through the rfc itself (why we need abstraction, problems, etc.)? [13:11:07] has anyone here doesn't know what this RFC is about in general? [13:11:37] About Oracle and MSSQL: my understanding is that a) supporting them via DBAL is tricky and may require thema changes for these DBs and b) we (WMF) are not going to do that [13:12:19] Amir1: Re the first point, I like what you suggested in one of our emails: Have an abstraction layer and use DBAL to implement it. That way if DBAL turns out to be insufficient in some way (generally or for one particular DB) we can easily change it without breaking the rest of the system. [13:12:20] So approving the RFC means dropping support for these databases without replacement in furture versions of mediawiki, until someone does the necessary work to support these databases in an extension [13:12:32] Yes, plus we don't have a way to test such DBMSes [13:13:14] As we don't install non-free software in our infra as matter of principal [13:13:39] CindyCicaleseWMF, you have been in touch with the maintainers of the Oracle and MSSQL code. What'S their take? [13:14:38] Yes, I spoke to Skizzerz regarding MSSQL and Freakolowsky regarding Oracle. [13:15:00] anomie: having an isolation layer between "our" code and dbal seems like a good thing, if that layer doesn't become too complex. how much we'd want to invest in it depends on how much code binds against it. As far as I understand, for now, only the database updater would be binding to DBAL (or the layer wrapping DBAL). [13:15:03] is that correct? [13:15:28] duesen: From the private emails, it seems to me that the tentative plan is to drop Oracle and MSSQL from core, but also to make sure extensions can cleanly provide DB implementations. [13:15:30] anomie: yes, it's generally good idea to isolate third party application and frameworks in adaptive layer for better testability and LSP (for example for Oracle because Doctrine is not working very well with that) [13:16:06] duesen: Installer and updater. And probably CI testing that infrastructure. [13:16:07] They are in support of an abstract schema and moving alternative database code out of core. They are willing but have limited availability to help in creating extensions for the database code they currently maintain. [13:16:10] duesen: database updater and installer [13:16:43] installer will bind to Schema object of Doctrine DBAL, updater will bind to SchemaDiff object [13:17:08] CindyCicaleseWMF: Do they have an idea how many people actually use these DBs? IIRC, our own idea is "a one digit number"... [13:17:51] Amir1: but we are not going to use "automatic" schmea diffs, right? We'll define every schema change explicitly? [13:17:59] duesen: Also, for example, your getSchemaOverrides() stuff in MW's test framework would likely make use of it. [13:18:01] if we make it possible for extensions to cleanly provide DB implementations, would we make an extension that provides PostgreSQL support? [13:18:31] is it unknown how hard it would be to support those databases with dbal, or do we already know that it would require a lot of work like Amir1 hints? [13:18:43] duesen: the stats on usage are apparently wrong because pingback was not available in the versions that are actually using MSSQL and Oracle [13:18:49] milimetric: I opposed that suggestion when it was made previously. PG support in core also helps to keep us honest rather than writing MySQL-specific code with $db->query(). [13:18:50] anomie: oh yes, good point! [13:18:52] duesen: I can't say for sure but given things like https://gerrit.wikimedia.org/r/c/mediawiki/core/+/328377/22/maintenance/mssql/archives/patch-user_groups-ug_expiry.sql#4 this mistake has been in MSSQL for two years and no one noticed, I guess no one actually uses it [13:18:59] Both believe there is a non-trivial number of people using them. Many of the environments in which those databases would commonly be used would be the same environments that would be less likely to participate in pingback due to privacy/security concerns behind corporate firewalls. [13:19:33] duesen: Regarding "automatic" schmea diffs, yes we are not using that thing [13:20:27] milimetric: I didn't get your question sorry [13:21:02] milimetric: it seems like a good idea for WMF to maintain at least one extension that provides support for a database, to ensure the extension point actually works. However, as anomie said, it may also be good to maintain support for more than one db in core, to keep us honest about not binding to specifics of a single db system. [13:21:05] Amir1: anomie answered, but basically I'm wondering if we're planning on maintaining an extension that would be a good example of an alternative DB implementation [13:21:28] because if so, then it seems to me the argument for dropping MSSQL and Oracle support out of core is pretty strong [13:21:39] Nikerabbit: As far as I know, Doctine DBAL has a proper support of MSSQL but not Oracle [13:22:07] I mean, I have lots and lots of experience with MSSQL, and it would be kind of interesting to write such an extension, maybe I can help? [13:22:27] I don't see the argument for keeping any specific database implementations in core, since we can always add an extension to our gated CI build. [13:22:37] * anomie notes that, done right, the abstraction layer would allow an Oracle extension to not use DBAL if that's determined to be better than trying to add it to DBAL. [13:22:45] Amir1, anmie: do I remember correctly that even though DBAL supports MSSQL, it doesn't work around the fact that our schema for MySQL isn't compatible with MSSQL, and would require structural changes? I thought that was the main problem. [13:23:04] milimetric: Sure but we can't officially support it as there's no way to test it in CI [13:23:04] milimetric: I'm sure Skizzerz would appreciate any help. [13:23:12] milimetric: skizzerz is in favor of moving it to an extension and perhaps you could collaborate with him on that [13:23:15] anomie: yeah, the stack doesn't have to be the same for all DBs, DBAL would just be an umbrella over the DBs it supports [13:23:19] awight: might surprise some users who expect core mediawiki to be installable though [13:23:37] also, hexmode [13:23:47] also hexmode is working on an extension to support percona [13:23:57] anomie: good point [13:23:57] * Amir1 seconds anomie on "notes that, done right, the abstraction layer would allow an Oracle extension to not use DBAL if that's determined to be better than trying to add it to DBAL." [13:24:10] he has proposed one small patch to the installer, but otherwise it has the support it needs in core [13:24:46] duesen: I don't remember what Amir1 determined with respect to things like null values in unique indexes, but I think he did determine something about that one. [13:25:04] awight: I agree, but anomie seems to think that it would be too easy to let that slip. OTOH, we need to explicitly test "the other" DB in CI anyway, extension or no... [13:25:18] anomie: regarding null values in indexes, Doctrine DBAL properly handles it in MSSQL but not in Oracle [13:25:20] hexmode also was working on a proof of concept patch to investigate what it would take to move sqlite into an extension [13:25:27] CindyCicaleseWMF: FYI, I had to give that patch a -1 yesterday since it seems to depend on what we decided against in T467. [13:25:28] T467: RfC: Extension management with Composer - https://phabricator.wikimedia.org/T467 [13:25:38] anomie: good to know [13:25:54] ok, so seems like some consensus on: abstraction layer done right (TM), DBAL supporting MySQL, PostgreSQL, and MSSQL, and loosely collaborating on an extension for MSSQL, is that right? [13:26:03] awight: so yea, there seems to be no advantage to keepting support for two DBs in core. [13:26:03] Can I see the patch? [13:26:19] milimetric: DBAL also for SQLite, I believe. [13:26:27] duesen: On that note, the "other" DB will need to be tested in a separate CI job anyway, so that we can provide the matching db server. [13:26:29] #info done right, the abstraction layer would allow an Oracle extension to not use DBAL if that's determined to be better than trying to add it to DBAL [13:26:49] maybe then we can make an extension for SQLite support [13:26:58] Amir1: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/522614 [13:27:21] anI would be in favor of keeping sqlite in core because it's fast on small scales and we can move some tests to use that instead (Thinking out loud) [13:27:27] Amir1: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/519681/, https://gerrit.wikimedia.org/r/c/mediawiki/core/+/522614, and https://github.com/MWStake/SQLiteDB I believe are the patches you asked about? [13:27:28] *and I [13:27:34] Minor question, would it be better to supply concrete db implementations as composer libs or MediaWiki extensions? [13:28:04] Also, here is hexmode's installer patch: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/519681 [13:28:10] thanks [13:28:18] I suspect we shouldn't be using hooks, which suggests that composer libs are a good fit. [13:28:33] awight: Probably the part that would go into includes/libs/rdbms as composer libs, with an extension for the MW installer bits and runtime registration bits. [13:28:35] awight: I would think extension so an administrator doesn't have to code to get mediawiki up on database X [13:28:49] awight: I think there are plans to decouple RDBMS lib out of core [13:29:01] I don't know if they fit into this RFC though [13:29:25] milimetric: loosely collaborating on an extension for MSSQL, and Oracle, yea. [13:29:46] Amir1: Just to the extent that I hope we can have the non-MW-specific parts of the abstract schema code in includes/libs/rdbms, IMO. [13:29:56] composer as used right now by us mostly ships static dependencies, not something that the user can add to add functionality [13:31:01] anomie: Agreed [13:31:19] awight: libs can't know about MediaWiki, so they can't plug in/register. extensions have their dependencies going the other way - they know all about mediawiki, and can register and interact. [13:31:22] TemplateStyles may be a decent analogy for the composer vs extension question: The sanitizer library is a composer lib, while the extension has the glue to get MW to use it. [13:31:44] awight: so db support always needs an extension. that extension can use a lib if it wants to. [13:33:08] anomie: i hope the only mediawiki-specific part of the abstract schema code would be the mechanism for extensions to plug in.. [13:33:27] Sorry for the tangent--replies make sense. [13:33:30] duesen: except if it is in core, right? [13:34:22] Nikerabbit: what do you mean? [13:34:36] duesen: Also the schema (json or whatever) files themselves, and the code in the installer and updater that finds the files and passes them to the RDBMS code in the proper manner. [13:34:37] * duesen notes that we have passed the half hour mark [13:35:48] should the json files stay in maintenance/? [13:35:50] anomie: right. in my mind, that's aprt of the updater logic, not part of the schema abstraction. but you are right that this needs to be in core. [13:36:02] "db support always needs an extension", just checking because we earlied discussed whether some should stay in core or should all be extensions [13:36:16] It's always was weird to me why but since all other sql files are there [13:36:26] ok, so is anomie convinced by awight/duesen and others that we can put all alternative db support in extensions and still involve the PostgreSQL in CI? [13:36:49] Nikerabbit: ah, right. yes, I meant they can't be a "pure" library, because they at least have to know how to plug into mediawiki. [13:36:52] Amir1: Location of the schema files seems like a bikeshed question to me. Do we need to use meeting time to try to brainstorm it? [13:37:04] duesen: ok, understood [13:37:06] anomie: sure, for later [13:37:26] milimetric: We *can*. I don't think we *should*. [13:38:19] what's the downside [13:38:35] milimetric, anomie: while i think it's relevant to explore that question i nthe context of the rfc, i don't think it needs to be decided. the rfc just calls for the creation of the abstraction layer based on dbal, and implies the removal of support for MSSQL and Oracle from core, for now without replacement. [13:38:44] that's the critical point. [13:38:58] duesen: I agree with your statement. [13:40:06] milimetric: If we have only MySQL in core, then that may encourage people to write MySQL-only code which is something we've been trying to get away from for several years now. I'd prefer to not make it easier to do the wrong thing. [13:40:39] but wouldn't any patch that did that fail CI, if we always test against the PostgreSQL extension? [13:40:49] anomie: this can only be prevented by running against PG in CI. and for that it doesn't matter whether PG is in core or not. [13:40:53] ah, I see, not if we miss a test case, etc. [13:41:21] yea. but we are again diving into the discussion that we just said isn't necessary to decide right now :) [13:41:35] yes, sorry, I'm done :) [13:41:52] * anomie refrains from replying to the discussion (: [13:42:00] anomie: so my suggested way to do it for start is to replace *.sql file that represents the schema change with *.json file that is used to build the SchemaDiff object [13:42:00] I'm just as guilty ;) [13:42:05] Should this be noted somehow for the meeting notes? [13:42:36] Amir1: to clarify: for new schema updates, or also retroactively? [13:42:38] Amir1: Any thoughts on https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema#Schema_change_format as the format of that JSON file? [13:42:50] basically replacing all .sql files with .json files [13:43:04] duesen: ideally retroactively to have a cleaner(TM) repo [13:43:11] Nikerabbit: we'll have to start using more #info, or someone(tm) has to go through the log later and summarize... [13:43:35] duesen: agreed [13:43:43] Amir1: I'd leave that for later. though it may be good for making sure the json format is covering all cases, and works as intended. [13:44:12] Amir1: Isn't the schema change in DBAL determined by simply having a final .json with the desired schema, rather than a .json for each migration? [13:44:42] So our refactor would actually be, replace all .sql files with a single .json, or a .json for each table... [13:45:01] awight: that only works for the most trivial schema changes [13:45:17] sort of what I'm worried about ;-) [13:45:18] duesen: maybe one or two, yes, I don't want to introduce schema changes for the sake of testing if the abstraction works [13:45:18] As for retroactively, for MySQL it would probably be pretty simple. PG may need more consideration, as somewhere in there we'll likely need a change bringing the old PG schema in line with the new abstract schema. Or declare a "MW/PG 2.0 schema" that requires a dump to XML and reimport rather than a database upgrade. [13:45:22] anomie: what about rollback? Should that be included along with the updates array? [13:45:57] milimetric: Rollback as in the Laravel feature mentioned in T191231#5035256? [13:45:58] duesen: agreed, number of schema changes is too big to keep in one file [13:46:00] T191231: RFC: Abstract schemas and schema changes - https://phabricator.wikimedia.org/T191231 [13:46:01] anomie: re-import from XML loses user accounts. not really an option, i think [13:46:16] ok, let's check what we agree on. [13:46:20] #info not deciding now about which DBs should be implemented as an extension as opposed to being in core [13:46:24] So, is ther eagreement that we want this, and want to try and implement this based on dbal, wrapped up nicely? [13:46:43] And is there agreement that we can drop MSSQL and Oracle from core? [13:47:07] I support both [13:47:14] me too [13:47:22] duesen: +1 from me on both questions. The latter since I know from other communication that the current volunteer maintainers of MSSQL and Oracle support it. [13:47:28] anomie: yeah, I was wondering if maybe the Schema_change_format should allow for it [13:47:53] anomie: I like your design of the abstraction, I can use it to write the SchemaBuilder class [13:48:17] Ok, so I propose the following: [13:48:21] #info volunteers maintainers of MSSQL and Oracle support moving support for these DBs into extensions [13:48:29] 1) Amir1 updates the RFC to reflect what was discussed here [13:48:30] milimetric: I suspect it would be significant added complexity for something rarely used. I already replied to that point in T191231#5338651, shall we continue that on Phab? [13:48:38] 2) once that is done, the RFC goes on last call [13:48:44] does that sound good? [13:49:10] anomie: not necessary, I'm just not up to speed, and that was the only thought I had looking through the format, otherwise looks good [13:49:53] duesen: Sounds good to me. [13:50:34] We have 10 minutes left, should we start bikeshedding JSON vs YAML vs something else? ;) Personally, at the moment I lean towards JSON with comments (using FormatJson::stripComments() or code like it). [13:50:44] yea, bikeshed time :) [13:51:23] Amir1: once you have updated the RFC, please move it to the RFC inbox with a comment saying that it's ready for last call, per the discussion in this meeting. [13:51:36] I'm in favor of json for two reasons: 1- It has native support in php 2- comments can be done in the similar way extension.json does for config variables [13:51:45] duesen: Sure [13:51:52] excellent :) [13:51:54] 3- yaml errors are nasty [13:52:13] #action Amir to update the RFC to reflect discussion and reply it's ready for last call [13:52:18] also, comments really should be _in the column definitions_ <-- this is something I feel pretty strongly about [13:52:26] I like yaml, because it's easier to edit by hand, easier to read by humans, and you can write multi-line comments without going insane. [13:52:31] 4- using json seems sorta standard in mediawiki [13:52:36] I note PHP support for parsing YAML is not bundled by default, which IMO is a point against it. [13:52:42] * Amir1 realizes two is not equal to four [13:53:09] I'm a yaml fan, but it seems a simple implementatio detail to switch between json and yaml later. Symfony has yaml library in pure php if that is a concern. [13:53:27] https://github.com/krakjoe/uopz/commit/e94fe6cace79edc700e520045b485f97605b24d2 [13:53:28] anomie, Amir1: YAML needing a composer dependency is kind of a moot point in the context of discussing DBAL wich is itself such a depßendency, and pulls in several more. [13:53:55] The most annoying part is when you make a mistake in json, it explodes, it gives you error "Unable to parse json" but when you make a mistake in yaml, it works but the output is changed in a sneaky way and it explodes when you deploy it :) [13:54:10] Nikerabbit: indeed, switching layter should be trivial, since converting between yaml and json is easy. [13:54:17] there are validators that work for both yaml and json to catch errors [13:54:33] Amir1: but you can get around that with for example having a schema [13:54:35] I use those for yaml files in Translate, helps a lot [13:54:45] +1 Nikerabbit, we need to validate the schema regardless of (isomorphic) language representation [13:54:55] Another option would be that suggested in T212460. [13:54:55] T212460: Adopt static array files for local disk storage of values (tracking) - https://phabricator.wikimedia.org/T212460 [13:55:13] it definitely needs to be interchangeable without much effor [13:55:17] *effort [13:56:26] but what should be used for the first iteration? [13:56:45] I think the implementer can decide (or we can have a vote...just kidding) [13:56:57] FWIW, json-schema can be used to validate YAML documents. [13:57:24] Is json_schame in core? [13:57:34] We use the justinrainbow validator, yes [13:57:58] https://github.com/justinrainbow/json-schema [13:58:23] that looks nice [13:58:30] Amir1: We could push the format decision to the MW side, and have the RDBMS code take the already-loaded data structure. [13:58:55] I think we should definitly have a schema, no matter whether we use yaml. [13:59:43] #info Should have a schema, no matter whether we use yaml or json or something else [13:59:47] We are almost at time [13:59:50] Our validator accepts an already-loaded structure too, so the on-disk representation is just a detail. [13:59:54] yeah, that should prevent most insanity that can happen with yaml and also allow for easy checking if we use json [14:00:42] with a json-schema I think it's really up to the implementers what the format is, whatever yall like :) [14:00:54] any last thoughts, or notes to add? [14:00:57] yeah that's also possible but I don't have preference where it should reside [14:01:33] Not a great location, but FYI https://github.com/wikimedia/mediawiki/blob/master/docs/extension.schema.v2.json [14:02:19] awight: thanks for the link, probably we will make something like that [14:02:38] https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/data/group-yaml-schema.yaml for Translate [14:02:58] one last question, do we have an agreement that each schema change should be represented with one file ? [14:03:05] there is a task to discuss whther to use a single validator for all mw stuff [14:03:59] Amir1: it's not clear to me what would be the pros and cons [14:04:00] Amir1: I suppose it depends what you mean by "change". For something like patch-comment-table.sql, I'd keep it in one file but have multiple "changes" in the file. [14:04:33] anomie: I mean in a sense that they are "immutable" put one change in one commit [14:04:36] ok, i have to run [14:04:44] one file per commit [14:04:57] Nikerabbit: will you close the meeting? otherwise, no minutes get written ;) [14:04:59] One file per commit sounds like a generally good goal to me. [14:05:04] duesen: I will [14:05:08] Good question, IMO the most important thing to consider is how these migrations be deployed. A single file should be something you would deploy on its own, I think. [14:05:50] more like a "transaction" I guess [14:06:07] * anomie wishes MySQL actually supported transactions for DDL. [14:06:10] it can be one schema change or several but they will share the same checks [14:06:18] run or fail togther [14:06:31] Amir1: I'd have them each have their own checks, as in the wiki page I linked. [14:07:18] Amir1: To make it more natural to avoid problems like T227662. [14:07:18] T227662: Update.php failure when upgrading from 1.32.1 to 1.33 - https://phabricator.wikimedia.org/T227662 [14:07:33] anomie: I can see both pros and cons, sharing checks bit me before [14:08:01] we are way out of time [14:08:31] yeah, so no decision on this last question? [14:08:43] Seems like not at the moment. [14:09:05] Nikerabbit: I think it's clear they should be one file per commit [14:09:17] how they do checks and stuff is more of details [14:09:20] I offer a humble +1 [14:09:39] *one file per commit/deploy [14:09:48] okay, I'm going to end this meeting with the bot [14:10:00] thanks Nikerabbit [14:10:07] #endmeeting [14:10:07] Meeting ended Wed Jul 17 14:10:07 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [14:10:07] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-07-17-13.05.html [14:10:08] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-07-17-13.05.txt [14:10:08] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-07-17-13.05.wiki [14:10:08] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-07-17-13.05.log.html [14:10:25] Thank you everyone [14:12:39] thank you Amir1 [14:38:34] https://meta.wikimedia.org/wiki/Special:MyLanguage/IRC_office_hours/Office_hours_2019-07-17 Log posted