[22:00:22] #startmeeting RFC meeting [22:00:27] * anomie waves [22:00:51] TimStarling could you kick the bot? [22:01:01] yes [22:01:18] O/ [22:02:14] #startmeeting RFC meeting [22:02:14] Meeting started Wed Mar 6 22:02:14 2019 UTC and is due to finish in 60 minutes. The chair is KateChapman. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:02:14] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:02:14] The meeting name has been set to 'rfc_meeting' [22:02:40] #topic: RFC: Abstract schemas and schema changes https://phabricator.wikimedia.org/T191231 [22:02:53] who is here to talk about abstract schemas and schema changes? [22:03:14] so far anomie and addshore it looks like [22:03:45] o/ [22:03:52] I'm lurking [22:04:38] o/ [22:05:07] in the committee meeting duesen was interested in doctrine as an option [22:05:07] anomie do you want to kick things off with a brief summary? It would also be good to know your preference on the two proposals. [22:05:57] Having to write almost-but-not-quite the same code 4-5 times for every schema change is a huge pain. It'd be nice if we could write it once and have code translate to the different DBs. [22:07:11] I like Option 1, mainly because that way we'd know it would work well with our existing LoadBalancer, IDatabase, and so on. [22:07:48] to use doctrine we would have to use doctrine connection objects? [22:08:04] Option 1 sounds like quite a bit more work, though. And we'd have to write abstraction layers for database systems we can't readily test. [22:08:07] how would we do that? [22:08:07] I'd think so. I'm no expert on doctrine though. [22:08:23] Back in 2014, for some prototype relating to Wikibase, we wrote https://github.com/wmde/WikibaseDatabase [22:08:50] I'm not convinced a json would be the best format for the abstraction [22:09:05] there was a previous proposal to use a DDL [22:09:08] It started off essentially implementing much of what doctrine did, before we found doctrine, as a mediawiki extension. The PR removing the functionality provided by doctrine is https://github.com/wmde/WikibaseDatabase/pull/48 [22:09:20] Just as an interesting point [22:09:30] duesen: Either drop MSSQL and Oracle if volunteers don't show up to help, or I could locally develop them and people could trust they work after static analysis I suppose. [22:10:00] Taking the MSSQL/Oracle burden off random devs doing schema changes would be great, regardless of how that's done [22:10:13] Platonides: Me neither. I'm not opposed to taking the advice of T212460. [22:10:13] T212460: Adopt static array files for local disk storage of values (tracking) - https://phabricator.wikimedia.org/T212460 [22:10:17] SQLite is also a bit of a pain though so I would be excited for us to have a define-once-use-everywhere kind of system [22:11:19] In terms of the proposal, #2 sounds great to me, mainly because it is worded as "Try to integrate". If that ends up not working, we can always reevaluate [22:11:32] RoanKattouw: Having to write "create a new table, copy everything, drop, and rename" for sqlite a bunch of times is another reason I finally got around to proposing this. ;) [22:12:07] anomie: I like the idea of dropping support for MSSQL and ORacle from core at the same time as we introduce the abstraction layer. The question is whether we would write the extensions that provide continued support for them, or leave that to volunteers. [22:12:34] Yeah that's a pain too. I'm personally more bothered by Oracle/MSSQL because at least SQLite is easy to understand for MySQL speakers, whereas when I read the Oracle/MSSQL table defs I have no idea what's going on [22:13:19] (It's still easy to accidentally try to do something in SQLite that it can't do though, because it has so many weird limitations especially with ALTER) [22:13:48] is there pingback data about the usage of db types? [22:14:13] addshore: I'm afraid that even if we did get Option 2 to work, it'd turn out to be fragile and hacky trying to match the different assumptions of Doctrine and MediaWiki. [22:14:18] anomie: it seems to me like we should try first to use an existing solution, like doctrine. only if that proves too annoying should we write our own. [22:14:30] Re pingback data, the answer to my own question is yes, https://pingback.wmflabs.org/#database-type [22:15:05] duesen: I like that approach, if it is decided to be too fragile or hacky then stop running with it [22:15:34] so, MSSQL is at 8 instances, Oracle at 0. [22:15:39] indeed [22:15:40] doctrine has e.g. Migrator::getSql(), so you don't need a doctrine connection to execute the migration [22:15:56] not surprising, since Oracle installs would probably be stuck on an old version of mediawiki that doesn't do pingbacks [22:15:59] #link https://pingback.wmflabs.org/#database-type [22:16:27] addshore, duesen: OTOH, it would mean we wasted a bunch of effort trying it. Is anyone familiar enough with both MediaWiki's DB layer and Doctrine to estimate the chances of it not being fragile and hacky? [22:16:43] although Migrator requires a MigrationRepository which requires a Connection, so you would need to connect with doctrine in order to generate the SQL [22:17:05] I'm familiar enough with MediaWiki's DB layer that I can say that it's fragile and hacky ;) [22:17:10] TimStarling: For schema changes, though, you often need to query the current state before knowing whether you need to make a change. Without a connection, can it do that? [22:17:48] yeah, it needs to query the current state via a connection in order to generate the SQL string [22:17:54] Skizzerz: You can see the discussion so far at https://wm-bot.wmflabs.org/logs/%23wikimedia-office/20190306.txt [22:18:38] why is creatign a doctrine connection a big problem? [22:18:49] For anyone who doesn't know, Skizzerz is my go-to person for MSSQL questions. [22:19:18] I mean, mapping our config to this should be simple enough: https://www.doctrine-project.org/projects/doctrine-dbal/en/2.9/reference/configuration.html [22:19:18] the question is just how intrusive it is [22:20:08] table prefixes could be a problem [22:20:14] so there is a Connection interface https://www.doctrine-project.org/api/dbal/2.5/Doctrine/DBAL/Driver/Connection.html [22:20:15] (can we just kill that feature?) [22:20:32] we could provide our own implementation of that interface [22:20:39] duesen: We'd have to get the right data out of LoadBalancer and munge it into the right format. Maybe easy, maybe hard edge cases. [22:22:20] you see what I mean right? if we only want doctrine for migrations, we can just wrap a connection from the existing LoadBalancer with a doctrine interface [22:22:28] that's a way of limiting the project scope [22:22:39] then there's no need to map configuration or establish a separate connection during installation [22:23:12] TimStarling: I fear our own munging of SQL and messing with table prefixes and such may get in the way if we trry to do that. [22:23:32] I think mapping configuration is easier and more reliable [22:23:51] but both options exist and should be evaluated [22:25:33] if doctrine doesn't have table prefixes then how would you map table prefix configuration to doctrine? [22:26:04] Besides connection info, we'd also have to look into type mapping. MediaWiki definitely needs binary-blob fields, and everything except MySQL and SQLite also seems to try to use real character fields for Unicode text too. In some places we have problems because a field was created as text on non-MySQL and then someone started putting gzipped data into it. [22:26:17] Doctrine's schema manager/migration can be used to just generate SQL strings that can be passed through our own layers, rather than going through their connection objects [22:26:47] Skizzerz: yes, but as we said above, you ned a doctrine connection to inspect the current schema [22:27:07] TimStarling: for that, your approach would work better of course. [22:27:36] anomie: re wasting time on trying to use doctrine.. that'S a risk, but then, there's also the risk of wasting time by not using doctrine. [22:27:37] Also would Doctrine handle stuff like munging unique index definitions as mentioned at https://www.mediawiki.org/wiki/User:Anomie/Abstract_schema/DB_Requirements#Unique_indexes ? [22:28:01] in my mind, standing on the shoulders of giants should be tried first, before we start to build our own ladder. [22:28:36] anomie: no idea, but maybe we could add that if it doesn't support that yet. [22:28:42] that's an unhelpful analogy [22:29:26] there is e.g. MigrationRepository::registerMigrationsFromDirectory() [22:30:02] it seems to me that it would be difficult to customise the migration file format in doctrine in order to add features that we need [22:31:08] the question is, once you add a few odd b/c features, is it still worthwhile to try to use doctrine's migration file format? [22:31:24] and if not, what is left in doctrine that is of value? [22:34:58] the community and active work. For example, they may support databases we decide to use in the future [22:36:24] TimStarling: don't get me wrong, i'm not totally sold on the doctrine idea. i think we should spend some time investigating it thoroughly, not go for it at all cost [22:36:36] (I have no skin in this game, but I do love sqlalchemy/alembic which does this for python.) [22:38:25] * anomie wonders if we'll ever get around to finishing librarizing Wikimedia\Rdbms\*, so other projects could use our LoadBalancer and IDatabase abstractions [22:39:11] I'm looking through the doctrine migrations source, it is kind of hard to pin down where the substantial bits are [22:39:12] anomie: do you anticipate to be doing most of the work on this? [22:39:35] what'S your prefference? investigate doctine first, or go straight for home grown? [22:39:56] duesen: On this RFC? With Option 1 I was planning on it. With "trying Doctrine" it'd be nice if someone else would look at that part. [22:40:42] So if there is nobody volunteering to do that, it would be option 1 per default [22:41:21] I'm willing to put in some effort as well to help move the RFC forwards. I'd appreciate guidance on what, though, so that I don't step on any toes [22:41:32] but maybe it is possible to have our own file format, and to generate a doctrine Version from that, after mapping types etc., and then to use Doctrine's MigrationPlanCalculator to give us the necessary ALTER queries [22:42:10] we only have about 20 minutes left. Please tag things as #info if you think it is important they get into the minutes [22:42:12] but you see how, considering the b/c restrictions, the advantages of using doctrine get smaller and smaller [22:43:13] TimStarling: which b/c restrictions do you mean? [22:43:51] #info if we only want doctrine for migrations, we can just wrap a connection from the existing LoadBalancer with a doctrine interface [22:44:12] #info it is possible to have our own file format, and to generate a doctrine Version from that, after mapping types etc., and then to use Doctrine's MigrationPlanCalculator to give us the necessary ALTER queries [22:44:40] #info but you see how, considering the b/c restrictions, the advantages of using doctrine get smaller and smalle [22:45:03] we have been talking about the pros and cons of using doctrine. are there any other questions we should be discussing? [22:45:08] b/c as in legacy DBMS-dependent index name mapping, DBMS-dependent table name mapping [22:45:31] legacy DBMS-dependent type mapping and random arbitrary differences in types between DBMSs [22:45:58] so basically, the fact that we have four different db schemas to begin with [22:46:14] OTOH, the random arbitrary differences between DBMSs is going to be a problem no matter what. :( [22:46:18] the type mapping is quite important, you can't just arbitrarily change the type of a column in most cases without dropping and re-creating the database [22:46:26] er, *table [22:47:55] #info the fact that MW has effectively different schemas for each DBMS is going to be probl,ematic, no matter what [22:48:03] anomie: I'm starting to think this may be a show stopper [22:48:23] is it even possibel to work around that? won't we end up with writing updates multiple times again, once for each schema? [22:49:07] if there is no way to map between the schemas programmatically, i don't see how the abstraction layer will help us [22:49:08] duesen: My gut feeling is that it'll be the hardest part, yes. The plan would be to transition each one to a schema derived from the one standard, somehow. [22:49:34] there is T164898 proposing to reduce inconsistencies with postgresql, but it doesn't completely eliminate them [22:49:35] T164898: PostgreSQL schema change for consistency with MySQL - https://phabricator.wikimedia.org/T164898 [22:49:43] I'm afraid that "somehow" is going to be the crux of the matter [22:50:05] #info The plan would be to transition each one to a schema derived from the one standard, somehow. [22:50:15] *** 10 minutes remaining **** [22:50:26] #info there is T164898 proposing to reduce inconsistencies with postgresql, but it doesn't completely eliminate them [22:50:32] #info "somehow", in the worst case, being some sort of "dump, reinstall, and import" [22:51:54] so I think the summary is that we want the doctrine option to be more fully explored [22:52:08] we at least want to have a rationale for not using it [22:52:15] for MSSQL at least, given that it's been broken for quite some time already (not sure the most recent version that works with that dbms), I think it may be worthwhile to examine the option of ignoring b/c and starting fresh [22:52:33] anomie: that would lose all user accounts... [22:52:46] not that sort of dump [22:53:27] #info so I think the summary is that we want the doctrine option to be more fully explored. we at least want to have a rationale for not using it [22:53:31] duesen: Not the existing XML-format dump. Maybe not even a dump at all, just a maintenance script that copies old tables to new ones (much like sqlite does already for almost all schema changes) [22:54:06] anomie: one last manual migration to a compatible schema? [22:54:18] that dopesn't sound too terrible, actually [22:54:23] duesen: Exactly. [22:54:47] if we have a strong rationale for doing our own thing, then we can go ahead and discuss the details [22:55:03] if you all need the latest version if mediawiki that’s sending pingback and using MSSQL, that should be easily found from the EventLogging data [22:55:12] #info would need one last manual migration to a schema schared by all DBMS [22:55:29] TimStarling: Being honest, if I'm the one looking at Doctrine I'll probably be biased towards finding it insufficient. That's why I'd prefer someone else do that part. [22:55:41] *** 5 minutes remaining **** [22:56:00] anyone want to volunteer to explore the doctrine option fully? [22:56:01] #info Skizzerz> for MSSQL at least, given that it's been broken for quite some time already (not sure the most recent version that works with that dbms), I think it may be worthwhile to examine the option of ignoring b/c and starting fresh [22:56:51] anomie: so, that manual migration would need to happen for pastgres. sqlite is already autmatically derived from mysql. and mssql and oracle can be dropped from core and wait for volunteer commitment. [22:56:54] is that right? [22:57:30] duesen: That sounds like a decent summary to me. [22:58:18] sqlite is automatically derived from mysql, although maybe not in the same way that doctrine would derive it [23:00:10] #info that manual migration would need to happen for pastgres. sqlite is already autmatically derived from mysql. and mssql and oracle can be dropped from core and wait for volunteer commitment. [23:00:40] and....we....are....out of time! thanks all! [23:00:56] RFC remains under discussion I guess [23:01:44] The one pressing question: Who's going to investigate the feasibility of using Doctrine? [23:01:57] anomie: ask on wikitech-l? [23:02:38] Does TechCom want to handle the asking? It'd be more official-sounding that way. [23:02:45] I suppose this could go on last call soon-ish. But it shoudl have some commitment from management to actually make room for the work. [23:02:53] #info The one pressing question: Who's going to investigate the feasibility of using Doctrine? [23:03:12] anomie: we can do that, but i'm not convinced it will work better :p [23:03:23] duesen is that Corey or another team? [23:03:35] KateChapman: can you include the question in the radar blurp about this meeting? [23:03:38] duesen: I'm sure it won't work worse ;) [23:03:43] Corey, I suppose [23:04:04] okay, I'm going to close this meeting on that note. I'll take on talking to Corey [23:04:18] #endmeeting [23:04:18] Meeting ended Wed Mar 6 23:04:18 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [23:04:18] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-03-06-22.02.html [23:04:19] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-03-06-22.02.txt [23:04:19] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-03-06-22.02.wiki [23:04:19] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2019/wikimedia-office.2019-03-06-22.02.log.html