[15:00:42] == Language Engineering Office Hour - September 2015 == [15:00:46] We are starting now [15:00:52] * arrbee is not sure if meetbot is active [15:00:58] We are hosting an online session today [15:01:52] Please follow here: https://plus.google.com/events/c1c0msurhua7enopsu3q8l42j3s [15:02:01] and ping me for the hangout url [15:02:13] We will be monitoring the IRC channel too [15:02:20] So please ask here as well [15:02:58] hello [15:03:34] evening [16:06:13] == Language Engineering office hour has ended == [16:06:22] We will post the notes on metawiki [17:00:32] aharoni: is TwnMainPage repo abandoned? [17:00:54] No. [17:01:19] ok, I am updating rubygems in repos, just checking [17:01:30] https://phabricator.wikimedia.org/T112748 [17:01:33] The browser tests haven't run for a while, but the extension is being maintained.. [17:02:13] well, it is a minute to update, so while I am there... [17:02:13] I will just update all of them [17:02:53] aharoni: github says there isn't a lot of activity https://github.com/wikimedia/mediawiki-extensions-TwnMainPage/pulse/monthly [17:02:57] that is why I ask [17:05:15] aharoni: also, https://www.mediawiki.org/wiki/Extension:TwnMainPage redirects to https://www.mediawiki.org/wiki/Translatewiki.net [17:10:42] and I could not find any mention of it here https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php [17:10:49] oh, aharoni is gone :) [17:10:52] nevermind then [18:32:18] Hey folks, if you are looking for the Research Showcase, see #wikimedia-research [18:32:23] The stream is at https://www.youtube.com/watch?v=eJk6mxJZhH8 [18:41:05] joining the stream [21:00:36] #startmeeting RFC meeting [21:00:36] Meeting started Wed Sep 16 21:00:36 2015 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:00:36] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:00:36] The meeting name has been set to 'rfc_meeting' [21:01:03] #topic Increase the strictness of mediawiki SQL code and leverage database code blockers for scalability | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:01:28] o/ jynus [21:01:46] hi there [21:01:58] o/ [21:02:03] hello [21:02:36] jynus: thanks for putting together a great writeup so quickly! I hope this didn't catch you (or anyone else) off guard [21:02:55] #link https://phabricator.wikimedia.org/T112637 [21:03:35] hey [21:03:42] Hello everyone [21:04:33] so, some items to discuss [21:04:49] 1) the migration period -- one year seems excessively long to me; I think we can (and should) be more aggressive. [21:05:35] Agree, 3-6 months should be enough. [21:05:36] the GROUP BY case was already handled for PG [21:05:44] lke, 10 years ago [21:05:45] agree, "till it shines in CI/vagrant" is enough [21:06:07] 2) how do we identify existing violations? can we start enforcing these constraints in beta? [21:06:10] domas, it happened again [21:06:17] domas, I <3 mysql's group by so muuuuch:P [21:06:25] PG is pain in the ass [21:06:29] inorite [21:06:34] (PG?) [21:06:39] postgres [21:06:46] oh [21:06:50] core is in good shape [21:06:59] vacuum for the win :-) [21:06:59] some wmf-only extensions are not [21:07:02] so the proposal is to make MediaWiki compatible with the TRADITIONAL mode [21:07:09] having mediawiki in strict'able form is good [21:07:23] jynus: which? [21:07:34] there are examples on the RFC [21:07:48] last one I can recall is centralauth [21:08:16] We fixed CentralAuth didn't we? [21:08:19] anomie -- are you aware of this, and do you have a sense of how much work it would be to fix? [21:08:25] At least, the Special:GlobalUsers one [21:08:36] yes, want everyone's compromise to keep doing it :-) [21:08:48] I'm not sure I entirely buy the arguments for not setting the mode in the session [21:09:05] ori: What am I being aware of? [21:09:36] CentralAuth being potentially incompatible with TRADITIONAL mode [21:09:41] we already have $wgSQLMode [21:09:53] it would just be adding a primary key right? [21:10:07] Where are these examples of extensions violating it? [21:10:22] I do not feel strongly about that part, TimStarling [21:10:51] $wgSQLMode is empty by default, which means by default, sql_mode is set to an empty string on connect [21:10:53] but setting it on session [21:11:06] means that extensions can choose to disable it? [21:11:12] ori: I don't know anything about CentralAuth being or not being compatible. [21:11:14] TimStarling: I used to remove that :) [21:11:30] plus 1 extra roundtrip on every connection? [21:11:32] Where are these examples of extensions violating it? <-- I can't find them either. [21:11:38] no, extensions shouldn't be setting that kind of configuration [21:11:47] no, it's not an extra roundtrip [21:12:02] there is a single SET statement done on connect, unconditionally [21:12:04] there's a list of tables at https://phabricator.wikimedia.org/T17441#1420166 [21:12:10] ori, Krenair I gave links of examples of issues found in the past [21:12:18] but that only includes those that are on enwiki [21:12:18] variables to set are concatenated, but there is always at least one at present [21:12:25] $set = array( 'group_concat_max_len = 262144' ); [21:12:30] if ( is_string( $wgSQLMode ) ) { [21:12:36] $set[] = 'sql_mode = ' . $this->addQuotes( $wgSQLMode ); [21:12:36] I did not create a report of extensions incompatible [21:12:41] okay, have we identified any remaining ones? [21:13:13] Krenair, it is not easy. sql mode is not something that limits static code, but dynamic queries [21:13:25] hence the time on beta to check it [21:13:32] It's fine if you haven't of course [21:13:35] #action Core is already in good shape; some extensions are not. We should create a report of incompatible extensions. [21:13:44] yeah, I think you can assign that to developers [21:14:13] my inclination would be to have an active migration period, where we review extensions and fix them [21:14:20] re: primary keys, there is a list on the linked ticket [21:14:26] followed by setting the default $wgSQLMode to TRADITIONAL [21:14:45] but this can and probably should be overridden in production to not set it, relying on the server configuration instead [21:15:01] my priority is to block new features from adding more issues [21:15:02] that way, developers will all be running TRADITIONAL [21:15:03] if RPC is there, it doesn't cost much to have it there [21:15:33] #info TimStarling in favor of setting the mode in the session. $wgSQLMode is empty by default, which means by default, sql_mode is set to an empty string on connect. We should have an active migration period, where we review extensions and fix them, followed by setting the default $wgSQLMode to TRADITIONAL. [21:15:52] if someone needs to run an old unmigrated extension they can always set $wgSQLMode = '' in their LocalSettings.php [21:16:12] what about settings in beta? [21:16:24] and vagrant [21:16:27] TimStarling: re: active migration period....I think that's a good thing, but we do need to figure out how to get this in WMF dev team's respective roadmaps [21:16:32] Heh. And vagrant... [21:16:37] #info I have no idea what this means. [21:16:44] /o\ [21:17:24] do we use mysql in jenkins now? or is it still sqlite? [21:17:37] domas: notetaking trolling! awesome...I leave it to you to innovate in the trolling dept :-) [21:17:47] Krinkle: ^ [21:18:00] TimStarling: mysql [21:18:05] domas: http://meetbot.debian.net/Manual.html#commands [21:18:20] so we can at least set it to TRADITIONAL in jenkins [21:18:32] #action Update MediaWiki-Vagrant, Beta Cluster and Jenkins to run with $wgSQLMode = 'TRADITIONAL'; [21:18:48] by the way, how does this make things more scalable? [21:18:51] ;-) [21:19:01] I hear "scalability' is the reason [21:19:37] domas: I saw a meme generator, and added scalability and leverage, which souded cool [21:19:37] Jenkins runs all tests on mysql now, and it's temporary scoped TMPDIR, as well as mysql data mounts are all in tmpfs RAM [21:19:50] (for performance reasons) [21:19:51] jynus: is 'TRADITIONAL' enough, or would 'ONLY_FULL_GROUP_BY' be needed as well? [21:19:56] both [21:20:19] #action Make that $wgSQLMode = 'TRADITIONAL, ONLY_FULL_GROUP_BY'; [21:21:06] can we clarify when we would allow to relax that? [21:21:20] jynus: re: my priority is to block new features from adding more issues -- I should think that having the RfC approved here, and then following up with a quick note to engineering saying violators will be reverted should do the trick. [21:21:34] or wikitech-l [21:21:40] Wikitech-l, please. [21:22:03] jynus: do you think we need to allow exemptions? [21:22:12] no, I would prefer not [21:22:21] that is why I am asking [21:22:24] so let's not allow any [21:23:49] we have a hard time finding good DBAs because the cognitive load of maintaining our databases is enormous. [21:23:49] so, to summarize [21:23:58] So... What is still left to deal with? [21:24:01] Is there a way to give these options to sql mode in a warning-only away? [21:24:13] PK on all tables (as soon as possible) [21:24:13] I'd like to avoid having to run the whole test suite twice just to vary on sql option [21:24:23] SQL strict [21:24:29] (since we want it non-voting at first) [21:24:39] there are a couple of other small issues [21:24:49] and I used should in RFC meaning [21:24:51] we can have a non-voting capture of the warning stream, that'd be more scalable for Ci [21:25:08] trying to avoid unsecure/undeterministic statements [21:25:13] PK on all tables (as soon as possible) <-- what do you need from developers to make that happen? [21:25:45] well, I need to be able to ban new features to create tables without pks [21:25:52] I did not feel backed about that [21:25:55] should have been done years ago [21:26:10] I can handle the transition myself, with time [21:26:16] is someone arguing against it? [21:26:46] isn't DBA's approval already required for schema changes? [21:26:47] let's say there are some developers pushing strongly schema changes I do not 100% agree with [21:26:51] #info New tables without PKs will not be tolerated from now on. [21:27:15] jynus, you -2 it and go on ;) [21:27:22] Indeed. [21:27:32] * robla agrees with MaxSem [21:27:40] that is exactly why I needed this :-) [21:27:59] so, back to the pending topics [21:28:08] jynus: I imagine some of this affects strictness on queries as well, not just schemas. Which are less-strongly reviewed, but should also ideally be enforced socially for new code. [21:28:20] recommend against undeterministic queries [21:28:25] jynus, are you willing to point out any particular changes you think are harmful? [21:28:38] because they are causing data corruption in labs [21:29:23] because I'm sure many people don't understand "nondeterministic queries", this needs to be documented well on mw.org [21:29:28] jynus: Hm.. replication copies the query, not the impact? I thought binlogs weren't in sql. Or is it different for slave repl and labs repl. (I'll defer outside the meeting, personal curiousity) [21:29:48] yes, hence the ticket to change to row-based replication [21:30:08] but until that, try to minimize the impact with current production configuration [21:30:38] I say here "should" and not "must", because this is the hardest one to detect and fix [21:31:01] but I know that it happens because the slave drift on the labs slaves [21:31:05] nondeterministic write queries? [21:31:26] (outlined at https://phabricator.wikimedia.org/T112637) [21:31:36] something like "UPDATE... limit without order" [21:31:41] we do not do that [21:31:49] but there are other things that we do [21:32:03] like insert ... select with auto_increment fields [21:32:09] well, this is a "must" recommendation for code review [21:32:11] resulting on different ids on some slaves [21:32:17] you are only worried about accidentally letting one through, right? [21:32:23] they are very subtle to detect [21:32:29] anyone who sees such a thing should give -2 [21:32:44] well, -1, I guess, if it is correctable [21:32:57] yes, sure, but the only real solution is row based replication [21:33:11] until I can do that, try to minimize impact [21:33:26] I can create a list of offenders to check [21:33:43] and some if them are in core [21:34:14] I think that is all [21:34:43] We should be able to make core warn/throw an exception on those, assuming they're not just completely raw SQL. [21:34:49] #info We need to be strict with queries, not just schemas. There are nondeterministic write queries running in production -- we know that this happens because now that it happens because we see slave drift on the labs slaves. They are not always easy to spot in code review. [21:34:54] what is the procedure in gerrit btw? do you want to be added as a reviewer on all database-related changes? [21:34:58] actually, sql thouws a warning for those [21:35:00] Assuming it currently supports that. . . [21:35:03] ok [21:35:08] we can detect them dynamicly [21:35:25] #action TimStarling: Anyone who sees such a thing [i.e., nondeterministic write queries] should give -1 / -2 (depending on whether the issue is correctable). [21:35:39] that's not an action [21:35:42] the throw "[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT" [21:35:51] TimStarling: yeah, my bad. [21:36:09] do we get these warnings and the bad query sent to the logs on fluorine? [21:36:12] but I think warnings may not be logged, and there may be too many of them? [21:36:16] did anyone respond to Krinkle's "Is there a way to give these options to sql mode in a warning-only way?" [21:36:19] * robla wants to know the answer to TimStarling's question about gerrit procedure [21:36:50] in general, thigs like truncations [21:36:56] produce warnings [21:36:58] too [21:37:05] only errors when in strict mode [21:37:14] so that is a way to check those [21:37:27] ok so someone needs to volunteer to figure out where the logs go and where they should go and how they can be routinely reviewed [21:37:33] jynus: Ok. so we'll need a way for jenkins to extract those warnings from wherever they're sent. We can add them to the assertion that already checks mediawiki exception and warning logs to be empty etc. [21:37:39] #link https://dev.mysql.com/doc/refman/5.0/en/sql-mode.html#sql-mode-strict [21:37:51] I think https://gerrit.wikimedia.org/r/#/c/198661/ is related? "[WIP] Optionally log warnings generated by MySQL" [21:38:01] check the code on the RFC for "Query OK, 1 row affected, 4 warnings" [21:38:14] Tricky bit is that Jenkins slaves share one local db instance. They have separate username/databases, but it's only one server, so no job-specific error log [21:38:23] legoktm: yes, looks like it [21:38:24] But if we can extract them from mediawiki in resposne to a query, that'd be cool [21:38:40] uploaded in march? [21:38:46] we suck [21:39:25] TimStarling: could you review it? :) [21:39:38] * ori itches for an #action. [21:39:52] I added jynus and myself as reviewers [21:40:12] #action TimStarling and jynus to review https://gerrit.wikimedia.org/r/#/c/198661/ (Optionally log warnings generated by MySQL). [21:40:18] \o/ [21:40:30] be careful [21:40:33] jynus: that's what confuses me. the first run in T112637 prints "Query OK, 1 row affected, 4 warnings (0.00 sec)", but no warnings appear. In addition to jenkins, we want a default developer configuration (in MediaWiki-vagrant and in the mw.org debugging page) that loudly prints these warnings [21:40:43] warnings are disabled on mysql's error log [21:40:58] because they may fill up the disks :-) [21:41:10] in the client command line [21:41:11] jynus: ok, so the change proposed should be what you want then [21:41:17] Do we generate a huge number of warnings? [21:41:27] I was going to ask whether we should do it via the server but that sounds like a no [21:41:37] the change proposed is to log via MediaWiki's debug log system [21:41:41] something like enabling it for a server first, check volumne [21:41:56] so we don't actually know what the volume is at the moment? [21:42:07] we can just enable it and see [21:42:09] *size, fix huge issues, before enabling it globally [21:42:21] MW logging supports sampling if its really bad [21:42:22] jynus: MediaWiki can turn log channels on / off and apply a sampling factor to reduce the rate [21:42:35] so it's pretty safe to try it and see [21:42:38] yeah we can manage noisy logs, we have plenty of experience with that [21:42:39] yes, just I am thinking as an op here, being careful [21:42:42] :-) [21:42:55] MediaWiki-Vagrant's config is different than production, though it's reused for some labs machines [21:42:55] spagewmf, command line client [21:43:07] does not show warnings by default [21:43:17] either you SHOW WARNINGS [21:43:28] or change mysql command line client config [21:43:39] I don't mind merging it, but I don't want it to be my job to push it through to production and check log volume and all that [21:43:45] connectors tend to do the same, not returning warnings by default [21:43:48] TimStarling: I can do that part [21:43:56] thanks ori [21:44:15] #action Once https://gerrit.wikimedia.org/r/#/c/198661/ is merged, Ori to deploy and monitor log volume. [21:44:30] does the beta cluster have a slave database server? [21:44:47] yes [21:45:40] but I know that it happens because the slave drift on the labs slaves [21:45:45] (btw the #action command should start with an IRC nickname parameter, meetbot organises action items by name) [21:45:49] do we have monitoring for that?) [21:45:53] yes, ori, there is a ticket [21:46:09] ack [21:46:27] https://phabricator.wikimedia.org/T111371 [21:46:42] labs slaves (sanitarium, actually) [21:46:51] got heavely desynced [21:47:14] ok, was there anything else to discuss? [21:47:21] #link https://phabricator.wikimedia.org/T111371 [21:48:05] we won't need a wrap-up period if we've covered everything already [21:48:15] I think it would be useful to pronounce this formally accepted, to put some muscle behind the enforcement [21:48:37] yes, I would like to add the accepted bits the the mediawiki documentation [21:48:55] jynus: hi, I am Technical Riter :) [21:49:06] :-) [21:49:30] #action spagewmf to update MediaWiki documentation [21:50:32] jynus: really great writeup and effort on T112637 man, thank you [21:50:34] ori: do you think that phab change is enough to make these warnings more visible on MediaWiki-Vagrant instances? [21:51:05] no, there are some additional tweaks can make to the configuration [21:51:11] (maybe it's premature if the volume of warnings is overwhelming) [21:51:25] the volume wouldn't be overwhelming for vagrant [21:51:41] chasemp++ [21:51:41] maybe jynus wants to give the task description another edit before I mark it accepted? [21:51:54] yes, I can do that [21:52:00] ori, deployment-db2 is the slave of deployment-db1 in beta, by the way [21:52:05] ori: great, ping me on the changes and I'll put them in the mw.org debugging page for people not using MW-Vagrant [21:52:07] "Stop fucking with my databases. --jynus" [21:52:10] basically it was changing the session thing [21:52:22] yeah [21:52:31] I think it'd be ok to trust jynus to get this stuff done after its approved, fwiw [21:52:40] ori Krenair : I think wmf-config/db-labs.php is the config [21:52:45] 3-6 moths we said? [21:52:47] yes [21:53:25] let's say no more than 6, and leave the option of making the timeline narrower if the number of issues is not unmanageable [21:53:48] will we need to schedule another of these meetings on this topic? if no, then I will move it to the approved column [21:53:58] (yes was to spagewmf) [21:54:16] I don't think we need to schedule another meeting on this unless jynus wants to clear some things up [21:54:34] no, just help with the review [21:54:47] ok [21:54:50] I think having a check-in meeting sometime in the future would be good, but not to debate the proposal, but evaluate progress. So not a blocker to approval. [21:55:18] (And perhaps these meetings aren't the right venue for check-ins, dunno.) [21:55:23] I look forward to wmgDBJynusNoF___Mode = true in wmf-config [21:55:25] ori++ [21:55:28] #agreed RFC approved, assuming decisions from this meeting will be incorporated into the phabricator task [21:55:47] I did a first edit [21:55:54] 5 minutes to review it :-) [21:56:07] I can add the log thingy, too [21:56:15] nice work jynus! I think this is the first RfC that was entirely done by a member of the TechOps team [21:56:25] so it's an important milestone [21:56:41] yup, excellent job jynus! [21:57:07] are T-shirts still a thing [21:57:18] \o [21:58:01] ha ha [21:58:07] #endmeeting [21:58:07] Meeting ended Wed Sep 16 21:58:07 2015 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [21:58:07] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-09-16-21.00.html [21:58:07] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-09-16-21.00.txt [21:58:07] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-09-16-21.00.wiki [21:58:08] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-09-16-21.00.log.html [21:58:37] I know this was a bit unusual, but the empowerment thing is something that I needed [21:59:18] and I can assure you it will revert back in avoiding security issues very soon