[01:05:19] 10DBA, 10MediaWiki-extensions-Linter: DBA review of Linter extension - https://phabricator.wikimedia.org/T148866#2743664 (10Legoktm) >>! In T148866#2735836, @Volans wrote: > @Legoktm, here a few personal questions/comments, not a DBA review :wink: > > # All the queries against this table will be performed b... [01:10:13] 10DBA, 10MediaWiki-extensions-Linter: DBA review of Linter extension - https://phabricator.wikimedia.org/T148866#2743670 (10Legoktm) >>! In T148866#2737877, @jcrespo wrote: > should the foreign key here a page or a revision Missed this part. It should be a page because it isn't revision. More clearly, the lin... [05:42:32] 10DBA: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079#2743863 (10Marostegui) db2058 finished correctly ``` Table: templatelinks Create Table: CREATE TABLE `templatelinks` ( `tl_from` int(8) unsigned NOT NULL DEFAULT '0', `tl_namespace` int(11) NOT NULL... [05:44:51] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2743864 (10Marostegui) Another compression for a bunch of big tables just finished: ``` root@dbstore2001:/srv/sqldata/enwiki# ls -lh externallinks.ibd templatelinks.ibd text.ibd -rw-rw---- 1 mysql mysql 22G Oc... [07:08:26] jynus: I have good news regarding multisource + gtid :) [07:08:47] jynus: I need to do a few more tests to make sure it will actually work in our scenario [07:09:02] I will update the ticket with the findings :) [07:09:40] ok [07:15:20] 10DBA: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079#2744015 (10Marostegui) db2051 is now running the ALTER [07:18:33] 10DBA, 10MediaWiki-extensions-Linter: DBA review of Linter extension - https://phabricator.wikimedia.org/T148866#2744016 (10jcrespo) > I just added an autoincrement primary key If you are going to write and delete multiple rows everytime things are parsed, an autoincrement will likely overflow, and not sure h... [08:48:41] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Implement a frontend failover solution for labsdb replicas - https://phabricator.wikimedia.org/T141097#2744088 (10jcrespo) [08:48:43] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Implement proxysql both for labs and for later production usage - https://phabricator.wikimedia.org/T148500#2744086 (10jcrespo) 05Open>03Resolved This technically works, the only things missing is having multiple proxysql instances per server an... [09:01:27] 10DBA, 06Labs, 10Labs-Infrastructure: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2744093 (10jcrespo) [09:01:38] ^ [09:04:57] :) [09:05:58] get a cup of coffee, I am about to update the ticket now, with a long text [09:06:01] :p [09:10:17] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2744128 (10Marostegui) I have started the tests from scratch in a completely new vm. (Conclusions at the end of the post) This is what I did: Started the three instances with a gtid_domain_id from the start (... [09:11:09] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search (Current work), and 2 others: MySQL chooses poor query plan for link counting query - https://phabricator.wikimedia.org/T143932#2744143 (10jcrespo) @EBernhardson Did you check if this worked? If yes, we should move it back from vslow to general traffic... [09:16:16] 10DBA, 06Labs, 10Labs-Infrastructure: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2744167 (10Marostegui) From my side - I am fine with the hostnames. [09:16:52] well, it is just that we need to discuss with labs the details [09:17:19] with my findings about proxysql and the actual organization/seamless failover [09:17:22] Sure, we can do that in the meeting about the proxy that you want to organize for when chase is back [09:17:27] exactly [09:42:31] https://i.imgflip.com/1czys5.jpg [09:43:00] I know I know [09:43:02] I am guilty [09:43:04] XD [09:43:14] I will clean them up XD [09:43:29] I will be doing T139090 [09:43:30] T139090: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090 [09:43:52] Your fav ticket :) [09:43:52] I already put it on Deployments and will log it when it starts [09:44:07] \o/ [09:48:44] say what you want, but those patches on a regular mediawiki installation will not work [09:49:14] this will fail on any installation: https://gerrit.wikimedia.org/r/#/c/190774/3/maintenance/archives/patch-il_from_namespace.sql [09:49:46] why? [09:50:05] it is overwriting a migration, you never do that [09:50:30] you just create a new migration dropping the index and creating another [09:50:51] maybe that was never realeased [09:50:54] I do not know [09:58:38] imagelinks is 10 times smaller tham the other altered tables, it should* be fast [10:20:14] This panel is really nice to know how long an alter table took: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=37&fullscreen&var-dc=codfw%20prometheus%2Fops&var-server=db2042&from=1477475529338&to=1477476827479 [10:20:48] oh that is nice [10:21:05] I was actually looking for something (in tendril) to see past queries to see how long db2058's alter took [10:21:15] It is clearer if you clikc on the legend "alter table" [10:21:25] *altering table [10:21:44] long running queries could be another way [10:22:50] 20 minutes per host- ~100 hosts, around 2 days? [10:23:04] probably it will not scale linearly for s3 [10:24:04] but we could create a dashboard for mysql ongoing changes [10:24:37] BTW, I just created https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag [10:25:21] es and pc hosts, despite doing cross-dc replication never go over 1 second of lag [10:25:44] which confirms it is not a technology problem, only an application problem [10:29:09] Ah I see [10:29:20] We had that kind of things on the graphs [10:29:23] in Tuenti I believe [10:29:38] and it was matching the time when the master was flushing logs and the slaves getting that [10:29:44] Just for the record [10:29:52] They were actually never showing 1 second lag [10:29:58] or we couldn't capture it :) [10:32:45] github showed off how they could do schema changes and control lag with 0.1 seconds accuracy [10:41:35] 10DBA, 06Labs, 10Labs-Infrastructure: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2744292 (10jcrespo) a:03jcrespo [10:48:12] https://grafana.wikimedia.org/dashboard/db/mysql?var-dc=codfw%20prometheus%2Fops&var-server=db2058&from=1477389599998&to=1477475999997 [10:48:25] Interesting there the ALTEr doesn't appear on the graph if I click the blue line [10:48:28] but the tmp tables do [10:51:23] ah [10:51:55] that is because it is a processlist monitoring, so it changes the state from alter to creating tmp table [10:52:03] it is too granular [10:52:10] Aaaaaah [10:52:16] Right right [10:52:33] we will create some graphs for queries [10:52:53] but it has to be private, as you saw when we wanted to enable that kind of monitoring [10:55:11] marostegui, https://tendril.wikimedia.org/report/slow_queries_checksum?checksum=f75f8bd7f8201bc9d69f8782ce7d6813&host=^db2058&user=root&schema=&hours=48 should work for now [10:57:02] Ah I see [10:57:13] over 13 hours [10:57:16] However, that is strange, as it took more than 8 hours to run [11:10:19] It is interesting to see how such an alter increases response time from 30ms to 6 seconds; and disk IO read latency from 52 ms to 2 seconds - it is really taxing [11:57:07] hey [11:58:19] https://phabricator.wikimedia.org/T148822 [11:58:50] if the FCGI limit doesn't seem to work, and the mysqli interface doesn't offer timeouts as suggested, then it doesn't really sound like something other teams can really help with [12:00:58] so could we do something at infrastructure side? [12:01:36] do you have any suggestion on how to proceed, or just let it go (because mysql limits are already enforced)? [12:02:24] i think we should test why the fcgi limit doesn't work, within Ops I guess [12:02:30] ok [12:02:41] that's more of an ops issue than a dev one [12:02:58] that can be done on a lower priority task [12:03:28] I will open a new one with more clear goals [12:03:45] and close this one with is query specific [12:03:50] *which is [12:04:48] yeah [12:04:53] let's discuss it on monday [12:04:56] sure [12:07:18] <_joe_> @dbas: I am decommissioning mw1152 and found that there are some grants defined for it, while they're missing for wasat [12:07:21] <_joe_> https://gerrit.wikimedia.org/r/#/c/318077/1/templates/mariadb/grants-wikitech.sql.erb [12:07:40] <_joe_> is that enough or should I also perform the new grants/remove the old ones manually? [12:08:15] sorry, please remind me what wasat is? [12:08:22] <_joe_> terbium in codfw [12:09:03] please deploy the change as is [12:09:23] <_joe_> ok! [12:09:43] I will do a general check the actual grants on the server [12:10:03] <_joe_> thanks [12:12:26] 10DBA: Check wikitech has the right db grants - https://phabricator.wikimedia.org/T149186#2744605 (10jcrespo) [12:13:50] _joe_, one thing [12:14:09] <_joe_> yes? [12:14:11] I think access is already granted on the firewall for those server [12:14:25] <_joe_> It's done in another point of the patch [12:14:34] yes, what I mean is [12:14:40] that may not be necessary [12:14:58] I will check that, too [12:15:10] you can deploy that anyway, it will not be [12:15:15] harmful [12:15:30] just redundant [12:15:33] <_joe_> yep, if it's redundant, we can just yank that out [12:15:45] we can do that at a later patch [12:15:52] so I do not block the rest [12:22:16] I am glad we got rid of tokudb for the user_groups tables, mariadb has 0 interested in the bug we opened, they assigned it to someone who's not even replied [12:22:27] very dissapointng [12:26:14] "MariaDB Foundation followed you" just now on my personal twitter [12:27:13] what did you do XDD [12:38:57] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2744633 (10Marostegui) I am running this on dbstore2002: ``` ./software/dbtools/osc_host.sh --host=dbstore2002.codfw.wmnet --port=3306 --db=dewiki --table=revi sion --method=ddl --no-replicate "DROP index rev_id, DROP PRIMARY KEY, A... [13:01:50] jynus: I'm ready for a chat whenever you are! [13:01:57] addshore, nice [13:02:05] I just returned from lunch [13:02:13] perfect! [13:02:14] let me open the task [13:03:19] so the main blocker is "is this table private" which you already answered [13:03:32] I have some questions about foreign keys [13:03:58] I will go in order, it will be easier [13:04:06] okay!¬ [13:04:22] cgti_site nothing to object, now that I understand you only want to create one of this [13:04:34] (I thought first you wanted to create one per wiki) [13:04:52] nope, just 1 table for all wiktionaries, and wiktionaries are also the only usecase we have! [13:04:59] before we continue [13:05:02] where? [13:05:15] where? [13:05:40] yes, as in, which set of servers you think at first this table should be in [13:05:52] because you will not be able to join it with all wikis [13:06:06] In theroy it can be anywhere! No joins are needed. [13:06:16] I have a suggestion, but maybe you had a first idea [13:06:27] normally, shared content between wikis [13:06:32] goes to x1 shard [13:06:47] yeh, that is I think where I was thinking (but of course it's up to you) [13:06:48] which I think it is where wikishared is [13:07:02] does that make sense? [13:07:04] yup [13:07:26] it is just that you will have to change sligtly the code or the config [13:07:41] not sure how the shard is controled right now [13:07:44] Yup, the cluster & db are already configurable in the extension (I saw this coming) [13:07:49] ok, good [13:08:11] now this is the part I do not fully understand [13:08:23] namespace and title [13:08:27] and key [13:08:46] I probably need to understand what these refer to [13:08:57] okay, so namespace and title are exactly namespace & dbkey title as in the regular page table [13:09:01] is this really refering a title or a page or soemthing else? [13:09:08] ok [13:09:10] lets stop here [13:09:14] the question is [13:09:19] why not page_id? [13:09:57] because if the page is renamed, maybe it will go wrong? [13:09:58] well, page_id would require joins for all queries (then this table would need to be able to be joined against all page tables) [13:10:17] the table will get updated on page moves, deletes, creates etc. [13:10:26] so I do not care as much about size [13:10:34] but for consistency [13:11:24] I think in theroy having page_id instead of the ns & title should be fine [13:11:30] we can go back to that later [13:11:36] as long as it can be somewhere that the joins are possible. [13:11:39] the other question is [13:11:54] what is cgti_key, is it an arbitrary key? [13:12:23] en, 0, Berlin, Berlin / fr, 0, Berlin, Berlin is too small of a sample to really undertand [13:12:39] can you give an example where those differ? [13:12:45] so, it is a slightly normalized version of the title, for example normalizing various apostrophes, let me come up with a better example, 2 secs [13:13:19] yes, think about the title "house" [13:13:31] So "Foo’bar" would be normalized to "Foo'bar" [13:13:34] fr:maison es:casa [13:13:53] or is it a term including a language? [13:14:19] e.g. en:house / fr:house ? [13:14:22] So, the point of the extension is to link pages on different wiktionaries that have the same title, as in literally the same title. [13:14:30] so yes, en:house -> fr:house [13:14:34] ok [13:14:43] now I am getting it a bit better [13:14:59] my sugggestion would be [13:15:04] to put there and id [13:15:06] So on some sites they use different quotes, and other chars, hence the normalization step. [13:15:09] and a separate table [13:15:16] with id normalized_term [13:15:24] then you can join [13:15:31] hmm, okay, yup, makes sense [13:15:32] but the same term is not repeated 27 times [13:15:41] note we are using here [13:15:43] house [13:15:57] but beware of "Supercalifagilisticexpialidosus" [13:16:00] :-) [13:16:10] yup, I like this idea. [13:16:27] if later you want to add more metadata [13:16:34] it is easier on a separate table [13:16:52] plus it makes things smaller-> easier to make other changes easier, too [13:17:09] back to the title [13:17:16] if the second is normalized [13:17:28] I do not have much problems with the title not being [13:17:32] if that helps [13:17:38] In your opinion would it also be a great idea to reference the site by id and then have a seperate table containing the site data needed by the extension? [13:17:40] less connections, less queries [13:17:54] addshore, if it was on core, yes [13:18:06] but being an extension, and the id being so small [13:18:11] I think it is not worth it [13:18:14] okay [13:18:26] I can accept the denormalization of site and title [13:18:37] How about the second version of the schema that includes cgti_site and cgti_interwiki ? [13:18:44] let me see [13:19:03] In this case _site would be say "enwiktionary" so the dbname, and _interwiki would be "en" [13:19:09] assuming this is only for wiktionaries [13:19:12] yup [13:19:17] I do not think that is needed [13:19:45] Yeh, so currently the code has a nasty assumption in it, but it should work for all wiktionary cases so far. [13:19:58] offtopic, but why does interwikis/wikidata doesn't work here? [13:20:04] that assumption is that $wgLanguageCode + $wgCognateGroup = $dbname [13:20:38] yes, but I do not think that is a horrible assumption, and could be fixed later [13:20:44] if you wanted the second option [13:20:46] So, because it is a different type of data, and the pages that should be linked are actually the same words, using wikibase/wikidata is a bit overkill [13:20:53] I would definitely normalized [13:21:04] site_id [13:21:07] etc. [13:21:31] another supposition [13:21:35] would be [13:21:41] okay! I just have to go through the current wiktionaries and ensure that assumption actuall holds true so far! :) [13:21:42] if this is only for terms [13:21:53] not for discussions, etc. [13:22:02] why put the namespace in the first place? [13:22:17] are there content outside of the main namespace on such wikis? [13:22:17] So, there are a couple of other namespaces that we want to cover [13:22:21] not discussions though [13:22:23] ok [13:22:37] I am not a wiktionary guy myself [13:22:44] *too much of a [13:23:04] so here is a couple of general recommendations [13:23:20] keep the sizes per row low - normalize where it makes sense [13:23:38] and keep the PK low in size [13:24:16] maybe the PK is not right in this case? [13:24:51] cgti_site (normalized or not), cgti_key (normalized) may be a better option? [13:25:17] That is not 100% unique though :/ [13:25:19] then you can create a secondary key with site, namespace and title [13:25:38] but the keys are arbitrary! [13:25:53] you create those- can create a different one per namespace [13:26:07] think: en 0 house [13:26:13] the keys are not arbitrary! [13:26:16] and en 301 house [13:26:26] key is 1234567 [13:26:35] nooo, in that case key is "house" [13:26:38] 1234567 0 house [13:26:53] 1234568 301 house [13:27:13] addshore, you just said separating the key was a good idea! [13:27:17] so If a site has 2 articles "Foo…" and "Foo..." they will both be normalised to a key of "Foo..." [13:27:42] yes, which will have an internal id 1234235234 [13:27:57] and you will put that on the original table [13:28:09] when searching for Foo... [13:28:20] Oooh, true, I was thinking both cases would refer to the same row in the normalized table! [13:28:24] you will get a key and join with the table, giving you several articles [13:28:43] the relationship is up to you [13:28:56] I would suggest to create another draft? [13:29:05] so it doesn't get too complex? [13:29:12] Okay, I'll work on what has been discussed over the coming days and get back to you (probably next week) [13:29:18] no rush [13:29:30] hope that is helpful and makes sense [13:29:35] Yup! :) [13:29:39] and not creates too much overhead on you [13:29:56] I can put a summary if you need it on the ticket [13:30:00] Nope, I have a flight on Monday and will likely look at it all then! [13:30:05] Yes, a summary in the ticket would be great! [13:30:12] great, doing now [13:43:43] 10DBA, 10Cognate, 10Wikidata, 15User-Addshore: Initial Cognate DB review - https://phabricator.wikimedia.org/T148988#2744793 (10jcrespo) So, with the feedback @addshore gave me on IRC, I would suggest: Converting, in the first case: ``` CREATE TABLE IF NOT EXISTS /*_*/cognate_titles ( cgti_site VARBINA... [13:48:19] 10DBA, 10Cognate, 10Wikidata, 15User-Addshore: Initial Cognate DB review - https://phabricator.wikimedia.org/T148988#2744802 (10jcrespo) p:05Triage>03Normal a:03jcrespo I will put this in progress, but I will be waiting for further feedback. [13:51:26] 10DBA: Check wikitech has the right db grants - https://phabricator.wikimedia.org/T149186#2744826 (10jcrespo) a:03jcrespo [14:32:41] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2744973 (10Marostegui) I have finished importing s4 commonswiki and it is now replicating fine. So there is now s1, s3 and s4 replicating. Obviously not using GTID ``` root@dbstore2001:/srv# mysql --skip-ssl -... [15:36:27] 10DBA, 10MediaWiki-Special-pages, 13Patch-For-Review: Filtering Special:NewPages by page size - https://phabricator.wikimedia.org/T12817#2745183 (10jcrespo) I've tested this query on all shards, several wikis, with special attention to all large wikis (enwiki, dewiki, commons, wikidata, frwiki, zhwiki) both... [15:42:34] 10DBA, 10MediaWiki-API, 07Performance: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077#2745199 (10Anomie) I wouldn't bother trying to make Database:select somehow magically do it, I'd just build the subqueries with Database::selectSQLT... [15:56:12] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search (Current work), and 2 others: CirrusSearch SQL query for locating pages for reindex performs poorly - https://phabricator.wikimedia.org/T147957#2710120 (10Anomie) FWIW, while EXPLAIN shows the sane plan, SHOW EXPLAIN on the actual running query doesn't... [16:01:58] 10DBA, 06Labs: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617#2745255 (10Dispenser) Several short comings compared to the previous implementation # No active watcher count ([[https://www.mediawiki.org/wiki/API:Info|visitingwatchers]], i.e.... [16:16:48] 10DBA, 06Operations, 10ops-codfw: db2011 disk media errors - https://phabricator.wikimedia.org/T149099#2745287 (10Papaul) @RobH I think we are okay on using the disks from the decommissioned es servers. Please see below for disk information Dell ST3600057SS 3.5: SAS 15K [16:17:59] 10DBA, 06Operations, 10ops-codfw: db2011 disk media errors - https://phabricator.wikimedia.org/T149099#2745290 (10RobH) I agree, the 15k and the larger size typically means they can replace smaller capacity disks without issues. Since they are larger, they'll likely be re-added to the raid array and only ma... [16:29:30] 10DBA, 06Operations, 10ops-codfw: db2011 disk media errors - https://phabricator.wikimedia.org/T149099#2745352 (10jcrespo) I am cool with this, this worked last time we tried. [16:54:12] 10DBA, 06Operations, 10ops-codfw: db2011 disk media errors - https://phabricator.wikimedia.org/T149099#2745447 (10Papaul) Disk replacement complete [16:59:44] 10DBA: Check wikitech has the right db grants - https://phabricator.wikimedia.org/T149186#2744605 (10Krenair) Even if the grant has been updated, looks like there's some firewall issues. [17:02:34] 10DBA: Check wikitech has the right db grants - https://phabricator.wikimedia.org/T149186#2745460 (10jcrespo) The grants have not been updated yet (probably). [17:03:08] 10DBA, 06Operations, 10ops-codfw: db2011 disk media errors - https://phabricator.wikimedia.org/T149099#2745461 (10jcrespo) 05stalled>03Open a:03jcrespo The disks are unconfigured, they need to be put into the RAID still: ``` Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45c... [17:06:32] 10DBA, 06Operations, 10ops-codfw: db2011 disk media errors - https://phabricator.wikimedia.org/T149099#2745478 (10jcrespo) p:05Triage>03Normal [17:45:17] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2745635 (10Marostegui) dbstore2002 is done ``` MariaDB DBSTORE localhost dewiki > show create table revision; +----------+---------------------------------------------------------------------------------------------------------- | T... [21:05:27] tendril is public now? [21:07:36] no it is not [21:45:26] 10DBA, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 07WorkType-NewFunctionality: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#2746607 (10Krinkle)