[00:14:37] 10DBA, 10SRE, 10Epic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Decide how to improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10aaron) Playing around with ` mwscript shell.php aawiki ` ...I noticed that SHOW SLAVE STATUS is empty in... [00:26:23] 10DBA, 10SRE, 10Epic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Decide how to improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10aaron) Ideally the SqlBagOStuff hashing would use HashRing, though any naive transition would involve a lo... [05:39:38] 10Blocked-on-schema-change, 10DBA: Drop default of oldimage.oi_timestamp - https://phabricator.wikimedia.org/T272511 (10Marostegui) [05:39:49] 10Blocked-on-schema-change, 10DBA: Drop default of oldimage.oi_timestamp - https://phabricator.wikimedia.org/T272511 (10Marostegui) 05Open→03Resolved All done [05:42:37] 10DBA, 10SRE, 10Epic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Decide how to improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10Marostegui) >>! In T133523#6884890, @aaron wrote: > Playing around with > ` > mwscript shell.php aawiki >... [05:53:44] 10DBA, 10Patch-For-Review, 10Performance-Team (Radar): Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) 05Open→03Resolved a:03Marostegui Thanks @Kormat for getting this last bit done. @Krinkle I believe everything is now in place. Both masters are writable and repli... [06:52:08] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2146 recloned and replication started. Corruption showed up again, which is not entirely unexpected as I copied it from db2145, which was cloned from db2116 which also showed corrup... [09:01:40] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2145 now replicating [09:14:01] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) And after a restart....corruption [09:25:06] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2116 is now replicating [10:30:19] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2146 is now replicating (cloned from a backup snapshot) [10:51:02] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2145 is now replicating [10:54:25] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) I am running a check table across all the tables on db2092, db2116, db2145 and db2146 [11:14:43] jynus: do you recall, from memory, how long it takes to restore s1 logical dump? I might want to test something with it [11:29:28] 10DBA, 10Patch-For-Review: Evaluate the impact of changing innodb_change_buffering to inserts - https://phabricator.wikimedia.org/T263443 (10Marostegui) I have set: `innodb_change_buffering = none` on pc1007 and db1134 temporarily to check their performance. [11:48:46] s1 cannot say [11:49:17] I can try to search how much it took last time [11:50:17] but is it like 2-3 days? or like 12h? [11:50:24] I don't even remember the order of magnitudes :) [11:50:44] 12h, but for commons would be more in the days [11:50:54] it depends on the table format [11:51:07] compressed takes like 2-3 times more [11:51:11] Cool, I might start it today and should be ready by monday then [11:51:27] try to move it first to the ssd [11:51:45] that way it won't break if a new one is generated [11:52:22] that's ongoing/ right? [11:52:28] yes, on dumps [11:52:37] yep, i will create ongoing/tmp and place it there [11:52:59] and for enwiki, use like 14 threads or so [11:53:07] excellent thanks [11:55:57] can I use db2102 (core_test) host and restore the dump there or are you using it for something else? [12:00:06] I am currently using for backup testing [12:00:10] *it [12:00:43] ok, I will take one of the new racked hosts then [12:00:50] We've got plenty :) [12:05:14] I can move the db, but it would take me some time [12:05:45] nah [12:05:50] no worries, I am taking db2145 [12:05:56] actually, related questions [12:06:22] I am thinking of setting up the final db on an ms-backup host [12:06:32] what final db? [12:06:53] the mediabackups db, with image backup metadata [12:07:10] I would use misc, but maybe it would take too many resources [12:07:24] disk space usage you mean? [12:07:27] any suggestion? [12:07:32] more like iops [12:08:05] let me see current usage [12:08:29] it'd be nice to have all the backup related databases together if that's possible from a resource point of view [12:08:36] yes [12:08:52] m1 has backup, dbbackups and it would be nice to have mediabackups db too [12:09:05] and bacula is there too, isn't it? [12:09:05] but mediabackups is way more resource-intensive [12:09:23] yes, sorry with backup I meant bacula9 [12:09:29] ah ok ok [12:09:44] so currently it has 42GB [12:10:00] but it will larger when we have full backup [12:10:05] like 200GB or so [12:10:36] disk wise I am not too worried, we've got pleeeenty of space on the master and on db1117 [12:10:46] but you said iops might be a concern? [12:10:53] yeah, it is iops taht worries me [12:10:56] also performance [12:11:06] db is one of the bottlenecks on backup speed [12:11:32] I will find metrics for you [12:12:28] sure [12:16:17] during some of the tests, we reached 4K IOPS: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2102&var-port=9104&from=1614071600306&to=1614111041717 [12:17:01] so not that we cannot achive that on m1, but maybe it would be too impacting? [12:17:04] you think that ms-host would be able to handle it? like, does it have better HW? [12:17:15] no, it will be worse hw [12:17:31] but 1) less latency, as it will be local for one of the workers [12:17:45] 2) it will not disturb other running processes on m1 [12:17:57] Yeah, I can see pros and cons for both approaches [12:18:10] other options would be to move m1 unrelated things to m2 or somewhere else [12:18:36] Or place it on m5 and move labswiki away for good [12:19:00] yes, I give you the problem, there are many solutions [12:19:04] like moving all backup* related things to m5 and move labswiki away [12:19:58] the first parametrized thing on the ongoing puppetization: https://gerrit.wikimedia.org/r/c/operations/puppet/+/668380 [12:20:15] is the db location, precisely as it is an open question [12:20:31] it is not a big issue now [12:20:43] so, how hard would be to place it on a ms-host and move it later? so you don't get blocked on this? [12:20:45] https://gerrit.wikimedia.org/r/c/operations/puppet/+/668380/3/modules/mediabackup/manifests/orchestrator.pp [12:21:04] ^preciselly because of that very easy [12:21:38] And the other backup related databases? from m1 to m5? [12:21:39] in fact, I am working with db2102 now because it won't impact existing work for now [12:21:55] I would assume bacula would be a pain, no? [12:21:57] not very hard, mostly time [12:22:09] but I wonder that if the main blocker for taht wouldn't be labswiki? [12:22:16] Yeah [12:22:23] We need to move it anyways, sooner rather than later [12:22:32] And once that is done, m5 is mostly empty [12:22:49] So maybe we can make m5 the backup misc section [12:22:49] we can also purchase an m6 for next year [12:22:53] (and use the proxy again!) [12:23:17] let's maybe talk monday [12:23:28] I think you are now aware of the question [12:23:46] we can discuss better there once we both give it a thought [12:23:49] sure, let's keep talking about it [12:23:49] yeah [12:24:09] I won't want to get blocked on labswiki :-) [12:24:47] :) [12:24:57] happily I think we have enough resources [12:25:09] it is just a question of distributing them inteligently [12:25:29] yes, I'd rather not buy m6 [12:25:36] We have plenty of resources on misc I think [12:25:39] yeah [12:26:08] the main problem is, while bacula or dbbackups being slower is a non issue [12:26:34] mediabackups, because of millions of operations, will need more dedication, at leasy for the initail run [12:26:47] yeah, let's give it a thought [13:25:14] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) I have started a check on db1134 [14:14:54] PROBLEM - MariaDB sustained replica lag on db1143 is CRITICAL: 25.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1143&var-port=9104 [14:17:00] RECOVERY - MariaDB sustained replica lag on db1143 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1143&var-port=9104 [14:17:21] downtime expiring? [14:17:41] ah, no it is the other check, I though it was -operations [15:55:04] Hi database team, I'm stopping mariadb on labsdb1012 to reimage it and rename it to clouddb1021, full procedure is at https://phabricator.wikimedia.org/T269211#6883946 [16:05:27] good luck! [18:40:55] marostegui: hey, is this file our current ParserCache tables? https://github.com/wikimedia/mediawiki/blob/master/maintenance/archives/patch-parsercache.sql [18:41:13] The file doesn't seem to be mentioned in core anywhere so I'm wondering if it's needed or not [20:44:11] Amir1: no, that's not the current schema [20:44:30] This is the current one: https://phabricator.wikimedia.org/P14647 [22:42:06] gracias