[00:14:37] <wikibugs>	 10DBA, 10SRE, 10Epic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Decide how to improve  parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10aaron) Playing around with  ` mwscript shell.php aawiki `  ...I noticed that SHOW SLAVE STATUS is empty in...
[00:26:23] <wikibugs>	 10DBA, 10SRE, 10Epic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Decide how to improve  parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10aaron) Ideally the SqlBagOStuff hashing would use HashRing, though any naive transition would involve a lo...
[05:39:38] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Drop default of oldimage.oi_timestamp - https://phabricator.wikimedia.org/T272511 (10Marostegui)
[05:39:49] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Drop default of oldimage.oi_timestamp - https://phabricator.wikimedia.org/T272511 (10Marostegui) 05Open→03Resolved All done
[05:42:37] <wikibugs>	 10DBA, 10SRE, 10Epic, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Decide how to improve  parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10Marostegui) >>! In T133523#6884890, @aaron wrote: > Playing around with  > ` > mwscript shell.php aawiki >...
[05:53:44] <wikibugs>	 10DBA, 10Patch-For-Review, 10Performance-Team (Radar): Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) 05Open→03Resolved a:03Marostegui Thanks @Kormat for getting this last bit done. @Krinkle I believe everything is now in place. Both masters are writable and repli...
[06:52:08] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2146 recloned and replication started. Corruption showed up again, which is not entirely unexpected as I copied it from db2145, which was cloned from db2116 which also showed corrup...
[09:01:40] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2145 now replicating
[09:14:01] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) And after a restart....corruption
[09:25:06] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2116 is now replicating
[10:30:19] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2146 is now replicating (cloned from a backup snapshot)
[10:51:02] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2145 is now replicating
[10:54:25] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) I am running a check table across all the tables on db2092, db2116, db2145 and db2146
[11:14:43] <marostegui>	 jynus: do you recall, from memory,  how long it takes to restore s1 logical dump? I might want to test something with it
[11:29:28] <wikibugs>	 10DBA, 10Patch-For-Review: Evaluate the impact of changing innodb_change_buffering to inserts - https://phabricator.wikimedia.org/T263443 (10Marostegui) I have set: `innodb_change_buffering = none` on pc1007 and db1134 temporarily to check their performance.
[11:48:46] <jynus>	 s1 cannot say
[11:49:17] <jynus>	 I can try to search how much it took last time
[11:50:17] <marostegui>	 but is it like 2-3 days? or like 12h?
[11:50:24] <marostegui>	 I don't even remember the order of magnitudes :)
[11:50:44] <jynus>	 12h, but for commons would be more in the days
[11:50:54] <jynus>	 it depends on the table format
[11:51:07] <jynus>	 compressed takes like 2-3 times more
[11:51:11] <marostegui>	 Cool, I might start it today and should be ready by monday then
[11:51:27] <jynus>	 try to move it first to the ssd
[11:51:45] <jynus>	 that way it won't break if a new one is generated
[11:52:22] <marostegui>	 that's ongoing/ right?
[11:52:28] <jynus>	 yes, on dumps
[11:52:37] <marostegui>	 yep, i will create ongoing/tmp and place it there
[11:52:59] <jynus>	 and for enwiki, use like 14 threads or so
[11:53:07] <marostegui>	 excellent thanks
[11:55:57] <marostegui>	 can I use db2102 (core_test) host and restore the dump there or are you using it for something else?
[12:00:06] <jynus>	 I am currently using for backup testing
[12:00:10] <jynus>	 *it
[12:00:43] <marostegui>	 ok, I will take one of the new racked hosts then
[12:00:50] <marostegui>	 We've got plenty :)
[12:05:14] <jynus>	 I can move the db, but it would take me some time
[12:05:45] <marostegui>	 nah
[12:05:50] <marostegui>	 no worries, I am taking db2145
[12:05:56] <jynus>	 actually, related questions
[12:06:22] <jynus>	 I am thinking of setting up the final db on an ms-backup host
[12:06:32] <marostegui>	 what final db?
[12:06:53] <jynus>	 the mediabackups db, with image backup metadata
[12:07:10] <jynus>	 I would use misc, but maybe it would take too many resources
[12:07:24] <marostegui>	 disk space usage you mean?
[12:07:27] <jynus>	 any suggestion?
[12:07:32] <jynus>	 more like iops
[12:08:05] <jynus>	 let me see current usage
[12:08:29] <marostegui>	 it'd be nice to have all the backup related databases together if that's possible from a resource point of view
[12:08:36] <jynus>	 yes
[12:08:52] <jynus>	 m1 has backup, dbbackups and it would be nice to have mediabackups db too
[12:09:05] <marostegui>	 and bacula is there too, isn't it?
[12:09:05] <jynus>	 but mediabackups is way more resource-intensive
[12:09:23] <jynus>	 yes, sorry with backup I meant bacula9
[12:09:29] <marostegui>	 ah ok ok 
[12:09:44] <jynus>	 so currently it has 42GB
[12:10:00] <jynus>	 but it will larger when we have full backup
[12:10:05] <jynus>	 like 200GB or so
[12:10:36] <marostegui>	 disk wise I am not too worried, we've got pleeeenty of space on the master and on db1117
[12:10:46] <marostegui>	 but you said iops might be a concern?
[12:10:53] <jynus>	 yeah, it is iops taht worries me
[12:10:56] <jynus>	 also performance
[12:11:06] <jynus>	 db is one of the bottlenecks on backup speed
[12:11:32] <jynus>	 I will find metrics for you
[12:12:28] <marostegui>	 sure
[12:16:17] <jynus>	 during some of the tests, we reached 4K IOPS: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2102&var-port=9104&from=1614071600306&to=1614111041717
[12:17:01] <jynus>	 so not that we cannot achive that on m1, but maybe it would be too impacting?
[12:17:04] <marostegui>	 you think that ms-host would be able to handle it? like, does it have better HW?
[12:17:15] <jynus>	 no, it will be worse hw
[12:17:31] <jynus>	 but 1) less latency, as it will be local for one of the workers
[12:17:45] <jynus>	 2) it will not disturb other running processes on m1
[12:17:57] <marostegui>	 Yeah, I can see pros and cons for both approaches
[12:18:10] <jynus>	 other options would be to move m1 unrelated things to m2 or somewhere else
[12:18:36] <marostegui>	 Or place it on m5 and move labswiki away for good
[12:19:00] <jynus>	 yes, I give you the problem, there are many solutions
[12:19:04] <marostegui>	 like moving all backup* related things to m5 and move labswiki away
[12:19:58] <jynus>	 the first parametrized thing on the ongoing puppetization: https://gerrit.wikimedia.org/r/c/operations/puppet/+/668380
[12:20:15] <jynus>	 is the db location, precisely as it is an open question
[12:20:31] <jynus>	 it is not a big issue now
[12:20:43] <marostegui>	 so, how hard would be to place it on a ms-host and move it later? so you don't get blocked on this?
[12:20:45] <jynus>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/668380/3/modules/mediabackup/manifests/orchestrator.pp
[12:21:04] <jynus>	 ^preciselly because of that very easy
[12:21:38] <marostegui>	 And the other backup related databases? from m1 to m5?
[12:21:39] <jynus>	 in fact, I am working with db2102 now because it won't impact existing work for now
[12:21:55] <marostegui>	 I would assume bacula would be a pain, no?
[12:21:57] <jynus>	 not very hard, mostly time
[12:22:09] <jynus>	 but I wonder that if the main blocker for taht wouldn't be labswiki?
[12:22:16] <marostegui>	 Yeah
[12:22:23] <marostegui>	 We need to move it anyways, sooner rather than later
[12:22:32] <marostegui>	 And once that is done, m5 is mostly empty
[12:22:49] <marostegui>	 So maybe we can make m5 the backup misc section
[12:22:49] <jynus>	 we can also purchase an m6 for next year
[12:22:53] <marostegui>	 (and use the proxy again!)
[12:23:17] <jynus>	 let's maybe talk monday
[12:23:28] <jynus>	 I think you are now aware of the question
[12:23:46] <jynus>	 we can discuss better there once we both give it a thought
[12:23:49] <marostegui>	 sure, let's keep talking about it
[12:23:49] <marostegui>	 yeah
[12:24:09] <jynus>	 I won't want to get blocked on labswiki :-)
[12:24:47] <marostegui>	 :)
[12:24:57] <jynus>	 happily I think we have enough resources
[12:25:09] <jynus>	 it is just a question of distributing them inteligently
[12:25:29] <marostegui>	 yes, I'd rather not buy m6 
[12:25:36] <marostegui>	 We have plenty of resources on misc I think
[12:25:39] <jynus>	 yeah
[12:26:08] <jynus>	 the main problem is, while bacula or dbbackups being slower is a non issue
[12:26:34] <jynus>	 mediabackups, because of millions of operations, will need more dedication, at leasy for the initail run
[12:26:47] <marostegui>	 yeah, let's give it a thought
[13:25:14] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) I have started a check on db1134
[14:14:54] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on db1143 is CRITICAL: 25.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1143&var-port=9104
[14:17:00] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on db1143 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1143&var-port=9104
[14:17:21] <jynus>	 downtime expiring?
[14:17:41] <jynus>	 ah, no it is the other check, I though it was -operations
[15:55:04] <razzi>	 Hi database team, I'm stopping mariadb on labsdb1012 to reimage it and rename it to clouddb1021, full procedure is at https://phabricator.wikimedia.org/T269211#6883946
[16:05:27] <marostegui>	 good luck!
[18:40:55] <Amir1>	 marostegui: hey, is this file our current ParserCache tables? https://github.com/wikimedia/mediawiki/blob/master/maintenance/archives/patch-parsercache.sql
[18:41:13] <Amir1>	 The file doesn't seem to be mentioned in core anywhere so I'm wondering if it's needed or not
[20:44:11] <marostegui>	 Amir1: no, that's not the current schema
[20:44:30] <marostegui>	 This is the current one: https://phabricator.wikimedia.org/P14647
[22:42:06] <Amir1>	 gracias