[08:03:10] 10DBA, 10User-DannyS712: Drop flagged revs tables on mediawikiwiki - https://phabricator.wikimedia.org/T248298 (10DannyS712) [08:31:44] 10DBA, 10Parsing-Team: testreduce_vd database in m5 still in use? - https://phabricator.wikimedia.org/T245408 (10Marostegui) @ssastry any follow up on this task? [09:41:55] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) >>! In T248086#5983887, @Marostegui wrote: > Current stats > ` > root@cumin1001:/home/marostegui# ./section s8 | grep eqiad | egrep -v "l... [09:44:03] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) [10:05:00] 10DBA, 10Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 (10Marostegui) Given that the original bug report we submitted for this issue (https://jira.mariadb.org/browse/MDEV-12012) was closed with a fix for 10.1.30 (but we never tested that workaround)... [10:16:01] don't get too aggresive with the deletion of dbs 0:-D [10:16:18] or people are going to think you want them gone :-) [10:16:30] but it is nice to see some cleanup efforts! [10:16:59] hehe [10:17:09] it is mostly to be able to enable GTID on multisource! [10:25:25] quick sanity check on?: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/582776/1/modules/install_server/files/autoinstall/netboot.cfg [12:00:10] jynus: ^ forgot to ask you in the meeting! [12:00:18] db1077 Replication: No,Yes expected? [12:00:24] FYI [12:00:24] expected [12:00:41] I will send you another for +1, but that can wait [12:00:45] I checked yours now [12:01:03] https://gerrit.wikimedia.org/r/c/operations/puppet/+/582791 [12:08:25] 10DBA, 10Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1077.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202003... [12:34:38] 10DBA, 10Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1077.eqiad.wmnet'] ` and were **ALL** successful. [12:52:52] 10DBA, 10Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 (10Marostegui) I have been able to get GTID also running on buster and 10.4 and multi-source. The only corner case to be able to fully replicate the original issue was to have the masters with s... [14:21:10] 10DBA, 10cloud-services-team (Kanban): Drop nova and nova_api databases from m5 - https://phabricator.wikimedia.org/T248313 (10Marostegui) [14:22:17] 10DBA, 10cloud-services-team (Kanban): Drop nova and nova_api databases from m5 - https://phabricator.wikimedia.org/T248313 (10Marostegui) p:05Triage→03Medium I guess we can start by renaming the tables. [14:27:42] 27379 QPS on db1118, processes: 201, latency: 0.0080 [14:28:03] should we tune it down a bit? [14:28:10] that's s1, no? [14:28:14] yes [14:29:02] e.g. more load on the rcs? [14:29:07] ah, that one has 500 yeah [14:29:15] yeah, let's maybe give 100 to each rc? [14:29:21] 50+100 [14:29:38] +1, do I do it [14:29:39] ? [14:29:47] yep, go for it, thanks [14:29:52] on it [14:30:59] apparently there is 4 host with s1 rcs [14:31:16] yes, because of the 10.4 testing [14:31:22] feel free to remove db1089 and db1107 [14:31:49] ok, maybe I will do that and tune things after it [14:31:49] db1107 can probably get 100 more in main too [14:35:52] I will do it slowly, expecte a few commits [14:36:08] cool, thanks [14:44:58] my guiding principle is mostly try to equalize latency across the cluster, except for those special ones that may have less or more [14:49:04] db1118 latency decreased substantially now [14:51:16] nice [14:51:26] let's leave the dust settle maybe? [14:51:51] db1089 is the worst latency right now [14:52:09] while special rc ones are the best ones [14:52:16] I would like to depool db1089 from rc [14:52:49] try to keep all replicas in single digit miliseconds? [14:53:33] let me know if you prefer to wait [14:55:19] I will do that one change and leave it there [14:55:25] yeah, db1089 can be removed [14:55:27] from rc [14:55:36] it was used for comparinson with db1107 [14:55:37] so go ahead [14:55:37] it is also an ideal situation [14:55:47] less rcs == better buffer pool usage [14:55:55] yep [14:55:59] while they are right now the least used servers [14:56:19] and I promise to stop tweaking for some time 0:-) [14:57:30] haha [14:57:40] it is addictive now that it is so easy with dbctl eh [14:57:53] I am actually mostly using edit [14:58:02] rather than using 5 commands [14:58:05] yeah, that makes sense for big edits [14:59:10] so apparently if you set groups: [14:59:23] it fails json validation [14:59:29] jsonschema.exceptions.ValidationError: None is not of type 'object' [14:59:44] but it is captured and still lets you edit to correct it [15:02:18] there is something else weird happening [15:02:33] for over a month there was constant timeouts waiting for replication [15:02:48] but it seems it got fixed at 14:43? [15:03:15] am I crazy to think that: https://logstash.wikimedia.org/goto/6d01096f1ecc42a81e5b2676a6e1adea [15:03:18] ? [15:07:44] as a side note, db1089 latency increased a bit, not decreased [15:13:41] :-/ [15:13:43] that is weird [15:13:54] was something pushed? [15:14:18] no, I think just the overal throughput grew again [15:14:39] so all changes I did just stopped momentarily the latency increase [15:16:15] e.g. db1118 went back to over 25K QPS [15:17:17] note we just surpased the 500K QPS on all of eqiad [15:22:39] https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&from=now-24h&to=now&fullscreen&panelId=1 almost.. [15:23:42] we actually did, just need to zoom in: https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&from=1584972760877&to=1584973386312&fullscreen&panelId=1&var-dc=eqiad%20prometheus%2Fops&var-group=All&var-shard=All&var-role=All [15:23:45] :-P [15:57:24] marostegui: jynus https://phabricator.wikimedia.org/phame/post/view/195/coming_to_terms_with_change/ [16:00:04] :-) [16:07:28] 10DBA, 10Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 (10Marostegui) >>! In T149418#5991662, @Marostegui wrote: > I have been able to get GTID also running on buster and 10.4 and multi-source. > The only corner case to be able to fully replicate... [16:34:50] marostegui: let's rename it the cloud next Monday [16:35:18] and then drop it a couple of weeks later [17:07:50] Amir1: not sure if I get what you mean. you meanil renaming it only on labs hosts next Monday but not in prod? [17:09:15] marostegui: the prod doesn't matter for them [17:09:51] as long as no bugs explode in our face we are fine [17:09:58] but it does matter for production errors :) [17:10:16] that's why I want to rename it on production,on at least one host [17:10:19] just to be fully sure [17:11:12] sure, I thought we would do it this week [17:12:22] yeah, I was looking for your confirmation that we can go ahead :) [17:12:28] I will do it tomorrow if that's good [17:13:20] Sure [17:25:24] someone subsribed to wikidata should forward the mail to cloud-l [17:27:17] looks like Lea has tried to cross post to cloud-announce. I will approve that mail... [17:28:09] cool, thanks [17:28:43] https://lists.wikimedia.org/pipermail/cloud-announce/2020-March/000269.html [17:53:46] 10Blocked-on-schema-change, 10DBA: Review schema changes for T218446 - https://phabricator.wikimedia.org/T248333 (10TK-999) [17:54:06] 10Blocked-on-schema-change, 10DBA: Review schema changes for T218446 - https://phabricator.wikimedia.org/T248333 (10TK-999) [20:06:38] 10DBA, 10Parsing-Team: testreduce_vd database in m5 still in use? - https://phabricator.wikimedia.org/T245408 (10ssastry) >>! In T245408#5991238, @Marostegui wrote: > @ssastry any follow up on this task? Sorry .. got distracted by everything else. So, yes, can you create the database and tables (same schema a... [21:51:50] 10Blocked-on-schema-change, 10DBA: Review schema changes for T218446 - https://phabricator.wikimedia.org/T248333 (10Krinkle) [21:51:58] 10Blocked-on-schema-change, 10DBA: Review schema changes for T218446 - https://phabricator.wikimedia.org/T248333 (10Krinkle)