[05:33:16] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3438408 (10Marostegui) And db1106 got installed finely [05:33:54] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3438409 (10Marostegui) 05Open>03Resolved [05:42:27] 10DBA, 10cloud-services-team: labsdb1009 crashed while doing an alter table on templatelinks - https://phabricator.wikimedia.org/T170657#3438411 (10Marostegui) [05:42:54] 10DBA, 10cloud-services-team: labsdb1009 crashed while doing an alter table on templatelinks - https://phabricator.wikimedia.org/T170657#3438425 (10Marostegui) p:05Triage>03Normal I will leave this open until the alters are done [06:08:01] 10DBA, 10Commons, 10MW-1.30-release-notes, 10MediaWiki-Special-pages, and 4 others: Special:ShortPages does not load in Wikimedia Commons: "Read timeout is reached" - https://phabricator.wikimedia.org/T168010#3438431 (10Marostegui) After yesterday's release, this no longer fails for me. @Jcb can you confir... [07:00:32] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3438447 (10Marostegui) 05Open>03Resolved After another run of the check_private_data on those hosts and sanitarium2, as everything was fine I have created the views on those h... [07:30:01] 10DBA, 10Epic, 10Patch-For-Review: Decouple roles from mariadb.pp into their own file - https://phabricator.wikimedia.org/T150850#3438474 (10Marostegui) [07:30:03] 10DBA, 10Patch-For-Review: Moving eventlogging mariadb role into its own .pp - https://phabricator.wikimedia.org/T152081#3438471 (10Marostegui) 05Open>03Resolved a:05Marostegui>03jcrespo This was done by Jaime: https://gerrit.wikimedia.org/r/#/c/342014/ [08:14:38] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3438514 (10jcrespo) Thank you all people for the help. [08:15:49] 10DBA, 10cloud-services-team: labsdb1009 crashed while doing an alter table on templatelinks - https://phabricator.wikimedia.org/T170657#3438515 (10jcrespo) Should we copy 1009 from 1010? [08:35:06] 10DBA, 10cloud-services-team: labsdb1009 crashed while doing an alter table on templatelinks - https://phabricator.wikimedia.org/T170657#3438520 (10Marostegui) Let's wait and see if the alters finish fine this time I would say. The server recovered fine after the crash, replication had no issues or anything [09:19:14] 10DBA: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3438592 (10jcrespo) [09:19:45] 10DBA: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3438592 (10jcrespo) p:05Triage>03Normal [09:59:24] 10DBA: Refactor prometheus-mysqld-exported to support multi-instance hosts - https://phabricator.wikimedia.org/T170666#3438730 (10jcrespo) [09:59:41] 10DBA, 10Patch-For-Review: Refactor puppet mariadb class to support multi-instance hosts - https://phabricator.wikimedia.org/T169514#3400308 (10jcrespo) a:05jcrespo>03None [10:12:02] 10DBA, 10cloud-services-team: labsdb1009 crashed while doing an alter table on templatelinks - https://phabricator.wikimedia.org/T170657#3438781 (10Marostegui) templatelinks went thru finely, now pagelinks is being altered (121G) [11:05:30] 10DBA, 10Commons, 10MW-1.30-release-notes, 10MediaWiki-Special-pages, and 4 others: Special:ShortPages does not load in Wikimedia Commons: "Read timeout is reached" - https://phabricator.wikimedia.org/T168010#3438882 (10Josve05a) I don't think file pags are wanted for that query on Commons. It is locally n... [11:28:40] 10DBA, 10Commons, 10MW-1.30-release-notes, 10MediaWiki-Special-pages, and 4 others: Special:ShortPages does not load in Wikimedia Commons: "Read timeout is reached" - https://phabricator.wikimedia.org/T168010#3438892 (10jcrespo) @Josve05a That is a completely different matter (I am not saying you are not r... [11:29:29] 10DBA, 10Commons, 10MW-1.30-release-notes, 10MediaWiki-Special-pages, and 3 others: Special:ShortPages does not load in Wikimedia Commons: "Read timeout is reached" - https://phabricator.wikimedia.org/T168010#3438894 (10jcrespo) [11:42:59] I am cloning db2062->db2072 while having lunch [11:43:10] the idea is to setup a new s1 host on codfw [11:43:14] failover the master [11:43:29] and populate dbstores with single-instance [11:43:52] so I will use the faster host (72) for compression [11:44:00] just FYI [11:44:09] great :) [11:44:53] if you have spare time [11:45:10] which you don't :-) [11:45:15] root@db2062:~$ systemctl status mariadb [11:45:21] quite strange ^ [11:46:02] I have seen that there are 40G of temporary files on dbstore1002:/srv/tmp [11:46:15] which are probably temporary tables that were not cleaned up when the server crashed doing the alter table [11:46:21] -rw-rw---- 1 mysql mysql 20G Jul 14 11:45 #sql_5e0b_0.MAD [11:46:21] -rw-rw---- 1 mysql mysql 24G Jul 14 11:45 #sql_5e0b_0.MAI [11:46:45] fuse them and we can delete them [11:46:52] there is also the patch I added [11:47:04] there are horrible aria tables everywhere [11:47:09] fuse them? [11:47:18] * marostegui not getting the meaning [11:47:19] :) [11:47:28] sorry [11:47:31] missed an r [11:47:37] aah [11:47:39] yeah :) [11:47:42] fuser / lsof [11:47:45] whatever [11:47:47] heheh yeah yeah [11:48:02] totally my fault [11:48:05] I thought it was a verb or something [11:48:11] And I was like: what does that mean? [11:48:13] :) [11:48:15] of course [11:48:23] use a fuse filesystem to delete tehm [11:48:31] hahaha [11:48:31] it was all clear [11:48:43] maybe you should add that to urbandictionary [11:51:21] I would need to shut the server down, as they appear to be used by mysql [11:51:41] It should be safe to restart mysql on that host anyways [11:53:18] But we can do that on monday too [12:18:57] see my comment [12:20:35] Oh [12:20:36] good one [13:10:10] 10DBA, 10Commons, 10MW-1.30-release-notes, 10MediaWiki-Special-pages, and 3 others: Special:ShortPages does not load in Wikimedia Commons: "Read timeout is reached" - https://phabricator.wikimedia.org/T168010#3439036 (10Jcb) Please remove the file pages from the results, the report is completely useless now. [13:15:37] 10DBA, 10Commons, 10MW-1.30-release-notes, 10MediaWiki-Special-pages, and 3 others: Special:ShortPages does not load in Wikimedia Commons: "Read timeout is reached" - https://phabricator.wikimedia.org/T168010#3439057 (10Jcb) In https://commons.wikimedia.org/wiki/Commons:Village_pump#Should_content_pages_co... [13:19:56] 10DBA, 10Commons, 10MW-1.30-release-notes, 10MediaWiki-Special-pages, and 3 others: Special:ShortPages does not load in Wikimedia Commons: "Read timeout is reached" - https://phabricator.wikimedia.org/T168010#3439058 (10jcrespo) @Jcb Adding images as "content page"s did that. It can be reverted with no pro... [13:49:52] quick question marostegui, do we have user db's on the new labsdb boxes (templatetiger?) ala https://phabricator.wikimedia.org/T170657? [13:50:04] nope [13:50:12] they are read only [13:50:34] we only have production shards [13:50:39] (if that is what you are asking) [13:51:49] marostegui: my early morning brain misread something [13:51:49] :) [13:51:52] sorry [13:52:07] haha [13:52:13] Did I answer your question though? [13:52:15] 61G is the exact size of a user db and I misread templatelinks as templatetiger [13:52:17] yes thanks! [13:52:22] hahah [13:52:33] embarrassing [13:52:53] Not at all! We are lucky you guys are not around in our mornings :-) [23:22:39] 10DBA, 10Operations, 10Performance-Team, 10Availability (Multiple-active-datacenters): Apache <=> mariadb SSL/TLS for cross-datacenter writes - https://phabricator.wikimedia.org/T134809#3440777 (10aaron) Reading things like https://www.percona.com/blog/2013/10/10/mysql-ssl-performance-overhead/ I think thi...