[01:27:12] hi, someone knows where are the global blocks in labs db replicas? there is no globalblocks table in metawiki_p and other databases [01:36:39] I found, it is in centralauth_p [07:58:26] 10DBA: Test InnoDB compression - https://phabricator.wikimedia.org/T139055#2775561 (10Marostegui) The compression finished and the dataset of commonswiki went from `1.3T` to `467G` ``` root@dbstore2001:/srv# mysql --skip-ssl information_schema -e "select TABLE_NAME,ROW_FORMAT,ENGINE,TABLE_ROWS,DATA_LENGTH from... [08:21:46] I tested 8.0 and 10.2 [08:21:53] run into several bugs [08:21:54] Aaaaaand [08:22:11] https://bugs.mysql.com/bug.php?id=83706 [08:22:41] https://jira.mariadb.org/browse/MDEV-11242 [08:23:06] https://jira.mariadb.org/browse/MDEV-10540 [08:24:39] https://jira.mariadb.org/browse/MDEV-11242 -> mark's comment XDDDD [08:25:32] did you get to test some performacne on 8.0? [08:26:24] jynus: btw:   https://gerrit.wikimedia.org/r/320166 [08:26:36] marostegui, indeed [08:26:48] I will publish a blog post with the results [08:26:54] oh nice!!! [08:26:56] too long and complex to explain [08:26:59] <_joe_> have you seem the mysql vulnerabilities? [08:27:02] sure, I can wait :) [08:27:08] <_joe_> I think they might be important for labsdb [08:27:09] _joe_, last week [08:27:24] <_joe_> heh I read about those on friday evening [08:27:43] I was in conversation with moritz a long time ago [08:28:39] _joe_, they actually have little to no impact for our particular configuration [08:29:37] we have no users with FILE permissions, and we ship our own mysqld_safe [08:30:50] plus write permissions were reviewed [08:38:00] <_joe_> jynus: ok [09:23:02] 10DBA, 13Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418#2775607 (10Marostegui) @jcrespo I would like to apply this change to `db1020` (db2011`s master) today: ``` root@db1020:~# cat /etc/my.cnf | grep "^gtid" gtid_domain_id = 171970569 ``` Any obj... [09:25:11] 10DBA, 13Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418#2775608 (10jcrespo) No objection. I would also apply it on config and live to all misc servers if no issues are found. [09:26:40] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2760850 (10jcrespo) p:05Triage>03Normal [09:32:53] 10DBA, 06Collaboration-Team-Triage, 10Flow, 13Patch-For-Review, 07Schema-change: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#2769308 (10jcrespo) Looks good, waiting on code deployment for production deploy- then this or a subticket should be added as a subtask of ticket: {T54... [10:06:11] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2775688 (10Marostegui) I have bad news. While doing the nc transfer, this server went down again and this was the state of it... ``` hpiLO-> power status=0 status_tag=COMMAND COMPLETED Mon No... [10:08:18] Wow, a query that has been running for two days in dbstore1002 [10:09:00] investigate the user from the port [10:09:22] once you know who is doing it, kill it and create a ticket [10:09:29] oki! [10:12:54] jynus: I got it [10:13:03] jynus: The ticket goes to the DBA board or something? [10:13:16] nope [10:13:39] search the user on phabriucator, report to it [10:13:44] ah ok :;) [10:13:45] thanks [10:14:06] add the user's group tag if it is staff [10:14:18] or research if it is research [10:24:53] https://phabricator.wikimedia.org/T150163 [10:33:42] 10DBA, 06Labs: Labs database replica drift - https://phabricator.wikimedia.org/T138967#2775777 (10jcrespo) @russblau Thanks for the report- it is 5 as we speak, but it is indeed wrong. This and hopefully all drift issues are fixed on the imports on the new labsdb servers T147052, that I hope they will be soon... [10:41:59] 10DBA, 06Labs, 10Tool-Labs: enwiki_p replica on s1 is corrupted - https://phabricator.wikimedia.org/T134203#2775785 (10jcrespo) @russblau See my comment at T138967#2775777. Expanding on that, import "in place" created lots of disruption (replication lag, which other users complained about). The decision take... [12:49:52] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2776069 (10Marostegui) This crashed again with the same symtopms but now I did see the same error we said the first time: ``` hpiLO-> show record2 status=0 status_tag=COMMAND COMPL... [14:48:30] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2776344 (10Marostegui) I will not do this today, but tomorrow Tuesday as the ALTER of https://phabricator.wikimedia.org/T149079#2749171 (db2019 - master) is still running and I do not want to have two masters running long ALTER table... [15:12:39] 10DBA, 13Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418#2776412 (10Marostegui) This is done. ``` MariaDB MISC m2 localhost (none) > select @@gtid_domain_id; +------------------+ | @@gtid_domain_id | +------------------+ | 171970569 | +--------... [15:14:58] re: filtering see https://phabricator.wikimedia.org/T132838 [15:15:52] Ah thanks [15:16:43] 10DBA, 07Epic, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#2776450 (10jcrespo) [15:16:45] 10DBA, 06Release-Engineering-Team: Missing / Dropped databases? - https://phabricator.wikimedia.org/T132838#2776449 (10jcrespo) [15:52:25] 10DBA, 10Cognate, 10Wikidata, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Cognate DB review - https://phabricator.wikimedia.org/T148988#2776571 (10Lydia_Pintscher) [15:53:15] 10DBA, 10MediaWiki-extensions-UserMerge: UserMerge using SELECT img_name FROM image WHERE img_user = ... but img_user doesn't have an index - https://phabricator.wikimedia.org/T105395#1442866 (10jcrespo) I would close this as resolved due to Aaron advise is correct in the current situation, and further optimiz... [15:57:36] 10DBA, 10GeoData, 07Wikimedia-log-errors: Uncommitted DB writes in GeoData::getAllCoordinates() - https://phabricator.wikimedia.org/T105698#2776594 (10jcrespo) 05Open>03Resolved a:03jcrespo I am going to close this based on being an old report, with no more recent occurrences that I am aware in the las... [16:01:35] 10DBA, 10MediaWiki-extensions-GlobalUsage, 06Multimedia, 07Wikimedia-log-errors: Uncommitted DB writes in GlobalUsage::getLinksFromPage() - https://phabricator.wikimedia.org/T105699#2776603 (10jcrespo) 05Open>03Resolved a:03jcrespo Same as: T105699 [16:03:11] 10DBA, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, and 2 others: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#2776611 (10jcrespo) [16:15:06] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2776687 (10Marostegui) codfw master (db2019) has now the correct table schema ``` CREATE TABLE `revision` ( `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT, `rev_page` int(8) unsigned NOT NULL DEFAULT '0',... [16:16:32] jynus: I was wondering if an ALTER table that removes partition can be done online or not... [16:16:54] it is easy to say- test it [16:17:03] Basically db2019 (master in codfw) needs the partitions removed [16:17:15] you can look at the manual too, but I would trust a test, too [16:17:20] *more [16:17:36] yeah, yeah, I was asking to see if you happened to know :) [16:23:20] Looks like it doesn't [16:23:41] although the doc is not entirely clear [16:23:51] ha [16:24:09] at least it is codfw :p [16:24:48] maybe we should pool a different server as the master [16:24:56] and keep that and the other as rc? [16:25:25] There are two rc servers [16:25:33] all ok? [16:25:39] Yeah, they do have partitions [16:25:47] I guess the master was cloned from one of those [16:25:48] interesting [16:26:38] So this alter will only generate delay on codfw master (and of course all the slaves) [16:26:45] It takes around 6 hours to run [16:27:27] I would do a failover anyway [16:27:47] specially if we are going to renew some servers there [16:27:59] prepare for that, and do several changes at the same time [16:28:20] Which changes do you have in mind for s4 codfw? [16:28:25] like: server-wise [16:28:31] all pending schema changes [16:28:44] plus pool a master that is not going to be decomissioned [16:28:53] db2019 will go away? [16:29:13] probably, based on the number, but I have not done the number yet [16:29:18] ah right [16:29:21] then this can indeed wait :) [16:34:35] Seeing the latest bugs, I would actually vote right now for haproxy: https://github.com/sysown/proxysql/releases [16:35:55] https://github.com/sysown/proxysql/issues/744 [16:35:58] I think there is a lot of potential there, but needs more maturity [16:35:58] uff [16:36:11] and not because the software [16:36:19] but L4 routing is simpler anyway [16:36:44] I would do haproxy now for labs [16:36:53] and proxysql for misc [16:37:00] nothing for now for production [16:37:08] maybe it is worth sharing that on the ticket? [16:37:10] (the bugs) [16:37:19] well, I was just seeing it [16:37:28] and commenting it real time [16:37:37] yeah yeah, sorry I didn't mean _now_ XD [16:37:49] to see if you agreed with me [16:38:12] I would still like to have it deployed somewhere so we can follow the project and see how it works with our environment [16:38:15] so misc sounds good :) [16:38:18] yes [16:38:28] misc would work nicely for easier failover [16:38:38] and automatic master-slave switch [16:39:10] plus for places like phabricator, we could install it on localhost [16:39:16] and less moving parts [16:41:54] 10DBA: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079#2776840 (10Marostegui) Looks like removing partition isn't a online ddl operation. Jaime and myself have agreed on not altering the master until we've done a failover to another host. I will continue unifying... [16:44:53] 10DBA, 06Labs, 10Labs-Infrastructure, 07Availability: Decide between proxysql and haproxy for labsdbproxy service - https://phabricator.wikimedia.org/T149844#2776848 (10jcrespo) [17:16:00] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, and 2 others: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2776977 (10mark) Approved. [17:24:24] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, and 2 others: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2777041 (10jcrespo) a:05jcrespo>03None @RobH You mentioned it may not need a physica... [17:28:04] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, and 2 others: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2777052 (10RobH) I said that in reply to the labs to db server transition, not the trans... [17:30:10] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, and 2 others: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2777055 (10jcrespo) Sorry for the missunderstanding! [21:18:04] 10DBA, 06Operations, 10ops-eqiad: labsdb1009 boot issues (power supply and controller?) - https://phabricator.wikimedia.org/T150211#2778014 (10jcrespo) [21:28:25] 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2778068 (10jcrespo) After running over 30 000 alter tables, this is nominatively done; however, it is highly likely that some failed and could not be retried; for example if they were under maintena... [23:20:27] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search (Current work), and 2 others: CirrusSearch SQL query for locating pages for reindex performs poorly - https://phabricator.wikimedia.org/T147957#2778316 (10EBernhardson) There isn't a big rush here, this query is incredibly rare. It's part of a reindexi...