[02:24:18] Yes, hello. [02:24:48] > jcrespo removed jcrespo as the assignee of this task. [02:24:48] Ouch. [06:09:50] what is the deal with db1073? [06:11:22] marostegui, did you stopped replication there? [06:12:23] maybe it crashed [06:13:28] yeah, it crashed [06:51:41] 10DBA: db1073 mysql crashed - https://phabricator.wikimedia.org/T149876#2767407 (10jcrespo) [06:54:08] 10DBA, 06Operations, 10Wikimedia-Site-requests, 13Patch-For-Review: Private wiki for Project Grants Committee - https://phabricator.wikimedia.org/T143138#2767421 (10jcrespo) a:05jcrespo>03None My part is done, process can continue. This will not reach labs. [07:10:13] 10DBA: db1073 mysql crashed - https://phabricator.wikimedia.org/T149876#2767438 (10Marostegui) I wouldn't trust this server too much, specially after the behaviour it has shown with the RAID controller - where the disk was failed but it wasn't detected (or marked) until it was actually hit for reads. I would ten... [07:12:38] 10DBA, 06Labs, 10Labs-Infrastructure, 10Tool-Labs, 10Wikimedia-Developer-Summit (2017): Labsdbs for WMF tools and contributors: get more data, faster - https://phabricator.wikimedia.org/T149624#2767439 (10jcrespo) [07:35:19] 10DBA: db1073 mysql crashed - https://phabricator.wikimedia.org/T149876#2767448 (10jcrespo) Certainly that would fit the lag issues - but also compression would! [07:35:37] It is my plan to start the server again [07:35:42] depooled [07:35:48] and see what happens [07:35:59] sounds good to me [07:36:13] we will see if the dbstores get delay because of compression [07:36:22] then reimage it anyway [07:36:37] but if it is disks, it could happen again, even after reimage [07:37:54] $ mysql mysql mysqlbinlog mysqldump mysql_upgrade [07:38:00] ^look what's on the path! [07:38:24] mysqlbinlog finally! :p [07:39:17] I am going to do a dump of all contents and see if I can trigger the error again [07:40:57] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2767451 (10Marostegui) Thanks - that sounds good. But we need to keep in mind labsdb1008 space if we want to place the three shards (s1,s3 and s4) Right now dbstore2001 and dbstore2002 (with s1 compressed) are... [07:51:55] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#2767459 (10jcrespo) [07:51:57] 10DBA: Decommission db1042 - https://phabricator.wikimedia.org/T149793#2767458 (10jcrespo) [07:58:09] I will leave db1073 dumping its contents and see if it crashes [07:58:17] cool [07:58:28] then I will reimage from db1052? [07:58:28] I am going to reimage db2034, did you enable replication back in that shard? [07:58:57] do you mean what I just SAL'ed, or something else? [07:59:13] oh, I missed it [07:59:13] sorry [07:59:16] :) [07:59:37] 10DBA, 06Labs, 10Labs-Infrastructure, 07Availability: Decide between proxysql and haproxy for labsdbproxy service - https://phabricator.wikimedia.org/T149844#2767472 (10Marostegui) I will also add my thoughts about this. As I expressed during the meeting, if possible I would try to go for proxySQL. Jaime... [07:59:38] I was wondering if we should do 2034 and 1073 at the same time? [07:59:41] I was doing this ^ :-) [07:59:47] No [07:59:51] no? [07:59:53] I can wait for db2034 :) [08:00:17] I do not understand [08:00:20] I mean, if you want the whole throughput for db1073 I can wait [08:00:31] but I am dumping it now [08:00:44] meaning [08:00:52] I am running mysqldump in db1073 [08:01:33] I just realised that db2034 is a jessie, so maybe it doesn't need reimage buyt just a fresh mysql copy or would you completely reimage it? [08:01:53] reimage, it takes no time now [08:02:00] ok :-) [08:02:02] will do now [08:02:07] well, it takes time [08:02:11] but need no attention [08:02:17] sure thing :) [08:02:41] if it is jessie maybe if failed before? [08:03:03] what do you mean? [08:03:23] reimaged recently == issues not a long time ago (maybe=) [08:03:35] ah [08:03:39] https://phabricator.wikimedia.org/T117858 [08:03:57] interesting, because this time it failed when it was executing an alter [08:04:15] like: heavy io operation [08:04:40] didn't we corrupt its binlog? [08:05:08] I skipped 2 transactions, but when doing the first alter it crashed [08:05:14] ah [08:05:24] We did corrupt its data, but that is unrelated (somehow) [08:05:25] check its temperature/kernel throttling [08:05:36] the logs showed power fault [08:05:48] there was no temperature errors on the logs at least [08:06:15] oh, yes, I remember [08:06:33] this is https://phabricator.wikimedia.org/T149553 right? [08:06:42] https://phabricator.wikimedia.org/T149553#2755916 [08:06:48] that merits a hardware investigation [08:06:59] I will ping papaul again [08:07:10] You think I should not reimage meanwhile? [08:07:23] it failed 3 times, actually [08:07:29] let me comment there [08:07:36] vale :) [08:07:42] thank you [08:08:22] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2767483 (10jcrespo) probably related to T117858 and T137084. Sounds like power supply failure every time. [08:08:25] ^ [08:08:53] I will ping papaul again to see if he can take a look [08:09:00] Do you rather leave it in this state or reimage it? [08:09:26] first one is a controller error [08:10:34] yes, and the second one is pretty similar to the last one [08:19:48] for some reason, someone has been creating UNIQUE KEYs instead of primary keys [08:19:59] on a lot of tables [08:20:30] meaning you have all the disadvantages of primary keys, and none of the advantages [08:20:57] I do not think db1073 is recoverable [08:21:05] Totally corrupted? [08:21:18] no, but it complained befor about old_id index page [08:21:32] and I said to me, well, it is only and index, we can reconstruct it [08:21:51] but it happens to be the (virtual) primary key, which means the data tree [08:22:08] so unsafe to recover [08:22:29] :( [08:22:38] I am pretty sure that is related to HW [08:22:39] it could be that it is not compression [08:22:50] I enabled parallel replication on this server recently [08:22:55] because of the lag [08:23:06] and we had issues in the past with it [08:23:36] while it was enabled, compression worked for months [08:23:49] without problem [08:24:06] until a few days ago then? [08:24:19] until this morning [08:24:26] so then I think we have the answer [08:24:48] don't jump too soon into conclusions [08:24:58] this is just pure speculation [08:25:15] I know, but it sounds weird that all of a sudden we get a corruption [08:25:27] And it kinda matches when the controller is behaving funny [08:26:09] I checked and there are no media errors aside from the shunned disk [09:07:24] 10DBA, 06Operations, 10ops-codfw: db2011 disk media errors - https://phabricator.wikimedia.org/T149099#2767615 (10Marostegui) I have completed the megacli documentation: https://wikitech.wikimedia.org/wiki/MegaCli This is the diff: https://wikitech.wikimedia.org/w/index.php?title=MegaCli&type=revision&dif... [09:10:18] really nice^ [09:10:31] thanks :) [09:14:35] 10DBA, 06Operations, 10ops-codfw: Degraded RAID on db2052 - https://phabricator.wikimedia.org/T149377#2750453 (10Marostegui) This is now in good state and can probably be closed: ``` Current Status: OK (for 0d 14h 21m 14s) Status Information: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:... [09:15:34] just close it [09:15:40] ^ [09:16:15] sure [09:16:21] didn't want to close tickets not assigned to me :) [09:16:41] 10DBA, 06Operations, 10ops-codfw: Degraded RAID on db2052 - https://phabricator.wikimedia.org/T149377#2767625 (10Marostegui) 05Open>03Resolved [09:17:17] 10DBA, 06Operations, 10ops-codfw: db2011 disk media errors - https://phabricator.wikimedia.org/T149099#2767627 (10Marostegui) 05Open>03Resolved [09:34:14] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2767668 (10Marostegui) db2059 finished ``` MariaDB MARIADB db2059.codfw.wmnet dewiki > show create table revision\G *************************** 1. row *************************** Table: revision Create Table: CREATE TABLE `rev... [09:39:13] 10DBA, 06Operations, 10hardware-requests, 10ops-eqiad, 13Patch-For-Review: Decommission db1042 - https://phabricator.wikimedia.org/T149793#2767673 (10jcrespo) This is ready to go. [09:40:24] 10DBA, 06Operations, 10hardware-requests, 10ops-eqiad: db1019: Decommission - https://phabricator.wikimedia.org/T146265#2767680 (10jcrespo) This is ready to go. [09:43:18] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#2767687 (10jcrespo) I am assuming we want to decom all of those, based on these are the old ones that replaced the newly purchased ones. I have marked {T149793} {T146265} are read... [09:47:58] 10DBA, 10Cognate, 10Wikidata, 15User-Addshore, 03WMDE-QWERTY-Team-Board: Cognate DB review - https://phabricator.wikimedia.org/T148988#2767692 (10Addshore) Okay, I'm getting very confused as to where these 'tags' have come from >>! In T148988#2764212, @jcrespo wrote: >> the NULL values in the title fiel... [09:48:35] 10DBA, 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: implement performance_schema for mysql monitoring - https://phabricator.wikimedia.org/T99485#2767698 (10jcrespo) [09:48:39] 10DBA, 06Operations, 05Prometheus-metrics-monitoring: Decide storage backend for performance schema monitoring stats - https://phabricator.wikimedia.org/T119619#2767695 (10jcrespo) 05Open>03stalled Half of this went to public prometheus. We cannot hold there query data as it can contain PII. The solutio... [09:48:42] 10DBA, 06Operations, 10Traffic, 06WMF-Legal, and 2 others: dbtree loads third party resources (from jquery.com and google.com) - https://phabricator.wikimedia.org/T96499#2767699 (10jcrespo) [09:55:08] 10DBA, 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: implement performance_schema for mysql monitoring - https://phabricator.wikimedia.org/T99485#2767736 (10jcrespo) Pending ones: ``` $ sudo salt --output=txt -C 'G@mysql_group:core' cmd.run 'mysql --defaults-file=/root/.my.cnf --batch... [10:18:47] 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1073 - https://phabricator.wikimedia.org/T149728#2767773 (10Volans) [11:35:08] 10DBA, 13Patch-For-Review: Check wikitech has the right db grants - https://phabricator.wikimedia.org/T149186#2767934 (10jcrespo) 05Open>03Resolved Certainly, the grants were wrong, I have applied the latest changes, here it is the current status: https://gerrit.wikimedia.org/r/#/c/319546/1/templates/mari... [11:41:53] 10DBA, 06Collaboration-Team-Triage, 10Flow, 13Patch-For-Review, 07Schema-change: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2767952 (10jcrespo) Thank you, this is actually important because even if internally, InnoDB treats those as PKs, from the SQL point of vi... [11:48:21] 10DBA, 06Operations: db1065 paged for NRPE timeout - https://phabricator.wikimedia.org/T149633#2767959 (10jcrespo) 05Open>03Resolved a:03jcrespo There has been several api issues in the last weeks. While the report is certainly helpful, I am resolving this because the long-term fixes are identified and... [11:50:21] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2767969 (10Marostegui) ALTER running in db2066 [11:56:05] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2767977 (10Marostegui) According to T148967#2741095 and the final schema table we are missing: ``` KEY `page_user_timestamp` (`rev_page`,`rev_user`,`rev_timestamp`), KEY `rev_page_id` (`rev_page`,`rev_id`) ``` I... [12:02:22] 10DBA, 06Operations, 10Wikimedia-Site-requests, 13Patch-For-Review: Private wiki for Project Grants Committee - https://phabricator.wikimedia.org/T143138#2767991 (10Dereckson) a:03Dereckson Thanks. Next step is Apache and DNS. If all is ready, I'll create the wiki Monday 2016-11-07. [12:09:43] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2767998 (10Marostegui) The alter is running on db2065 [12:32:51] 10DBA: Test InnoDB compression - https://phabricator.wikimedia.org/T139055#2768055 (10Marostegui) I am compressing S4 now in dbstore2001, without compression it is currently `1.3T` We will see how it decreases. I am compressing in a asc order, so the smallest tables first. [12:41:33] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial setup and provision of labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2465396 (10Marostegui) How are you planning to migrate (or populate data) in this new servers? [12:41:52] lol [12:42:05] I could ask the same question! [12:42:09] haha [12:42:25] No, I am just like: mmm are we going to migrate from 10.0 -> 10.1 and then run upgrade? [12:42:35] wait, wait [12:42:41] I am with https://phabricator.wikimedia.org/T149422 [12:42:46] then we can thing about that [12:42:53] *think [12:43:01] "While we do not intend to do any kind of mass-migration for now, we need to make sure we support it properly, specially thinking about new labsdbs." [12:43:08] clear :) [12:43:13] prepare puppet [12:43:15] shall I remove my comment then to avoid confusion? [12:43:18] then we ewill see [12:43:22] no, the question is legit [12:43:34] it is just I cannot answer it now [12:43:42] No, me neither XD [12:43:46] I honestly do not know yet [12:44:14] we need 10.1 on sanitarium, too [12:44:25] for the slave-side triggers [12:44:33] true [12:44:42] and we need to test the roles :) [12:44:42] well, on db1095 [12:44:51] but we need a lot of planing still [12:45:04] going to lunch [12:45:10] emjoy! [12:45:11] enjoy [12:50:57] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2768109 (10Marostegui) db2065 is finished: ``` root@neodymium:~# mysql -hdb2065.codfw.wmnet -A commonswiki -e "show create table revision\G" *************************** 1. row ***************************... [13:19:04] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2768208 (10Marostegui) db2066 is now finished: ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql -hdb2066.codfw.wmnet -A dewiki -e "show create table revision\G" *************************** 1. row *********************... [13:42:20] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2768315 (10Marostegui) db2058 is finished ``` root@neodymium:~# mysql -hdb2058.codfw.wmnet -A commonswiki -e "show create table revision\G" *************************** 1. row *************************** Ta... [14:20:01] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2768425 (10Marostegui) db2051 is now finished ``` root@neodymium:/home/marostegui# mysql -hdb2051.codfw.wmnet -A commonswiki -e "show create table revision\G" *************************** 1. row ******************... [14:43:09] mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `text` at row: 250114191 [14:43:32] you know where that came from don't you? ^ [14:43:38] db1073? :_( [14:43:52] it was a good move to do a mysqldump [14:44:11] even if it took "272GiB 5:14:58 [14.8MiB/s]" [14:44:38] Nov 3 12:55:29 db1073 kernel: [19794.912375] sd 0:2:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [14:44:41] Nov 3 12:55:29 db1073 kernel: [19794.912387] sd 0:2:0:0: [sda] tag#0 CDB: Read(16) 88 00 00 00 00 01 9d b8 05 88 00 00 00 10 00 00 [14:44:44] did you see that? [14:44:50] oh [14:45:03] I didn't check the kernel [14:45:06] root@db1073:~# date [14:45:06] Thu Nov 3 14:44:59 UTC 2016 [14:45:16] so looks like it was not happy 2 hours ago [14:45:44] oh [14:45:55] do you think controller or the non-ñmirrored disk? [14:46:25] actually [14:46:26] 161103 12:55:29 [ERROR] InnoDB: File ./enwiki/text.ibd: 'Linux aio' returned OS error 0. Cannot continue operation [14:46:26] 161103 12:55:34 mysqld_safe Number of processes running now: 0 [14:46:26] 161103 12:55:34 mysqld_safe mysqld restarted [14:46:26] was that you? [14:46:38] s/was that you?/did you see that/g [14:46:38] no it probably crashed [14:46:47] it crashed at the same time then [14:46:48] when reached the faulty page [14:46:51] when the kernel logged that [14:46:52] yep [14:46:54] yes [14:46:58] that makes sense [14:47:02] it did't crashed [14:47:07] *crash [14:47:19] innodb is inteligent to shutdown when page checksum fails [14:47:47] which clearly is hardware-caused [14:48:04] yes, that controller isn't healthy [14:48:20] (I still think it is the controller as per the first symtopms, a disk not being detected as failed but was failed) [14:48:23] could be double disk-failure only? [14:48:52] There is several: Nov 1 18:55:45 db1073 kernel: [6147809.670026] CPU0: Package temperature above threshold, cpu clock throttled (total events = 1) [14:51:20] There is one disk which is not part of the raid because it has a foreign config [14:51:57] what? [14:54:25] this is weird [14:54:25] Look at this [14:54:25] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2768489 (10jcrespo) [14:54:25] According to ^ the disk 4 is failed [14:54:26] and in that report it shows Failed [14:54:26] but now it shows: Firmware state: Unconfigured(good), Spun Up [14:54:26] In the ticket I don't see that chris changed it [14:54:49] maybe it added it just now? [14:54:52] i did not change it yet...new disk should arrive today [14:55:00] oh, hello [14:55:02] :-) [14:55:14] we have worse issues than a disk :-( [14:55:24] I am updating the task [14:55:57] https://phabricator.wikimedia.org/T149728#2768494 [14:56:37] the problem is how to check everthing is correct after a reimage? [14:56:51] because filesystem is not reliable [14:57:42] hey cmjohnson1 o/ :) [14:57:43] Ah, I was doing it too [14:57:43] I will wait for your update [14:57:43] as we might say the same [14:57:43] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2760850 (10jcrespo) https://phabricator.wikimedia.org/T149728#2768489 See kernel.log: {P4360} Both the thermal issues happening and the I/O errors, causing data corruption: ``` Nov 3 12:55:29... [14:57:43] It is weird that it is back as "good" [14:57:44] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial setup and provision of labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2768500 (10chasemp) >>! In T140452#2768084, @Marostegui wrote: > How are you planning to migrate (or populate data) in these new servers... [14:58:03] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2760850 (10Marostegui) The original report raid report showed this information for disk 32:4 ``` Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Firmware state: =====> Failed <===== Media Type:... [14:58:19] jynus: the heating problems are quite frequent, I opened an epic ticket for the mw servers recently: https://phabricator.wikimedia.org/T149287 [14:58:35] should be solved via refreshing thermal paste [14:58:59] yes [14:59:04] cpu is clear [14:59:16] I am only worried about disks, those are mysql [14:59:41] it is very suspicious that the controller didn't mark that disk as broken and only when it got activity (when it was pooled) it was detected [14:59:44] and now it goes back to "good" [15:00:05] let's check hw logs [15:00:20] marostegui, I am taking care of this, do not want to disturb you [15:00:32] will update as I know more [15:00:57] sure :) [15:01:02] If you need any help let me know, I like hardware :p [15:15:54] lol [15:19:23] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2768544 (10jcrespo) ```lines=10 Severity Date and Time Message ID Summary Comment 2016-11-03T15:15:04-0500 USR0030 Successfully logged in using root, from 10.64.48.28 and GUI. 2016... [15:19:36] do you know when people say "RAID is not a backup" [15:20:23] if you do not believe, believe the RAID literally says "then restore the data from the backup to the disk" [15:20:25] nice [15:20:54] *believe it when the RAID says [15:21:05] haha [15:21:16] interesting message [15:22:15] so one is broken and another one about to [15:22:32] no, I only see mention to a single disk [15:22:39] but with uncorrectable error [15:22:58] Ah, the first message is from 1st October, sorry [15:22:59] yes [15:27:23] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2768559 (10jcrespo) So the plan is: #1 install new disk #2 correct cpu thermal issues #3 redo the RAID from 0 because of this "uncorrectable error" Ok with that plan @Cmjohnson ? This is NOT an... [15:28:13] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2768564 (10Cmjohnson) @jcrespo works for me. [15:28:32] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2768565 (10Marostegui) @Papaul Do you know if there are some logs or something you can check onsite to see if we can find out why this server turned off itself? [15:33:06] once db1073 is down, the only server with a bad pattern is db1051 [15:33:33] https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?panelId=1&fullscreen&from=1478165603920&to=1478187203920 [15:44:29] 10DBA: db1051 disk is about to fail - https://phabricator.wikimedia.org/T149908#2768614 (10jcrespo) [15:51:12] 10DBA, 06Operations, 10ops-eqiad: db1051 disk is about to fail - https://phabricator.wikimedia.org/T149908#2768636 (10jcrespo) ``` Enclosure Device ID: 32 Slot Number: 8 Drive's position: DiskGroup: 0, Span: 4, Arm: 0 Enclosure position: 1 Device Id: 8 WWN: 5000C5005ABB3B30 Sequence Number: 2 Media Error Cou... [15:57:51] 10DBA, 06Operations, 10ops-eqiad: db1051 disk is about to fail - https://phabricator.wikimedia.org/T149908#2768655 (10MoritzMuehlenhoff) a:03Cmjohnson [16:01:41] 10DBA, 06Collaboration-Team-Triage, 10Flow, 13Patch-For-Review, 07Schema-change: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2768660 (10Catrope) Matt pointed out that the `flow_subscription` table is unused and should just be dropped entirely (I checked and it's... [16:32:57] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2768786 (10Papaul) @Marostegui no everything looks good. All lids are green [17:20:05] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial setup and provision of labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2768940 (10jcrespo) [17:20:07] 10DBA, 13Patch-For-Review: Prepare for mariadb 10.1 - https://phabricator.wikimedia.org/T149422#2768939 (10jcrespo) 05Open>03Resolved [17:29:41] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial setup and provision of labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2768980 (10jcrespo) Socket authentication allow for easier administration: ``` $ sudo mysql Welcome to the MariaDB monitor. Commands... [18:14:03] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Implement a frontend failover solution for labsdb replicas - https://phabricator.wikimedia.org/T141097#2769216 (10jcrespo) [18:14:06] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial setup and provision of labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2769214 (10jcrespo) 05Open>03Resolved Monitoring of the new hosts can be seen at: https://grafana.wikimedia.org/dashboard/db/mysql?... [18:19:58] 10DBA, 06Labs, 10Labs-Infrastructure: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2769241 (10jcrespo) [18:28:25] 10DBA, 06Collaboration-Team-Triage, 10Flow, 07Schema-change: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#2769308 (10Catrope) [18:28:25] 10DBA, 06Collaboration-Team-Triage, 10Flow, 13Patch-For-Review, 07Schema-change: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2769339 (10Catrope) >>! In T149819#2768660, @Catrope wrote: > Matt pointed out that the `flow_subscription` table is unused and should jus... [18:28:26] 10DBA, 06Collaboration-Team-Triage, 10Flow, 13Patch-For-Review, 07Schema-change: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#2769308 (10Catrope) SQL: https://gerrit.wikimedia.org/r/#/c/319642/1/db_patches/patch-drop-flow_subscription.sql [19:02:52] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2769513 (10jcrespo) [19:02:55] 10DBA: db1073 mysql crashed - https://phabricator.wikimedia.org/T149876#2769515 (10jcrespo) [19:57:37] 10DBA, 10MediaWiki-extensions-Linter: DBA review of Linter extension - https://phabricator.wikimedia.org/T148866#2769661 (10Legoktm) >>! In T148866#2744016, @jcrespo wrote: >> I just added an autoincrement primary key > > If you are going to write and delete multiple rows everytime things are parsed, an autoi... [20:34:26] 07Blocked-on-schema-change, 10DBA, 10Wikimedia-Site-requests, 06Wikisource, and 2 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2769742 (10matmarex)