[01:56:24] 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#2867045 (10Reedy) [01:56:33] 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#2867057 (10Reedy) [01:58:59] 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#2867065 (10Reedy) [01:59:53] 10DBA, 07Epic, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#2867068 (10Reedy) [01:59:55] 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#2867045 (10Reedy) [02:01:04] 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#2867070 (10Reedy) No rush to remove this one, but it should eventually. Need to check if the data has any use for anyone (Analytics or research, maybe?) before dropping it completely [02:02:00] 10DBA, 07Epic, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#2867072 (10Reedy) [02:24:29] 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#2867113 (10Peachey88) [02:24:45] 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#2867045 (10Peachey88) [07:15:55] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2867253 (10Marostegui) This was started around 10 minutes ago. [07:17:26] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2867254 (10Marostegui) s2 is now catching up in dbstore2001. Stopping/starting all slaves worked fine. Later I will stop MySQL and start MySQL to make sure nothing got corrupted. Once... [07:50:49] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2867309 (10Marostegui) I have started the transfer from db2048 to db2034 [07:52:25] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2867326 (10Marostegui) I have also cleared the logs, by the way, the controller showed this error for the two different disks we inserted in slot 2 ``` description=POST Error: 1720-Slot X Drive Arr... [08:03:14] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2867351 (10Marostegui) Thinking about it and given that dbstore2001 has enough disk now, I will keep importing shards this week as next week deployments will not be allowed. So I will... [08:33:20] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2867409 (10jcrespo) @kaldari What is the status. Has the script finished? Is it running still? This is to make the maintenance win... [08:36:35] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2867427 (10kaldari) @jcrespo: The script is still running, but I expect it to finish by the end of the window (10 hours from now). [08:39:06] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2867428 (10jcrespo) Good. [08:41:44] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2867432 (10Marostegui) All the files from s6 are now being transferred to dbstore2001 [08:42:04] db1051 is depooled, I will do some upgrades as it seems to be an old server [08:42:24] heads up if you want to do something else when the server finishes the alter table [08:42:39] i was checking indeed, but no, my pending alters are all in s5 [08:42:41] thanks though :) [08:44:46] you are/were touching 2064 and 2067, right? [08:45:04] yep [08:45:35] db2064 will be back in the pool today, hopefully in a couple of hours, or 3 [08:45:37] I am trying once again an alter on labsdb1001 [08:46:18] oh, no problem, just wanted to see there was not lagged server we were not on top of [08:46:26] ah sure :) [08:48:14] there are 3 servers on s3 eqiad that are close to being full [08:48:43] they would need defragmentation, but also are scheduled for decomission [08:49:05] we can try degra+compression [08:49:11] on the top3 tables [08:49:18] per wiki [08:49:24] yeah, I am not sure it is worth the time [08:50:00] will they survive xmas time without any issue? [08:50:31] maybe we should run pt-table-checksum on them and retire them forever [08:51:30] After xmas, right? [08:52:35] https://grafana.wikimedia.org/dashboard/file/server-board.json?panelId=17&fullscreen&var-server=db1015&var-network=eth0&from=now-90d&to=now [08:52:56] they will survive, but maybe we should ack its space alerts [08:53:28] indeed [08:53:42] there is a spike of growth lately [08:54:00] oh [08:54:04] it is already ack'ed [08:54:22] same on the other s3 servers [08:54:33] maybe a wiki was imported? [08:55:15] ERROR 1062 (23000) at line 1: Duplicate entry '0-The_Rutles_Archaeology' for key 'name_title' [08:55:23] labs? [08:55:48] or db1069 again? [08:55:55] labsdb1001 [08:56:08] :( [08:56:20] I am going to try ALGORITHM=COPY [08:56:30] although I am not sure toku will like that [08:56:41] oh, toku [08:56:45] I keep forgetting about it [09:00:03] enwiki master was done with no issue [09:28:50] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2867494 (10Marostegui) And as expected, it died: ``` time=09:26 description=System Power Fault Detected (XR: 14 00 MID: FF 4D FC CE C0 FF FF 32 32 0C 0C 00 9C 00 04 01 03 47 00 00 00 00 00... [09:30:16] (╯°□°)╯︵ ┻━┻ [09:31:00] hahaha no idea what that means, very complex emoticon :p [09:31:06] I guess it means you are sad? [09:31:21] It means I am altering a table! [09:31:37] marostegui: he's flipping the table technically :-P [09:31:40] hahahaha [09:31:45] oh [09:31:52] the right thing is the table upside down I guess? [09:32:24] (ノ^_^)ノ┻━┻ ┬─┬ ノ( ^_^ノ) [09:32:29] hahahaha [09:32:37] now I do see the tables [09:33:04] I was actually reacting to your latest phab update [09:33:31] that is what I thought!!! [09:33:52] you were angry and throwing a table away? :) [09:34:56] ┬─┬ ︵ /(.□. \) [09:35:30] this is even harder than regex! [09:43:45] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2867546 (10Marostegui) Transfer is done and I am now importing: `frwiki jawiki and ruwiki` [10:00:37] I am going to introduce 10.0.28 on the first enwiki eqiad slave [10:01:34] good! [10:21:51] should we repoint labsdb1001 and labsdb1003 to use db1095 or not yet? [10:29:22] we can do it [10:29:27] i mean [10:29:33] there is no more maintenance needed on db1095, right? [10:29:54] my only concern would be RBR on db1095 but not on labsdb1001 right? [10:30:36] there is no binary log on labs [10:30:45] and they are run as IDEMPOTENT [10:31:03] ah [10:31:06] then \o/ [10:31:35] we can create some followup tickets (not part of the goal) [10:44:54] we need to see how to stop both db1069 and db1095 at the same time [10:47:05] stop? [10:47:15] replication [10:47:19] ah, for repointing [10:47:25] yep [10:47:37] there is a script for that [10:47:46] oooh is it? [10:48:10] but I think it only works if both servers are slaves of the same master [10:48:45] the ./repl.pl script has that functionality, give it a look [10:48:51] but it may need changes [10:48:54] I will take a look [10:49:43] I think that + gtid (without gtid being enabled) could work [10:51:46] now that I think about it [10:51:59] it is easier, as we only need to stop db1052 (master of db1095) and db1069 [10:52:04] which are both under the same master [10:54:04] yes [10:54:08] that would work [10:54:50] let's create a general ticket for planning, it will probably be done next year [10:54:56] ok [10:55:03] Let me start it [10:55:17] I would prefer to do dbstore1001 first [10:55:37] but we are still waiting for the disks :( [10:55:56] it is blocked on mark, let's nag him [10:56:04] XD [10:58:06] 10DBA: Pending things in the labs infra - https://phabricator.wikimedia.org/T153058#2868122 (10Marostegui) [11:10:15] 10DBA, 06Operations: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#1737574 (10TTO) Do these tables remain on any wikis? From T132837 it seems like at least the `hitcounter` ones have been delet... [11:19:39] 10DBA, 06Operations: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#1737574 (10Marostegui) `hitcounter` tables were deleted indeed. I will check the other ones. [12:16:32] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2868420 (10Marostegui) The master db1049 is done ``` root@neodymium:/home/marostegui/git/software# mysql -hdb1049 -A dewiki -e "show create table revision\G" *************************** 1. row *****************... [12:29:57] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2868434 (10Marostegui) The following server are missing the page_user_timestamp index: ``` db1045 db1082 db1087 db1049 (master) ``` I am currently altering db1087 ``` ./software/dbtools/osc_host.sh --host=db1... [12:31:50] what is blocked on me? [12:32:07] what? [12:33:58] some disk purchases [12:34:15] oh, I read jynus instead of mark XD [12:34:28] if someone writes on this channel I just assume it is jaime :-) [12:34:44] https://phabricator.wikimedia.org/T143874#2860166 [12:35:37] approved [12:35:53] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=dbstore1001&service=MariaDB+disk+space [12:37:48] 10DBA, 06Operations: Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas - https://phabricator.wikimedia.org/T115982#2868461 (10Marostegui) What Jaime posted on T115982#1807646 is still the situation we have with the exception of the `hitcount... [12:37:58] jynus: db1087 is all yours [12:38:15] thanks [12:38:19] will repool it [12:38:24] when I am done [12:38:47] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2868463 (10Marostegui) db1087 is done. [12:38:50] great [13:14:22] I will run the schema change on db1082 now so you do not have to wait [13:15:16] ok, I am done with mine [13:16:21] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2868694 (10Marostegui) db1082 is done [13:17:04] in reality, we could pool as soon as we start the alters [13:17:13] once you are done, let me know, I will revert and depool db1045 [13:17:27] as the only issue, aside from extra load, would be the metadata locks, wich only happen at the start [13:18:18] I only need to do db1092 [13:18:21] not the others [13:18:27] ah [13:18:45] I do not need db1092 [13:18:49] it was only the large ones that failed pooled [13:19:03] yeah, but we need not to step over [13:19:11] do you want to do your first? [13:19:15] I don't mind waiting [13:19:19] I will wait for you to your things [13:19:29] I have more pending on enwiki, I think [13:19:29] ok, it takes around 5 minutes only [13:19:32] ok [13:23:08] 82 is done [13:23:17] ok - great [13:23:19] will revert [13:35:20] jynus: https://gerrit.wikimedia.org/r/#/c/326935/ you fine with this change? [13:37:26] ok, but check the processlist [13:37:35] yes :) [13:37:36] there is a wikiadmin process on many hosts [13:37:54] doing a localuser fill-in [13:38:36] I think it is not on Deployments, maybe I should bark to some people [13:39:08] yes, there is nothing there [14:08:29] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2868853 (10Marostegui) db1045 is done The only pending one is the master - will be done tomorrow morning. [14:10:57] I am done with db1045 but I will wait for SWAT to finish before reverting [14:28:35] 10DBA: Pending things in the labs infra - https://phabricator.wikimedia.org/T153058#2868926 (10jcrespo) [15:24:44] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2869039 (10Marostegui) The last wiki (ruwiki) is getting imported and it will take a few more hours to finish. db2067 will remain with replication stopped until it is done. [16:14:23] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2869205 (10Marostegui) We are going to compare the BIOS power settings of both db2034 and db2048 [16:16:15] 07Blocked-on-schema-change, 10DBA, 10Wikimedia-Site-requests, 06Wikisource, and 3 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2869215 (10jcrespo) Only 3 servers left: ``` labsdb1001.eqiad.wmnet:enwiki db1089.eqiad.wmnet:enwiki db1092.eqiad.wmnet:wikidatawiki... [16:46:04] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2869347 (10Marostegui) Power settings are the same on both servers. We will contact the vendor to see if we can get a new raid controller or a technician onsite (again) [16:56:04] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2869387 (10Marostegui) s6 has been imported into dbstore2001 and it is now catching up with the master. [16:58:58] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2869394 (10Marostegui) db2034 is back up rebuilding the RAID with the disk (which will be marked as predictive failure by the controller for sure) [17:09:34] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2869435 (10Marostegui) @kaldari how is it going? you reckon it will end soon? Just asking to see if we need to extend the downtime... [17:34:31] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2869535 (10kaldari) @Marostegui: Apparently, it's taking a long time to complete the script on loginwiki, which I had completely f... [17:44:18] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2869573 (10Papaul) Server Temperature {F5049624} [18:45:16] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2869872 (10Cmjohnson) a:05Cmjohnson>03jcrespo @jcrespo The new disk is installed. Assigning to you, resolve once complete or back to me if there are any issues. [18:45:48] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2869875 (10Cmjohnson) Return part tracking number 9202 3946 5301 2434 9841 72 [18:47:30] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2869878 (10fgiunchedi) @kaldari ok! downtimed db1028 for another 12h [19:55:03] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2870225 (10Marostegui) Thanks @fgiunchedi!