[05:51:32] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3358878 (10Marostegui) >>! In T168109#3358211, @alanajjar wrote: > @Marostegui can we doing it now? Hi, Sorry, I wasn't around on Sunday :-) Ping me today... [06:00:10] 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s2 - https://phabricator.wikimedia.org/T166205#3358881 (10Marostegui) [06:01:34] I am going to try to fix dbstore1002 replication [06:02:01] what happened again? [06:02:24] Error 'Index recentchanges is corrupted' says [06:02:27] I will see [06:02:38] Oh, I see [06:02:59] Was there a crash or something? [06:04:19] no [06:05:33] InnoDB: (index "rc_namespace_title" of table "srwiki"."recentchanges") [06:05:39] InnoDB: Database page corruption on disk or a failed [06:08:06] yeah the backtrace is horrible [06:08:30] the disks look fine though [06:08:35] Don't see anything broken on the dmesg [06:08:39] let me check the hw logs [06:10:24] You have that session taken? [06:10:41] root@dbstore1002[srwiki]> ALTER TABLE recentchanges ENGINE=InnoDB, FORCE; [06:10:42] ERROR 1712 (HY000): Index PRIMARY is corruptedge done [06:11:58] pffffffff [06:13:03] I guess that table needs to get reimported :_( [06:13:34] if you want to set a replication filter to ignore it and let it catch up, I can reimport it later [06:14:02] but i am worried about that corruption... [06:14:21] hopefully it is just that and a punctual thing [06:16:45] I cannot ignore it, it has replication filters on the do part already [06:17:12] Ah yes, damn [06:18:18] Well, then I will reimport the whole s3 :( [06:18:56] mysqldump: Error 1712: Index recentchanges is corrupted when dumping table `recentchanges` at row: 0 [06:19:32] it is totally screwed :( [06:20:50] Table 'srwiki.recentchanges' doesn't exist in engine [06:21:15] how can all that happen without a crash? [06:21:18] :| [06:21:25] Did you see something on the hw logs? [06:24:07] I have restarted replication on all shards, I will drop and reimport all of srwiki [06:24:52] You sure? I can do it if you like, it is my last day of clinic duty, but I am sure I can have some time for it [06:25:30] I can use db1015 [06:26:24] we may want to restart dbstore1002 anyway [06:30:52] Yeah I was going to suggest to restart the host [06:30:58] To make sure it is well... [06:37:27] 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s2 - https://phabricator.wikimedia.org/T166205#3358941 (10Marostegui) #cloud-services-team I have started to alter labsdb1001 for the s2 shard - this might result on labsdb1001 being delayed for a couple of days. [06:49:22] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206#3358960 (10Marostegui) dbstore1001 is done: ``` root@neodymium:/home/marostegui# for i in `cat s4_tables`; do echo $i; mysql --skip-ssl -hdbstore... [06:49:37] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206#3358961 (10Marostegui) [06:49:49] 10Blocked-on-schema-change, 10DBA: Convert unique keys into primary keys for some wiki tables on s1, s2, s4, s5 and s7 (eqiad) - https://phabricator.wikimedia.org/T164185#3358963 (10Marostegui) [06:49:51] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206#3288358 (10Marostegui) 05Open>03Resolved [07:09:32] 10Blocked-on-schema-change, 10DBA: Convert unique keys into primary keys for some wiki tables on s5 - https://phabricator.wikimedia.org/T166207#3358981 (10Marostegui) [07:27:42] db1047 alter…12 days and…56.3% of stage done [07:27:46] elukey ^ XD [07:28:34] come ooonnnnnnnn :D [07:28:45] it is the enwiki revision table [07:29:41] I wanted to ask you if it was possible in these days to test one alter in the EL db on db1047, to check how much time it will take [07:30:22] Sure, we can try. At this point i am thinking I will kill the alter on db1047 as it is never going to finish…:( [07:30:32] But I am worried about the rollback time it will take [07:30:42] Although the other times when it died after 2 days, it was pretty fast [07:30:49] So maybe tokudb is good for these cases [07:31:14] as in: killing alters XD [07:32:19] :D [08:20:22] 10DBA, 10Schema-change: Drop titlekey table from all wmf databases - https://phabricator.wikimedia.org/T164949#3359158 (10Marostegui) [08:32:40] 10DBA, 10Operations, 10ops-eqiad: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3359207 (10elukey) >>! In T159266#3351373, @Ottomata wrote: > @elukey might have other opinions, but I'm inclined to try our best to expedite the ordering of new hard... [08:34:57] 10DBA, 10Operations, 10ops-eqiad: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3359235 (10Marostegui) 05stalled>03Resolved Let's close this then for now as nothing will be done at this point (and I agree with what you guys think - not worth) [08:40:08] 10DBA, 10Schema-change: Drop titlekey table from all wmf databases - https://phabricator.wikimedia.org/T164949#3359246 (10Marostegui) [08:45:41] 10DBA, 10Operations, 10cloud-services-team, 10Patch-For-Review, 10Wikimedia-log-errors: Terbium cronjobs attempting to connect to labstestweb2001 - https://phabricator.wikimedia.org/T167961#3359253 (10jcrespo) [08:46:54] 10DBA, 10Operations, 10cloud-services-team, 10Patch-For-Review, 10Wikimedia-log-errors: Terbium cronjobs attempting to connect to labstestweb2001 - https://phabricator.wikimedia.org/T167961#3351232 (10jcrespo) I have added 2 temporary accounts to connect from terbium and wasat as the admin users. [09:13:01] 10DBA, 10Labs: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3113206 (10Marostegui) Just leaving the comment here for the record: ping us (DBAs) when the tables are created so we can sanitize them on sanitarium hosts and labs. [09:23:34] 10DBA, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3359355 (10Marostegui) @elukey looks like the BBU is now almost completely dead. After Jaime's relearn attempt, almost 3 hours ago the battery status hasn't changed: ``... [09:35:55] thanks marostegui for --^, now we have a clear idea about the WB vs WT impact on wrties [09:36:01] (for db1046) [09:36:05] :) [09:36:53] as discussed before, would it be ok for you guys if we (analytics) create a task to order new hardware for db104[67] replacements with you hw suggestions/reccomendation? [09:37:29] Then we'll drive the paperwork [09:37:59] having the new hosts racked by the end of Q1 would be good [09:38:17] (config etc.. will of course be done later on as planned) [09:38:17] create a subtask of T156844 on procurement [09:38:18] T156844: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844 [09:38:52] okok! [09:42:49] ./repl.pl --stop-siblings-in-sync --host1=dbstore1002.eqiad.wmnet:3306 --host2=db1015.eqiad.wmnet:3306 --parent-set="default_master_connection='s3'" [09:43:00] I used this, writing it here for the record [09:43:12] (I may search it on the logs later) [10:59:06] 10DBA, 10Labs: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3359887 (10Dereckson) OK. Will be probably this week. [11:18:18] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360091 (10alanajjar) >>! In T168109#3358878, @Marostegui wrote: >>>! In T168109#3358211, @alanajjar wrote: >> @Marostegui can we doing it now? > > Hi, > >... [11:22:43] 10DBA, 10Wikimedia-Hackathon-2017, 10Wikimedia-Site-requests, 10Documentation, 10Mediawiki SWAT Deployments: Create summary templates on Wikitech wiki to stop to write the same things everywhere everytime - https://phabricator.wikimedia.org/T165756#3360111 (10jcrespo) [11:22:54] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360112 (10Marostegui) >>! In T168109#3360091, @alanajjar wrote: >>>! In T168109#3358878, @Marostegui wrote: >>>>! In T168109#3358211, @alanajjar wrote: >>> @M... [11:23:17] 10DBA, 10Wikimedia-Hackathon-2017, 10Wikimedia-Site-requests, 10Documentation, 10Mediawiki SWAT Deployments: Create summary templates on Wikitech wiki to stop writing the same things everywhere, everytime - https://phabricator.wikimedia.org/T165756#3276060 (10jcrespo) [11:26:20] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360127 (10alanajjar) @Marostegui commons ([[ https://meta.wikimedia.org/w/index.php?title=Special:CentralAuth&target=Smuconlaw | see here]]) [11:27:53] 10DBA, 10Patch-For-Review: Migrate parsercache hosts to file per table - https://phabricator.wikimedia.org/T167567#3360158 (10jcrespo) 05Open>03Resolved This is done, continuing service monitoring at T167784 [11:28:18] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360161 (10Marostegui) >>! In T168109#3360127, @alanajjar wrote: > @Marostegui > > commons ([[ https://meta.wikimedia.org/w/index.php?title=Special:CentralAut... [11:31:42] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360164 (10alanajjar) @Marostegui All thanks for you. And [[https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Sgconlaw |we start]]! [11:40:58] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360208 (10Marostegui) commons finished without any major delays :-) Next big one will be enwiki (11k) [11:46:29] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360224 (10alanajjar) >>! In T168109#3360208, @Marostegui wrote: > commons finished without any major delays :-) > Next big one will be enwiki (11k) Thanks @M... [11:47:09] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360226 (10Marostegui) Let's wait until everything is finished, no? As per: https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Sgconlaw there is stil... [11:49:04] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360241 (10alanajjar) >>! In T168109#3360226, @Marostegui wrote: > Let's wait until everything is finished, no? > As per: https://meta.wikimedia.org/wiki/Speci... [11:50:42] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360244 (10Marostegui) Yes, let's wait until it is fully finished - shouldn't take long anyways as most of them are pretty small indeed Thanks :-) [12:18:36] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360329 (10alanajjar) >>! In T168109#3360244, @Marostegui wrote: > Yes, let's wait until it is fully finished - shouldn't take long anyways as most of them are... [12:35:46] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3360352 (10Marostegui) 05Open>03Resolved a:03Marostegui Thanks @alanajjar!! [13:07:02] 10DBA, 10Labs, 10User-bd808, 10cloud-services-team (Kanban): setup dewiki and wikidatawiki on the labsdb1009, 1010 and 1011 - https://phabricator.wikimedia.org/T168021#3360443 (10Marostegui) @jcrespo @bd808 I have tried to run: `sudo /usr/local/sbin/maintain-views --databases dewiki` on labsdb1009, 1010 an... [13:18:01] 10DBA, 10Labs, 10User-bd808, 10cloud-services-team (Kanban): setup dewiki and wikidatawiki on the labsdb1009, 1010 and 1011 - https://phabricator.wikimedia.org/T168021#3360473 (10jcrespo) a:05jcrespo>03Marostegui Assigned to to for now, ping it back when done. [13:19:01] 10DBA, 10Labs, 10Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3360477 (10Marostegui) [13:19:03] 10DBA, 10Labs, 10User-bd808, 10cloud-services-team (Kanban): setup dewiki and wikidatawiki on the labsdb1009, 1010 and 1011 - https://phabricator.wikimedia.org/T168021#3360476 (10Marostegui) 05Open>03Resolved [14:12:52] 10DBA, 10Operations, 10cloud-services-team, 10Patch-For-Review, 10Wikimedia-log-errors: Terbium cronjobs attempting to connect to labstestweb2001 - https://phabricator.wikimedia.org/T167961#3360676 (10jcrespo) [14:59:54] 10DBA, 10Operations, 10cloud-services-team, 10Patch-For-Review, 10Wikimedia-log-errors: Terbium cronjobs attempting to connect to labstestweb2001 - https://phabricator.wikimedia.org/T167961#3360789 (10Marostegui) 05Open>03Resolved a:03jcrespo After Jaime added the grants manually, I have talked to... [15:15:06] 10Blocked-on-schema-change, 10DBA: Convert unique keys into primary keys for some wiki tables on s5 - https://phabricator.wikimedia.org/T166207#3360849 (10Marostegui) codfw master (db2023) has been finished, and now the changes are getting replicated to the slaves. db2023: ``` root@neodymium:/home/marostegui#...