[09:35:27] 10DBA, 10Patch-For-Review: Drop tag_summary table - https://phabricator.wikimedia.org/T212255 (10Ladsgroup) The reason being is that dropping tag_summary wasn't made it to wmf.9 ([[https://github.com/wikimedia/mediawiki/blob/wmf/1.33.0-wmf.9/maintenance/tables.sql| here]]). I think the reason valid_tag doesn't... [10:06:20] 10DBA, 10Goal: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10jcrespo) [10:15:31] 10DBA, 10Goal: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10jcrespo) I have modified the wording to reuse the meta task for the new goal, which has already solved the decision part, but still needs some design for the architecture, purchase... [10:20:38] 10DBA, 10Goal: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10jcrespo) p:05Normal→03High [10:25:31] 10DBA: Design the final architecture for the database binary backups - https://phabricator.wikimedia.org/T213404 (10jcrespo) p:05Triage→03High [10:39:40] 10DBA: Purchase remaining hosts for database backups - https://phabricator.wikimedia.org/T213406 (10jcrespo) p:05Triage→03High [10:39:50] marostegui: it definitely didn't make it to the branch cut. It was merged at 10pm, by then the branch is already cut: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/479266/ [10:40:24] 10DBA: Design the final architecture for the database binary backups - https://phabricator.wikimedia.org/T213404 (10jcrespo) [10:40:31] 10DBA: Purchase remaining hosts for database backups - https://phabricator.wikimedia.org/T213406 (10jcrespo) [10:42:31] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) [10:42:54] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) >>! In T85757#4862113, @Banyek wrote: > on `db1062` (s7 master) every database is done, except `eswiki` I have to retry... [11:53:46] 10DBA, 10Operations, 10ops-codfw, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo) 05Resolved→03Open [11:54:19] 10DBA, 10Operations, 10ops-codfw, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo) Leaving it open and acking it on icinga so we don't forget about it. [11:56:50] 10DBA, 10User-Banyek: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10jcrespo) 05Resolved→03Open Failing again, acking on icinga, reopening to not forget about it. [12:27:43] 10DBA, 10MediaWiki-extensions-WikibaseMediaInfo, 10SDC Engineering, 10StructuredDataOnCommons, and 3 others: MediaInfo extension should not use the wb_terms table - https://phabricator.wikimedia.org/T208330 (10Addshore) These will be backported in a slot before the deployment of media info to real commons... [12:51:49] 10DBA, 10Operations, 10ops-codfw, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) Let's merge this then T209261 with this task or the other way around [12:54:25] 10DBA, 10Operations, 10ops-codfw, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10jcrespo) Sorry, I searched but I didn't find the other one, as on your above comment you probably meant that but linked to itself by mistake. I am ok with any method, as lo... [12:55:17] marostegui: T213422 [12:55:17] T213422: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 [12:59:16] es2019 :* [12:59:18] :( [12:59:34] 2019 ? [13:01:04] banyek: what is wrong with es2019? [13:01:26] nothing, I just uncapable of reading I guess [13:01:28] :/ [13:01:44] ah, so you were referring to my ticket, right? [13:01:52] yes [13:02:03] ah, I got worried, I though it has failed or something [13:02:04] but I mis-read it [13:02:07] np [13:02:14] *had [13:02:35] last week we had an outage on es2019 I just seen "es" and "19" and I thought it repeated [13:02:50] "Assumption is the mother of all f*ckups" [13:16:36] 10DBA, 10Patch-For-Review: Drop tag_summary table - https://phabricator.wikimedia.org/T212255 (10Ladsgroup) Let me elaborate more why I said we should drop it after wmf.9 is everywhere (T212255#4849367). It was because the patch that stops writing to the table was made it to the branch cut (wmf.9) but the patc... [13:24:58] I think zhwiki backups are running, so es1018 maintenance will have to wait [13:55:35] 10DBA, 10Patch-For-Review: Drop tag_summary table - https://phabricator.wikimedia.org/T212255 (10Marostegui) Thanks for the explanation! That makes sense :-) All clear now! [14:16:54] jynus: Again es1019 :( [14:17:41] 10DBA, 10Operations, 10ops-codfw, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) [14:17:43] 10DBA: Failover m3 codfw master - https://phabricator.wikimedia.org/T209261 (10Marostegui) [14:18:18] 10DBA, 10Operations, 10ops-codfw, 10User-Banyek: db2042 (m3) master RAID battery failed - https://phabricator.wikimedia.org/T202051 (10Marostegui) I have merged this into T209261 as that other one has a more "important" title so we don't forget! :) [14:41:51] I have cleaned up the screen from s3/m3 backup I created yesterday on es2001 as it was all good [14:43:29] so my theory is that there could be some overload [14:43:40] which is not that strange for tendril [14:43:51] but there is not a real, concrete bug [14:44:09] just we have to see if it reappears and a way to minimize it [14:44:48] yeah [14:44:57] It is the first time I have seen it to be honest [14:45:13] I don't recall another time where we had that "ongoing" thing but the backup was actually finished and rotated [15:15:07] marostegui: jynus: Hey is there a way to measure performance improvements caused by dropping tag_summary and change_tag.ct_tag column? I like to report on that to my boss [15:15:56] Surely dropping tables/columns makes little difference to perf.... It's when you stop using them? ;) [15:17:36] yeah, I meant that :D [15:17:55] * Amir1 blames it on the headache [15:20:33] Amir1: The only thing I could think of would be the performance time and/or the amount of rows read/written [15:21:39] yeah, that would be great, maybe hits on innodb buffer pool would be useful too [15:22:11] I would suggest you take one of the slaves from commons or enwiki and check its graphs for when it stopped being used [15:22:25] And check if there are some drops somewhere :) [15:25:42] 10DBA, 10MediaWiki-extensions-WikibaseMediaInfo, 10SDC Engineering, 10StructuredDataOnCommons, and 4 others: MediaInfo extension should not use the wb_terms table - https://phabricator.wikimedia.org/T208330 (10Addshore) 05Open→03Resolved a:03Addshore The extension no longer writes into the table [15:26:35] 10DBA, 10MediaWiki-extensions-WikibaseMediaInfo, 10SDC Engineering, 10StructuredDataOnCommons, and 4 others: MediaInfo extension should not use the wb_terms table - https://phabricator.wikimedia.org/T208330 (10Marostegui) Thanks so much! [15:27:52] okay [15:28:31] Amir1, Reedy it is all related- dropping is mostly indirect performance [15:28:52] e.g. not having to back them up speeds up also backup and recovery for example [15:29:03] Well, sure [15:29:30] evicting them from the buffer pool has also some impact, but it gets lost in a sea of many variables [15:33:28] 10DBA, 10MediaWiki-extensions-WikibaseMediaInfo, 10SDC Engineering, 10StructuredDataOnCommons, and 4 others: MediaInfo extension should not use the wb_terms table - https://phabricator.wikimedia.org/T208330 (10Jdforrester-WMF) This is only going to be Resolved for a couple of weeks until we enable Properti... [15:37:01] 10DBA, 10MediaWiki-extensions-WikibaseMediaInfo, 10SDC Engineering, 10StructuredDataOnCommons, and 3 others: MediaInfo extension should not use the wb_terms table - https://phabricator.wikimedia.org/T208330 (10Jdforrester-WMF) [15:40:36] jynus: https://phabricator.wikimedia.org/T213422#4869949 let me know if you want me to coordinate that so you don't have to stay online late :) [15:40:43] 10DBA, 10MediaWiki-extensions-WikibaseMediaInfo, 10SDC Engineering, 10StructuredDataOnCommons, and 3 others: MediaInfo extension should not use the wb_terms table - https://phabricator.wikimedia.org/T208330 (10Addshore) >>! In T208330#4869968, @Jdforrester-WMF wrote: > This is only going to be Resolved for... [15:57:25] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Scoring-platform-team, and 2 others: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 (10Marostegui) [15:57:31] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 (10Marostegui) [16:17:11] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Marostegui) a:05Banyek→03Marostegui >>! In T85757#4868982, @Banyek wrote: >>>! In T85757#4862113, @Banyek wrote: >> on `db10... [16:17:45] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) Thank you <3 [16:17:59] 10DBA, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) a:05Banyek→03elukey Assigning it to Luca, as he is coordinating this. [16:18:56] 10DBA, 10Analytics, 10Analytics-Kanban, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) a:05Banyek→03elukey Assigning this to @elukey so he can follow up what is... [17:10:37] 10DBA, 10Schema-change, 10Tracking: [DO NOT USE] Schema changes for Wikimedia wikis (tracking) [superseded by #Blocked-on-schema-change] - https://phabricator.wikimedia.org/T51188 (10Marostegui) [17:10:43] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 (10Marostegui) 05Open→03Resolved >>! In T86338#4812696, @Marostegui wrote: > db1068 (s4 master) has too much concurrency on the `page` table to... [17:10:58] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 (10Marostegui) [17:13:43] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) a:05Cmjohnson→03RobH @robh @Marostegui I went through the very long and painful Dell troubleshooting and it's one of those cases where it actually worked. The server is ready to... [17:14:53] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) Good job!! Thank you! Is the RAID 5 made already? If it is only OS install pending, I can take it from there [17:15:53] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10jcrespo) @Cmjohnson you are the best, the worse Dell is, the more superb you are to cover for their mess. How many beers do I own you already? XD Thanks again. [17:17:56] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) Ah no, I think all the mgmt entries, vlan and all those steps are pending so I cannot proceed until those are set up. (Just tried to access mgmt, which was not successful). [17:18:11] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) [17:19:22] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) mgmt works now :-) I will wait for the green light from @Cmjohnson to proceed with the install Thank you for getting this almost done! [17:19:43] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) *update not ready for install. I set the wrong raid. I am updating the driver now and will fix to raid 5 once the update is complete. @marostegui odd...may have somethign to do with t... [17:19:56] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Scoring-platform-team, and 2 others: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 (10Marostegui) 05Open→03Resolved This is all done [17:23:20] marostegui: I don't have any issues with mgmt. [17:23:34] https://www.irccloud.com/pastebin/3cins6sX/ [17:24:01] it does need production dns [17:24:19] typically robh handles that portion (post DC ops) [17:24:27] post on-site DC ops [17:26:15] yeah, it works now [17:26:31] It wasn't working and all of a sudden it started replying to ping cmjohnson1 [17:26:42] okay..it looks like the raid f/w is still updating on my end [17:26:58] should take 5mins or so [17:27:30] cmjohnson1: Awesome! I will wait for robh to get the dns up then too [17:41:39] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10User-Ladsgroup: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 (10Marostegui) This alter (`ALTER TABLE /*_*/change_tag MODIFY ct_tag_id int unsigned NOT NULL, LOCK=NONE;`) requires the following session change... [18:06:57] marostegui: I leave you with es2019 depooled because dumps running (context T213422) [18:06:57] T213422: es1019 IPMI and its management interface are unresponsive (again) - https://phabricator.wikimedia.org/T213422 [18:07:32] you mean you leave es1019 depooled or es2019? :) [18:07:46] I don't know now [18:08:18] so es1018 [18:08:24] es1019 is depooled [18:08:43] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) return shipping info for parts USPS 9202 3946 5301 2440 4873 91 Fedex 9611918 2393026 77237414 [18:08:58] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson) [18:09:28] es1018 and es1019 [18:09:45] what do you mean with dumps running? I don't see anything on es1019 [18:10:00] well, they were running a few minutes ago [18:10:14] they may come back, I am not going to risk it [18:10:37] I repooled the ones on codfw [18:10:37] Sure! Just wanted to make sure I understood the whole thing :) [18:10:53] it is the wikiadmin doing long-running queries [18:10:56] ah ok [18:11:12] well, not long-running, but connected for a long time [18:11:32] sometimes dumps, sometimes wikidat dumps, sometimes something else [18:11:32] you want to leave es1019 depooled till tuesday? [18:11:52] the problem is that if I repool it, by tuesday the connections may come back [18:12:17] yep [18:12:21] that is why I was asking :) [18:12:45] do whatever you think is ok, even if you want to work with chris yourself [18:12:48] I am out [18:12:50] :-) [18:13:24] enjoy!! [18:13:24] there is still redundancy with the master, so that doesn't worry me [18:13:31] nah, I will leave it depooled :) [18:13:45] As we already agreed on doing it on Tuesday [18:36:20] I am ;-) [18:37:09] :) [18:43:47] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Marostegui) s8 (wikidatawiki) doesn't have the column. As that wiki is relatively "new" I guess it was created after the patch w... [18:44:36] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Marostegui) [19:12:59] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Marostegui) [19:14:12] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Marostegui) [19:19:59] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Marostegui) [23:06:16] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Marostegui)