[06:09:55] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), 10Patch-For-Review: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273#4227533 (10Marostegui) I have repooled db1092 and it looks better, but we still... [07:58:50] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T195444#4227660 (10Marostegui) a:03Cmjohnson @Cmjohnson let's get this disk replaced Thanks! [08:42:09] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10Epic: Make wb_terms table fancy - https://phabricator.wikimedia.org/T188992#4227745 (10Marostegui) [08:42:14] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10Patch-For-Review: Drop 'tmp1' index from wb_terms table in production - https://phabricator.wikimedia.org/T194270#4227743 (10Marostegui) 05Open>03Resolved This index has been dropped everywhere in s8 [08:42:37] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Patch-For-Review, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4227746 (10Marostegui) [08:42:52] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Ministry-Of-Magic: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519#4227748 (10Marostegui) [08:43:10] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299#4227762 (10Marostegui) [08:51:38] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db209[45].codfw.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194781#4227813 (10Marostegui) [09:52:55] marostegui: hey, just got to the office. Let me double check [09:53:00] sorry for the mess [09:53:07] no mess! [09:53:08] :) [11:11:20] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Patch-For-Review, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4228305 (10Marostegui) [11:11:35] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Ministry-Of-Magic: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519#4228306 (10Marostegui) [11:11:38] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299#4228307 (10Marostegui) [11:12:37] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299#4185475 (10Marostegui) s1 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1095 []... [11:13:01] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Patch-For-Review, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4181115 (10Marostegui) s1 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1095 [] dbstore1001 []... [11:13:03] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Ministry-Of-Magic: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519#4181129 (10Marostegui) s1 eqiad progress [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1095 [] dbstore1... [12:16:42] marostegui: I see what's going on here, I need to make a patch and get it merged. We can move forward and diverge them for now (we are already diverged) It needs to be a covering index [12:44:55] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), 10Patch-For-Review: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273#4228564 (10jcrespo) TermSqlIndex::getMatchingTerms is still failing, it is right... [12:45:19] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), 10Patch-For-Review: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273#4228565 (10jcrespo) p:05Normal>03High [12:51:28] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), 10Patch-For-Review: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273#4228567 (10Ladsgroup) Wait a second, I thought it's depooled again. if that's no... [13:10:04] marostegui: jynus can I use the isolated host to find out which index is better to add in here? [13:11:15] not sure what you mean [13:12:07] if you mean db2083, you can do whatever you want there [13:14:20] Amir1: yeah, feel free to go for it [13:14:45] Amir1: Keep in mind that adding an index on that table will take probably a day per host (just a rough guess) [13:21:21] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299#4228603 (10Marostegui) [13:22:09] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Ministry-Of-Magic: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519#4228604 (10Marostegui) [13:23:12] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Patch-For-Review, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4228607 (10Marostegui) [13:38:45] 10DBA: Decommission db1051 - https://phabricator.wikimedia.org/T195484#4228629 (10jcrespo) p:05Triage>03Normal [13:51:15] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4211428 (10Marostegui) [13:51:53] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4211428 (10Marostegui) We have agreed this will be done the 13th of June 2018 from 06:00AM UTC till 06:30AM UTC [13:52:35] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4228714 (10Marostegui) [14:00:43] 10DBA, 10Operations-Software-Development, 10Patch-For-Review: Debmonitor: request for misc DB allocation - https://phabricator.wikimedia.org/T192875#4228726 (10Volans) @jcrespo `naos` has been reimaged to `deploy2001.codfw.wmnet`, so I guess it can now be added to the grants. Mentioning it here just for not... [14:38:48] 10DBA, 10Operations, 10decommission, 10ops-codfw, 10Patch-For-Review: db2064 crashed and totally broken - decommission it - https://phabricator.wikimedia.org/T195228#4228800 (10Marostegui) I had a chat with @mark and for now we will not buy a replacement. If we have some more issues with other servers an... [14:38:55] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10MW-1.32-release-notes (WMF-deploy-2018-05-22 (1.32.0-wmf.5)), 10Patch-For-Review: Clean up indexes of wb_terms table - https://phabricator.wikimedia.org/T194273#4194132 (10Lucas_Werkmeister_WMDE) >>! In T194273#4228564, @jcrespo wrote: > Ter... [15:17:02] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T195444#4228898 (10Marostegui) Storage crashed: ``` root@db1065:~# df -hT -bash: /bin/df: Input/output error root@db1065:~# dmesg -bash: /bin/dmesg: Input/output error ``` [15:20:22] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T195444#4228905 (10Marostegui) @Cmjohnson can you visually check if there are more than 1 disk broken? [16:03:28] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1065 storage crash - https://phabricator.wikimedia.org/T195444#4229050 (10jcrespo) [16:21:16] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db112[45].eqiad.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194780#4229116 (10Cmjohnson) [16:53:26] 10DBA, 10Operations, 10ops-codfw: Swith port information for db209[4-5] - https://phabricator.wikimedia.org/T195507#4229282 (10Papaul) p:05Triage>03Normal [17:02:45] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db209[45].codfw.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194781#4229379 (10Papaul) [17:06:03] 10DBA, 10Operations, 10netops, 10ops-codfw: Swtich port information for db209[4-5] - https://phabricator.wikimedia.org/T195507#4229394 (10Papaul) [17:07:25] 10DBA, 10Operations, 10netops, 10ops-codfw: switch port information for db209[4-5] - https://phabricator.wikimedia.org/T195507#4229282 (10Papaul) [17:29:34] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db209[45].codfw.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194781#4229475 (10Papaul) [17:31:18] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db209[45].codfw.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194781#4229479 (10RobH) [17:35:55] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db112[45].eqiad.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194780#4229486 (10Cmjohnson) [17:36:43] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db112[45].eqiad.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194780#4208342 (10Cmjohnson) a:05Cmjohnson>03Marostegui @Marostegui These are installed and ready for you to take over. Assigning to you [18:33:37] 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4081506 (10Marostegui) [18:33:39] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db112[45].eqiad.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194780#4229603 (10Marostegui) 05Open>03Resolved Thanks! I have confirmed I can access both servers and they look good. Going to continue the final... [18:37:01] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db209[45].codfw.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194781#4229627 (10Papaul) I can not pxe boot both servers >>Start PXE over IPv4. Station IP address is 10.192.0.101 Server IP address is 208.80.1... [18:40:12] 10DBA, 10Operations, 10netops, 10ops-codfw, 10Patch-For-Review: rack/setup/install db209[45].codfw.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194781#4229634 (10Marostegui) Adding #netops to see if they can help out [18:46:38] 10DBA, 10Operations, 10netops, 10ops-codfw, 10Patch-For-Review: rack/setup/install db209[45].codfw.wmnet (sanitarium expansion) - https://phabricator.wikimedia.org/T194781#4229648 (10Marostegui) I can the requests arriving fine (this is db2094) but looks like it is not going past that? : ``` May 24 18:25...