[06:20:57] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4063825 (10Marostegui) >>! In T187089#4063822, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#... [06:35:25] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4063852 (10Marostegui) db1106 is now catching up after being recloned from db1065. Once it has been replicating for another 24h, I would say we can cha... [08:02:15] https://gerrit.wikimedia.org/r/#/c/420645/ [08:05:07] if you are going for the pc*s, I will go for labsdb [08:05:27] sounds good :) [08:09:54] note it may take some time for those to be depooled, keep an eye for processlist [08:17:16] yeah, I was monitoring it [09:05:35] 10DBA, 10ContentTranslation, 10Language-2018-Jan-Mar, 10MW-1.31-release-notes (WMF-deploy-2018-03-20 (1.31.0-wmf.26)), 10Schema-change: CX2: Register the version used to start a translation - https://phabricator.wikimedia.org/T187986#4064007 (10Etonkovidova) 05Open>03Resolved Checked in betalabs: -... [09:06:33] 10DBA, 10Operations, 10Epic: DB meta task for next DC failover issues - https://phabricator.wikimedia.org/T189107#4064012 (10Marostegui) [09:31:41] ok to restart db1095 and db1102? not sure in which order [09:40:49] ok to me [09:40:56] I prefer to restart always db1102 [09:40:59] as it is multiinstance XD [09:41:12] in case some goes wrong, it is easier to fix, but those are just "manias" [09:44:32] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4064080 (10Marostegui) s2 db1054 master: ``` root@db1054:~# megacli -LDPDInfo -aAll | egrep -i "slot|error|failure count|s.m.a.r.t" Slot Number: 0 Media Error Count: 0 Other Error Count:... [09:48:15] s/always/first/? [09:48:40] yeah, always first :) [09:48:41] doing that [10:00:02] 10DBA, 10Data-Services, 10Goal, 10Patch-For-Review, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#4064108 (10Marostegui) This can probably be closed I assume? [10:40:06] 10Blocked-on-schema-change, 10DBA, 10Language-2018-Jan-Mar: Deploy translation_cx_version schema change to production - https://phabricator.wikimedia.org/T190133#4064172 (10Nikerabbit) [10:43:49] 10Blocked-on-schema-change, 10DBA, 10Language-2018-Jan-Mar: Deploy translation_cx_version schema change to production - https://phabricator.wikimedia.org/T190133#4064186 (10jcrespo) a:03jcrespo As a note, we do not replicate x1 data to wikireplicas, as we store mostly on x1 non-public data, but it is inter... [10:50:03] is db1106 the new s1 host? [10:51:08] I would try to pool it with some small amount of traffic and depool db1065 (which can take 1 day to be depooled) [10:51:13] to check no errors happen [12:00:49] I will deploy T190133 after lunch [12:00:49] T190133: Deploy translation_cx_version schema change to production - https://phabricator.wikimedia.org/T190133 [12:30:22] yeah, db1106 is the new one [12:30:30] it is already pooled on vslow (shared with db1065) [12:31:39] 10Blocked-on-schema-change, 10DBA, 10Language-2018-Jan-Mar: Deploy translation_cx_version schema change to production - https://phabricator.wikimedia.org/T190133#4064361 (10jcrespo) p:05Triage>03Normal [12:55:49] 10DBA, 10ContentTranslation, 10Language-2018-Jan-Mar, 10MW-1.31-release-notes (WMF-deploy-2018-03-20 (1.31.0-wmf.26)), 10Schema-change: CX2: Register the version used to start a translation - https://phabricator.wikimedia.org/T187986#4064458 (10KartikMistry) [13:07:22] 10Blocked-on-schema-change, 10DBA, 10Language-2018-Jan-Mar: Deploy translation_cx_version schema change to production - https://phabricator.wikimedia.org/T190133#4064513 (10jcrespo) Please check and resolve when happy: ```lines=20 $ grep -v dbstore1001 x1.hosts | while read host port; do echo "$host:$port";... [13:41:35] 10Blocked-on-schema-change, 10DBA: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064600 (10Addshore) [13:41:50] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064610 (10Addshore) [13:42:16] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064600 (10Addshore) [13:44:24] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064616 (10Addshore) [13:48:19] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064625 (10Marostegui) p:05Triage>03Normal [13:48:58] super fast triage there marostegui :) [13:53:10] addshore: it will take a while to get that schema change done I am afraid :( [13:53:17] how long? :( [13:53:21] (just curious) [13:53:23] We are doing 3 at the same time right now [13:53:36] Okay [13:53:36] And I expect to get it done in around 2 months or a bit less [13:53:39] then I can start with that one [13:54:00] And to get that one done it might take another 3-4 months probably as the revision table is huge and needs to be done server by server [13:54:02] It would probably be nice to get it on testwiki at least before then, or testwiki2 [13:54:20] marostegui: okay, any chance you could comment that on the ticket? [13:54:22] Yeah, I can do those first that is no problem [13:54:40] sure, I will comment there [13:54:46] thanks! [14:05:50] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064699 (10daniel) p:05Triage>03Normal [14:06:15] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064712 (10daniel) a:05daniel>03None [14:12:04] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064699 (10Marostegui) If we are sure it is completely empty, it might be easier and faster to drop... [14:16:14] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064762 (10daniel) If that's fine with you, that's fine with me. [14:17:07] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064764 (10Marostegui) >>! In T190153#4064762, @daniel wrote: > @Marostegui If that's fine with you,... [14:17:54] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064699 (10jcrespo) Whoever deployed the table, violated the advice when creating new tables: "Howev... [14:18:37] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064794 (10daniel) [14:20:12] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064811 (10Anomie) >>! In T190153#4064780, @jcrespo wrote: > Whoever deployed the table, violated th... [14:22:29] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064825 (10daniel) @jcrespo I had actually forgotten that we had already created these tables. But a... [14:22:37] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064836 (10jcrespo) Ok, Manuel haven't communicated to me and I wasn't on that ticket. Apparently th... [14:24:03] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064845 (10daniel) @Marostegui wrote: > Sure, I can drop the table in core (I will check to make sur... [14:25:12] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064852 (10daniel) @Anomie, @aude: let's wait with re-creating the table until we need it, before ru... [14:27:34] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064862 (10jcrespo) +1 in case there are more changes [14:27:39] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064863 (10Marostegui) >>! In T190153#4064836, @jcrespo wrote: > Ok, Manuel haven't communicated to... [14:30:44] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064867 (10jcrespo) "too soon" in that the structure was not stable, even if it was merged. [14:31:08] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: In the slots table, replace slot_inherited with slot_origin - https://phabricator.wikimedia.org/T190153#4064868 (10Marostegui) >>! In T190153#4064867, @jcrespo wrote: > "too soon" in that the structure wa... [14:39:31] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: DROP unused 'slots' table (WAS: In the slots table, replace slot_inherited with slot_origin) - https://phabricator.wikimedia.org/T190153#4064884 (10Marostegui) [14:40:39] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: DROP unused 'slots' table (WAS: In the slots table, replace slot_inherited with slot_origin) - https://phabricator.wikimedia.org/T190153#4064889 (10Marostegui) [14:49:00] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: DROP unused 'slots' table (WAS: In the slots table, replace slot_inherited with slot_origin) - https://phabricator.wikimedia.org/T190153#4064901 (10Marostegui) [14:53:45] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: DROP unused 'slots' table (WAS: In the slots table, replace slot_inherited with slot_origin) - https://phabricator.wikimedia.org/T190153#4064923 (10Marostegui) [14:56:55] 10DBA, 10Data-Services, 10Goal, 10Patch-For-Review, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#4064927 (10bd808) >>! In T142807#4064108, @Marostegui wrote: > This can probably be closed I assum... [14:57:00] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: DROP unused 'slots' table (WAS: In the slots table, replace slot_inherited with slot_origin) - https://phabricator.wikimedia.org/T190153#4064928 (10Marostegui) [15:05:01] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: DROP unused 'slots' table (WAS: In the slots table, replace slot_inherited with slot_origin) - https://phabricator.wikimedia.org/T190153#4064950 (10Marostegui) [15:05:14] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064951 (10Anomie) >>! In T190148#4064667, @Marostegui wrote: > Once we get on this task, please keep in mind it can take around 4-5... [15:08:59] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064600 (10jcrespo) If it was a metadata-only change, not requiring a table rebuild, it would take hours or less to deploy (read days... [15:11:28] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064977 (10Marostegui) >>! In T190148#4064951, @Anomie wrote: >>>! In T190148#4064667, @Marostegui wrote: >> Once we get on this... [15:18:12] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4064999 (10jcrespo) The other problem is metadata locking- if it is instant but has to be blocked for selects on the table to finish,... [15:19:10] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4065000 (10Anomie) Wouldn't hurt to check it again, although I think we confirmed SET DEFAULT previously. [15:19:40] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10User-Addshore: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148#4065016 (10Marostegui) >>! In T190148#4064999, @jcrespo wrote: > The other problem is metadata locking- if it is instant but has to b... [15:23:04] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: DROP unused 'slots' table (WAS: In the slots table, replace slot_inherited with slot_origin) - https://phabricator.wikimedia.org/T190153#4065027 (10Marostegui) I have started the drops on s3, the last sec... [15:41:34] do we do like last day or do we invert? [15:41:47] I am fine with the same process :) [15:41:53] I will take care of proxies and killing connections [15:43:42] I will try to merge the patches now, then [15:44:46] oki! [16:22:53] 10DBA: Decommission db1016 - https://phabricator.wikimedia.org/T190179#4065400 (10Marostegui) p:05Triage>03Normal [16:23:21] 10DBA: Decommission db1016 - https://phabricator.wikimedia.org/T190179#4065400 (10Marostegui) [16:23:24] 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#4065416 (10Marostegui) [16:32:10] 10Blocked-on-schema-change, 10DBA, 10Multi-Content-Revisions, 10Structured-Data-Commons, 10Wikidata: DROP unused 'slots' table (WAS: In the slots table, replace slot_inherited with slot_origin) - https://phabricator.wikimedia.org/T190153#4064699 (10Marostegui) a:03Marostegui [16:41:24] linking db1016 to db1063, just in case [16:41:55] yeah, so we can clone db1065 from it :) [16:42:08] I agree [16:42:11] better than 1001 [16:42:14] yeah [16:42:23] great work [16:42:35] should we change master for db1095 from db1065 to db1106 tomorrow then? [16:42:53] yes, tomorrow is ok [16:42:57] coolio [16:43:05] I have to finish some stuff related to m1 first [16:43:09] monitoring and so [16:43:11] sure :) [16:43:22] it was pretty smooth, well done :) [16:44:23] db1016 now catching up [17:12:14] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4065652 (10jcrespo) [17:12:18] 10DBA, 10Operations, 10Patch-For-Review: Switchover m1 master from db1016 to db1063 - https://phabricator.wikimedia.org/T189655#4065649 (10jcrespo) 05Open>03Resolved a:03jcrespo Considered done: ``` == m1 == * Disable GTID on db1063, connect db2078 and db1001 to db1063 DONE * Disable puppet @db1016,... [17:15:19] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4065674 (10Marostegui) [17:16:11] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4065679 (10Marostegui) [17:16:31] 10DBA, 10Operations, 10hardware-requests, 10ops-codfw, 10Patch-For-Review: Decommission db2011 - https://phabricator.wikimedia.org/T187886#4065683 (10Marostegui) [17:23:51] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4065695 (10Marostegui) [17:26:07] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4065715 (10Marostegui) s3 hosts have HP controller so there is no way to see errors per disk. [17:27:35] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4065721 (10Marostegui) s4 db1068 master ``` root@db1068:~# megacli -LDPDInfo -aAll | egrep -i "slot|error|failure count|s.m.a.r.t" Slot Number: 0 Media Error Count: 0 Other Error Count:... [19:12:44] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4066328 (10RobH) [19:23:23] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4066353 (10RobH) [19:23:46] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4035423 (10RobH) a:05RobH>03Cmjohnson ready for onsite disk wipe and completion of steps