[07:05:31] 10DBA, 10Operations, 10ops-codfw: pc2006 rebooted itself - https://phabricator.wikimedia.org/T200641 (10Marostegui) 05Open>03Resolved I have repooled the host so going to consider this resolved as there is not much else we can do - I am going to create a task to get pc2004 and pc2005's BIOS upgrade befo... [07:08:28] 10DBA, 10Operations, 10ops-codfw: Upgrade pc2004 and pc2005 BIOS - https://phabricator.wikimedia.org/T201387 (10Marostegui) [07:08:41] 10DBA, 10Operations, 10ops-codfw: Upgrade pc2004 and pc2005 BIOS - https://phabricator.wikimedia.org/T201387 (10Marostegui) p:05Triage>03Normal [07:35:32] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) [07:35:56] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) [07:36:10] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 (10Marostegui) 05Open>03Resolved This is all done [07:36:27] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) [07:37:16] 10DBA, 10Schema-change, 10Tracking: [DO NOT USE] Schema changes for Wikimedia wikis (tracking) [superseded by #Blocked-on-schema-change] - https://phabricator.wikimedia.org/T51188 (10Marostegui) [07:37:22] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Patch-For-Review, 10Schema-change: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 (10Marostegui) 05Open>03stalled Blocking this until we have done DC failover so we can do s4 primary master (T144010#4449908) [07:38:17] 10DBA, 10MediaWiki-Database, 10PostgreSQL, 10Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441 (10Marostegui) [07:38:19] 10DBA, 10Patch-For-Review, 10Schema-change: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 (10Marostegui) 05Open>03Resolved This is all done - more PK in production! \o/ [07:38:29] can you see es1014 on https://grafana.wikimedia.org/dashboard/db/mysql ? [07:38:48] I see it [07:39:00] Some graphs have no data [07:39:03] But some others do [07:39:12] oh, I think it only has issues on one of the prometheus servers [08:09:55] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) a:03Marostegui [08:10:51] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) a:03Marostegui [08:11:04] marostegui: I will pool the es3 master with 33% of the load just incase [08:11:12] when depooling es1019 [08:11:24] sounds good! [08:12:52] not sure if depooling now or waiting [08:14:40] I would depool now, so the other servers start getting the new load slowly [08:14:45] as the traffic ramps up [08:15:15] thanks, also later mediawiki-config can get crowded [08:15:47] yeah, true, with swat, train... [08:23:06] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) a:03Marostegui This also includes the drop of `KEY ``el_backlinks_to` (`el_from_namespace`,`el_to``(60),``el_from`), which is not being used really: s1: ``` root@db1089.eq... [08:38:28] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) [08:38:30] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) [08:38:32] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) [09:01:05] I have other more important priorities, but being a bit stuck, I may setup the testing hosts as its actual intended role (backup sources) [09:01:31] unless you suggest something alternative (I finished reimaging all I could reimage now) [09:02:52] I can also prepare the es1011 switchover :-) [09:02:57] haha [09:03:10] I think we can do that one after I am back from holidays [09:03:18] I don't think we are in a rush for that one [09:03:31] so ok for me to advance backups on eqiad? [09:03:42] +100 [09:03:49] I can setup 4 or maybe more core backups [09:14:55] by any chance, do you remember the history of db1116 ? [09:15:09] I thought it was sent to x1? [09:15:22] no, db1120 was sent to x1 [09:15:35] to have all the hosts in completely different rows [09:15:49] https://phabricator.wikimedia.org/T196376#4440220 [09:17:02] interesting [09:17:21] I will take for now just db1095 and db1102 [09:17:40] sounds good [09:17:58] maybe we can use db1116 and db1118 as testing hosts later [09:18:37] I guess we will still need extra for x1 too [09:19:10] 10DBA, 10Patch-For-Review: Productionize old/temporary eqiad sanitariums - https://phabricator.wikimedia.org/T196376 (10Marostegui) [09:32:02] 10DBA: Finish eqiad metadata database backup setup (s1-s8, x1) - https://phabricator.wikimedia.org/T201392 (10jcrespo) [09:32:17] 10DBA: Finish eqiad metadata database backup setup (s1-s8, x1) - https://phabricator.wikimedia.org/T201392 (10jcrespo) [09:32:19] 10DBA, 10Epic, 10Wikimedia-Incident: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10jcrespo) [09:32:40] 10DBA: Finish eqiad metadata database backup setup (s1-s8, x1) - https://phabricator.wikimedia.org/T201392 (10jcrespo) p:05Triage>03Normal a:03jcrespo [09:44:52] one last question, I see db1111|db1112|db1120 on hosts allowed to be reimaged, probably they were already? [09:45:01] correct [09:45:08] they can be removed if you are editing the file [09:45:18] oh, db1111|db1112 ar tests [09:45:21] so not big deal [09:45:23] the other? [09:45:32] it is x1 [09:45:35] yeah, get rid of db1120 [09:45:36] I will remove them all [09:45:37] it is in x1 now [09:59:01] 10DBA, 10Patch-For-Review: Productionize old/temporary eqiad sanitariums - https://phabricator.wikimedia.org/T196376 (10jcrespo) [10:05:04] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) [10:05:08] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) [10:13:15] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) [10:13:22] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) [10:19:06] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) I have removed the column from db2075 (codfw - s5) and I am going to leave it like that for a few days, to make sure nothing writes to that column. If that happens replicatio... [10:19:19] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) I have removed the column from db2075 (codfw - s5) and I am going to leave it like that for a few days, to make sure n... [10:19:31] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) I have removed the column from db2075 (codfw - s5) and I am going to leave it like that for a few days, to make sure nothing writes to that... [10:24:48] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: es1019 mgmt interface DOWN - https://phabricator.wikimedia.org/T201132 (10jcrespo) @Cmjohnson es1019 is fully depooled, alerts disabled and shutdown, please proceed directly with any task you need to do it, ping us here when finished. [13:07:04] 10DBA, 10Operations, 10Epic: DB meta task for next DC failover issues - https://phabricator.wikimedia.org/T189107 (10Marostegui) [13:07:07] 10DBA, 10Operations, 10ops-eqiad: db1069 (x1 master) memory errors - https://phabricator.wikimedia.org/T201133 (10Marostegui) 05stalled>03Resolved It recovered itself: ``` ˜/icinga-wm 15:02> RECOVERY - Memory correctable errors -EDAC- on db1069 is OK: (C)4 ge (W)2 ge 1 https://grafana.wikimedia.org/dashb... [13:55:34] marostegui and jynus es1019 is coming back up now [13:55:45] \o/ [13:55:57] And the mgmt works! [13:55:58] \o/ [13:56:02] Thanks cmjohnson1 [13:56:32] thank you, cmjohnson1 [13:58:10] marostegui: I am putting it up, repooling, etc [13:58:17] Awesome! [13:58:18] Thanks [14:07:34] 10DBA, 10Operations, 10ops-eqiad: db1069 (x1 master) memory errors - https://phabricator.wikimedia.org/T201133 (10fgiunchedi) Indeed it can happen since the alert is errors over four days, if no new errors come in the alert will recover [14:10:54] marostegui: regarding db1120, shouldn't we want most of the x1 traffic there? [14:11:09] it currently has 33% of the reads [14:11:26] e.g. as when warming up [14:13:17] jynus: yeah, makes sense, fine if you change it! [14:13:20] (im in a meeting) [14:13:55] sorry [14:14:12] no worries at all! [14:14:25] if you change the weight that's cool, otherwise just leave it for me and I can do it later :) [14:46:57] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: es1019 mgmt interface DOWN - https://phabricator.wikimedia.org/T201132 (10jcrespo) 05Open>03Resolved Everything looking ok now. [16:24:55] 10DBA, 10Patch-For-Review: Finish eqiad metadata database backup setup (s1-s8, x1) - https://phabricator.wikimedia.org/T201392 (10jcrespo) db1095:s2 and db1102:s4 are currently compressiong (and with replication stopped), when they finish tomorrow, I will import s3 and s5 too. [17:04:04] 10DBA, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission db1051 - https://phabricator.wikimedia.org/T195484 (10Cmjohnson) [17:04:09] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320 (10Cmjohnson) [17:04:14] 10DBA, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission db1051 - https://phabricator.wikimedia.org/T195484 (10Cmjohnson) 05Open>03Resolved [17:04:42] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1053 - https://phabricator.wikimedia.org/T194634 (10Cmjohnson) [17:04:54] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320 (10Cmjohnson) [17:04:56] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1053 - https://phabricator.wikimedia.org/T194634 (10Cmjohnson) 05Open>03Resolved [17:05:37] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1054 - https://phabricator.wikimedia.org/T197063 (10Cmjohnson) [17:05:51] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320 (10Cmjohnson) [17:05:56] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1054 - https://phabricator.wikimedia.org/T197063 (10Cmjohnson) 05Open>03Resolved [17:06:12] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1055 - https://phabricator.wikimedia.org/T194118 (10Cmjohnson) [17:06:22] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320 (10Cmjohnson) [17:06:25] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1055 - https://phabricator.wikimedia.org/T194118 (10Cmjohnson) 05Open>03Resolved [17:06:46] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1056 - https://phabricator.wikimedia.org/T193736 (10Cmjohnson) [17:06:51] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1056 - https://phabricator.wikimedia.org/T193736 (10Cmjohnson) 05Open>03Resolved [17:06:53] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320 (10Cmjohnson) [17:07:19] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1059 - https://phabricator.wikimedia.org/T196606 (10Cmjohnson) [17:07:25] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1059 - https://phabricator.wikimedia.org/T196606 (10Cmjohnson) 05Open>03Resolved [17:07:27] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320 (10Cmjohnson) [17:07:45] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1060 - https://phabricator.wikimedia.org/T193732 (10Cmjohnson) [17:07:58] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320 (10Cmjohnson) [17:08:01] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1060 - https://phabricator.wikimedia.org/T193732 (10Cmjohnson) 05Open>03Resolved [17:08:06] Sorry for tall the spam...closing out the decom tasks [17:15:08] 10DBA, 10JADE, 10Operations, 10Scoring-platform-team (Current), 10User-Joe: Extension:JADE scalability concerns - https://phabricator.wikimedia.org/T196547 (10Halfak) [17:19:48] cmjohnson1: Don't be sorry - we love seeing those tasks done! More rack space available! [17:21:13] 10DBA, 10JADE, 10Operations, 10Scoring-platform-team (Current), 10User-Joe: Extension:JADE scalability concerns - https://phabricator.wikimedia.org/T196547 (10awight) [18:10:21] 10DBA, 10Growth-Team, 10MediaWiki-Watchlist, 10Wikimedia-log-errors: Deleting large watchlist takes > 4 seconds causing rollback due to write time limit - https://phabricator.wikimedia.org/T171898 (10MMiller_WMF) Leaving this in To Triage for now, as we continue to decide how the Growth Team will do triage. [18:21:49] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) [18:34:25] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Nemo_bis) > To me it still seems the easiest solution would be to put this on a separate wiki. This was... [18:54:47] 10DBA, 10JADE, 10Operations, 10Scoring-platform-team (Current), 10User-Joe: Extension:JADE scalability concerns - https://phabricator.wikimedia.org/T196547 (10awight) a:03awight [18:55:02] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) a:03awight [18:58:07] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Halfak) @Nemo_bis, thanks for chiming in. There are a lot of concerns I have about a central wiki from a...