[04:47:17] 10DBA, 10Wiki-Loves-Monuments-Database: mysqldump is timing out preventing all tables from being included in the dump - https://phabricator.wikimedia.org/T138517 (10Marostegui) Excellent news! @JeanFred can this task be closed then? Thanks [04:58:40] 10DBA, 10Goal: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106 (10Marostegui) [05:19:48] 10DBA, 10Operations: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539 (10Marostegui) a:03Marostegui [05:22:00] 10DBA, 10Operations: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539 (10Marostegui) I have left a backup of this DB at: ` cumin1001:/home/marostegui/T231539 ` [05:24:42] 10DBA, 10Operations: Drop puppet database from m1 - https://phabricator.wikimedia.org/T231539 (10Marostegui) I have renamed the tables on the `puppet` DB, I will leave them for a few hours before dropping the database: ` # mysql.py -hdb1063 puppet -e "show tables" -BN TO_DROP_auth_group TO_DROP_auth_group_perm... [05:53:45] 10DBA, 10Goal, 10Patch-For-Review: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106 (10Marostegui) s7 codfw master swapped from db2047 to db2118 [05:53:53] 10DBA, 10Goal, 10Patch-For-Review: Switchover codfw primary database masters to new hosts - https://phabricator.wikimedia.org/T230106 (10Marostegui) [06:05:59] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) s7 done - set it to 4: ` # dbctl -s eqiad section s7 get ; dbctl -s codfw section s7 get { "s7": { "master": "db1062", "min_replicas": 4, "readonly": false,... [06:06:46] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) [06:30:03] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [06:31:18] 10DBA, 10Operations: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Marostegui) [06:31:34] 10DBA, 10Operations: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Marostegui) p:05Triage→03Normal [06:31:53] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [06:36:14] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2047.codfw.wmnet - https://phabricator.wikimedia.org/T231852 (10Marostegui) [06:40:25] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [07:17:38] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) s1 done - set it to 6 ` # dbctl -s eqiad section s1 get ; dbctl -s codfw section s1 get { "s1": { "master": "db1067", "min_replicas": 6, "readonly": false,... [07:17:58] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) [07:18:07] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) 05Open→03Resolved Everything is now done. [09:17:29] marostegui: morning, We are going to start writing both on very small parts of items (up to Q1000) for wb_terms [09:17:44] slowly we increase the number [09:19:53] Amir1: Sounds good, just to mention, next week we are switching over s8 master - https://phabricator.wikimedia.org/T230762 [09:56:07] noted [09:56:34] Amir1: you are not migrating old stuff to the new tables right? [09:56:37] not yet, I mean [09:57:13] the properties (1GB) is done already. The plan is to start migration next week [09:57:40] yeah, I mean the big migration [11:24:00] marostegui: that supposed to be next week. Is it okay if we do it after the failover? the same day I mean [11:25:31] Amir1: I am not to answer that, but note you risk to get delayed if there is any s8 issue [11:26:06] I think it's fine if it gets delayed [11:26:39] and with issue don't think anything major- it could be jobqueue errors or performance changes [11:26:56] that we would like to research without any otherc change [11:30:41] It's fine [11:32:11] Amir1: after the failover is fine, but please do ask us just in case we had issues during the failover (I will update the task once it is fully done as well) [11:32:34] marostegui: You probably need to merge the patches (it's be cron in puppet) [11:33:28] Amir1: sure, that's fine, let's coordinate that day [11:49:22] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [11:53:42] did you create a mediawiki-config patch? are we still doing that? [11:53:50] I will do that later [11:53:56] yeah [11:54:00] In a few mins I mean [11:54:11] but we are doing it for now, true? [11:54:16] I mean not now [11:54:17] I prefer doing it yep [11:54:30] "we have not stop doing it" :-D [11:54:36] *stopped [11:58:54] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [12:07:32] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [12:07:57] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) @JHedden all the PRE steps are done. [12:11:31] A few "Wikimedia\Rdbms\LoadMonitor::getServerStates: host {db_server} is not replicating?" at 12:00 [12:11:46] I am guessing that is during the topology change [12:11:58] most likely [12:12:11] as it stops it for some time [12:12:25] for larger sections maybe it can be depooled [12:12:49] there is nothing pooled on wikitech [12:12:55] just db1133 with weight 0 [12:13:01] interesting [12:13:31] not sure if that error should happen, then [12:13:49] I guess so, because db1133 is still being checked by mediawiki (I assume) [12:14:09] yes, but it shouldn't complain if it has weight 0? [12:14:45] there was a recent change I belive on the replication checker [12:15:10] maybe loadbysection is 100% on the replica now? [12:15:17] if it is not configured? [12:15:52] yeah, I think it is pooled "invisibly" [12:16:28] e.g. I would expect api calls to go to the replica [12:54:20] if dns fails to pull that may be an issue [12:54:33] cobalt is very loaded at the time [12:56:22] I did a pull test a few minutes ago and it went fine, fingers crossed [13:13:11] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [13:14:36] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [13:15:15] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [13:21:40] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) This was done successfully. wikitech read only start: 13:08:40 wikit... [13:41:43] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10JHedden) Cloud VPS OpenStack has been fully switched over and all services are ba... [13:57:16] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [13:58:04] marostegui: any thoughts on when we should clear out the old lines in db-eqiad/codfw.php? think we should have a deployment window? [13:58:12] tomorrow maybe? [13:58:42] cdanis: Thu might work better for me [13:58:49] ok [13:59:43] I might go offline around 14:30 UTC tomorrow.... [13:59:52] And thursday my afternoon looks cleaner :) [14:07:08] marostegui: thanks for all the switch over planning and work! It went really well :) [14:07:32] jeh: glad to hear! thank you for being around for us :) [14:07:41] I don't forget you, jeh, thanks for the help too! [14:45:04] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [14:45:41] 10DBA, 10Goal: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 (10Marostegui) [14:45:43] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) [14:45:45] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) 05Open→03Resolved This is all done - db1073 will be decommissioned in a few days (most... [14:49:16] 10DBA, 10Operations: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui) [14:50:05] 10DBA, 10Operations: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui) [14:50:09] 10DBA, 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133: Tuesday 3rd Sept at 13:00 UTC - https://phabricator.wikimedia.org/T229657 (10Marostegui) [14:50:19] 10DBA, 10Operations: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui) [14:50:21] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [14:50:57] 10DBA, 10Operations: Decommission db1073.eqiad.wmnet - https://phabricator.wikimedia.org/T231892 (10Marostegui) p:05Triage→03Normal This host was just removed from being a master [T229657] let's give it a few more days before actually start its decommissioning process. [18:19:19] 10DBA, 10Wikimedia-Rdbms, 10Core Platform Team (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), 10Performance-Team (Radar): SHOW SLAVE STATUS as a health check should have a low timeout - https://phabricator.wikimedia.org/T129093 (10WDoranWMF) @aaron Is this a task Performance i... [18:19:28] 10DBA, 10Wikimedia-Rdbms, 10Performance-Team (Radar): SHOW SLAVE STATUS as a health check should have a low timeout - https://phabricator.wikimedia.org/T129093 (10WDoranWMF) [20:00:00] 10DBA, 10Performance-Team, 10Wikimedia-Rdbms: SHOW SLAVE STATUS as a health check should have a low timeout - https://phabricator.wikimedia.org/T129093 (10Krinkle) [20:12:58] 10DBA, 10Performance-Team, 10Wikimedia-Rdbms: SHOW SLAVE STATUS as a health check should have a low timeout - https://phabricator.wikimedia.org/T129093 (10Gilles) a:03aaron [21:53:54] 10DBA, 10MediaWiki-General, 10PostgreSQL, 10Schema-change, 10Wikimedia-database-error: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441 (10TK-999) [22:33:11] 10DBA, 10Operations: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179 (10TK-999) >>! In T109179#3952629, @jcrespo wrote: > This is important, but not a goal for this quarter- we are still blocked on mediawiki extension maintainers to be compatible with it; however, all da...