[05:25:25] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: Move masters away from codfw C6 - https://phabricator.wikimedia.org/T191193#4122342 (10Marostegui) @Papaul next one will be db2042 Thanks! [05:42:06] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275#4099551 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db2069.codfw.w... [05:51:32] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275#4122367 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db2069.codfw.w... [06:17:44] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275#4122377 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2069.codfw.wmnet'] ``` and were **ALL** successful. [07:20:25] one schema change and one cloning/role movement ongoing, right? [07:20:46] yep [07:20:55] All SAL'ed I think [07:21:08] cool, just wanted a quick recap [07:21:20] I can now look at details, etc. [07:21:36] :) [07:26:06] and of course eqiad backups failed due to grants missing [07:27:16] codfw are still going strong [08:01:27] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275#4122462 (10Marostegui) [08:27:27] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275#4122513 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db2033.codfw.w... [08:54:49] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275#4122548 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2033.codfw.wmnet'] ``` and were **ALL** successful. [09:12:04] "YES-replication-T187089.sql" not going to ask... :-D [09:12:04] T187089: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089 [09:12:36] yeah, you should've see the mess in s3 :) [09:12:44] That's why [09:13:13] there is a nice --replicate flag on osc_host [09:13:25] I know and I normally use it [09:14:01] But that schema change is quite hard because of how messy the tables are, that's why [09:14:13] It is not as easy with all the wikis [09:14:20] So I had to script many things [09:14:25] Depending on the section [09:14:26] that is why I said I am not going to ask :-) [09:14:31] :) [09:14:43] we need the host schema monitoring utility [09:14:48] yeah [09:15:13] It took me almost 3 days to get everything ready to deploy the schema change on s3, and the schema change only took around 3 hours per host XD [09:38:06] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all codfw database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T191275#4122652 (10Marostegui) 05Open>03Resolved a:03Marostegui This is all done. [10:01:43] 10Blocked-on-schema-change, 10DBA: Schema changes to site_stats - https://phabricator.wikimedia.org/T190780#4122721 (10Marostegui) >>! In T190780#4119392, @jcrespo wrote: > Yes, we could do this on the masters even with a table reconstruction- but we should check if a table reconstruction is needed only for a... [10:13:45] backups on eqiad finished after the grant correction [10:13:48] 10Blocked-on-schema-change, 10DBA: Schema changes to site_stats - https://phabricator.wikimedia.org/T190780#4122733 (10Marostegui) The only concern about doing it directly on the master could be the fact that this table is being updated almost every second. [10:13:51] nice! [10:14:27] the ones on codfw are about to finish [10:17:48] Re: site_stats the metadata locking is only problematic with lots of read and write activity, like revision, page. Check on a smaller wiki an go up for that [10:18:30] yeah, I was checking dewiki [10:18:37] I guess the alter will take just 1 second or less [10:18:48] but we might have small pile ups on the master for that second [10:18:58] because it is being updated every second or even less [10:19:20] the problem would not be 1 second, but if it continues more after that [10:19:31] what do you mean? [10:19:49] or if many of those on a single server (s3) affect the performance (like drops) [10:20:07] yeah, for s3 that would need to be done with big sleeps or host by host [10:20:29] if metadata locking prevents the alter from going through [10:20:37] it will be more than 1 second [10:21:07] yep [10:21:11] and if that happens over all replicas, even worse [10:21:21] for enwiki I think we definitely have to do it host by host [10:21:31] I say test it [10:21:40] testwiki is the ideal candidate [10:21:58] check if there is lag, then abort immediately [10:22:23] nah, testwiki I checked already, that doesn't get updated that much :( [10:22:34] actually nothing for minutes [10:22:42] I think the one to test could be dewiki [10:22:48] or s6 [10:22:56] sure [10:23:19] when I do those, I do them when there is less activity early in the morning [10:23:26] less chance for interactions [10:23:29] yeah [10:23:35] it is quite "late" now :) [10:29:10] 10DBA: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#4122776 (10Marostegui) [10:29:38] 10DBA: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#2912996 (10Marostegui) [10:32:03] 10DBA: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#4122780 (10Marostegui) [10:33:42] 10DBA: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#4122782 (10Marostegui) [11:00:32] 10DBA: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#4122836 (10Marostegui) [11:08:17] will continue reimaging after the commercial break [11:08:31] aka "lunch" [11:11:58] 10Blocked-on-schema-change, 10DBA: Schema changes to site_stats - https://phabricator.wikimedia.org/T190780#4122850 (10EddieGP) >>! In T190780#4122733, @Marostegui wrote: > The only concern about doing it directly on the master could be the fact that this table is being updated almost every second. If your wo... [11:14:35] 10DBA: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#4122853 (10Marostegui) [11:14:52] 10DBA: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#2912996 (10Marostegui) a:03Marostegui [11:15:16] 10Blocked-on-schema-change, 10DBA: Schema changes to site_stats - https://phabricator.wikimedia.org/T190780#4122857 (10jcrespo) @EddieGP Not queries will be lost, but if they pileup blocking the wiki's activity, it will be a worse issue (actual outage or edit outage). [11:28:42] 10Blocked-on-schema-change, 10DBA: Schema changes to site_stats - https://phabricator.wikimedia.org/T190780#4122879 (10EddieGP) Alright, ignore me then. :-) [13:14:24] 10DBA: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#4123055 (10Marostegui) [13:41:56] 10DBA, 10Patch-For-Review: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#4123120 (10Marostegui) [13:45:24] 10DBA, 10Patch-For-Review: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#4123125 (10Marostegui) [13:45:40] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#4123128 (10Marostegui) [13:45:42] 10DBA, 10Patch-For-Review: Delete prefstats tables - https://phabricator.wikimedia.org/T154490#2912996 (10Marostegui) 05Open>03Resolved All done. Thanks! [13:46:17] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3349798 (10Marostegui) [13:46:46] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3350578 (10Marostegui) [14:07:45] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4123193 (10Marostegui) s8 progress: [] dbstore1002 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] db1095 []... [14:08:01] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 3 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#4123194 (10Marostegui) s8 progress: [] dbstore1002 [] labsdb1011 [] labsdb1010 [] labsdb1... [14:08:16] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#4123195 (10Marostegui) s8 progress: [] dbstore1002 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] db1095 [] db1109 [] db1104... [14:08:32] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4123196 (10Marostegui) [14:08:45] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 3 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#4123197 (10Marostegui) [14:09:06] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#4123198 (10Marostegui) [14:17:42] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4123218 (10Marostegui) [14:17:57] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 3 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#4123219 (10Marostegui) [14:18:00] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#4123220 (10Marostegui) [14:26:51] 10DBA, 10Operations, 10ops-codfw: remote ipmi doesn't work for es2013 - https://phabricator.wikimedia.org/T191977#4123236 (10jcrespo) [14:30:59] 10DBA, 10Operations, 10ops-codfw: remote ipmi doesn't work for es2013 - https://phabricator.wikimedia.org/T191977#4123270 (10jcrespo) T150160 suggests `racadm reset` may fix it. [16:05:10] 10DBA: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4123759 (10jcrespo) [16:07:26] 10DBA: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4123783 (10jcrespo) [16:14:28] 10DBA: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4123816 (10Marostegui) I would suggest to depool it for a few minutes to see if errors stop. If they don't we can assume it might be an internal service failing to connect? Like we had with that other server - I can't remember which host... [16:16:11] 10DBA: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4123823 (10jcrespo) We know it is mediawiki, I discovered through application logs on logstash. [22:26:42] marostegui: I am getting access denied to my own database when running: "ALTER SCHEMA logging DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;" [23:10:43] 10DBA, 10Patch-For-Review: Drop localisation and localisation_file_hash tables, l10nwiki databases too - https://phabricator.wikimedia.org/T119811#4125340 (10Bstorm)