[00:22:34] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10User-notice: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10NickK) Is it there a guide on how to make a wikitext export? I would need it for ukwiki. Please also note that wikis that used... [00:51:48] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10Zoranzoki21) [00:51:50] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10Wikimedia-Site-requests, 10User-Zoranzoki21: Drop FlaggedRevs tables in database for srwikinews - https://phabricator.wikimedia.org/T209761 (10Zoranzoki21) 05stalled>03Open >>! In T209761#4757341, @jcrespo wrote: > @Zoranzoki21 just to be 100% sure everybody... [05:21:33] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10tstarling) >>! In T54921#3382335, @Marostegui wrote: > @tstarling As a background task for myself I am slowly cleaning up all the tables listed here to... [05:22:18] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10tstarling) [07:06:28] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [07:10:56] 10DBA, 10foundation.wikimedia.org: Drop the petition_data table from production - https://phabricator.wikimedia.org/T208979 (10Marostegui) 05Open>03Resolved Table dropped on s3 master with replication I have left a backup at: `db1075:/srv/tmp/T208979/foundationwiki_petition_data.sql` [07:10:58] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10Marostegui) [07:20:26] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [07:20:39] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) s4 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1002 [] db1125 [] db1121 [] db1103 [] db1102 [] db1097... [07:20:55] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [07:25:48] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) db2044 came up with predictive failure today: ` root@db2044:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380264FFFB0) Port Name: 1I... [07:26:02] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [07:36:53] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2044 - https://phabricator.wikimedia.org/T210049 (10Marostegui) p:05Triage>03Normal a:03Papaul @Papaul you have any disks to replace this? Even if it is not a new one? Thanks [07:37:20] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) db2044's disk finally failed {T210049} [07:37:29] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [07:47:10] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [07:51:23] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [07:51:56] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) s8 eqiad progress: [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1002 [] db1124 [] db1116 [] db1109 [] db1104 [] db11... [07:52:16] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [08:44:58] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [08:49:25] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [08:53:09] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) s7 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1002 [] db1125 [x] db1116 [] db1101 [] db1098 [] db10... [08:53:49] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 (10Marostegui) [09:29:28] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) [09:43:49] 10DBA, 10MediaWiki-Database, 10TechCom-RFC: RFC: Proposal to add wl_addedtimestamp attribute to the watchlist table - https://phabricator.wikimedia.org/T209773 (10Marostegui) Does this has something to do with {T125991}? Is that some sort of duplicate? [10:06:01] marostegui: are you touching db2095, I was going to reboot it? [10:06:50] nope [10:06:54] you can go ahead [10:07:05] asked because I saw it downtimed this morning [10:10:34] yeah, I ran some schema changes, so downtime it for 2h [10:11:22] banyek: are you going to work on dbstore2002:3313 today? [10:11:59] s3 you mean? [10:12:33] yes [10:13:08] actually I don't really know what to do with it [10:13:55] ? [10:16:19] what we know is the replication catched up [10:16:19] 11:15 I like Jaime's yesterday comment about remove s2 and give it's resources to s3 [10:23:00] 10DBA, 10Operations, 10Availability (MediaWiki-MultiDC), 10Performance-Team (Radar): Investigate solutions for MySQL connection pooling - https://phabricator.wikimedia.org/T196378 (10jcrespo) > Could you elaborate on how this would work a bit more? I will install a proxy on each master pointing to the mas... [10:24:07] 10DBA, 10Operations, 10Availability (MediaWiki-MultiDC), 10Performance-Team (Radar): Investigate solutions for MySQL connection pooling - https://phabricator.wikimedia.org/T196378 (10jcrespo) Also in terms of prioritization, I asked if I should put this on top of other things and the answer was no due to u... [10:30:46] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) [10:47:53] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) [11:09:54] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10jcrespo) [11:14:49] hi! heads up, when you get a moment for this review (thanks banyek !) https://gerrit.wikimedia.org/r/c/operations/puppet/+/467264 [11:15:59] 3 mins tops [11:21:17] godog: ah, that one [11:21:43] yeah I wanted to make sure we're not missing anything [11:50:13] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) [12:02:39] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) [12:21:18] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) [12:30:02] I prepare to run the T85757 schema change on db1085 with replication enabled [12:30:02] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [12:30:45] It will break replication on db1125 (the user table has triggers on it, which affects the user_options column) [12:31:58] sequence will be depool db1085 [12:32:09] stop replication on db1085 [12:32:17] execute schema change [12:32:26] fix triggers on db1125 [12:32:36] restart replication on db1125 [12:32:42] start replication on db1085 [12:32:56] repool db1085 [12:42:34] marostegui: jynus: tomorrow I'll be 'away' as my son is ill, and he has to be at home (nothing serious, but the doctor not sure what he's got is contagious or not) but there's no-one who could help us now. (Yesterday and today my wife is here, but tomorrow she has to go to work.) Mark gave me a sick day leave. I'll be around, if something pops out with something I am involved, and I also plan to show up in the DBA Triaging meeting, I am [12:42:34] just not want to put a youtube capable tablet into his hands for a full day - even he was into it, I believe -, that would be pretty bad. [13:17:52] 10DBA, 10Operations, 10User-Banyek: BBU Fail on dbstore2002 - https://phabricator.wikimedia.org/T208320 (10Banyek) [13:22:41] 10DBA, 10Operations, 10User-Banyek: BBU Fail on dbstore2002 - https://phabricator.wikimedia.org/T208320 (10Banyek) I prepare a patch to remove s2 instance, and give it's resources to s3 to see how it works. [13:48:35] banyek: cool, thanks for the heads up [13:49:10] add it to your calendar if you can [13:55:23] ok, I added it to the WMF Sick/Vacation calendar too [13:55:32] Jynus can I as for a CR on https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/473546/ ? [13:59:44] banyek: I need more context, are those the first 2 setup? [14:01:20] yes, they are. Originally I tried to add all of them, but marostegui suggested to only add first those which are not in production. I checked those, and these are them. The mapping is 1:1, the hiearadata file was created based on the originals [14:10:23] Before anything else: I do first the schema change on db1085 as that one affects live service, and I don't want to leave it to the end of the day [14:10:35] sounds like a good idea [14:10:49] I downtime db1085, db1125 and the labsdb hosts in icinga, as the replication will complain [14:11:10] (labsdb hosts in this case are 1009, 1010, and 1011) [14:11:12] If you stop replication on db1085 db1125 shouldn't break I think [14:12:02] and the replication lag? [14:12:12] I mean that could alert [14:12:13] That will of course [14:12:19] I meant the breakage because of triggers [14:22:00] I downtimed the hosts, I start executing the change [14:24:00] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2044 - https://phabricator.wikimedia.org/T210049 (10Papaul) @Marostegui I have 2 more new disks [14:24:41] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2044 - https://phabricator.wikimedia.org/T210049 (10Marostegui) Let's try one here! Thanks! [14:36:24] I am executing the schema change on db1085 with [14:36:28] ```./wmfmariadbpy/wmfmariadbpy/osc_host.py --method=ddl --host db1085.eqiad.wmnet --dblist mediawiki-config/dblists/s6.dblist --table user --debug "DROP COLUMN IF EXISTS user_options"``` [14:36:40] when it finished I fix the triggers [14:36:46] banyek: one sec [14:37:01] ok, I am holding back my horses [14:37:11] banyek: is —replicate by default? if so, it will go to sanitarium and labs (which is what we want) [14:37:17] I cannot remember, hence the question :) [14:38:21] as it broke the codfw labs I think it is default, but I see no reason not to add it [14:38:39] ```./wmfmariadbpy/wmfmariadbpy/osc_host.py --method=ddl --host db1085.eqiad.wmnet --dblist mediawiki-config/dblists/s6.dblist --table user --debug --replicate "DROP COLUMN IF EXISTS user_options"``` [14:38:55] banyek: sure, just checking, if replication is stopped on db1085, you want to use replication for that and then fix the triggers once it caught up, and then enable replication on db1085 [14:39:49] marostegui: caught up? [14:40:06] replication is stopped on db1085 [14:40:18] so nothing changes "below" it now [14:40:20] banyek: yeah, I meant for the alter tables, once all the alter tables have been executed on sanitarium [14:40:45] 👍 ah, ok [14:41:01] good [14:41:07] I am executing the alter now [14:44:11] alter finished [14:44:23] now the triggers will be fixed [14:59:37] triggers are in place [14:59:43] I restart replication on db1085 [15:01:20] slaves are catching up [15:01:28] it seems they not break [15:01:29] yay [15:01:38] :) [15:01:39] I'll wait until they catch up [15:02:26] they're caught up, now I am repooling db1085 [15:02:53] there's only the master (db1061) are left in s6 [15:03:09] And I still don't know if I can execute the schema change there [15:04:24] I think in the initial plan you put up together we discussed it, didn't we? [15:04:38] My recommendation would be [15:04:47] - re check mariadb doc about DDL operations [15:05:08] The alter table takes like 10 minutes? did you see lag building up during the schema change? [15:05:50] 7-8 minutes all three databases, 2-3 minutes each [15:05:58] no I didn't see lag building up [15:06:15] Also you can dig around phabricator for similar schema changes and see what was done [15:06:36] but documentation says `Starting with MariaDB 10.4, ADD COLUMN, DROP COLUMN, and reordering columns can be performed instantaneously, without rebuilding the table.` [15:06:41] If lag wasn't building up, it means the table is either not in use or not blocked, right? ;) [15:07:27] true-true [15:08:03] Starting MariaDB 10.4? [15:08:07] but then why we were depooling those host [15:08:10] That sentence is right, but you can also rebuild the table and still permit concurrent DML [15:08:10] hosts? [15:08:12] do we have 10.4 on the master? [15:08:21] of course not [15:08:40] banyek: why we were depooling the hosts? that was also discussed on the plan, I am sure [15:09:13] to avoid errors with replication lag [15:09:36] banyek: that is one thing, metadata locking is the other reason [15:09:54] which can hit hard for tables that are hit massively [15:10:30] banyek: so going back to your sentences, adding a column instantaneously is different from allowing/blocking the whole table [15:10:37] and there's no metadata locking issue if we do the change on the master? [15:11:55] banyek: check this out:https://dev.mysql.com/doc/refman/5.7/en/metadata-locking.html [15:12:17] I repool db1085, and then read it! [15:12:21] ok [15:13:25] My advise would be: [15:14:20] - read doc about metadata locking, read the mariadb online ddl operations, check phabricator for similar alter tables that have been performed, and then evaluate if for the master you want to try with the smallest table and reduce the lock_wait_time_out to mitigate any possible locks caused by the schema change [15:14:49] I said that flag from top of my head, it is probably not written correctly :) [15:15:02] the osc utility does those changes automatically [15:15:13] it is basically why it was built [15:15:22] Yeah, but I think it is 30 by default [15:15:44] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) [15:16:25] I does set up innodb lock wait timeout and the lock wait timeout, but I think it is set to 30 by default [15:16:39] so my suggestion was to reduce it for the master in case he's not sure about it [15:47:08] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2044 - https://phabricator.wikimedia.org/T210049 (10Papaul) a:05Papaul>03Marostegui Disk replaced [15:51:40] 10DBA, 10MediaWiki-Database, 10TechCom-RFC: RFC: Proposal to add wl_addedtimestamp attribute to the watchlist table - https://phabricator.wikimedia.org/T209773 (10Anomie) >>! In T209773#4765225, @Marostegui wrote: > Does this has something to do with {T125991}? Is that some sort of duplicate? Yes, it looks... [16:29:55] I am leaving soon for today [16:40:01] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2044 - https://phabricator.wikimedia.org/T210049 (10Marostegui) a:05Marostegui>03Papaul @Papaul disk failed, can you pull out and pull in again? [16:47:07] bye [16:56:53] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2044 - https://phabricator.wikimedia.org/T210049 (10Papaul) a:05Papaul>03Marostegui Done [17:38:02] 10DBA, 10MediaWiki-General-or-Unknown, 10MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), 10Patch-For-Review, 10Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2): [Bug] Update old nonuniformly distributed page_random values - https://phabricator.wikimedia.org/T208909 (10Milimetric) Random thought rel... [17:41:51] 10DBA, 10JADE, 10Operations, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10kchapman) Reminder that there is an IRC meeting today (Wednesday November 21st) at 11pm PST(November 22nd... [19:11:07] 10DBA, 10MediaWiki-General-or-Unknown, 10MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), 10Patch-For-Review, 10Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2): [Bug] Update old nonuniformly distributed page_random values - https://phabricator.wikimedia.org/T208909 (10Niedzielski) @milimetric, for... [19:20:15] 10DBA, 10Analytics, 10Analytics-Kanban, 10Core Platform Team, and 2 others: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Anomie) >>! In T209031#4763909, @Bawolff wrote: > I was assuming bases on this comment > >>>! In T209031#4...