[02:48:37] 10DBA: Move more wikis from s3 to s5 - https://phabricator.wikimedia.org/T226950 (10Zoranzoki21) Ok, thanks everyone for explaining! [02:53:19] 10DBA: Move more wikis from s3 to s5 - https://phabricator.wikimedia.org/T226950 (10Zoranzoki21) What is the situation today? [05:20:51] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10mariadb-optimizer-bug: SELECT /* Title::getFirstRevision */ sometimes using page_user_timestamp index instead of page_timestamp - https://phabricator.wikimedia.org/T236376 (10Marostegui) It is still an issue - once we've migrated more hosts to 10.... [05:31:20] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on es2004 - https://phabricator.wikimedia.org/T251017 (10Marostegui) a:03jcrespo @jcrespo this host is scheduled for decommission {T222592} however it is (or it was) being used for backups, what would you like to do with this task? Do you want the disk replac... [05:36:39] 10DBA: Move more wikis from s3 to s5 - https://phabricator.wikimedia.org/T226950 (10Marostegui) >>! In T226950#6083247, @Zoranzoki21 wrote: > What is the situation today? For logingwiki? Top used tables: ` -rw-rw---- 1 mysql mysql 2.3G Apr 27 05:35 spoofuser.ibd -rw-rw---- 1 mysql mysql 2.6G Apr 27 05:35 user_... [05:41:14] 10DBA, 10MediaWiki-User-management, 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.35-notes (1.35.0-wmf.30; 2020-04-28), and 2 others: Rename ipb_address index on ipb_address to ipb_address_unique - https://phabricator.wikimedia.org/T250071 (10Marostegui) 05Resolved→03Open We have to alter... [05:47:31] 10DBA, 10Cognate, 10ContentTranslation, 10Growth-Team, and 10 others: Restart extension1 (x1) database primary master (db1120) - https://phabricator.wikimedia.org/T250701 (10Marostegui) >>! In T250701#6078884, @Addshore wrote: > Hmm, Cognate should be in the lists in the description? > Or am I confusing so... [06:06:50] 10DBA, 10Patch-For-Review: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['labsdb1011.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/2020042706... [06:45:14] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) I think we might have hit a bug similar to https://jira.mariadb.org/browse/MDEV-12463 The innodb purger thread had lots of pending things to purge when we upgraded, so it is possible that we've hit a a bug whe... [06:48:12] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on es2004 - https://phabricator.wikimedia.org/T251017 (10jcrespo) 05Open→03Declined [06:48:44] 10DBA: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) a:03jcrespo [07:02:27] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=db1115 [07:02:48] ^ Do I set a warning threshold on 3%? [07:02:51] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['labsdb1011.eqiad.wmnet'] ` and were **ALL** successful. [07:03:32] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) mysql started fine in the end, but I think I am going to file a bug against MariaDB anyways. [07:03:56] or do I set individual percentages per instance? [07:13:42] Amir1: we still have SELECT /* SpecialFewestRevisions::reallyDoQuery messing up :( [07:13:44] I am going to kill it [07:15:44] done and commented on the task [07:19:52] jynus: let's do 3% yeah for now [07:20:06] I think individual % will be hard to maintain in the long run [07:20:29] let me send a patch and you can comment further there, if needed [07:20:34] ta! [07:29:37] 3% of 1TB is 30 GB, is that acceptable to not warn? [07:30:49] so that's if the backups is 30GB different from previous one, no? [07:31:29] marostegui: ooh sorry, I already started a ticket waiting for PM, can you comment there? T245818 [07:31:29] T245818: Audit query pages on Wikidata - https://phabricator.wikimedia.org/T245818 [07:31:54] Amir1: On my way! [07:32:14] marostegui: purge+optimize completed successfully for pc1010 [07:32:22] kormat: cool, let's check spaces? [07:32:26] disk spaces :) [07:33:09] we're at 71% used on pc1010, which sounds right. i'll go compare with grafana [07:33:23] marostegui: I have some good news and bad news regarding the drifts [07:33:27] kormat: we should be around 60% or so [07:33:36] which ones do you want to hear first [07:33:52] kormat: I have checked some tables and I think we still have old data from there :(, so we should just truncate+optimize and forget about this [07:33:59] Amir1: Bad ones first, always! [07:34:10] Since you asked [07:34:42] 1- the ip address unique index is not different on name only, it lacks the fourth column in production [07:34:45] Amir1: commented on that task [07:35:02] Amir1: So code has 4 columns and prod has 3? [07:35:06] yup [07:35:10] sweeeeet [07:35:51] 2- the abusefilter tables are a mess, with only three tables, they have more drifts than core (I think people there don't know they should report schema changes and they think it automatically happen) [07:36:15] ~1500 drifst [07:36:32] yeah, abusefilter tables are a total disaster :( [07:36:51] 3- the flaggedrevs drifts only happen on s1 and s5 for reasons unknown to me, some only on one node [07:37:17] 4- the ip address unique also needs changing on s8 [07:37:42] (not only a couple of wikis on s3) [07:38:01] marostegui: That was the bad news, do you want to hear the good news now? [07:39:48] yes! [07:39:55] (can we get tasks for those things?) [07:40:06] I ran it the script on flaggedrevs and abuse filter [07:40:14] That was it, the whole good news [07:40:23] Happy Monday! [07:40:42] I will make one [07:41:02] hahaha [07:41:09] thanks :p [07:43:51] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) s8 gave a duplicate entry on: ` Last_Error: Could not execute Write_rows_v1 event on table wikidatawiki.wbc_entity_usage; Duplicate entry 'Q83873593-L.ru-19441465' for key 'eu_entity_id', E... [07:47:14] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) And MySQL crashed all of a sudden. This host is corrupted. [07:57:44] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) Created: https://jira.mariadb.org/browse/MDEV-22373 I am going to also talk to Analytics to see if we can stop labsdb1012 and re-clone labsdb1011 today. [07:57:52] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) p:05Medium→03High [08:11:49] Amir1: so we need to "rename": https://phabricator.wikimedia.org/T250071#6083365 and also change the rest of production wikis to include the new column, right? [08:12:29] marostegui: no need to rename anymore [08:12:32] (except s8) [08:13:36] and the s3 wikis? [08:14:06] those are called: ipb_address [08:14:18] The list I pasted there, only [08:14:24] https://phabricator.wikimedia.org/T250071#6051598 [08:14:37] those need renaming to ipb_address_unique [08:17:51] all backup checks are now green, except for es* ones, which will run for a second time tonight [08:18:18] nice! [08:18:57] still not 100% happy with the method [08:19:48] but enough to tick T138562 [08:19:49] T138562: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 [08:20:56] 10DBA, 10Epic, 10Patch-For-Review: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10jcrespo) [08:26:44] I updated: https://wikitech.wikimedia.org/wiki/MariaDB/Backups#Alerting [08:27:12] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) [08:29:40] marostegui: yup [08:30:14] Amir1: cool, I will finish the s3+s8 task, and if you could create a task for adding the extra column where it is needed, I would be grateful :) [08:30:35] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) s5 eqiad [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1130 [] db1124 [] db1113 [] db1110 [x] db1102 [] db1100 [x] db1097 [x] db1096 [] db1082 [08:31:12] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) [08:36:12] 10DBA, 10Operations, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Kormat) 05Open→03Resolved Work is now complete. [08:37:29] Sure! [08:40:42] thank you Amir1 [08:41:43] 10DBA, 10Operations, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Marostegui) Thank you! For the record: the incident report for this is at: https://wikitech.wikimedia.org/wiki/Incident_documentation/2020031... [08:48:17] nah, sorry for creating shit ton of work for you [08:49:34] No way! :) [08:49:49] We have fixed looots of drifts over the years, but it is a constant battle :) [08:51:38] so snapshots are taking around 14 hours to run [08:51:55] with additional servers we should be able to keep that down again to 9h or so [08:52:26] last ones running at 9am UTC is not ideal [08:54:02] partially, it is also because we are under lower redundancy due to hw failure [08:55:13] I just marked db1140 as "failed" [08:55:18] on netbox [08:55:48] what does that imply? [08:56:01] hopefully only temporary failure [08:56:01] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) [08:56:15] but "Netbox report puppetdb_physical" alert was complaining [08:58:04] not documented on https://wikitech.wikimedia.org/wiki/Netbox#Report_Conventions [08:59:38] It is, here: https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Active_-%3E_Failed [09:00:06] so right move, marostegui ^ FYI [09:05:01] Just got paged for labsdb1011. Is that expected? [09:05:11] paged? [09:05:13] I downtimed it [09:05:19] Ah right, but the reimage drops it from icinga [09:05:21] sorry [09:05:36] Going to disable notifications for it [09:05:38] sorry bstorm_ [09:05:40] Oh whew! [09:05:55] OK back to bed thx [09:06:37] sleep well and sorry! [09:15:28] 10DBA, 10Operations, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) Projects currently still using the "simplelamp" class: Project: glampipe All project instances Project: gratitude... [09:16:45] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) 05Stalled→03Open [09:17:45] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) [09:19:45] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) [09:20:40] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) [09:27:16] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) labsdb1012 replication positions: https://phabricator.wikimedia.org/P11040 [09:29:25] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) [09:32:30] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jynus@cumin2001 for hosts: `es2001.codfw.wmnet` - es2001.codfw.wmnet (**PASS**) - Downtimed host on Icinga - Found physical... [09:33:18] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jynus@cumin2001 for hosts: `es2002.codfw.wmnet` - es2002.codfw.wmnet (**PASS**) - Downtimed host on Icinga - Found physical... [09:34:06] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jynus@cumin2001 for hosts: `es2003.codfw.wmnet` - es2003.codfw.wmnet (**PASS**) - Downtimed host on Icinga - Found physical... [09:34:58] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jynus@cumin2001 for hosts: `es2004.codfw.wmnet` - es2004.codfw.wmnet (**PASS**) - Downtimed host on Icinga - Found physical... [09:35:47] 10DBA, 10Patch-For-Review: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) [09:39:33] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10mariadb-optimizer-bug: SELECT /* Title::getFirstRevision */ sometimes using page_user_timestamp index instead of page_timestamp - https://phabricator.wikimedia.org/T236376 (10daniel) >>! In T236376#6083347, @Marostegui wrote: > It is still an issu... [09:40:11] 10DBA, 10Core Platform Team Workboards (Clinic Duty Team), 10mariadb-optimizer-bug: SELECT /* Title::getFirstRevision */ sometimes using page_user_timestamp index instead of page_timestamp - https://phabricator.wikimedia.org/T236376 (10Marostegui) Let's wait for now I would say - I will try to upgrade a cou... [09:55:45] 10DBA: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) a:05jcrespo→03Papaul [09:57:30] 10DBA: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) One year later, this is ready for full decommissioning. Note: @Papaul These hosts used to contain all Wiki content, so disk should be wiped. Some, at least 2, may have failed already and were never replac... [09:59:04] 10DBA: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) [10:02:31] jynus: meeting? [10:02:45] on it [10:02:48] camara issues [10:15:55] no, can't be, your camera has always been the one with best quality in all our meetings! [10:50:24] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) [10:51:46] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) s1 eqiad [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1139 [] db1134 [] db1124 [] db1119 [] db1118 [] db1107 [] db1106 [] db1105 [] db1099 [x] db108... [11:11:08] 10DBA: Move more wikis from s3 to s5 - https://phabricator.wikimedia.org/T226950 (10Zoranzoki21) >>! In T226950#6083361, @Marostegui wrote: >>>! In T226950#6083247, @Zoranzoki21 wrote: >> What is the situation today? > > For logingwiki? > > Top used tables: > ` > -rw-rw---- 1 mysql mysql 2.3G Apr 27 05:35 spoo... [11:11:32] just commenting it here for I think the innodb purge issues might be specific to whichever hosts serves the analytics part (or quarry), as labsdb1010 which is serving it while labsdb1011 is depooled is showing it: https://grafana.wikimedia.org/d/000000273/mysql?panelId=11&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=labsdb1010&var-port=9104&from=now-7d&to=now [11:12:16] 10DBA: Move more wikis from s3 to s5 - https://phabricator.wikimedia.org/T226950 (10Marostegui) We haven't moved any more s3 wikis to s5. It is a complex procedure and it is normally best tried when we have a DC switchover scheduled. [11:12:17] jynus: ^ [11:12:27] the labsdb1011, not the s3 to s5 [11:12:37] Maybe quarry is causing those things [11:12:56] Anyways, I am going for lunch [11:15:55] 10DBA: Move more wikis from s3 to s5 - https://phabricator.wikimedia.org/T226950 (10Zoranzoki21) >>! In T226950#6084473, @Marostegui wrote: > We haven't moved any more s3 wikis to s5. > It is a complex procedure and it is normally best tried when we have a DC switchover scheduled. Yes, I know that it is compli... [11:37:41] 10DBA: Move more wikis from s3 to s5 - https://phabricator.wikimedia.org/T226950 (10Aklapper) What does "What is with..." mean exactly? What's your actual underlying question / intention? (This task is about moving from `s3` to `s5`, not `s2`. For general info, please see https://wikitech.wikimedia.org/wiki/Mari... [11:42:11] marostegui: aren't we supposed to drop some tables? :D [12:09:19] we areª! [12:11:07] Amir1: I assume the ones in commons and testcommonswiki (empty) can be dropped from the wikireplicas right? [12:11:20] Amir1: Only testwikidata and wikidata ones need to stay in the replicas? [12:11:25] (wikireplicas) [12:12:04] Yeah, by wikireplica you mean labs? [12:12:15] I'm confused by the terms [12:12:30] labs doesn't exist [12:12:55] Amir1: yes, labsdb hosts [12:13:09] https://wikitech.wikimedia.org/wiki/Help:Labs_labs_labs [12:13:19] I'm from an old time where we had toolserver and everything was simple [12:13:50] if those hosts were created now, they would be called clouddb s [12:13:57] Amir1: so green light to proceed on commons and testwikis all the way from the master? they are empty anyways [12:14:04] Yup [12:14:09] ok [12:14:11] No one should use them [12:14:18] I will let you do the first drop in an s8 host though :) [12:14:25] Yesss [12:14:27] ha [12:14:30] Let me know [12:17:20] Honorary drop 😆🧡🤩❤😍💛🌈🏝⭐🗿🛠🎉🎁♥️🎆 [12:17:35] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) Table gone in s4 (commonswiki and testcommonswiki): ` root@cumin1001:/home/marostegui# ./section s4 | grep -v labs | while read host port;... [12:17:51] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) [12:21:24] Amir1: want to go for db1104? that host is depooled [12:21:35] Amir1: let me know how long it takes (you might want to run it from a screen) [12:22:10] sure [12:28:09] https://www.irccloud.com/pastebin/g0fBySEq/ [12:28:17] I think it's becasue it's depooled [12:28:22] I do it manually [12:30:43] The MariaDB server is running with the --read-only option so it cannot execute this statement [12:30:45] hmm [12:31:18] ah no [12:31:22] it is because you are not root :) [12:31:26] sorry! [12:31:38] :( [12:31:41] It's fine :D [12:31:46] you drop, I watch [12:31:48] haha [12:31:57] I will drop codfw first then [12:33:24] no super_read_only on mariadb 10.4 :-( https://jira.mariadb.org/browse/MDEV-9458 [12:33:41] https://phabricator.wikimedia.org/P11042 [12:34:15] jynus: and they don't even answer when :( [12:34:37] 20 seconds, I am going to guess more on production hosts, and probably causing extra contention? [12:34:43] btw https://dbtree.wikimedia.org/ is empty [12:34:55] Amir1: WFM [12:34:56] jynus: It will only be done with replication on codfw [12:35:05] no, I mean on eqiad [12:35:20] jynus: yes, that's why we'll depool hosts first [12:35:28] Amir1: with empty you mean blank page? [12:35:30] jynus: suddenly got fixed [12:35:40] jynus: yeah, it was blank. since Friday [12:35:40] yeah, that's "normal" [12:36:04] hopefully will be fixed soon [12:36:41] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) [12:37:18] https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&from=1587990245899&to=1587990997723&fullscreen&panelId=12&var-server=db2079&var-datasource=codfw%20prometheus%2Fops&var-cluster=mysql the storage usage in codfw master went from 50% to 36% [12:37:56] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) [12:37:57] performance should go up too [12:38:07] Amir1: yeah, check the paste above :) [12:38:58] \o/ [12:39:15] labsdb and dbstore will be excluded [12:39:17] in s8 [12:39:25] in s3 it can be dropped as far as I know, can you confirm? [12:39:35] testwikidatawiki? [12:39:37] Sure [12:39:39] yep [12:40:05] ok, dropping it on testwikidatawiki everywhere then [12:42:53] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) Table gone in s3 (testwikidatawiki): ` root@cumin1001:/home/marostegui# ./section s3 | grep -v labs | while read host port; do echo "$host... [12:43:15] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) [12:46:07] Amir1: you've got any approximate idea when we will be able drop the table from labsdb hosts and dbstore? [12:46:13] just curious [12:46:59] in one or two weeks [12:47:10] it should not be a big issue [12:47:14] nice! [12:49:00] I ask our PM and give you a date [12:49:15] sure, no rush [12:49:17] just curious [12:49:47] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) [12:54:47] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) The effect of dropping the table on s8 (wikidatawiki) master: {P11042} [12:56:17] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) [12:56:21] https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&from=1587984400093&to=1587992133132&fullscreen&panelId=12&var-server=db1104&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql hmm, we had 2% drop this morning, was it the img_deleted? [12:56:43] Amir1: no, that's because I am compressing that host [12:56:50] Which was the last one (apart from the master) to get innodb compressed [12:56:58] ooooh, compress all the things [12:57:01] yep [12:57:09] but check the last drop, AMAZING [12:57:49] YES [12:57:53] I know [13:18:48] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) [13:35:48] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) s7 eqiad [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1136 [] db1125 [x] db1116 [x] db1101 [x] db1098 [x] db1094 [x] db1090 [] db1086 [] db1079 [13:46:39] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) [13:54:11] didn't we cancel T250423 ? [13:54:34] in exchange for extra db hosts [13:54:48] yes [13:54:49] or am I confusing that with something else [13:54:57] I wasn't even aware of that task [13:55:02] me neither [13:55:04] We are not tagged [13:55:12] We did cancel it, yes [13:55:12] should I ping them about hat? [13:55:19] just double checking [13:55:20] sure, can you CC me? [13:55:28] We cancel it with m4rk [13:55:32] cool [13:55:33] Not sure if it made it to rob [13:55:41] spreedsheet :-D [13:55:57] haha [13:56:52] do we have and sdc/s4 purchase ticket? [13:56:54] *an [13:57:05] yes, let me look for it [13:57:37] T246007 and T245137 [13:57:50] thanks [13:57:53] commenting [13:57:57] ta [13:58:04] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) [14:02:13] 10DBA: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 (10Marostegui) [14:09:08] 10DBA, 10Schema-change: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 (10Marostegui) s3 eqiad [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004 [] db1124 [] db1123 [] db1112 [] db1095 [] db1078 [] db1075 [15:06:43] 10DBA, 10Dumps-Generation, 10MediaWiki-extensions-CodeReview, 10Security-Team: Publish SQL dumps of CodeReview tables - https://phabricator.wikimedia.org/T243055 (10Reedy) >>! In T243055#6074582, @Marostegui wrote: > I guess this needs a review from #security-team to double check if it needs redaction or i... [15:09:32] 10DBA, 10Privacy Engineering, 10Security-Team: Drop (and archive?) aft_feedback - https://phabricator.wikimedia.org/T250715 (10JFishback_WMF) [15:09:58] 10DBA, 10Privacy Engineering, 10Security-Team: Drop (and archive?) aft_feedback - https://phabricator.wikimedia.org/T250715 (10JFishback_WMF) a:03JFishback_WMF [15:12:41] 10DBA, 10Operations: Upgrade and restart s5 and s6 primary DB master - https://phabricator.wikimedia.org/T251154 (10Marostegui) [15:12:54] 10DBA, 10Operations: Upgrade and restart s5 and s6 primary DB master - https://phabricator.wikimedia.org/T251154 (10Marostegui) [15:13:01] 10DBA, 10Operations: Upgrade and restart s5 and s6 primary DB master - https://phabricator.wikimedia.org/T251154 (10Marostegui) p:05Triage→03Medium [15:19:30] 10DBA, 10Operations: Upgrade and restart s3 and s7 primary DB master - https://phabricator.wikimedia.org/T251158 (10Marostegui) [15:19:44] 10DBA, 10Operations: Upgrade and restart s3 and s7 primary DB master - https://phabricator.wikimedia.org/T251158 (10Marostegui) p:05Triage→03Medium [15:20:40] 10DBA, 10Operations: Upgrade and restart s5 and s6 primary DB master - https://phabricator.wikimedia.org/T251154 (10Marostegui) [15:22:51] 10DBA, 10Dumps-Generation, 10MediaWiki-extensions-CodeReview, 10Security-Team: Publish SQL dumps of CodeReview tables - https://phabricator.wikimedia.org/T243055 (10Marostegui) @jcrespo do you want to take over this? [15:49:26] 10DBA, 10Dumps-Generation, 10MediaWiki-extensions-CodeReview, 10Security-Team: Publish SQL dumps of CodeReview tables - https://phabricator.wikimedia.org/T243055 (10jcrespo) a:03jcrespo [15:51:03] 10DBA, 10Dumps-Generation, 10MediaWiki-extensions-CodeReview, 10Security-Team: Publish SQL dumps of CodeReview tables - https://phabricator.wikimedia.org/T243055 (10jcrespo) @ArielGlenn I will be doing this, but lets find some time tomorrow (or other day) to double check what I will be doing. [17:13:29] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) The copy has finished, so far mysql started with 0 errors as opposed to when the host had the previous 10.1 data (and the huge purge pending). I am going to run mysql_upgrade and do a few more tests before con... [17:54:37] 10DBA: Remove deprecated status options from grafana in mariadb 10.4 - https://phabricator.wikimedia.org/T244696 (10Marostegui) Some of these will be back in 10.5 (like innodb_history_list_length and the ibuf*) ones: https://mariadb.com/kb/en/innodb-status-variables/#innodb_ibuf_merged_deletes https://jira.maria... [18:00:48] 10DBA: Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) I have stopped and started the server a couple of times with no errors. Replication is configured already and the host is catching up. Tomorrow morning I will enable `innodb_purge_threads = 10` and start/stop... [18:30:24] I must remember to say that maroste_gui needs https://phabricator.wikimedia.org/badges/view/14/ after that nerd snipe of Amir_1 the other day :P [19:41:04] 10Blocked-on-schema-change, 10DBA: ipb_address_unique has an extra column in the code but not in production - https://phabricator.wikimedia.org/T251188 (10Ladsgroup) [19:57:09] 10DBA: FlaggedRevs has lots of database drifts but only in s1 and s5 - https://phabricator.wikimedia.org/T251191 (10Ladsgroup) [20:03:18] 10DBA: FlaggedRevs has lots of database drifts but only in s1 and s5 - https://phabricator.wikimedia.org/T251191 (10Ladsgroup) Also rEFLR1bbd5fa1cd681ee0221e45945234476826f142ed is not deployed either. How this thing hasn't exploded yet. [20:19:38] 10DBA, 10MediaWiki-extensions-FlaggedRevs: FlaggedRevs has lots of database drifts but only in s1 and s5 - https://phabricator.wikimedia.org/T251191 (10DannyS712) [21:19:35] 10DBA: FlaggedRevs has lots of database drifts but only in s1 and s5 - https://phabricator.wikimedia.org/T251191 (10Aklapper) @DannyS712: There seems to be nothing to do in the FlaggedRevs code base, hence removing tag.