[05:26:22] 10DBA, 10decommission-hardware: decommission es2012.codfw.wmnet - https://phabricator.wikimedia.org/T263613 (10Marostegui) [05:30:26] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2012.codfw.wmnet - https://phabricator.wikimedia.org/T263613 (10Marostegui) [05:30:34] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2018.codfw.wmnet - https://phabricator.wikimedia.org/T263615 (10Marostegui) [05:41:12] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2012.codfw.wmnet - https://phabricator.wikimedia.org/T263613 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `es2012.codfw.wmnet` - es2012.codfw.wmnet (**PASS**) - Downtimed host on Icinga... [07:29:37] 10DBA, 10Wikidata, 10Wikidata-Campsite: Investigate indexes of wb_changes - https://phabricator.wikimedia.org/T262856 (10Marostegui) So, I have captured a lots of queries involving `wb_changes` and I haven't found any single query that has a crazy query plan as a result of deleting `wb_changes_change_type wb... [07:37:03] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) Looks like it arrived \o/: ` Delivered Wednesday 9/23/2020 at 9:57 am ` [08:06:56] FYI, installing mariadb-10.1 and mariadb-10-3 updates from Stretch/Buster point releases, it's the packaged versions from Debian, so only affects the client side libs and tools [08:07:30] moritzm: thanks for the heads up [08:08:56] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2018.codfw.wmnet - https://phabricator.wikimedia.org/T263615 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by volans@cumin1001 for hosts: `es2018.codfw.wmnet` - es2018.codfw.wmnet (**PASS**) - Downtimed host on Icinga -... [08:11:55] 10DBA, 10Operations, 10decommission-hardware, 10ops-codfw: decommission es2018.codfw.wmnet - https://phabricator.wikimedia.org/T263615 (10Marostegui) a:05Marostegui→03Papaul Ready for #dc-ops [08:42:15] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) @Papaul can you coordinate with @Kormat for this? I will be off from today's evening till Monday, so if you need something from us... [08:55:37] marostegui: i just spent 3 minutes trying to figure out why db2125 wasn't showing up in https://noc.wikimedia.org/dbconfig/codfw.json :) [08:57:07] haha [09:38:08] I am running a full backup test right now (weird hours) [09:38:31] you may see the backup replicas lagging, that's normal (alerts are disabled) [11:31:17] 10DBA: Failover s6 master, db1093 to db1131 - https://phabricator.wikimedia.org/T263227 (10Marostegui) Steps and checklist: **Preparation** NEW master: db1131 OLD master: db1093 [] Check configuration differences between new and old master ` pt-config-diff h=db1093.eqiad.wmnet,F=/root/.my.cnf h=db1131.eqiad.w... [12:11:57] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Patch-For-Review, and 3 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) [12:26:18] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) I will be on site today the only thing i need for now is depool the server and power it down if it is not done yet. Thanks. [12:32:06] kormat: ^ I will do that [12:32:11] Oh [12:32:14] you did it already [12:32:20] thank you <3 [12:32:55] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Kormat) @Papaul : server is depooled and powered down now. Cheers :) [12:33:38] marostegui: too slow, old man [12:33:50] * kormat refusing to think about the fact that marostegui is younger [12:33:55] XDDDDD [12:37:03] No self-inflicted ageism please, it hits too close to home [12:39:22] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) @Kormat thanks [12:39:56] sobanski: :) [13:57:24] PROBLEM - MariaDB sustained replica lag on db1081 is CRITICAL: 15 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [13:59:04] there was a 5-minute period of lag on db1081, up to 45s max. it's since recovered [14:00:30] RECOVERY - MariaDB sustained replica lag on db1081 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [14:02:39] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Cmjohnson) @Jclark-ctr Have you had any discussion with HPE about this? [14:04:27] the rate of UPDATES went up by 50% on the s4 master in codfw, so i'm guessing this was just catching up to that [14:05:26] Yeah, s4 has been having more activity lately [14:05:36] I checked a few days ago and it looked like batch uploads [14:05:43] maybe a contest is going on or something like that [14:06:06] "New contest: confuse the DBAs!" [14:06:09] And yes, db1081 is probably coping with it: https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=7&orgId=1&from=now-24h&to=now&var-site=codfw&var-group=core&var-shard=s4&var-role=All [14:18:13] 10DBA, 10User-Kormat: Switchover s8 primary database master db1109 -> db1104 - 2020-09-29 08:00 UTC - https://phabricator.wikimedia.org/T239238 (10Kormat) [14:40:54] 10DBA, 10User-Kormat: Switchover s8 primary database master db1109 -> db1104 - 2020-09-29 08:00 UTC - https://phabricator.wikimedia.org/T239238 (10Kormat) Steps and checklist: **Preparation** NEW master: db1104 OLD master: db1109 [] Check configuration differences between new and old master ` pt-config-diff... [14:55:53] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) 05Open→03Stalled s3 is a... [15:02:03] 10DBA, 10Operations, 10netops, 10ops-eqiad, and 3 others: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10ayounsi) [15:05:04] 10DBA, 10Operations, 10netops, 10ops-eqiad, and 3 others: Upgrade eqiad rack D4 to 10G switch - https://phabricator.wikimedia.org/T196487 (10ayounsi) a:05ayounsi→03Cmjohnson [15:18:16] 10DBA, 10Data-Persistence, 10PM: Update the DBA task tracking workflow - https://phabricator.wikimedia.org/T263463 (10LSobanski) I created #data-persistence and #data-persistence-backup projects. [16:01:34] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10daniel) >>! In T238966#6491558, @Maroste... [16:41:04] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) main board replaced and upgrade BIOS and IDRAC on the new board. @Kormat you can repool the server and resolve this task for now when... [19:00:10] 10DBA, 10Growth-Structured-Tasks, 10Growth-Team: Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Tgr) >>! In T261411#6486319, @Marostegui wrote: > What I don't really see is us storing this throwaway data on MySQL. What is...