[04:44:40] 10DBA, 10Core Platform Team Legacy (Watching / External), 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), 10Performance: Review special replica partitioning of certain tables by `xx_user` - https://phabricator.wikimedia.org/T223151 (10Marostegui) Thanks -... [05:50:20] 10DBA: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 (10Marostegui) [05:51:16] 10DBA: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 (10Marostegui) [06:17:39] https://jira.mariadb.org/browse/MDEV-20644 [06:18:29] the answer to that last question from that user is clear: money [06:20:05] don't worry, I am sure the foundation will be more reasonable... oh wait [06:20:47] haha [07:22:14] I was reading the (great) DB crash incident report and one actionable came to my mind; okay if I add https://phabricator.wikimedia.org/T233774 there? [07:23:28] moritzm: sure thing! [07:27:31] ack, added to wikitech [07:28:13] thank you [08:17:16] marostegui: thanks for the follow-up on T233766 [08:17:16] T233766: labsdb1011 mariadb crashed - https://phabricator.wikimedia.org/T233766 [08:17:45] np! [10:13:09] 10DBA, 10Operations, 10ops-eqiad, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Marostegui) BBU has arrived to the DC, I am trying to coordinate with @Cmjohnson and @Jclark-ctr to see if we can replace this asap. [12:47:43] 10DBA: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 (10Marostegui) [12:49:27] 10DBA: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 (10Marostegui) db2085 enwiki is done: ` root@cumin1001:/home/marostegui/T233625# mysql.py -hdb2085:3311 enwiki -e "show create table logging\G" *************************** 1. row ***************************... [13:01:57] 10DBA, 10Operations, 10ops-eqiad, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Marostegui) I can see the battery now after @Jclark-ctr has installed the new one: ` Battery/Capacitor Count: 1 Battery/Capacitor Status: OK ` [13:05:57] 10DBA, 10Operations: Switchover s3 primary database master db1075 -> db1123 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 (10Jclark-ctr) [13:06:00] 10DBA, 10Operations, 10ops-eqiad, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Jclark-ctr) 05Open→03Resolved a:05Cmjohnson→03Jclark-ctr replaced battery. resolving ticket [13:53:55] 10DBA, 10Operations, 10ops-eqiad, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Marostegui) db1075 is now fully pooled back. Thanks John! [14:01:41] 10DBA: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 (10Marostegui) [15:28:21] 10DBA, 10Operations, 10ops-eqiad: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Cmjohnson) we're on the schedule to get the board swapped for 9/26 [15:30:23] 10DBA, 10Operations, 10ops-eqiad: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) Cool, I will have the host down for you tomorrow. Thanks for the heads up [15:45:01] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299 (10Bstorm) [17:32:44] hello! is there any maintenance going on with the replica dbs? starting about 19 hours ago, I'm seeing a lot of errors from my tools about queries timing out. These are queries that shouldn't time out [17:34:26] I guess there's quite a bit of lag, too https://tools.wmflabs.org/replag/ [17:34:55] musikanimal: there was T233766 but it should be fixed [17:34:55] T233766: labsdb1011 mariadb crashed - https://phabricator.wikimedia.org/T233766 [17:35:34] please comment there any issue you have after restarting your connections [17:36:59] (I am going to assume a secondary overload for temporary limited resources, that will go away soon) [17:37:48] jynus: so the current replication lag is probably because of that? [17:38:36] I don't know honestly, please comment what you are seeing and someome may help you [17:38:43] sure thing, thanks [17:38:43] Thanks for looking into labsdb1011! I appreciate it. [17:40:14] musikanimal: yes, it is most likely because of the overload of labsdb1010 handling all the load [17:40:59] okee doke :) If the issue persists I'll comment on that task. Thank you! [17:44:09] musikanimal: please do, yeah [17:45:57] 10DBA, 10Operations, 10ops-eqiad: db1074 crashed: Broken BBU - https://phabricator.wikimedia.org/T231638 (10jcrespo) Reminder to move sanitarium (T231638#5453802) back here (or somewhere else on eqiad) before closing this ticket. [18:13:54] 10DBA, 10Data-Services, 10Operations, 10Patch-For-Review: Prepare and check storage layer for nqowiki - https://phabricator.wikimedia.org/T230543 (10Ladsgroup) @Marostegui The wiki is up, please do what needs to be done 🔨