[05:06:08] 10DBA, 10Operations, 10ops-eqiad: SMART alerts on db1069 - https://phabricator.wikimedia.org/T222507 (10Marostegui) Thanks @jijiki for creating the task. We are no longer creating tasks for predictive failures, we let them fail so the task gets created automatically. We track the predictive failures at {T208... [05:06:58] 10DBA, 10Operations, 10ops-eqiad: SMART alerts on db1069 - https://phabricator.wikimedia.org/T222507 (10Marostegui) [05:07:01] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [05:08:34] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [05:09:50] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) >>! In T208323#5158076, @jcrespo wrote: > T222526 db2049 (again?) You might be confused with db2047, I don't recall db2049 having a disk replaced lately [05:26:45] 10DBA, 10MediaWiki-API, 10Patch-For-Review: Slow query "ApiQueryLogEvents::execute" after actor rollout - https://phabricator.wikimedia.org/T220999 (10Marostegui) a:03Anomie Closing this as resolved as this patch fixed it (same as T221511#5153193). There are also no more reports on Tendril [05:26:51] 10DBA, 10MediaWiki-API, 10Patch-For-Review: Slow query "ApiQueryLogEvents::execute" after actor rollout - https://phabricator.wikimedia.org/T220999 (10Marostegui) 05Open→03Resolved [05:32:19] 10DBA, 10Patch-For-Review: questions about standalone wmf-mariadb103 - https://phabricator.wikimedia.org/T221463 (10Marostegui) 05Open→03Resolved Resolving this for now - please re-open if needed [06:20:03] 10DBA, 10MediaWiki-Cache, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) I have reserved a window for Tuesday 14th of May to change the second parsercache ke... [07:05:35] 10DBA, 10Goal, 10Patch-For-Review: Decommission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10Marostegui) Would you mind leaving dbstore1001 as the final host? I have emailed Chase and John to sync-up about something temporary being stored on dbstore... [08:09:34] 10DBA, 10Patch-For-Review: BBU issues on codfw - https://phabricator.wikimedia.org/T214264 (10Marostegui) [08:10:11] 10DBA, 10Patch-For-Review: BBU issues on codfw - https://phabricator.wikimedia.org/T214264 (10Marostegui) [08:48:05] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10jcrespo) > You might be confused with db2047, I don't recall db2049 having a disk replaced lately //Marostegui updated the task description. Feb 12 2019, 07:40:// https://phabricator.wikimedia.or... [08:50:09] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) >>! In T208323#5159099, @jcrespo wrote: >> You might be confused with db2047, I don't recall db2049 having a disk replaced lately > > //Marostegui updated the task description. Feb 12... [08:57:37] 10DBA, 10Goal, 10Patch-For-Review: Decommission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10jcrespo) Does T220002#5158901 conflict with setting it as spare? I wanted to set it as spare soon-ish, decom later. [08:58:11] 10DBA, 10Goal, 10Patch-For-Review: Decommission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10Marostegui) >>! In T220002#5159137, @jcrespo wrote: > Does T220002#5158901 conflict with setting it as spare? I wanted to set it as spare soon-ish, decom la... [09:32:57] 10DBA: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) [09:33:18] 10DBA: Decommission es2001, es2002, es2003, es2004 - https://phabricator.wikimedia.org/T222592 (10jcrespo) 05Open→03Stalled Blocked on bacula setup. [09:33:56] 10DBA, 10Goal, 10Patch-For-Review: Decommission dbstore1001, dbstore2001, dbstore2002 - https://phabricator.wikimedia.org/T220002 (10jcrespo) [10:02:06] jynus: meeting? [10:02:42] I'm coming [13:09:36] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1099 memory issues - https://phabricator.wikimedia.org/T221502 (10Marostegui) 05Open→03Resolved This host recovered itself, so closing for now as nothing is to be done. [13:27:07] Looks like 10.1.39 is out [13:27:17] https://mariadb.com/kb/en/library/mariadb-10139-release-notes/ [13:29:12] there is a fix for mariadbackup: Revision #9a8b8ea66b 2019-03-27 14:37:14 +0100 [13:29:12] MDEV-19060 : mariabackup continues, despite failing to open a tablespace [13:29:18] which I don't think we have ever suffered? [13:29:27] And also: MDEV-12711 mariabackup --backup is refused for multi-file system tablespace [13:30:26] There are some more fixes to mariabackup [14:32:41] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (101997kB) @Marostegui Can we start? [14:35:10] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (10Marostegui) Yep [14:52:05] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (101997kB) 05Open→03Resolved a:031997kB [14:54:24] Those doesn't affect us currently [15:31:29] 10DBA, 10Operations, 10ops-codfw, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Marostegui) [15:32:40] 10DBA, 10Operations, 10ops-eqiad, 10Goal, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10RobH) [15:33:00] 10DBA, 10Operations, 10ops-eqiad, 10Goal, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10RobH) a:05RobH→03Marostegui All set! [15:52:07] 10DBA, 10Operations, 10ops-eqiad, 10Goal, and 2 others: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Marostegui) Thanks, now that ^ has been merged I will take over Note: db1127 is still not present on the netboot.cfg because it is not accessible yet via idrac s... [15:54:12] 10DBA, 10Operations, 10ops-eqiad, 10Goal, and 2 others: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Marostegui) [16:13:47] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10RobH) This is only under warranty until later this month, and was brought up in the SRE weekly meeting. This needs to be high priority! Supposedly warra... [16:31:18] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10Cmjohnson) I created a task for this with HPE. Case ID: 5338390467 Case title: Failed BBU Severity 3-Normal Product serial number: MXQ616071T Product... [16:32:46] i opened a task with HPE.....i do have a spare bbu on-site that I can replace it with but I do want to wait until tomorrow in case HP needs something first. [16:35:23] cmjohnson1: sounds good - let us know so we can depool it for you [16:35:25] cmjohnson1: thanks a lot [16:38:38] https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=codfw%20prometheus%2Fops&var-server=db2102&var-port=9104 <--- version [16:39:22] version? [16:41:34] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) @elukey @Ottomata what do you guys want to do with this? [16:42:37] ah [16:42:38] Server version: 10.1.39-MariaDB MariaDB Server [16:42:40] I get it now XD [16:42:41] nice! [16:45:59] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10elukey) @Marostegui sorry I was under the impression that we'd have needed to wait for a feedback from Chris/Rob about how to proc... [16:49:42] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) @elukey sorry, I realised that I didn't sent the first sentence: "The errors corrected themselves and Icinga is now al... [16:50:02] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Cmjohnson) I now have h/w log entries. I will need the server to be taken offline so I can relocate the DIMM and check to see if t... [16:50:44] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) @elukey can you coordinate with Chris? ^ [16:50:56] marostegui i am confused over db1007...is there an issue or not an issue? There is a h/w log entry from 4/29 but nothing since [16:52:02] cmjohnson1: yeah, that is my confusion too, it was a correctable error, which got corrected itself [16:52:08] so not sure if we should go for the dimm exchange or not [16:52:13] cmjohnson1: what do you advise? [16:52:51] let's move the dimm from A3 to B3 and clear the log...if the error returns I will know what to do next. [16:53:01] sounds good [16:53:15] I will paste that on the ticket so luca can coordinate (as he needs to stop the service) [16:54:06] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) ` [18:50:55] marostegui i am confused over db1007...is there an issue or not an issue? There is a h/w l... [17:03:27] aside from the test hosts, I am going to update the root clients and the xtrabackup hosts [17:04:28] talk to me if you plan to provision hosts soon as I may be able to help (with better tooling, WIP) [17:25:13] I will install hosts tomorrow for now only [17:25:24] as in not populating them with data yet [23:16:26] 10DBA, 10Operations, 10ops-codfw, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul)