[04:52:57] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) p:05Triage→03Normal a:03Papaul Can we get the disk replaced? Thanks! [04:57:58] 10DBA: Create a recovery/provisioning script for database binary backups - https://phabricator.wikimedia.org/T219631 (10Marostegui) Leaving this for the record here. Today we had to alerts about disk space on `dbprov1001`: ` [00:47:58] <+icinga-wm> PROBLEM - Disk space on dbprov1001 is CRITICAL: DISK CRITICAL -... [05:04:29] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) 05Open→03Resolved Closing this for now as it self-recovered and never showed up again. [06:04:02] 10DBA, 10Operations: db2084 temporary correctable hardware errors - https://phabricator.wikimedia.org/T225884 (10Marostegui) Some more errors from yesterday evening: ` [Sun Jun 16 21:33:58 2019] {7}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4 [Sun Jun 16 21:33:58 2019] {7}[Hardwa... [06:20:33] 10DBA, 10Operations: db2084 temporary correctable hardware errors - https://phabricator.wikimedia.org/T225884 (10Marostegui) Host rebooted. No new logs on HW side. [07:17:16] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1077 crashed - https://phabricator.wikimedia.org/T225391 (10Marostegui) db1077 has had its BBU in charging status for around 30h now. I have taken a look at the HW logs and: ` /system1/log1/record20 Targets Properties number=20 severity=C... [07:21:33] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1077 crashed - https://phabricator.wikimedia.org/T225391 (10Marostegui) Also db1114 (test-s1) can be a host we can place instead of db1077 and move db1077 to be test-s1? [07:24:24] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1077 crashed - https://phabricator.wikimedia.org/T225391 (10jcrespo) note db1114 was a host we removed from production because it was unstable. I would vote for another. Did you try depooling and forcing a learning cycle? [07:25:44] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1077 crashed - https://phabricator.wikimedia.org/T225391 (10Marostegui) >>! In T225391#5261673, @jcrespo wrote: > note db1114 was a host we removed from production because it was unstable. I would vote for another. Did you try depooling and forcing a... [09:42:51] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) So, 2 of these should go to replace dbproxy1010 and dbproxy1011, right? If so, we can rack 2 them on the same racks as those (C5) and put them on that same VLAN to do a... [09:43:57] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10jcrespo) I don't know if 2 or 3, depending on the needs of the others. There was discussion with cloud if to also put a proxy in front of toolsdb. [09:45:45] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) a:05Cmjohnson→03Marostegui Assigning this to myself to let Chris know that this is still blocked on DBAs to decide. So for now 2 of them will go to replace 1010 and... [09:47:06] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10jcrespo) More like: we need 1 for m5, something else? [09:49:12] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) m5 at the moment doesn't use the proxies (I know it should but they are not being used at the moment) (T202367#5252689) [10:25:26] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [10:26:06] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10jcrespo) Thinking more, as toolsdb was canibalized by openstack, maybe its potential proxies should too. I guess 2/2 is the safe option right now. Sorry, but I didn't think too much... [10:27:47] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) >>! In T225704#5262147, @jcrespo wrote: > Thinking more, as toolsdb was canibalized by openstack, maybe its potential proxies should too. I guess 2/2 is the safe option... [10:31:45] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10jcrespo) > 2/2 meaning 2 for cloud (to replace 1010 and 1011) and 2 for other usages (misc, core..)? Yes. [10:32:22] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) Thanks! I will update the task accordingly to reflect this discussion on top so it is easier for Chris [10:33:15] it took 2 hours just to prepare s3 [10:33:50] in that time I prepared and compressed 2 s8 instances [10:34:01] (of course, with no transactions on the logs) [10:36:24] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) [10:37:15] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) a:05Marostegui→03Cmjohnson @Cmjohnson I have updated the task with the racking proposal at the beginning. Thanks! [10:37:28] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10jcrespo) No name change? I do not mind, just want to make sure it is a conscious decision. [10:37:41] jynus: 2h for s3 is reasonable or is it more than you initially expected? [10:38:18] that is not accounting for transmission or compression [10:38:27] just xtrabackup --prepare [10:38:45] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) >>! In T225704#5262204, @jcrespo wrote: > No name change? I do not mind, just want to make sure it is a conscious decision. I would prefer not to change them for now as... [10:44:03] BTW, did you fix manually something about the backups last week? [10:44:23] I want to know if it worked much better lately or it was you fixing it [10:45:02] nope [10:45:06] I didn't touch anything :) [10:45:11] well, that is good news [10:46:02] 2 failures out of 54 runs only [13:23:11] 10DBA, 10Patch-For-Review: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4] - https://phabricator.wikimedia.org/T202367 (10Marostegui) [16:11:07] 10DBA, 10Operations, 10ops-codfw: db2097 (codfw s1&s6 source backups) mariadb@s6 *process* (10.1.39) crashed on 2019-06-08 - https://phabricator.wikimedia.org/T225378 (10jcrespo) I am going to run a data check, no matter what we do in the end with the instance. [16:52:40] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10bd808) >>! In T225704#5261978, @Marostegui wrote: > @Bstorm @bd808 any comments on T225704#5261972? I think that if we need a proxy in front of ToolsDB we should probably do that w... [17:09:26] marostegui around? [17:10:11] I need to confrim vlan for new dbproxies? i'm assuming private [17:11:40] cmjohnson1: don't know, the same as the dboroxy1010 and 1011. I don't really know much about the vlans details :( [17:12:01] oh..1010 is cloud support [17:12:08] okay....thx [17:13:17] 1010 and 1011 are the ones to replace yeah [18:04:54] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Cmjohnson) [18:11:09] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Cmjohnson) a:05Cmjohnson→03ayounsi Assigning to @ayounsi to add cloud-support1-d-eqiad. Once that is done, the vlan for dbproxy1020 and 1021 will need to be set up. Switch port... [18:11:29] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Cmjohnson) [18:13:22] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) @cmjohnson which ones will go in the cloud vlan finally? 1018 and 1019 or 1020 and 1021? I'm fine either way but I'm confused with your last comment :) [18:18:08] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Cmjohnson) @marostegui: do they all go to the cloud vlan? if they do then 1020 and 1021 are in row D...that support-cloud vlan is not available on row D yet. I need Arzhel to copy... [18:18:11] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10ayounsi) a:05ayounsi→03Cmjohnson The cloud support vlan/network is legacy, so I'd rather not create a new one (in a new row). As we already have cloud-support1-a-eqiad and cloud... [18:19:51] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) Yep! Not a problem, I don't mind which hosts as long as we have two on that VLAN, whichever ones work best for you [18:22:12] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) 1018 and 1019 are ok to go to cloud VLAN from my side (as they are in row C) We just need two hosts on that vlan [18:39:28] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Cmjohnson) @ayounsi I rather not move the servers...I racked them based on the instructions and they're already in racks and setup [19:49:39] 10DBA, 10MediaWiki-Cache, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) pc1008 tables optimization finished: ` root@pc1008:~# df -hT /srv Filesystem Type Size Used A... [20:42:03] 10DBA, 10Operations, 10ops-codfw: db2097 (codfw s1&s6 source backups) mariadb@s6 *process* (10.1.39) crashed on 2019-06-08 - https://phabricator.wikimedia.org/T225378 (10jcrespo) Check finished, no differences found.