[04:53:17] 10DBA, 10Epic, 10Tracking-Neverending: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10Marostegui) [04:53:20] 10DBA, 10Beta-Cluster-Infrastructure, 10Reading-Infrastructure-Team-Backlog, 10WikimediaEditorTasks, and 2 others: Drop the `wikimedia_editor_tasks_entity_description_exists` table - https://phabricator.wikimedia.org/T226326 (10Marostegui) [04:56:20] 10DBA, 10Beta-Cluster-Infrastructure, 10Reading-Infrastructure-Team-Backlog, 10WikimediaEditorTasks, and 2 others: Drop the `wikimedia_editor_tasks_entity_description_exists` table - https://phabricator.wikimedia.org/T226326 (10Marostegui) a:03Marostegui So for now I have renamed the table on db1092 and... [05:00:01] 10DBA, 10Beta-Cluster-Infrastructure, 10Reading-Infrastructure-Team-Backlog, 10WikimediaEditorTasks, and 3 others: Drop the `wikimedia_editor_tasks_entity_description_exists` table - https://phabricator.wikimedia.org/T226326 (10Marostegui) Same has been done on testwikidatawiki on s3: ` root@db1123.eqiad.w... [05:23:05] 10DBA, 10Goal, 10Patch-For-Review: Productionize db11[26-38] - https://phabricator.wikimedia.org/T222682 (10Marostegui) [05:23:35] 10DBA, 10Goal, 10Patch-For-Review: Productionize db11[26-38] - https://phabricator.wikimedia.org/T222682 (10Marostegui) [05:41:19] 10DBA, 10Goal, 10Patch-For-Review: Productionize db11[26-38] - https://phabricator.wikimedia.org/T222682 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1135.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201906240541_mar... [05:59:51] 10DBA, 10Goal, 10Patch-For-Review: Productionize db11[26-38] - https://phabricator.wikimedia.org/T222682 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1135.eqiad.wmnet'] ` and were **ALL** successful. [06:37:17] 10DBA, 10Goal: Productionize db11[26-38] - https://phabricator.wikimedia.org/T222682 (10Marostegui) [07:59:25] 10DBA, 10Cognate, 10Growth-Team, 10Language-Team, and 2 others: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC - https://phabricator.wikimedia.org/T226358 (10Marostegui) [07:59:35] 10DBA, 10Cognate, 10Growth-Team, 10Language-Team, and 2 others: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC - https://phabricator.wikimedia.org/T226358 (10Marostegui) p:05Triage→03Normal [08:37:24] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10jcrespo) 05Resolved→03Open {P8645} [08:41:02] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) @Cmjohnson as per the error @jcrespo pasted above is that enough to get Dell to send a new DIMM you think? [08:53:07] what are your thoughts on https://phabricator.wikimedia.org/T225378 ? If there were no issues, I would say close it? But if you want to fully rebuild the host, that is also fine, I don't have strong opinions either way [08:54:22] I would like to work on that, but didn't have yet the time [08:54:27] I have to check backups first [08:54:31] sure! :) [09:01:23] did you restart the host when upgrading the x1 hosts? [09:02:36] you mean restart mysql or reboot the host? [09:02:40] I rebooted them [09:03:03] thanks [09:03:29] normally if there is a new kernel I always reboot them [09:05:22] thanks [09:06:02] backups worked ok except s3 today, which seems to be taking a lot of time [09:06:11] but not yet failed [09:08:19] yeah, the eqiad one will succeed [09:09:12] the codfw one actually worked, but it didn't update the database [09:57:39] jynus: can we delay the meeting, I helping with some unexpected stuff [09:57:52] ok, fixed time or will you ping me? [10:01:45] I will ping you in a sec [13:12:37] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [14:06:25] 10DBA, 10Cognate, 10Growth-Team, 10Language-Team, and 2 others: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC - https://phabricator.wikimedia.org/T226358 (10Marostegui) [14:32:59] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Papaul) a:05Papaul→03Marostegui Disk replaced [14:35:46] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) a:05Marostegui→03Papaul It failed already :( ` physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Failed) ` [15:26:51] marostegui I am working on db1133 ...should have for you today (i hope) [16:00:06] cmjohnson1: you made my day! [16:00:27] dont get to excited...i am having a new issue with it now [16:00:40] good news....raid issue is fixed [16:00:43] hahaha [16:01:02] of course, it sounded too good :p [16:22:00] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10RobH) Are the disks being installed (and then failing) new disks or old decom disks? If new spare disks are failing, we need to return them for replacement. Please let me know! [16:24:22] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Papaul) Those are decom disks and not new disks. We have no more new disks on site. [16:31:38] 10DBA, 10Goal, 10Patch-For-Review: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10jcrespo) I am going to close this as resolved, move the minor pending things to T138562 with lower priority. [16:44:38] 10DBA, 10Goal, 10Patch-For-Review: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10Marostegui) \o/ [17:21:45] marostegui i think you're okay now..i have installed it 2x and rebooted many times...not seeing any errors [17:24:03] yaaaaay [17:24:08] cmjohnson1: thanks! I will take it from there! [17:24:22] I will reimage it tomorrow too, to be fully sure :) [17:57:30] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) 05Open→03Resolved The RAID finished correctly, although the disk came with predictive failure. I am going to close this task as resolved as the ops-monitoring will open a new once once it... [17:58:09] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [23:22:54] 10DBA, 10MediaWiki-API, 10Core Platform Team Backlog (Watching / External), 10Core Platform Team Kanban (Waiting for Review), and 2 others: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077 (10Krinkle) >>! In T149077#3355760, @gerritbo... [23:23:08] 10DBA, 10MediaWiki-API, 10Core Platform Team Backlog (Watching / External), 10Core Platform Team Kanban (Waiting for Review), and 2 others: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077 (10Krinkle) p:05Triage→03Normal