[07:05:55] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission es2017.codfw.wmnet - https://phabricator.wikimedia.org/T264386 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `es2017.codfw.wmnet` - es2017.codfw.wmnet (**PASS**) - Downtimed host on Icinga... [07:07:16] 10DBA, 10decommission-hardware: decommission es2017.codfw.wmnet - https://phabricator.wikimedia.org/T264386 (10Marostegui) [07:07:29] 10DBA, 10decommission-hardware: decommission es2017.codfw.wmnet - https://phabricator.wikimedia.org/T264386 (10Marostegui) @Papaul this is ready for you! [07:10:36] 10DBA, 10decommission-hardware: decommission es2015.codfw.wmnet - https://phabricator.wikimedia.org/T264700 (10Marostegui) [07:13:01] 10DBA, 10Data-Persistence, 10decommission-hardware: decommission es2015.codfw.wmnet - https://phabricator.wikimedia.org/T264700 (10Marostegui) [07:34:36] 10DBA, 10Operations, 10Sustainability (Incident Followup), 10Wikimedia-Incident: S5 replication issue, affecting watchlist and probably recentchanges - https://phabricator.wikimedia.org/T263842 (10Marostegui) IR is at: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200925-s5-replication-lag [07:34:47] 10DBA, 10Operations, 10Sustainability (Incident Followup), 10Wikimedia-Incident: S5 replication issue, affecting watchlist and probably recentchanges - https://phabricator.wikimedia.org/T263842 (10Marostegui) 05Open→03Resolved I am going to close this as resolved as the incident is over and the table w... [08:05:47] 10DBA: Evaluate the impact of changing innodb_change_buffering to inserts - https://phabricator.wikimedia.org/T263443 (10Marostegui) [08:54:53] 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat) [09:14:50] 10DBA: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) [09:14:52] 10DBA: Enable DB replication eqiad -> codfw before the switchover - https://phabricator.wikimedia.org/T243374 (10Marostegui) [09:27:45] 10DBA, 10Operations: Refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [09:56:46] elukey: you ok with https://gerrit.wikimedia.org/r/632451 ? [09:59:53] ouch thanks :( [10:00:05] elukey: thank you!, green light to quickly restart the processes? [10:00:11] +1 [10:00:34] elukey: grazie [10:04:06] elukey: all done [10:05:03] thanks a lot! [10:05:08] :* [13:02:15] 10DBA: Evaluate the impact of changing innodb_change_buffering to inserts - https://phabricator.wikimedia.org/T263443 (10Marostegui) I haven't found anything weird on this so far, so I am going to deploy it to more hosts on s5 and s2 for now after deploying it to a bunch on s6 this morning. [13:05:54] 10DBA: Evaluate the impact of changing innodb_change_buffering to inserts - https://phabricator.wikimedia.org/T263443 (10Marostegui) [14:03:35] 10DBA, 10Data-Persistence, 10Operations: db1076 crashed - BBU failure - https://phabricator.wikimedia.org/T264755 (10Marostegui) [14:07:09] 10DBA, 10Data-Persistence, 10Operations: db1076 crashed - BBU failure - https://phabricator.wikimedia.org/T264755 (10Marostegui) The battery is gone: ` root@db1076:~# hpssacli controller all show detail | grep Battery No-Battery Write Cache: Disabled Battery/Capacitor Count: 0 ` [14:07:42] 10DBA, 10Data-Persistence, 10Operations: db1076 crashed - BBU failure - https://phabricator.wikimedia.org/T264755 (10Marostegui) p:05Triage→03Medium [14:11:57] 10DBA, 10Data-Persistence, 10Operations, 10Patch-For-Review: db1076 crashed - BBU failure - https://phabricator.wikimedia.org/T264755 (10Marostegui) I have rebooted the host to make sure it boots up cleanly and to get it on the newest kernel. Let's leave the controller on `write through` policy and if we s... [14:33:05] 10DBA, 10Data-Persistence, 10Operations, 10Patch-For-Review: db1076 crashed - BBU failure - https://phabricator.wikimedia.org/T264755 (10Marostegui) [14:34:00] 10DBA, 10Data-Persistence, 10Operations, 10Patch-For-Review: db1076 crashed - BBU failure - https://phabricator.wikimedia.org/T264755 (10Marostegui) Comparison between the master and this host started for the following tables: ` actor actor_id archive ar_id change_tag ct_id comment comment_id logging log_i... [16:52:30] 10DBA, 10Epic: All sorts of random drifts in wikis in s3 - https://phabricator.wikimedia.org/T260111 (10Ladsgroup) Non-abstract ones (half of the tables) are clean. Running on abstract tables now. [23:37:20] 10DBA, 10Operations, 10serviceops: Hourly read spikes against s8 resulting in occasional user-visible latency & error spikes - https://phabricator.wikimedia.org/T264821 (10CDanis)