[02:32:48] PROBLEM - MariaDB sustained replica lag on db1089 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1089&var-port=9104 [02:39:44] RECOVERY - MariaDB sustained replica lag on db1089 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1089&var-port=9104 [06:26:51] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Marostegui) Could we have an ETA on when this server can be worked on? We were aiming to open a new infrastructure this server is part of to users 1st of Feb [06:37:44] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) x2 cleaned. [06:39:54] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp field of uploadstash - https://phabricator.wikimedia.org/T270055 (10Marostegui) [06:47:55] 10DBA, 10Patch-For-Review: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) [06:48:40] 10DBA, 10Patch-For-Review: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) 05Open→03Resolved This is all done - hosts are ready to start getting data. [07:07:29] 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) Window reserved on the deployments page [07:12:40] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [07:15:12] marostegui: I have a couple more schema changes sending on your way [07:15:14] * Amir1 hides [07:15:50] Amir1: That is good news though! It means we are getting closer to finish the abstract schemas! [07:16:06] yeah, it's 90% done [07:16:12] 89% to be exacty [07:31:21] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) s4 progress: [] labsdb1012.eqiad.wmnet:3306 [] labsdb1011.eqiad.wmnet:3306 [] labsdb1010.eqiad.wmnet:3306 [] labsdb1009.eqiad.wmnet:3306 [x] dbstore1004.eqiad.wmnet:3314 [x] db11... [07:31:36] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) [08:04:52] 10DBA, 10SRE: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [08:15:08] 10DBA, 10Patch-For-Review: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [08:28:10] 10DBA, 10Patch-For-Review: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [09:30:52] 10DBA, 10Patch-For-Review: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) This was also needed: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/658218 (added this to the dbctl documentation so we don't forget about it when adding new external clusters): htt... [14:01:31] can I get a review on https://gerrit.wikimedia.org/r/c/operations/puppet/+/658211? [14:02:47] please give me a few minutes - I am guessing as long as it is done today there is no rush there [14:02:52] no rush [14:02:57] yep! [14:02:59] thanks [14:03:24] but thanks for the heads up because I think I hadn't been notified about that, or I was and didn't notice [14:03:50] :) [16:09:08] <_joe_> marostegui: we really need to get to think about a new parsercache design [16:53:38] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Cmjohnson) I am working on it, I am dependant on Dell. I do need to update all the f/w and idrac today. [18:05:20] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Cmjohnson) I failed to re-connect the mgmt cable after getting it to power on and was not able to remotely access the server to get the logs for the Dell tech. I connected everything, updated the bios and... [18:06:03] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Marostegui) Thanks Chris, any chances that we can get the host to boot up at least so MySQL replication can catch up a bit. Thank you! [18:06:47] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Cmjohnson) @marostegui it should be accessible now [18:07:33] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Cmjohnson) There is just let memory at the moment [18:16:56] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Cmjohnson) Dell ticket number SR1049824647 [18:31:36] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Marostegui) Thanks Chris - I can now access the server and will start mysql so it can catch up on replication!. Let's coordinate to install the new memory once it arrives. Thanks again [19:50:26] 10DBA, 10Platform Engineering Roadmap Decision Making, 10SRE, 10Performance-Team (Radar), 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle)