[06:26:25] <marostegui>	 federico3: the progress on https://phabricator.wikimedia.org/T399728 is real? because the work on https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance says something different, and I need to know if I can run my schema change on s1 or s8
[06:27:05] <marostegui>	 By the way, that change is also safe to run on masters without any switchovers (so once all the replicas are done you can use --dc-masters $DATACENTER)
[06:29:23] <federico3>	 marostegui: I'm updating the progress on the task async after the run. The data on https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance is generated by the auto schema script
[06:30:51] <federico3>	 BTW s3 on codfw is done, can i start s4?
[06:41:20] <federico3>	 also I'm aware of --dc-masters but I thought of starting with the replicas as they are the biggest/slowest changes to rollout and less risky
[08:14:17] <marostegui>	 federico3: Yes, that is fine, also --dc-masters will ONLY do dc masters, so always start with the replicas
[08:14:37] <marostegui>	 What I normally do is: I do the replicas of a section and then --dc maters so the full section is fully finished, but that's how I approach it
[08:15:44] <federico3>	 ok I can add it to the script right now
[08:15:56] <marostegui>	 To which script?
[08:16:32] <federico3>	 the helper here https://gitlab.wikimedia.org/repos/sre/schema-changes/-/merge_requests/42
[08:16:51] <marostegui>	 But this is going to be a full wrapper of the schema change
[08:16:51] <federico3>	 also you want the task to reflect the current/ongoing operation?
[08:17:48] <marostegui>	 You mean add it to def write_summary right?
[08:17:56] <marostegui>	 Not to change the logic of the schema change to perform things etc right?
[08:18:01] <marostegui>	 Like a purely reporting thing, am I right?
[08:19:56] <federico3>	 I mean 1) add the --dc-masters step here https://gitlab.wikimedia.org/repos/sre/schema-changes/-/merge_requests/42/diffs#7329d389feef6faed22f45ef93afd8d94da66ec0_0_45 after the schema change on the replicas is completed
[08:19:56] <federico3>	 2) tweak write_summary https://gitlab.wikimedia.org/repos/sre/schema-changes/-/merge_requests/42/diffs#7329d389feef6faed22f45ef93afd8d94da66ec0_0_118 to also flag "(ongoing)" to report it into phabricator
[08:20:55] <federico3>	 (auto_schema and the schema-change scripts remain unchanged)
[08:22:15] <marostegui>	 I feel we are almost creating a total wrapper of the auto schema
[08:22:38] <federico3>	 yes, that's the idea
[08:22:42] <marostegui>	 Who's idea?
[08:22:45] <marostegui>	 whose
[08:22:50] <federico3>	 mine :D 
[08:23:08] <marostegui>	 Right, but that needs some discussion because we are changing the logic of how we run schema changes
[08:23:12] <federico3>	 so we can run it over section one at the time and get the reports out
[08:24:08] <marostegui>	 Sure, but there are some stuff that we need to double check (eg: dc-masters could be critical enough that we need to discuss whether we want it to be fully automated, even with an optional --dc-master argument)
[08:26:21] <federico3>	 maybe I should describe better what it does: I was instructed to apply schema changes one section at a time and one  dc at a time after asking for confirmation at each section-dc, and update the task as I go: the script is implementing the same manual work that I've been doing. It is still asking the user for manual confirmation before doing each step
[08:29:41] <federico3>	 to clarify: the script is not taking decisions by itself on when to run - if you want we can schedule a meeting and look at it together: I think often even a 20 minute meeting looking at the same codebase together could be more efficient than discussing on the PR and IRC
[08:41:15] <marostegui>	 sure, we can talk about it next week, no problem
[09:13:58] <federico3>	 marostegui: can I start the schema change in s4?
[09:14:04] <marostegui>	 yep
[09:14:50] <federico3>	 ok thanks
[22:21:48] <jinxer-wm>	 FIRING: [8x] MysqlReplicationLagPtHeartbeat: MySQL instance db2186:9104 has too large replication lag (11m 24s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica  - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat
[22:25:52] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on x1 on db2215 is CRITICAL: 628 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2215&var-port=9104
[22:26:48] <jinxer-wm>	 RESOLVED: [8x] MysqlReplicationLagPtHeartbeat: MySQL instance db2186:9104 has too large replication lag (15m 24s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica  - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat
[22:26:54] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on x1 on db2191 is CRITICAL: 476.5 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2191&var-port=9104
[22:27:26] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on x1 on db2196 is CRITICAL: 51 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2196&var-port=9104
[22:27:54] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on x1 on db2191 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2191&var-port=9104
[22:29:52] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on x1 on db2215 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2215&var-port=9104
[22:30:28] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on x1 on db2196 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2196&var-port=9104