[11:03:00] marostegui: thanks for the email about wiki replicas. I'm adding a few more details in wikitech. [11:03:21] question: isn't an-redacteddb already "productionized"? see https://wikitech.wikimedia.org/wiki/MariaDB/PII#Notify_Relevant_Teams [11:04:02] if yes, I think you'll need to run check_private_data.py on that server as well? [11:05:51] dhinus: we treat that host as a normal clouddb host yes [11:06:12] I guess it will be covered by the automation, but I will remove that note then [11:06:13] So that doc needs to be updated indeed [11:07:32] another thing: I think it might be easier/cleaner to create a sub-task for WMCS (assigned to me for now, to the team in the future) instead of reassigning the same task [11:07:59] I can create a link that pre-fills the task [11:08:05] what do you think? [11:11:30] To add the views? [11:11:49] That may be more overhead no? Like right now it's just assigning a task [11:14:38] it's definitely quicker to just reassign... but I think it can be confusing to have a task that contains many steps that must be performed by different people [11:14:49] not a big issue, we can keep it like that if you prefer [11:16:15] right now we already have a "create wiki" task (not sure who does it), with a subtask "prepare and check storage layer" (data persistence), so I imagined we could add a third for WMCS [11:20:08] actually it looks like the data-persistence task is created automatically by Maintainance_bot so maybe I can tweak the bot to add two tasks and assign them to the right teams/people [11:30:19] Yeah we create none of those tasks [11:30:26] It's is all done by a script [11:30:40] Maybe we can ask them to create a views task [11:41:23] Creating the wikis doesn't have an owner and currently is being done by volunteers or staff in their volunteer capacity [11:44:47] Amir1: ack. wdyt of this patch? https://gitlab.wikimedia.org/ladsgroup/Phabricator-maintenance-bot/-/merge_requests/18/diffs [11:46:32] dhinus: lgtm. I'll merge it soon [11:46:33] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 56.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [11:47:02] I need to understand what would be the difference with "prepare task" though [11:47:46] sure, I'll try updating the steps in the docs to clarify [11:47:52] (So the plan is to create two tickets now? Sorry I haven't read the discussions in depth) [11:48:23] yes, so right now the "prepare and check storage layer" is handled by data-persistence, then gets re-assigned to WMCS for the last step [11:49:00] my suggestion is to move that "last step" (basically running cookbook sre.wikireplicas.add-wiki) to its own task [11:49:33] Ah cool [11:51:36] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [11:58:17] I've updated https://wikitech.wikimedia.org/wiki/Add_a_wiki#Wiki_Replicas [12:07:37] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 14.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [12:10:39] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 17 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [12:23:39] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [12:35:37] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 13.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [12:46:39] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 11.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [12:54:39] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:14:08] dumps time [13:23:39] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 24.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:24:01] Dumps again :( [13:24:57] btullis: ^ do you think you could find sometime after the holidays to work on https://phabricator.wikimedia.org/T368098#10392790 ? This has been going on for months [13:32:39] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 2.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:54:16] marostegui: Yes, definitely. I will be working extensively on dumps after the holidays, as we have committed to migrate it to Kubernetes by end of March. I will be trying my best to get it to switch sources to the analytics replicas at the same time. [13:54:55] btullis: Great! Let me know if I can help [13:56:52] Cool. Will do. I'll try to update that ticket with some details of the plan, too. [13:58:48] Also dhinus - many thanks for taking care of T381079 for us. [13:58:49] T381079: Prepare and check storage layer for idwikivoyage - https://phabricator.wikimedia.org/T381079 [14:03:35] btullis: we should probably sync after the holidays, I had a chat with marostegui about how to improve the process for new wikis [14:03:39] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 18.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:04:21] tl;dr I will take care of running the "add-wiki" cookbook for clouddb*, at the moment the cookbook also runs on an-redacted so you get that "for free" :) [14:04:42] Fabulous! Thanks. Yes, let's cath up in the New Year. [14:04:44] but maybe we could split it if you prefer to have more control on when views are updated on an-redacted [14:04:45] catch [14:04:54] I'll send an invite for a sync in Jan [14:06:00] No, I think that approach is fine, but I should probably make sure that decision is written down somewhere. Maybe here? https://www.mediawiki.org/wiki/Data_Platform_Engineering/Data_Platform_SRE/Decisions_and_Design_Docs - I'll check with g.ehel. [14:07:58] there's also https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas#Who_admins_what but some lines in that table are not crystal clear [14:08:39] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 3.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:32:39] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 28.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:43:39] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 1 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:57:39] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 27.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [15:18:41] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [15:36:41] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 18.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [15:50:41] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [16:12:41] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 17.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [16:24:41] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [16:51:41] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 12.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [16:54:41] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 4.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [17:20:41] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 37.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [17:33:41] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [17:48:17] non-urgent curiosity question, when resolving the issue with huwiki.recentchanges on db2168 just now, the result of optimize included a message that said: [17:48:17] `Table does not support optimize, doing recreate + analyze instead` [17:49:31] is that an operations that's as safe as a "native" optimize? particularly to perform against a host that is still pooled [17:51:59] swfrench-wmf: What the optimize does is an alter table, which recreates the table. I personally don't run optimize, but alter table blablabla engine=innodb,force; [17:53:15] marostegui: ah, got it - yes, I see this now in the upper right hand corner of the index corruption spreadsheet [18:00:41] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 59.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [18:06:17] one follow-up question that might be handy to add to the response doc (not urgent): if someone feels comfortable enough to attempt a fix via optimize or alter, and it succeeds, should they repool the host? if so, what's the appropriate procedure? [18:06:59] I recall the repool script referenced in [0], but AFAICT that may no longer exist? [18:06:59] [0] https://wikitech.wikimedia.org/wiki/Dbctl#Automatically_slowly_repool_a_host [18:07:38] I didn't have to worry about this this time, since m.arostegui kindly handled the depool/repool :) [18:07:41] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [18:15:19] ah, TIL sre.mysql.pool is a thing [18:15:28] swfrench-wmf: It does exist (it's in my home in cumin1002 or cumin2002). We also have a cookbook for it, but I believe there's still a bug so it is failing sometimes. Not sure what happened with that bug in the end, volans my recall. As I've been away for sometime [18:17:06] got it, thank you! [18:17:42] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 13.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [18:24:06] swfrench-wmf: in any case if you don't upgrade the host you can probably not depool,just fix it and it will catch up with the master when done [18:31:19] marostegui: ah, that's very good to know! I don't have a good mental model of how "quick" the fix is generally expected to be, so I wasn't sure whether one strictly /needs/ to depool first in order to "stop the pain" [18:31:19] (though if replication is borked, I guess clients will start avoiding the host due to lag anyway IIRC) [18:31:48] FIRING: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (3m 37s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [18:31:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (3m 37s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [19:36:48] RESOLVED: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (1m 14s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [19:36:48] RESOLVED: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (1m 15s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [19:56:48] FIRING: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (2m 49s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [19:56:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (2m 49s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [20:06:48] RESOLVED: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (1m 9s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [20:06:48] RESOLVED: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (1m 16s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [20:21:48] FIRING: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (3m 6s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [20:21:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (3m 6s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [20:35:54] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 4.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [20:36:48] RESOLVED: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (1m 12s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [20:36:48] RESOLVED: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (1m 12s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [20:46:54] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 134.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [20:53:48] FIRING: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (4m 23s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [20:53:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (4m 23s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [21:33:48] RESOLVED: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (1m 1s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:33:48] RESOLVED: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (1m 1s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [21:48:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (2m 41s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [21:49:30] hello data-persistence friends - FYI, I've downtimed the `MariaDB Replica Lag: s1` service check on db1206 through Monday morning - tracked in T382625 [21:49:31] T382625: Repeated replication lag pages for db1206 - https://phabricator.wikimedia.org/T382625 [21:49:48] FIRING: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (2m 59s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [22:28:48] RESOLVED: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (1m 16s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [22:29:48] RESOLVED: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (1m 7s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [22:33:54] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [22:45:54] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 25 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [23:00:56] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [23:19:56] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 114.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [23:27:48] FIRING: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (1m 54s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [23:27:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (1m 55s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [23:32:48] RESOLVED: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (1m 2s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [23:32:48] RESOLVED: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (1m 2s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [23:49:56] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 2.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104