[04:10:35] 10DBA, 10Data-Services, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review, and 3 others: Create Wikipedia Kari Seediq - https://phabricator.wikimedia.org/T276246 (10Dcljr) @Urbanecm: please see my previous comment, immediately above this one. [04:56:03] 10DBA, 10Data-Services, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review, and 3 others: Create Wikipedia Kari Seediq - https://phabricator.wikimedia.org/T276246 (10Dcljr) Has "CX Config" (in the post-install checklist) really not been done yet? Is that important? I notice that wiki-creation... [06:33:32] PROBLEM - MariaDB sustained replica lag on db1159 is CRITICAL: 500 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1159&var-port=9104 [06:46:48] RECOVERY - MariaDB sustained replica lag on db1159 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1159&var-port=9104 [11:44:39] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [11:48:47] RECOVERY - MariaDB sustained replica lag on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [12:09:09] 10DBA: m2 codfw master crashed - https://phabricator.wikimedia.org/T272614 (10Kormat) As suggested by @LSobanski, this looks to have been https://jira.mariadb.org/browse/MDEV-22563. [12:09:13] sobanski: you were right :) ^ [12:11:47] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [12:13:57] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [12:57:40] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [12:58:36] kormat, did you see the recently opened pc-related ticket? could explain some of the above? [12:58:56] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [12:59:41] jynus: are you thinking of https://phabricator.wikimedia.org/T267404 ? [13:00:01] there is a new one, let me find it [13:00:23] ah yes, it was the one at the of that T278655 [13:00:23] T278655: Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 [13:00:54] *end [13:04:20] i hadn't seen it, no. i don't think i understand enough to evaluate it, but thanks for the heads-up [13:04:48] yeah, I didn't go deeply into that, but seemed interesting to bring it up [13:04:54] 👍 [14:58:07] lots of errors on es1 since ~14h? https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=10&orgId=1&var-site=eqiad&var-group=core&var-shard=es1&var-role=All&from=1617094681377&to=1617116281377 [15:00:41] they seem to be not webrequest errors, but wmf_slave_wikiuser_sleep runs [15:02:14] no deployments around that time [15:02:37] I think there is no immediate action needed but I will ping service ops [15:05:33] it is mainly es1, but it is afecting almost all sections [15:16:43] oh, it may have started earlier, so it could be a deploy: https://logstash.wikimedia.org/goto/8c440ea2592b0406e4483b1f01345ca9 [15:22:56] 10DBA, 10Growth-Team, 10StructuredDiscussions, 10User-DannyS712: Clean up OneStepUserNameQuery/TwoStepUserNameQuery - https://phabricator.wikimedia.org/T278660 (10EBernhardson) >>! In T278660#6954045, @LSobanski wrote: > @DannyS712 could you provide links to the queries? > > Side note, if we can track dow... [15:32:23] after going the metrics rabbithole, it looks like a jobqueue overload- not something caused by dbs [16:47:50] 10DBA, 10Platform Engineering, 10SRE, 10Wikimedia-Incident: Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10jijiki) p:05Triage→03Medium [18:56:42] 10DBA, 10Platform Engineering, 10SRE, 10Wikimedia-Incident: Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10matmarex) Thanks for the ping, it doesn't seem like there's anything for me or @Esanders to do here at the moment? Let us know if there's some...