[00:12:06] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:28:22] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:33:04] RECOVERY - MariaDB sustained replica lag on db2132 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:47:36] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [01:04:38] RECOVERY - MariaDB sustained replica lag on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [01:06:23] 10DBA, 10Release-Engineering-Team, 10Continuous-Integration-Config, 10Patch-For-Review, 10User-Kormat: Run wmfmariadbpy integration test suite on CI - https://phabricator.wikimedia.org/T261098 (10thcipriani) [04:05:46] 10DBA, 10Release-Engineering-Team, 10Epic: Implement a system to automatically deploy schema changes without needing DBA intervention - https://phabricator.wikimedia.org/T121857 (10thcipriani) [05:44:10] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10Marostegui) [05:44:14] 10DBA, 10Patch-For-Review: Migrate codfw sanitarium hosts (db2094/db2095) to Buster and 10.4 - https://phabricator.wikimedia.org/T275112 (10Marostegui) 05Open→03Resolved db2095 is fixed [05:47:58] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Marostegui) >>! In T279053#6971658, @kostajh wrote: > @Marostegui it’s a one off. You can see the alter statements we want to run here https://github... [05:49:21] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [05:52:01] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2105.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202104200551_marostegui_... [05:52:08] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2073.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202104200552_marostegui_... [05:55:17] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [05:56:09] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) >>! In T280492#7018422, @ops-monitoring-bot wrote: > Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: > ` > ['db2105.codfw.wmnet'] > ` > The log can be found i... [05:57:07] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2073.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202104200556_marostegui_... [05:57:33] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) >>! In T280492#7018428, @Marostegui wrote: >>>! In T280492#7018422, @ops-monitoring-bot wrote: >> Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: >> ` >> ['db... [05:58:06] 10DBA: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2074.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202104200557_marostegui_... [06:19:06] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2127.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [06:21:29] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2074.codfw.wmnet'] ` and were **ALL** successful. [06:23:29] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2073.codfw.wmnet'] ` and were **ALL** successful. [06:42:46] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2105.codfw.wmnet'] ` and were **ALL** successful. [06:44:56] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) I reimaged by mistake db2105 which is s3 codfw master and as I made the initial mistake on the list of hosts to reimage, the patch I pushed yesterday included it to be buster... [06:47:52] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2127.codfw.wmnet'] ` and were **ALL** successful. [06:50:10] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2074.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [06:52:05] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Checking tables on: db2073, db2074, db2105, db2127 [07:08:09] PROBLEM - MariaDB sustained replica lag on db2133 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2133&var-port=9104 [07:09:25] RECOVERY - MariaDB sustained replica lag on db2133 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2133&var-port=9104 [07:13:23] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2074.codfw.wmnet'] ` and were **ALL** successful. [07:22:26] 10DBA, 10WMDE-Analytics-Engineering, 10Wikidata, 10Wikidata-Campsite, and 5 others: [Story] Monitor size of some Wikidata database tables - https://phabricator.wikimedia.org/T68025 (10Addshore) This is going to be looked at by a focus group of the #wikidata-campsite starting soon [07:26:50] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10Marostegui) >>! In T280042#7007224, @gmodena wrote: > Hey @Marostegui, > > Thanks for detailed reply and constructive feedback. > >> - Is MySQL the best place to store this materialized data? Have you considered hadoop p... [07:29:36] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10kostajh) >>! In T279053#7018419, @Marostegui wrote: >>>! In T279053#6971658, @kostajh wrote: >> @Marostegui it’s a one off. You can see the alter sta... [07:32:30] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Marostegui) >>! In T279053#7018577, @kostajh wrote: >>>! In T279053#7018419, @Marostegui wrote: >>>>! In T279053#6971658, @kostajh wrote: >>> @Marost... [07:52:57] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2128.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [08:21:55] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2128.codfw.wmnet'] ` and were **ALL** successful. [08:22:40] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Checking tables on db2128 [09:23:22] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10kostajh) >>! In T279053#7018584, @Marostegui wrote: >>>! In T279053#7018577, @kostajh wrote: >>>>! In T279053#7018419, @Marostegui wrote: >>>>>! In T2... [09:32:00] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Marostegui) Can we remove the alter grant or do you think you'll need it at some point in the future? [09:39:27] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10kostajh) >>! In T279053#7019173, @Marostegui wrote: > Can we remove the alter grant or do you think you'll need it at some point in the future? I thi... [09:45:42] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Tgr) If it's not too much trouble to re-grant it every time we need a schema change (which is hopefully zero times, but...), sure. [09:49:39] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Marostegui) I will remove it for now. If we see we start needing it many times we can evaluate leaving it permanently. We are trying to narrow grants... [09:52:49] not sure how relevant patches like https://gerrit.wikimedia.org/r/c/operations/puppet/+/681317 are to you, marostegui I have added you on CC, but let me know if to tune down or up your spam level O:-) [09:54:21] jynus: that's useful, keep in mind that it might take me 3-4 days to actually see it, so CC is a good idea so you aren't blocked on me :) [09:54:57] yeah, I normally CC someone when I thihk it is relevant to make someone know a change happened, but not asking for a review [09:55:24] for things like 681103 I definitely want your ok [09:55:43] but in any case, I never expect quick response! [09:56:41] just feel free to tell me if I am generating too much noise for you at times [09:59:03] haha thanks - will do :) [10:22:19] 10DBA, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10jcrespo) >>! In T280492#7018497, @Marostegui wrote: time, @jcrespo, would you have time to reimage db2098 to Buster today or tomorrow? Not sure I can reimage db2098, but I certainly can... [10:23:48] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10jcrespo) Adding a tag or I won't be able to find this task later. [10:26:09] 10Data-Persistence-Backup: export_smart_data_dump.service failed on dbprov2001 because of a timeout in the raid facter - https://phabricator.wikimedia.org/T271821 (10jcrespo) 05Open→03Declined I am going to close this for inactivity, because I haven't seen it happen again since last reported a few months ago... [10:28:29] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Thanks, that would help! [10:49:44] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint), 10Patch-For-Review: Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Marostegui) Removed from puppet and from the DB. [11:04:10] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10jcrespo) So the current plan is to setup s3 on buster at db2139, move backups to dbprov2003, and then drop the db2098 s3 section. It will take me a bit to bac... [11:04:56] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Thank you, that sounds very good to me. I am still checking tables on all the hosts I upgraded today in s3, so they won't be ready till tomorrow a... [12:40:21] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [12:43:14] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [12:47:35] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2076.codfw.wmnet'] ` The log can be found in... [13:12:55] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2076.codfw.wmnet'] ` and were **ALL** successful. [13:53:22] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) tables being checked on db2076 [15:02:40] backups of s4's commonswiki.image are starting to get a bit silly [15:03:36] commonswiki.image.sql.gz | 258545284318 [15:04:15] ~14 hours to backup serially, much more to recover [15:04:46] PROBLEM - MariaDB sustained replica lag on db2076 is CRITICAL: 20 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [15:09:28] RECOVERY - MariaDB sustained replica lag on db2076 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [15:47:49] would a DBA suggesting updating mysql-server or mysql-client first I'm updating servers and they'd be running for a short while two different MariaDB versions [16:04:29] I imagine it won't matter too much. I don't think the client/server api changes much, so shouldn't break [16:11:45] I was guessing not but wondered if anyone had advice [16:11:57] It's my first time having to think about it [16:33:37] PROBLEM - MariaDB sustained replica lag on db2076 is CRITICAL: 51.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [16:35:55] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Rileych) [16:36:01] RECOVERY - MariaDB sustained replica lag on db2076 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [16:44:03] 10DBA, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), and 2 others: Add Link engineering: Provide a mechanism for storing data about which link recommendations were rejected by the user - https://phabricator.wikimedia.org/T266446 (10Rileych) [16:56:57] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10jcrespo) I finished setting up db2139 with and s3 instance on buster- as soon as I merge the above patch (https://gerrit.wikimedia.org/r/681439) db2139:s3 wil... [17:31:14] PROBLEM - MariaDB sustained replica lag on db2076 is CRITICAL: 215 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [17:42:20] RECOVERY - MariaDB sustained replica lag on db2076 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [18:10:49] PROBLEM - MariaDB sustained replica lag on db2076 is CRITICAL: 1136 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [18:42:27] RECOVERY - MariaDB sustained replica lag on db2076 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [18:52:37] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [18:56:16] PROBLEM - MariaDB sustained replica lag on db2076 is CRITICAL: 5.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [18:57:24] RECOVERY - MariaDB sustained replica lag on db2076 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [19:21:38] PROBLEM - MariaDB sustained replica lag on db2076 is CRITICAL: 129.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [19:41:50] RECOVERY - MariaDB sustained replica lag on db2076 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [20:15:56] PROBLEM - MariaDB sustained replica lag on db2076 is CRITICAL: 24.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [20:28:27] RECOVERY - MariaDB sustained replica lag on db2076 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104 [21:16:39] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10Eevans) >>! In T280042#7007224, @gmodena wrote: > Hey @Marostegui, > > Thanks for detailed reply and constructive feedback. > >> - Is MySQL the best place to store this materialized data? Have you considered hadoop perha... [23:30:01] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Urbanecm_WMF) 05Open→03Resolved Table should be l...