[00:11:33] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:29:54] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [00:44:04] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [01:02:37] RECOVERY - MariaDB sustained replica lag on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [05:06:51] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [05:12:25] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [05:29:11] 10DBA: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) [05:29:21] 10DBA: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) p:05Triage→03Medium [05:30:03] 10DBA: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) [05:41:20] 10DBA, 10decommission-hardware: decommission db1086.eqiad.wmnet - https://phabricator.wikimedia.org/T278229 (10Marostegui) a:05Marostegui→03wiki_willy [05:41:37] 10DBA, 10decommission-hardware, 10ops-eqiad: decommission db1086.eqiad.wmnet - https://phabricator.wikimedia.org/T278229 (10Marostegui) This is ready for #dc-ops [05:42:26] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [05:47:02] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2077.codfw.wmnet'] ` The log can be found in... [05:47:58] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [05:49:23] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2082.codfw.wmnet'] ` The log can be found in... [06:07:02] 10DBA: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) [06:08:36] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [06:13:19] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2077.codfw.wmnet'] ` and were **ALL** successful. [06:13:48] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) Transfer started from db1074 to db1156 [06:15:42] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Tables being checked on db2077 [06:16:10] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2082.codfw.wmnet'] ` and were **ALL** successful. [06:18:59] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Tables being checked on db2082 [06:21:08] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) >>! In T280492#7021446, @jcrespo wrote: > I finished setting up db2139 with an s3 instance on buster- as soon as I merge the above patch (https://... [06:23:01] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) @jcrespo - also started replication on db2098:3313 too, let's see what happens. [06:24:13] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [06:26:25] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) s5 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1161 [] db1154 [] db1150 [x] db1145 [] db1144 [... [06:40:40] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup) Hi! Can it be done? We are planning to deploy early next week. [06:43:07] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) Hey, I thought this was meant to be done in 2-3 weeks :) (T278614#6992922) Also, I thought we wanted to delete the temporary databases before proceeding. Keep in mind... [06:45:44] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) s5 fully done [06:46:03] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [06:50:39] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [06:51:41] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup) >>! In T278614#7022974, @Marostegui wrote: > Hey, > > I thought this was meant to be done in 2-3 weeks :) (T278614#6992922) Well, it was one week ago :D. But it's sti... [06:53:35] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) >>! In T278614#7022981, @Ladsgroup wrote: > > hmm, that's a tough one. So we are enabling mailman3 on lists1001.wikimedia.org (lists.wikimedia.org) but we are not upgr... [06:53:43] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) 05Stalled→03Open [06:58:06] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup) Sure. I will make sure to remind you to delete it. Yes. `mailman3` with user `mailman3` and `mailman3web` with user `mailman3web`. They should bound to a different host... [07:02:54] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup) oh and this one needs backups but that can happen later. [07:09:50] marostegui: <3 Thanks! [07:15:11] 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) Databases created on m5 (same IP where the test databases are) Ferm needs updating to be able to reach m5-master: ` root@lists1001:~# telnet db11... [07:17:07] Amir1: done :) [07:17:26] 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) >>! In T278614#7023003, @Ladsgroup wrote: > oh and this one needs backups but that can happen later. @jcrespo can you handle this? Thank you! [07:17:41] Amazing! Thanks. Sorry for bothering you. [07:18:07] Amir1: ferm needs updating, and the password needs to be puppetized like it was done with the test databases :) [07:18:15] other than that, you are good to go! [07:18:48] \o/ We will get rid of that massive crap soon [07:19:02] :) [07:29:11] 10DBA, 10Phabricator: Upgrade mysql on db1132 (phabricator db master) - https://phabricator.wikimedia.org/T279625 (10mmodell) @Marostegui Confirmed, those are the commands to set read-only and then restore read-write. [07:31:18] 10DBA, 10Phabricator: Upgrade mysql on db1132 (phabricator db master) - https://phabricator.wikimedia.org/T279625 (10Marostegui) Thank you! I will get this done early in the UTC morning next week. Thanks again! [07:37:49] 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10jcrespo) >>! In T278614#7023031, @Marostegui wrote: > @jcrespo can you handle this? Thank you! Which section is this in? I cannot find it on the task. [07:38:25] 10DBA, 10Data-Persistence-Backup, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10jcrespo) [07:39:58] 10DBA, 10Data-Persistence-Backup, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) This lives on m5: T278614#7023029 [07:40:16] 10DBA, 10Data-Persistence-Backup, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) [08:01:33] new buster backups on codfw-s3 worked nicely. Do you want to keep db2098:3313 for a bit longer for experimentation or not really useful? [08:03:31] let's keep it till monday [08:03:39] cool [08:29:54] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Tables being checked on db1156 as it was just productionized at T258361 [08:30:27] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) Replication started on db1156 and checking tables. [08:33:09] 10DBA, 10Data-Persistence-Backup, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10jcrespo) >>! In T278614#7023003, @Ladsgroup wrote: > oh and this one needs backups but that can happen later. @Ladsgroup please... [09:11:54] FYI librenms upgrade is running an alter table on m1-master [09:13:46] godog: thanks for the heads up [09:14:59] np! I'm glad the 'progress' column in show processlist is a % and not number of rows [09:17:00] godog: don't trust it really, it can go from 0 to 99 in 10 seconds and stay at 99 for 12h :) [09:18:18] haha good to know marostegui ! yeah kinda what happened, copying to tmp table step took a few minutes to get to 50% and then the rest completed relatively quickly [09:23:54] PROBLEM - MariaDB sustained replica lag on db2132 is CRITICAL: 196.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [09:25:13] godog: downtimed the m1 slaves as they were about to alert (irc alert only) [09:26:21] marostegui: ack, thank you! upgrade is done on the librenms end fwiw [09:28:36] RECOVERY - MariaDB sustained replica lag on db2132 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2132&var-port=9104 [09:52:12] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [09:54:15] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) s2 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004 [] db1182 [] db1171 [x] db1170 [] db1162 [] db1156 [... [09:55:07] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [10:10:32] 10DBA: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Kormat) [10:17:34] 10DBA, 10Data-Persistence-Backup, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup) >>! In T278614#7023147, @jcrespo wrote: >>>! In T278614#7023003, @Ladsgroup wrote: >> oh and this one needs backups bu... [10:19:44] 10DBA, 10Data-Persistence-Backup, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10jcrespo) >>! In T278614#7023524, @Ladsgroup wrote: > `mailman3` should be small but `mailman3web` can get rather big. According t... [10:41:55] 10DBA: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) @jcrespo this is the next section we are going to fully switch on both DCs. Following the procedure written on the doc. [11:04:14] 10DBA: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10jcrespo) I have db1140:s6 and db2141:s6 with buster already ready to substitute db1139:s6 and db2097:s6, respectively, stretch instances whenever you are ready. Will prepare a patch. [11:04:53] 10DBA: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) Thanks - Stevie will be driving this task, so let's coordinate with her. [12:01:32] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1158.eqiad.wmnet'] ` The log ca... [12:25:09] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1158.eqiad.wmnet'] ` and were **ALL** successful. [14:25:15] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [14:25:30] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) db1165 is ready to take over db1085 in s6 [14:27:10] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) db1165 is ready to take over db1085 in s6 as candidate master. They both need to be stopped at the same time and move db1155 (sanitarium for s2) under it. db1125 (old sanitarium) d... [14:27:25] ^that was meant to be "sanitarium master" I just edited the comment. [14:41:48] 10DBA, 10Privacy Engineering, 10SRE, 10WMF-Legal, and 3 others: dbtree loads third party resources (from google.com/jsapi) - https://phabricator.wikimedia.org/T96499 (10Ladsgroup) [16:44:07] 10DBA, 10Security-Team, 10Security, 10Vuln-Infoleak: Verify if the dbtree password exposed in Gerrit is still in use - https://phabricator.wikimedia.org/T280812 (10sbassett) 05Open→03Resolved a:03Marostegui [16:44:10] 10DBA, 10Security-Team, 10Security, 10Vuln-Infoleak: Verify if the dbtree password exposed in Gerrit is still in use - https://phabricator.wikimedia.org/T280812 (10sbassett) [16:46:41] 10DBA, 10Security-Team, 10Security, 10Vuln-Infoleak: Verify if the dbtree password exposed in Gerrit is still in use - https://phabricator.wikimedia.org/T280812 (10sbassett) [17:49:27] 10DBA, 10Data-Persistence-Backup, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10jcrespo) Backups have been enabled and access seem correct. I saw the dbs are right now empty, but please ping me at some point i... [20:17:31] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10gmodena) @Marostegui @Eevans thanks for the input! I should have stats re dataset sizes of the 300+ wikis towards the end of this week. Crunching is still in progress; it takes a while to cycle through all languages. I'l... [22:38:50] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Improve workflow for mailman database bootstrapping and updates - https://phabricator.wikimedia.org/T278499 (10Legoktm) 05Open→03Resolved a:03Legoktm Sounds good, thanks for the input! [22:54:27] 10DBA, 10Wikimedia-Mailing-lists: Upgrade lists-next to bullseye mailman versions - https://phabricator.wikimedia.org/T280887 (10Legoktm) [22:59:25] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10Jdforrester-WMF) [23:04:39] 10DBA, 10Wikimedia-Mailing-lists: Upgrade lists-next to bullseye mailman versions - https://phabricator.wikimedia.org/T280887 (10Legoktm) [23:04:45] 10DBA, 10Data-Persistence-Backup, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Legoktm) It slipped my mind that we need to test the new packages first, I filed {T280887} for that. If we can get that upgrade...