[05:01:28] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Marostegui) Sure, that works. Just let me know when we can proceed. I assume we'd need to delete the following databases: ` testmailman3 testmailman3web ` And the following users: ` | test... [05:05:13] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Ladsgroup) I confirm that :) [05:19:18] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1121.eqiad.wmnet'] ` The log can be found in... [05:23:36] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) a:05Marostegui→03None Yes, this can be done anytime. It doesn't require master swaps, it is an only change (metadata locking can arise though) [05:23:54] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) a:05Marostegui→03None Yes, this can be done anytime. It doesn't require master swaps, it is an only change (metadata locking can arise though) [05:24:14] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) a:05Marostegui→03None Yes, this can be done anytime. It doesn't require master swaps, it is an only change (metadata locking can arise though) [05:27:02] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team, and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Marostegui) >>! In T280605#7075501, @DLynch wrote: > If the purge script is getting unreliable, should we look into that b... [05:40:15] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1121.eqiad.wmnet'] ` and were **ALL** successful. [05:42:45] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) db1121 has been reimaged to Buster. I am checking the tables now, this means `commonswiki` will show lag on wikireplicas. [05:45:03] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1082.eqiad.wmnet` - db1082.eqiad.wmnet (**PASS**) - Downtimed host on Icinga... [05:45:05] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) a:05Marostegui→03wiki_willy [05:46:01] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [06:50:37] 10DBA: Re-import some tables on db2094:3318 - https://phabricator.wikimedia.org/T282514 (10Marostegui) [06:51:03] 10DBA: Re-import some tables on db2094:3318 (sanitarium host) - https://phabricator.wikimedia.org/T282514 (10Marostegui) p:05Triage→03Medium [07:02:06] 10DBA: Re-import some tables on db2094:3318 (sanitarium host) - https://phabricator.wikimedia.org/T282514 (10Marostegui) mydumper in progress [07:21:01] m5 backups grew 50%, acking those as I guess it is due to mailman import [08:30:42] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [08:31:10] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [08:46:35] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team, and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Marostegui) Progress: 18.24% [08:54:06] 10DBA, 10decommission-hardware: decommission labsdb1009.eqiad.wmnet - https://phabricator.wikimedia.org/T282522 (10Marostegui) [08:54:33] 10DBA, 10decommission-hardware: decommission labsdb1010.eqiad.wmnet - https://phabricator.wikimedia.org/T282523 (10Marostegui) 05Open→03Stalled [08:54:47] 10DBA, 10decommission-hardware: decommission labsdb1009.eqiad.wmnet - https://phabricator.wikimedia.org/T282522 (10Marostegui) 05Open→03Stalled [08:55:29] 10DBA, 10decommission-hardware: decommission labsdb1011.eqiad.wmnet - https://phabricator.wikimedia.org/T282524 (10Marostegui) 05Open→03Stalled [09:06:36] 10Blocked-on-schema-change, 10DBA: Alter objectcache.exptime - https://phabricator.wikimedia.org/T272512 (10Marostegui) @Ladsgroup did you find something about this? [09:08:22] 10Blocked-on-schema-change, 10DBA: Alter objectcache.exptime - https://phabricator.wikimedia.org/T272512 (10Ladsgroup) I haven't had time to look at it with all of the craziness of the world but it has been on my (very long) todo list since February. [09:20:43] 10Blocked-on-schema-change, 10DBA: Alter objectcache.exptime - https://phabricator.wikimedia.org/T272512 (10Marostegui) No worries, just double checking tasks we want/can prepare before the switchover :) [09:41:03] 10DBA, 10Data-Services, 10decommission-hardware, 10Patch-For-Review, 10cloud-services-team (Kanban): decommission labsdb1011.eqiad.wmnet - https://phabricator.wikimedia.org/T282524 (10Marostegui) [09:41:19] 10DBA, 10Data-Services, 10decommission-hardware, 10Patch-For-Review, 10cloud-services-team (Kanban): decommission labsdb1010.eqiad.wmnet - https://phabricator.wikimedia.org/T282523 (10Marostegui) [09:41:55] 10DBA, 10Data-Services, 10decommission-hardware, 10Patch-For-Review, 10cloud-services-team (Kanban): decommission labsdb1009.eqiad.wmnet - https://phabricator.wikimedia.org/T282522 (10Marostegui) [09:46:59] is dbtree.w.o expected to be unavailable? [09:47:38] nope [09:47:39] checking [09:47:52] tendril works fine for me [09:47:54] fwiw [09:48:10] indeed [09:48:11] I don't recall if dbtree gets the data from tendril or not [09:48:14] it is just dbtree [09:49:37] volans: dbtree is a bit of black magic to me but yes, it does get data from tendril [09:49:53] ehehe, sorry for the trouble then [09:51:10] moritzm: any FW changes on dbmonitor that you recall? [09:59:29] Not sure if this is correct: https://phabricator.wikimedia.org/P15906 [10:01:58] from that I gather that dbtree is going through the CDN [10:03:28] let me check with valentin to see if there's something there [10:05:02] Looks like this is it: https://phabricator.wikimedia.org/T281673 [10:16:19] thanks for reporting this volans :* [10:16:31] valentin is following up [10:17:25] np, sorry for the additional work :) [10:18:13] 10DBA, 10SRE, 10Traffic: dbtree.wm.o stopped working after enforcing Puppet CA issued certs for ATS backend origin servers - https://phabricator.wikimedia.org/T282531 (10Marostegui) [11:00:43] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) s2 is done, pending the master. [11:00:46] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) s2 is done, pending the master. [11:00:49] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) s2 is done, pending the master. [11:01:03] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [11:01:20] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [11:01:26] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [11:12:35] 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Marostegui) Ready to merge this whenver you review it: https://gerrit.wikimedia.org/r/688975 Once merge this requires removing the users manually too in the DB. [11:19:16] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Rename AbuseFilter indexes for consistency - https://phabricator.wikimedia.org/T281058 (10Marostegui) @Daimona did you do a search on code to make sure there's no FORCE INDEX for any of those? [11:30:34] 10DBA: Move db2108 from s2 to s7 - https://phabricator.wikimedia.org/T282535 (10Marostegui) [11:30:46] 10DBA: Move db2108 from s2 to s7 - https://phabricator.wikimedia.org/T282535 (10Marostegui) p:05Triage→03Medium [11:32:09] 10DBA: Productionize db114[1-9] - https://phabricator.wikimedia.org/T252512 (10Marostegui) [11:32:22] 10DBA, 10Patch-For-Review: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10Marostegui) 05Stalled→03Resolved This is mostly finished, the last move should be: T282535 [11:45:41] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Rename AbuseFilter indexes for consistency - https://phabricator.wikimedia.org/T281058 (10Daimona) >>! In T281058#7077561, @Marostegui wrote: > @Daimona did you do a search on code to make sure there's no FORCE INDEX for any of those? Yes, there are no FORCE... [11:46:32] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Rename AbuseFilter indexes for consistency - https://phabricator.wikimedia.org/T281058 (10Marostegui) Excellent, thank you! [12:23:27] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [12:23:33] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [12:23:40] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [12:49:09] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) s7 eqiad [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1181 [] db1174 [] db1170 [] db1158 [] db1155 [] db1136... [12:49:12] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) s7 eqiad [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1181 [] db1174 [] db1170 [] db1158 [] db1155 [] db1136 [] db1127 []... [12:49:15] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) s7 eqiad [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1181 [] db1174 [] db1170 [] db1158 [] db1155 [] db1136 [] db1127 [] db1124... [13:46:27] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 43.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [13:55:55] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [14:03:48] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Cmjohnson) [14:03:51] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [14:06:15] 10DBA: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC - https://phabricator.wikimedia.org/T276448 (10Cmjohnson) [14:45:46] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [14:46:19] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Cmjohnson) [14:46:24] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [14:46:45] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [15:14:51] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [15:32:41] 10Data-Persistence-Backup: Setup backup1003 and backup2003 as the storage location for es bacula backups - https://phabricator.wikimedia.org/T282249 (10jcrespo) es-rw dumps ran correctly today. They should be backed up on bacula tonight. Tomorrow we will start the es-ro backup process. [15:35:53] I feel like there should be some retirement party for labsdb10[09|10|11]. jayme, marostegui, and bstorm have certainly spent a lot of time getting to know those servers over the years. [15:37:50] bd808, I have the gasoline :-) [15:38:03] i have the lighter [15:38:16] https://www.youtube.com/watch?v=N9wsjroVlu8 [15:38:28] *PC LOAD LETTER* [15:41:08] hahahhahaa [15:51:20] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [16:02:39] do you happen to know what is cloudmetrics1002, bd808? [16:52:40] lol [16:56:26] normally bacula is quire resilient, but when it gets silly, it get really silly until you restart all director and storage stuck daemons [16:56:29] *quite [17:37:00] jynus: cloudmetrics1002 is graphite, prometheus, grafana for WMCS backplane (and some tenants?) [17:37:52] I think the tenant connection is only graphite, but I might be wrong about that [18:28:25] 10DBA, 10MediaWiki-Cache, 10MediaWiki-Revision-backend, 10Performance-Team, and 2 others: SqlBlobStore no longer caching blobs (DBConnectionError Too many connections) - https://phabricator.wikimedia.org/T281480 (10Krinkle) [18:28:36] 10DBA, 10MediaWiki-Cache, 10MediaWiki-Revision-backend, 10Performance-Team, and 3 others: SqlBlobStore no longer caching blobs (DBConnectionError Too many connections) - https://phabricator.wikimedia.org/T281480 (10Krinkle) [18:28:47] 10DBA, 10MediaWiki-Cache, 10MediaWiki-Revision-backend, 10Performance-Team, and 3 others: SqlBlobStore no longer caching blobs (DBConnectionError Too many connections) - https://phabricator.wikimedia.org/T281480 (10Krinkle) [21:56:19] 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Schema-change: Mailman3 schema change: change utf8 columns to utf8mb4 - https://phabricator.wikimedia.org/T282621 (10Legoktm) [22:14:32] 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by legoktm@cumin1001 for hosts: `lists1002.wikimedia.org` - lists1002.wikimedia.org (**PASS**) - Downti... [22:22:14] 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Legoktm) >>! In T281548#7080171, @ops-monitoring-bot wrote: > - COMMON_STEPS (**FAIL**) > - **Failed to run the sre.dns.netbox cookbook**: Cumin execution failed (ex... [22:42:20] 10DBA, 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Legoktm) I deleted the VM, DNS, private puppet, labs/private. @Marostegui all ready for you to do database deletion!