[04:53:46] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) >>! In T280751#7052806, @Kormat wrote: > Steps to swap db1085 -> db1165: > [] Silence db1085, db1165 & db1155. > [] Depool instances: > ` > sudo dbctl instance db1085 depool > sudo... [05:16:56] 10DBA, 10DiscussionTools, 10OWC2020, 10Editing-team (FY2020-21 Kanban Board), 10Patch-For-Review: DBA review: conversation subscriptions - https://phabricator.wikimedia.org/T263817 (10Marostegui) I have restarted sanitarium hosts to filter the tables on the wiki replicas. The tables can now be created in... [06:07:06] Ping https://phabricator.wikimedia.org/T163532#7018585 :D [06:21:08] Amir1: what am I supposed to do next? [06:21:21] ah [06:21:24] remove the index? [06:21:40] yeah :D [06:21:42] I have a pile of things waiting on me, I don't know if I will have time for this this week [06:21:56] all good. I'm here to ping you all year [06:21:57] removing the index is easy, monitoring it isn't :-) [06:21:58] and beyond [06:22:20] snoozed until next week, I'll ping you next week. Would that work? [06:22:29] that works thanks [06:22:39] I can help with monitoring too [07:04:25] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [07:04:34] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [07:04:41] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [07:15:53] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [07:16:10] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) s5 sanitarium master switched: db1154 now replicates from db1161 (10.4) [07:17:41] 10DBA, 10decommission-hardware: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) [07:18:02] 10DBA, 10decommission-hardware: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) 05Open→03Stalled Waiting a few days to see if its replacement (db1161) works fine. [07:18:25] 10DBA, 10decommission-hardware: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) [07:18:28] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [07:18:42] 10DBA, 10decommission-hardware: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) [07:18:44] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [07:18:58] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [07:48:20] did you see pt-heartbeat failing on tendril/orchestration dbs since 4 days ago? [07:48:37] nop [07:48:56] I have been off for 4 days :) [07:49:00] he [07:49:00] I will take a look later [07:52:40] fixed [08:03:31] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Checking tables on db1106 [09:00:44] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1178 is now replicating [09:35:56] 10DBA, 10AbuseFilter: Check whether `FORCE INDEX page_timestamp` is still needed in LazyVariableComputer.php - https://phabricator.wikimedia.org/T281579 (10LSobanski) p:05Triage→03Medium [09:37:54] 10DBA, 10SRE-tools, 10Sustainability (Incident Followup): Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10LSobanski) p:05Triage→03Medium [09:42:08] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) Checking tables on db1178 [09:55:23] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10hnowlan) >>! In T280042#7045385, @gmodena wrote: > > Life would be easier if we could reach RESTBase Cassandra from the Hadoop network. For the right usecase I imagine access could be authorised - we currently have [[ ht... [11:48:42] 10DBA, 10decommission-hardware: decommission db1083.eqiad.wmnet - https://phabricator.wikimedia.org/T281445 (10Marostegui) db1118 has been pooled today. Let's give it a week [12:11:47] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s2 - https://phabricator.wikimedia.org/T281826 (10LSobanski) [12:12:42] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s3 - https://phabricator.wikimedia.org/T281827 (10LSobanski) [12:15:01] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s5 - https://phabricator.wikimedia.org/T281828 (10LSobanski) [12:15:37] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s6 - https://phabricator.wikimedia.org/T281829 (10LSobanski) [12:16:09] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s8 - https://phabricator.wikimedia.org/T281830 (10LSobanski) [12:16:47] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s8 - https://phabricator.wikimedia.org/T281830 (10LSobanski) p:05Triage→03Medium [12:17:09] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s6 - https://phabricator.wikimedia.org/T281829 (10LSobanski) p:05Triage→03Medium [12:17:13] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s5 - https://phabricator.wikimedia.org/T281828 (10LSobanski) p:05Triage→03Medium [12:17:15] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s3 - https://phabricator.wikimedia.org/T281827 (10LSobanski) p:05Triage→03High [12:17:19] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s3 - https://phabricator.wikimedia.org/T281827 (10LSobanski) p:05High→03Medium [12:17:22] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s2 - https://phabricator.wikimedia.org/T281826 (10LSobanski) p:05Triage→03Medium [12:26:01] 10DBA: QPS rate of change alarming - https://phabricator.wikimedia.org/T281833 (10LSobanski) [12:26:23] 10DBA: QPS rate of change alarming - https://phabricator.wikimedia.org/T281833 (10LSobanski) p:05Triage→03Medium [12:52:04] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Kormat) [12:52:28] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Kormat) [12:53:47] 10DBA, 10Cognate, 10ContentTranslation, 10Growth-Team, and 9 others: Restart x1 database master (db1103) - https://phabricator.wikimedia.org/T281212 (10Marostegui) All slaves have been upgraded to 10.4.18, so the master is ready. [13:45:50] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10gmodena) >>! In T280042#7056379, @hnowlan wrote: >>>! In T280042#7045385, @gmodena wrote: >> >> Life would be easier if we could reach RESTBase Cassandra from the Hadoop network. > > For the right usecase I imagine acces... [14:41:37] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Bstorm) [14:46:45] hi DB's the pki database currently stores all certificates ever issued, including expired certificates hwoever expired certs have very little use to us so i im going to create a clean up job which dose the following [14:46:58] delete from certificates where expiry < now(); [14:47:27] i planed to do this in a small python script run from a systemd::timer::job however i wondered of there is a more DBA native weay to do this? [14:47:53] jbond42: where does pki live? [14:48:12] found it [14:48:14] m1-master [14:48:45] jbond42: it is just 4k rows [14:48:58] jbond42: I can issue the delete now if you want, so you don't wast time doing all that [14:49:02] *waste [14:49:03] thats probably about right [14:49:34] jbond42: so up to you :) [14:49:41] wait, let me check backups [14:50:07] marostegui: the issue is the database will grown by ($number_of_hosts * number_of_certs)/expiry length [14:50:18] jbond42: ah right, so it is not a one time thing [14:50:19] e.g. now its ~1600 * 1/4 weeks [14:50:23] no [14:50:39] jbond42: then yeah, a cronjob sounds good to me, if it always points to m1-master.eqiad.wmnet [14:51:06] it shold always point there and act thats fine i already have some maint scripts so not a big issue thanks [14:51:16] cool :) [14:51:17] thanks [14:51:23] that's currently 2.6M compressed on backups [14:51:30] it's not a lot [14:51:45] jynus: fyi once the certs are expired they are of absolutly no use [14:53:08] its unlikley that this db will grow massive in size but its seems to still make senses to clean up entries we dont use. also not this could have a much bigger inpact if we have multiple certs with much short ttl e.g. 1 day ttls and ~5 certs per host means and additional 8000 rosw/day [14:53:55] yeah, that's a some rows... [15:05:23] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10hnowlan) >>! In T280042#7057407, @gmodena wrote: >>>! In T280042#7056379, @hnowlan wrote: >>>>! In T280042#7045385, @gmodena wrote: >>> >>> Life would be easier if we could reach RESTBase Cassandra from the Hadoop network... [15:35:30] * marostegui happy to see kormat explaining how innodb works - mission accomplished! [15:35:43] * kormat swears furiously [15:35:46] kormat you are officially a dba [15:36:14] marostegui: where, where? [15:36:18] Here's your badge and your gun [15:36:28] there is only one bullet [15:36:42] And it only fires if pointed towards own foot [16:19:49] 10Data-Persistence-Backup, 10Goal: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) [16:20:21] 10Data-Persistence-Backup, 10Goal: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) New s2 and s3 instances: db1102:3313, db2101:3312 checking tables right now. [17:04:33] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 6.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [17:06:49] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [17:45:24] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10thcipriani) After talking this through with contractors + @brennen and @wkandek I think we'd like to do a full daily backup. Starting from the use-cas... [20:07:09] 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul) [20:08:12] 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul)