[04:53:46] <wikibugs>	 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Marostegui) >>! In T280751#7052806, @Kormat wrote: > Steps to swap db1085 -> db1165: > [] Silence db1085, db1165 & db1155. > [] Depool instances: > ` > sudo dbctl instance db1085 depool > sudo...
[05:16:56] <wikibugs>	 10DBA, 10DiscussionTools, 10OWC2020, 10Editing-team (FY2020-21 Kanban Board), 10Patch-For-Review: DBA review: conversation subscriptions - https://phabricator.wikimedia.org/T263817 (10Marostegui) I have restarted sanitarium hosts to filter the tables on the wiki replicas. The tables can now be created in...
[06:07:06] <Amir1>	 Ping https://phabricator.wikimedia.org/T163532#7018585 :D
[06:21:08] <marostegui>	 Amir1: what am I supposed to do next?
[06:21:21] <marostegui>	 ah
[06:21:24] <marostegui>	 remove the index?
[06:21:40] <Amir1>	 yeah :D
[06:21:42] <marostegui>	 I have a pile of things waiting on me, I don't know if I will have time for this this week
[06:21:56] <Amir1>	 all good. I'm here to ping you all year
[06:21:57] <marostegui>	 removing the index is easy, monitoring it isn't :-)
[06:21:58] <Amir1>	 and beyond
[06:22:20] <Amir1>	 snoozed until next week, I'll ping you next week. Would that work?
[06:22:29] <marostegui>	 that works thanks
[06:22:39] <Amir1>	 I can help with monitoring too
[07:04:25] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui)
[07:04:34] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui)
[07:04:41] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui)
[07:15:53] <wikibugs>	 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui)
[07:16:10] <wikibugs>	 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) s5 sanitarium master switched: db1154 now replicates from db1161 (10.4)
[07:17:41] <wikibugs>	 10DBA, 10decommission-hardware: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui)
[07:18:02] <wikibugs>	 10DBA, 10decommission-hardware: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) 05Open→03Stalled Waiting a few days to see if its replacement (db1161) works fine.
[07:18:25] <wikibugs>	 10DBA, 10decommission-hardware: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui)
[07:18:28] <wikibugs>	 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui)
[07:18:42] <wikibugs>	 10DBA, 10decommission-hardware: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui)
[07:18:44] <wikibugs>	 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[07:18:58] <wikibugs>	 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[07:48:20] <jynus>	 did you see pt-heartbeat failing on tendril/orchestration dbs since 4 days ago?
[07:48:37] <marostegui>	 nop
[07:48:56] <marostegui>	 I have been off for 4 days :)
[07:49:00] <jynus>	 he
[07:49:00] <marostegui>	 I will take a look later
[07:52:40] <marostegui>	 fixed
[08:03:31] <wikibugs>	 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Checking tables on db1106
[09:00:44] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1178 is now replicating
[09:35:56] <wikibugs>	 10DBA, 10AbuseFilter: Check whether `FORCE INDEX page_timestamp` is still needed in LazyVariableComputer.php - https://phabricator.wikimedia.org/T281579 (10LSobanski) p:05Triage→03Medium
[09:37:54] <wikibugs>	 10DBA, 10SRE-tools, 10Sustainability (Incident Followup): Create or modify an existing tool that quickly shows the db replication status in case of master failure - https://phabricator.wikimedia.org/T281249 (10LSobanski) p:05Triage→03Medium
[09:42:08] <wikibugs>	 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) Checking tables on db1178
[09:55:23] <wikibugs>	 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10hnowlan) >>! In T280042#7045385, @gmodena wrote: >  > Life would be easier if we could reach RESTBase Cassandra from the Hadoop network.  For the right usecase I imagine access could be authorised - we currently have [[ ht...
[11:48:42] <wikibugs>	 10DBA, 10decommission-hardware: decommission db1083.eqiad.wmnet - https://phabricator.wikimedia.org/T281445 (10Marostegui) db1118 has been pooled today. Let's give it a week
[12:11:47] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s2 - https://phabricator.wikimedia.org/T281826 (10LSobanski)
[12:12:42] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s3 - https://phabricator.wikimedia.org/T281827 (10LSobanski)
[12:15:01] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s5 - https://phabricator.wikimedia.org/T281828 (10LSobanski)
[12:15:37] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s6 - https://phabricator.wikimedia.org/T281829 (10LSobanski)
[12:16:09] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s8 - https://phabricator.wikimedia.org/T281830 (10LSobanski)
[12:16:47] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s8 - https://phabricator.wikimedia.org/T281830 (10LSobanski) p:05Triage→03Medium
[12:17:09] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s6 - https://phabricator.wikimedia.org/T281829 (10LSobanski) p:05Triage→03Medium
[12:17:13] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s5 - https://phabricator.wikimedia.org/T281828 (10LSobanski) p:05Triage→03Medium
[12:17:15] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s3 - https://phabricator.wikimedia.org/T281827 (10LSobanski) p:05Triage→03High
[12:17:19] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s3 - https://phabricator.wikimedia.org/T281827 (10LSobanski) p:05High→03Medium
[12:17:22] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s2 - https://phabricator.wikimedia.org/T281826 (10LSobanski) p:05Triage→03Medium
[12:26:01] <wikibugs>	 10DBA: QPS rate of change alarming - https://phabricator.wikimedia.org/T281833 (10LSobanski)
[12:26:23] <wikibugs>	 10DBA: QPS rate of change alarming - https://phabricator.wikimedia.org/T281833 (10LSobanski) p:05Triage→03Medium
[12:52:04] <wikibugs>	 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Kormat)
[12:52:28] <wikibugs>	 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Kormat)
[12:53:47] <wikibugs>	 10DBA, 10Cognate, 10ContentTranslation, 10Growth-Team, and 9 others: Restart x1 database master (db1103) - https://phabricator.wikimedia.org/T281212 (10Marostegui) All slaves have been upgraded to 10.4.18, so the master is ready.
[13:45:50] <wikibugs>	 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10gmodena) >>! In T280042#7056379, @hnowlan wrote: >>>! In T280042#7045385, @gmodena wrote: >>  >> Life would be easier if we could reach RESTBase Cassandra from the Hadoop network. >  > For the right usecase I imagine acces...
[14:41:37] <wikibugs>	 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Bstorm)
[14:46:45] <jbond42>	 hi DB's the pki database currently stores all certificates ever issued, including expired certificates hwoever expired certs have very little use to us so i im going to create a clean up job which dose the following
[14:46:58] <jbond42>	 delete from certificates where expiry < now();
[14:47:27] <jbond42>	 i planed to do this in a small python script run from a systemd::timer::job however i wondered of there is a more DBA native weay to do this?
[14:47:53] <marostegui>	 jbond42: where does pki live?
[14:48:12] <marostegui>	 found it
[14:48:14] <jbond42>	 m1-master
[14:48:45] <marostegui>	 jbond42: it is just 4k rows
[14:48:58] <marostegui>	 jbond42: I can issue the delete now if you want, so you don't wast time doing all that
[14:49:02] <marostegui>	 *waste
[14:49:03] <jbond42>	 thats probably about right
[14:49:34] <marostegui>	 jbond42: so up to you :)
[14:49:41] <jynus>	 wait, let me check backups
[14:50:07] <jbond42>	 marostegui: the issue is the database will grown by ($number_of_hosts * number_of_certs)/expiry length
[14:50:18] <marostegui>	 jbond42: ah right, so it is not a one time thing
[14:50:19] <jbond42>	 e.g. now its ~1600 * 1/4 weeks
[14:50:23] <jbond42>	 no
[14:50:39] <marostegui>	 jbond42: then yeah, a cronjob sounds good to me, if it always points to m1-master.eqiad.wmnet
[14:51:06] <jbond42>	 it shold always point there and act thats fine i already have some maint scripts so not a big issue thanks
[14:51:16] <marostegui>	 cool :)
[14:51:17] <marostegui>	 thanks
[14:51:23] <jynus>	 that's currently 2.6M compressed on backups
[14:51:30] <jynus>	 it's not a lot
[14:51:45] <jbond42>	 jynus: fyi once the certs are expired they are of absolutly no use
[14:53:08] <jbond42>	 its unlikley that this db will grow massive in size but its seems to still make senses to clean up entries we dont use.  also not this could have a much bigger inpact if we have multiple certs with much short ttl e.g. 1 day ttls and ~5 certs per host means and additional 8000 rosw/day
[14:53:55] <marostegui>	 yeah, that's a some rows...
[15:05:23] <wikibugs>	 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10hnowlan) >>! In T280042#7057407, @gmodena wrote: >>>! In T280042#7056379, @hnowlan wrote: >>>>! In T280042#7045385, @gmodena wrote: >>>  >>> Life would be easier if we could reach RESTBase Cassandra from the Hadoop network...
[15:35:30] * marostegui happy to see kormat explaining how innodb works - mission accomplished!
[15:35:43] * kormat swears furiously
[15:35:46] <marostegui>	 kormat you are officially a dba
[15:36:14] <volans>	 marostegui: where, where?
[15:36:18] <sobanski>	 Here's your badge and your gun
[15:36:28] <volans>	 there is only one bullet
[15:36:42] <sobanski>	 And it only fires if pointed towards own foot
[16:19:49] <wikibugs>	 10Data-Persistence-Backup, 10Goal: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo)
[16:20:21] <wikibugs>	 10Data-Persistence-Backup, 10Goal: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) New s2 and s3 instances: db1102:3313, db2101:3312 checking tables right now.
[17:04:33] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 6.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104
[17:06:49] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104
[17:45:24] <wikibugs>	 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10thcipriani) After talking this through with contractors + @brennen and @wkandek I think we'd like to do a full daily backup.  Starting from the use-cas...
[20:07:09] <wikibugs>	 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul)
[20:08:12] <wikibugs>	 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Papaul)