[04:55:33] PROBLEM - MariaDB sustained replica lag on db2090 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2090&var-port=9104 [04:56:43] RECOVERY - MariaDB sustained replica lag on db2090 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2090&var-port=9104 [06:16:32] 10DBA, 10SRE, 10ops-eqiad, 10Patch-For-Review: Investigate and repool db1134 - https://phabricator.wikimedia.org/T274472 (10Marostegui) Thanks John - I have powered up the host and it looks good now. I will take it from here ` root@db1134:~# free -g total used free shared... [06:20:30] 10DBA: Reimage db1134 to Buster and repool it - https://phabricator.wikimedia.org/T275343 (10Marostegui) [06:20:41] 10DBA: Reimage db1134 to Buster and repool it - https://phabricator.wikimedia.org/T275343 (10Marostegui) p:05Triage→03Medium [06:21:06] 10DBA: Reimage db1134 to Buster and repool it - https://phabricator.wikimedia.org/T275343 (10Marostegui) [06:22:17] 10DBA, 10SRE, 10ops-eqiad, 10Patch-For-Review: Investigate and repool db1134 - https://phabricator.wikimedia.org/T274472 (10Marostegui) 05Open→03Resolved I am going to close this - the task to track next steps is: T275343 Thanks everyone for helping out here [06:33:22] 10DBA, 10Commons: Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata - https://phabricator.wikimedia.org/T275268 (10Marostegui) Just to comment that compressing the table would give some benefit, but in the long run, it won't give us much. I think the whole... [07:13:44] 10DBA, 10SRE, 10ops-eqiad: db1162 crashed - https://phabricator.wikimedia.org/T275309 (10wiki_willy) a:05wiki_willy→03Cmjohnson [07:46:19] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) [07:52:09] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1090.eqiad.wmnet - https://phabricator.wikimedia.org/T274333 (10Marostegui) [08:16:38] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) s1 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1169 [] db1163 [] db1154 [x] db1140 [x] db1139 [] db1135... [08:16:52] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) [08:20:16] 10Data-Persistence-Backup, 10SRE, 10SRE-swift-storage, 10Traffic, 10netops: Depool codfw swift cluster - https://phabricator.wikimedia.org/T267338 (10fgiunchedi) Links came back over the weekend, looks like we can proceed when ready [08:21:15] 10DBA, 10Commons: Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata - https://phabricator.wikimedia.org/T275268 (10daniel) >>! In T275268#6847540, @Marostegui wrote: > Just to comment that compressing the table would give some benefit, but in the long run,... [08:35:12] 10DBA, 10Commons: Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata - https://phabricator.wikimedia.org/T275268 (10Marostegui) >>! In T275268#6847710, @daniel wrote: >>>! In T275268#6847540, @Marostegui wrote: >> Just to comment that compressing the table w... [09:34:40] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) [09:50:40] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) [09:54:31] jynus: are you working on db1150 and db1102 or should I go ahead and restart the prometheus exporter there? [09:54:53] I must have forgotten to restart it after restart [09:54:57] you can go ahead [09:55:01] ok [09:56:14] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) s3 progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004 [] db1175 [x] db1171 [] db1166 [] db1157 [] db1154 [] db1124 [] db1... [09:56:32] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) [09:58:41] isn't it "funny" that metrics themselves don't get affected? [10:25:19] db1141 just had a weird state: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1141&var-port=9104&from=1613988652096&to=1613989447740 [10:46:53] 10DBA, 10Commons: Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata - https://phabricator.wikimedia.org/T275268 (10daniel) >>! In T275268#6847723, @Marostegui wrote: > Let's discuss further steps on T28741, and not hijack this task :) I commented there, s... [10:49:03] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Technical-Debt, and 2 others: Make wb_changes_dispatch.chd_seen unsigned in production - https://phabricator.wikimedia.org/T273874 (10Lucas_Werkmeister_WMDE) [13:43:53] heads up, I will be using dbprov2003, db2102 and a spare swift host for media backups workflow/performance testing [13:58:21] 10Data-Persistence-Backup, 10SRE, 10SRE-swift-storage, 10Traffic, 10netops: Depool codfw swift cluster - https://phabricator.wikimedia.org/T267338 (10fgiunchedi) Just depooled swift from codfw (for reads) `confctl --object-type discovery select 'dnsdisc=swift,name=codfw' set/pooled=false` [13:59:16] ^and this is happening now [15:41:40] mariadb-dump's --system option looks interesting: https://mariadb.com/kb/en/mysqldump/ [15:42:42] cute [16:02:35] 10DBA, 10OTRS: OTRS database is "too large" - https://phabricator.wikimedia.org/T138915 (10grin) We are using FS storage since the beginning, and it usually goes well without large problems. So far I remember only one problem which was present in OTRSv4: attachments with bad encoding sometimes were saved uing... [16:08:25] marostegui: heads-up, creating multiple dbs now, will ping once I'm done [16:11:13] thanks Urbanecm , I'm out tomorrow. kormat if you have time tomorrow for it, remember there are 4 sanitariums in eqiad, the two stretch and the two bustersm codfw has the normal two [16:11:38] ack, enjoy your time out marostegui ! [16:20:51] 10DBA, 10Data-Services: Prepare and check storage layer for altwiki - https://phabricator.wikimedia.org/T271982 (10Urbanecm) I just created the database. Thanks! [16:21:25] 10DBA, 10Data-Services: Prepare and check storage layer for mniwiki - https://phabricator.wikimedia.org/T273465 (10Urbanecm) The database was just created. Thanks! [17:32:07] 10DBA, 10Data-Services: Prepare and check storage layer for mniwiktionary - https://phabricator.wikimedia.org/T273459 (10Urbanecm) The database was just created [22:57:32] 10DBA, 10SRE, 10ops-eqiad: Degraded RAID on db1103 - https://phabricator.wikimedia.org/T275266 (10Jclark-ctr) @Marostegui Swapped Bad SSD @wiki_willy we did have one new in box same size same model ect. it originally came from HP [23:11:19] 10DBA, 10SRE, 10ops-eqiad: Degraded RAID on db1103 - https://phabricator.wikimedia.org/T275266 (10wiki_willy) Thanks @Jclark-ctr >>! In T275266#6850886, @Jclark-ctr wrote: > @Marostegui Swapped Bad SSD @wiki_willy we did have one new in box same size same model ect. it originally came from HP