[05:15:51] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1124.eqiad.wmnet'] ` The log ca... [05:17:56] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1124.eqiad.wmnet'] ` Of which those **FAILED**: ` ['db1124.eqiad.wmnet'] ` [05:18:25] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1124.eqiad.wmnet'] ` The log ca... [05:25:27] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [05:29:26] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [05:37:24] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) s1 eqiad [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1184 [] db1169 [] db1164 [] db1163 [] db1154 [x] db1140 [... [05:42:41] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1124.eqiad.wmnet'] ` and were **ALL** successful. [05:48:36] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) s1 is just waiting for the master, which will be done once the switchover on Wednesday is done (T278214) [05:48:50] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [05:50:00] 10Blocked-on-schema-change, 10DBA, 10AbuseFilter: Rename AbuseFilter indexes for consistency - https://phabricator.wikimedia.org/T281058 (10Marostegui) p:05Triage→03Medium [05:59:27] 10DBA, 10decommission-hardware: decommission db1077.eqiad.wmnet - https://phabricator.wikimedia.org/T281075 (10Marostegui) [06:00:03] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [06:00:30] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [06:00:32] 10DBA, 10decommission-hardware: decommission db1077.eqiad.wmnet - https://phabricator.wikimedia.org/T281075 (10Marostegui) [06:07:05] 10DBA, 10Commons: Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata - https://phabricator.wikimedia.org/T275268 (10tstarling) This is apparently a duplicate of T32906. Has anyone got any thoughts on the internal API? Ideally PdfHandler should not have to k... [06:20:59] 10DBA, 10Commons: Avoid capacity issues from the image table holding the text of pdf/djvu files as part of their metadata - https://phabricator.wikimedia.org/T275268 (10tstarling) Regarding ApiQueryImageInfo and ForeignAPIRepo: * ApiQueryImageInfo is already depending on the metadata being JSON-serializable,... [07:13:21] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [07:39:23] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) checking tables on db1124 after the transfer from db1158 [08:56:37] 10Data-Persistence-Backup: xtrabackup --prepare hits open_files_limit on buster - https://phabricator.wikimedia.org/T281094 (10jcrespo) [09:13:22] 10DBA, 10DiscussionTools, 10OWC2020, 10Editing-team (FY2020-21 Kanban Board): DBA review: conversation subscriptions - https://phabricator.wikimedia.org/T263817 (10LSobanski) p:05Triage→03Medium [09:17:51] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Kormat) [09:18:17] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Kormat) a:03Kormat [09:19:38] 10Data-Persistence-Backup: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) [09:19:40] 10Data-Persistence-Backup, 10Patch-For-Review: xtrabackup --prepare hits open_files_limit on buster - https://phabricator.wikimedia.org/T281094 (10jcrespo) [09:19:59] 10Data-Persistence-Backup, 10Patch-For-Review: xtrabackup --prepare hits open_files_limit on buster - https://phabricator.wikimedia.org/T281094 (10jcrespo) p:05Triage→03High [10:01:18] I will connect in some minutes, I was distracted by the swift latencies incident [10:01:44] continue without me [10:01:52] jynus: the team meeting is later today [10:04:01] oh, ok [10:04:05] that's good :-) [10:07:18] 10DBA, 10Cloud-Services: Querying the logging table on labs is slow - https://phabricator.wikimedia.org/T131266 (10LSobanski) p:05Low→03Lowest [10:07:40] 10DBA, 10Pybal, 10SRE, 10Sustainability: Create a backend check for pybal to monitor the MySQL protocol being up - https://phabricator.wikimedia.org/T165677 (10LSobanski) p:05Medium→03Low [10:07:52] 10DBA: Show transfer time once successfully completed - https://phabricator.wikimedia.org/T258559 (10LSobanski) p:05Low→03Lowest [10:08:01] 10DBA, 10Privacy Engineering, 10SRE, 10WMF-Legal, and 3 others: dbtree loads third party resources (from google.com/jsapi) - https://phabricator.wikimedia.org/T96499 (10LSobanski) p:05Medium→03Lowest [10:08:21] 10DBA, 10Data-Services, 10Patch-For-Review: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617 (10LSobanski) p:05Medium→03Lowest [10:08:26] 10DBA, 10Pybal, 10SRE, 10Sustainability: Create a backend check for pybal to monitor the MySQL protocol being up - https://phabricator.wikimedia.org/T165677 (10LSobanski) p:05Low→03Lowest [10:08:37] 10DBA, 10Patch-For-Review: transferpy: Multiprocess the transfers - https://phabricator.wikimedia.org/T259327 (10LSobanski) p:05Triage→03Lowest [10:09:14] FYI, I'm merging the "backlog (help needed) into "backlog", but setting the tasks to "lowest" priority. [10:09:47] 10DBA: Cleanup empty and bogus databases/tables from external storage - https://phabricator.wikimedia.org/T245732 (10LSobanski) p:05Lowest→03Low [10:10:17] 10DBA, 10Cloud-Services: Provide dynamic report of differences between replica databases and production databases - https://phabricator.wikimedia.org/T57455 (10LSobanski) p:05Lowest→03Low [10:10:33] 10DBA, 10MediaWiki-General, 10Patch-For-Review, 10Schema-change: Convert primary key integers and references thereto from int to bigint (unsigned) - https://phabricator.wikimedia.org/T63111 (10LSobanski) p:05Lowest→03Low [10:12:53] 10DBA, 10Tool-Database-Queries, 10Toolforge: How to correctly read multibyte characters from text fields on DB replicas on Toolforge? - https://phabricator.wikimedia.org/T257103 (10LSobanski) p:05Triage→03Low [10:14:22] 10DBA, 10Data-Services: Provide dynamic report of differences between replica databases and production databases - https://phabricator.wikimedia.org/T57455 (10Majavah) [10:15:46] 10DBA: Setup a global admin account that can only read/have limited privileges to databases for safer debugging - https://phabricator.wikimedia.org/T254756 (10LSobanski) p:05Triage→03Medium [10:17:11] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1125.eqiad.wmnet'] ` The log ca... [10:17:51] 10DBA, 10cloud-services-team (Kanban): Add visitingwatchers to watchlist_count - https://phabricator.wikimedia.org/T150547 (10LSobanski) p:05Triage→03Lowest [10:22:27] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10jcrespo) db1156 should be almost ready for handover to core production, but I am going to take the opportunity to setup a new s2 backup source, now that it ha... [10:22:54] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Thanks! [10:25:21] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) [10:35:44] 10DBA, 10DiscussionTools, 10OWC2020, 10Editing-team (FY2020-21 Kanban Board): DBA review: conversation subscriptions - https://phabricator.wikimedia.org/T263817 (10Marostegui) >>! In T263817#7030301, @matmarex wrote: > Info for DBA review: > > Note that the code is already merged after we reviewed it in... [10:38:15] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) s4 progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1184 [] db1169 [] db1164 [] db1163 [] db1154 [] db114... [10:39:39] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1125.eqiad.wmnet'] ` and were **ALL** successful. [12:18:26] jynus: Hi, for when you have time (if it's in your radar). Can you provide the list of biggest watchlist tables across all wikis? I assume getting from backups should be easier. If not, tell me and I find another way context T258098 [12:18:26] T258098: Purge unused watchlist rows - https://phabricator.wikimedia.org/T258098 [12:18:33] sorry for bothering you [12:19:23] Amir1: if enwiki is done, we should probably optimize the table in codfw to see how much on-disk space we got back [12:19:53] marostegui: for enwiki it was only 3-5% so I'm not sure if it's that useful [12:20:02] wikidatawiki on the other hand should be cut to at least half [12:20:05] Amir1: ah, I thought it was more: ) [12:20:08] Amir1: oh nice [12:20:13] Amir1: is that done too? [12:20:27] I can get it on the backup stats [12:20:43] yup wikidatawiki is mostly, another 10M rows left [12:20:51] let me just do it [12:21:05] Amir1: great! [12:21:16] jynus: Thanks! [12:24:08] it may take me a bit, as the stats table is not optimized for reports (indexed) yet [12:25:13] marostegui: started. It'll take 1-2 hours https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=7&orgId=1&from=now-3h&to=now [12:25:43] Amir1: excellent, ping me on irc once it is done so I can see what difference it makes on a codfw host [12:25:51] sure! [12:25:54] thanks! [12:29:47] Amir1, this is probably enough for what you want? https://phabricator.wikimedia.org/P15523 [12:30:04] Yes [12:30:07] Awesome. [12:30:17] Thanks. [12:31:02] let check by logical size, in case the entropy is different than the actual size [12:37:25] nah, it is mostly the same [12:42:57] commons will be nice clean up given its resource constraints [12:43:13] I think wikidata is pretty healthy atm [12:45:46] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) s1 is fully done only pending the master (T278214) [12:46:36] marostegui: nice :) [13:04:37] 10DBA: generate-mysqld-exporter-config fails on prometheus eqiad - https://phabricator.wikimedia.org/T281128 (10fgiunchedi) [13:05:12] FYI ^ can't tell from the backtrace what's wrong right away, but sth definitely wrong [13:07:12] godog: thanks we are in a meeting [13:07:42] marostegui: ack [13:08:23] I can have a look later [13:08:41] prometheus scrapping itself should be working normally meanwhile [13:08:49] jynus: thanks <3 [13:09:20] "unorderable types: NoneType() < str()" probably some query is returning null instead of an expected string [13:10:11] something wrong on mysql db or tendril being naughty [13:11:13] I have been touching tendril to update test-s4 shard and all that, so maybe I caused it? [13:11:43] yeah, could be, a corner case that wasn't expected [13:11:52] s/tendril/zarcillo/g [13:12:05] I also think I updated zarcillo and gave db1183, db1124 "testing" shard, so maybe that [13:12:08] although testing is a string! [13:12:13] yep, meaning db1115 data in general here [13:25:37] marostegui: Ran some numbers. Commons will be one fourth smaller. https://commons.wikimedia.org/wiki/Commons:Village_pump/Technical#Clean_up_watchlist_of_bots [13:26:05] if some users want to, it might get even smaller [13:26:17] i'll keep you posted on that [13:28:16] still deleting https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=7&orgId=1&from=now-3h&to=now [13:35:34] 10DBA: generate-mysqld-exporter-config fails on prometheus eqiad - https://phabricator.wikimedia.org/T281128 (10Marostegui) I think I fixed it, I changed "testing" as the section assigned to db1183 and db1125 to "test-s4" ` Apr 26 13:31:03 prometheus1003 systemd[1]: Started generates prometheus-mysqld-exporter t... [13:36:52] 10DBA, 10Wikimedia-Rdbms, 10Performance-Team (Radar), 10Services (watching), 10Sustainability (Incident Followup): Fix mediawiki heartbeat model, change pt-heartbeat model to not use super-user, avoid SPOF and switch automatically to the real master without puppe... - https://phabricator.wikimedia.org/T172497 [13:37:07] 10DBA: generate-mysqld-exporter-config fails on prometheus eqiad - https://phabricator.wikimedia.org/T281128 (10Marostegui) 05Open→03Resolved a:03Marostegui It runs well now: ` root@prometheus1003:/lib# /usr/local/sbin/mysqld_exporter_config.py eqiad '/srv/prometheus/ops/targets' root@prometheus1003:/lib# ` [13:48:52] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10gmodena) Hey @eevans, @Marostegui, The full dataset for ImageMatching, generated on 321 wikis, is 2.6GB. It contains `23585365` records. In prod we might want to store multiple snapshots (prev/current months), and possibl... [14:20:46] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10jcrespo) I am running a compare on db1156: ` # tail -n +2 mediawiki-config/dblists/s2.dblist | while read db; do while read table; do echo "$db.$table"; db-c... [14:24:32] marostegui: Done https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=7&orgId=1&from=1619439593690&to=1619445809299 [14:34:59] thanks Amir1 - in a meeting now! [14:35:26] no worries. it has been like this for years :D [14:46:27] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10gmodena) Maybe premature optimisation, but this dataset stores text fields (part of a potential primary key) that can be relatively long (page titles, image names). Do we have guidelines for hashing/storing long keys? Bel... [15:22:01] "Last snapshot for s2 at eqiad (db1156.eqiad.wmnet) taken on 2021-04-26 11:58:44 is 985 GB, but previous one was 834 GB, a change of 18.2%" [15:22:42] ^18.2% size reduction after logical restore (defragmentation) :-) [15:22:50] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) Excellent, thanks. It will take around a day I'd guess. [15:59:10] 10DBA: generate-mysqld-exporter-config fails on prometheus eqiad - https://phabricator.wikimedia.org/T281128 (10jcrespo) I had to add db1156 and db1102:s2 to zarcillo, and FYI, I changed the group ofdb1183 and db1125 (not the section, I didn't touch that from manuel's) from 'core' to 'test'- as it may not be ad... [16:24:10] 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) [18:11:25] marostegui: if you're still around, I'm ready to do the lists-next upgrade [18:14:23] or kormat ^ [21:28:29] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10Eevans) >>! In T280042#7034010, @gmodena wrote: > Hey @eevans, @Marostegui, > > The full dataset for ImageMatching, generated on 321 wikis, is 2.6GB. It contains `23585365` records. To be clear, a //record// as it is ref... [21:37:48] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10Eevans) >>! In T280042#7034256, @gmodena wrote: > Maybe premature optimisation, but this dataset stores text fields (part of a potential primary key) that can be relatively long (page titles, image names). Do we have guide... [22:01:33] PROBLEM - MariaDB sustained replica lag on pc2007 is CRITICAL: 3 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [22:03:57] RECOVERY - MariaDB sustained replica lag on pc2007 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104