[06:11:16] 10DBA, 10Data-Services, 10Quarry: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Marostegui) >>! In T246970#5977262, @Mike_Peel wrote: > Looking at CPU usage at https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&from=now-30d&to=now I can't... [06:22:15] 10DBA, 10Operations, 10Patch-For-Review, 10codfw-rollout: [RFC] improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10Marostegui) [06:47:46] 10DBA, 10Operations, 10Patch-For-Review, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Marostegui) >>! In T247787#5977227, @wiki_willy wrote: > Sure, that works for me @Marostegui . Feel free to shoot open... [06:54:43] 10DBA, 10Operations, 10Patch-For-Review, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['pc1008.e... [08:16:07] 10DBA, 10Operations, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['pc1008.eqiad.wmnet'] ` and were **ALL** successful. [08:39:15] 10DBA, 10Upstream, 10mariadb-optimizer-bug: Slow query on 10.4: SpecialRecentChanges::doMainQuery - https://phabricator.wikimedia.org/T246069 (10Marostegui) >>! In T246069#5914694, @Marostegui wrote: > Created this: https://jira.mariadb.org/browse/MDEV-21813 There's been movement on this bug: https://jira.m... [08:48:40] did you test disk performance after reimage? [08:53:06] I am on it :) [08:53:11] ABout to finish it [08:59:51] 10DBA, 10Operations, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Marostegui) @wiki_willy I was able to destroy and recreate the RAID myself. So no further needs are expected at this point. While the raid w... [09:10:45] 10DBA, 10Operations, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Marostegui) So, given that pc1008 looks ok from those tests my proposal is: - Let pc1008 (pc2008 replicates from pc1008) replicate for maybe... [09:40:44] 10DBA: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 (10Marostegui) [09:40:55] 10DBA: Install 1 buster+10.4 host per section - https://phabricator.wikimedia.org/T246604 (10Marostegui) 05Open→03Resolved This is done [11:16:17] 10DBA, 10Operations, 10Wikimedia-Incident: investigate pc1008 for possible hardware issues / performance under high load - https://phabricator.wikimedia.org/T247787 (10Marostegui) p:05Triage→03Medium a:03Marostegui [12:12:48] 10DBA, 10wikitech.wikimedia.org: Move databases for wikitech (labswiki) and labstestwiki to a main cluster section (s5?) - https://phabricator.wikimedia.org/T167973 (10Marostegui) `labtestwiki` no longer lives in m5 {T233236} ` root@cumin1001:/home/marostegui# mysql.py -hdb1133 -e "show databases like '%wik%'"... [12:14:20] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section (s5?) - https://phabricator.wikimedia.org/T167973 (10Marostegui) [12:14:37] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Marostegui) [12:21:18] 10DBA, 10Wikidata, 10wikidata-tech-focus: Set wb_changes_dispatch ROW_FORMAT=COMPRESSED on install and update - https://phabricator.wikimedia.org/T207006 (10Marostegui) 05Stalled→03Declined Going to decline this for now. We are compressing everything as part of T232446 [12:21:25] 10DBA, 10MediaWiki-General, 10Operations, 10Wikidata, and 6 others: Investigate decrease in wikidata dispatch times due to eqiad -> codfw DC switch - https://phabricator.wikimedia.org/T205865 (10Marostegui) [13:33:11] 10DBA, 10OTRS, 10Operations, 10Recommendation-API, 10Research: Upgrade and restart m2 primary database master (db1132) - https://phabricator.wikimedia.org/T246098 (10Marostegui) m2 eqiad proxies that will require reload: dbproxy1015: active dbproxy1013: passive m2 codfw proxy requires no action. Hosts... [14:01:41] marostegui: you can start with test wikidatawiki (s3 I think) [14:02:10] I can rename it on codfw on s8 and s3 [14:04:17] Awesome [14:04:42] Should I !log to that EPIC task? [14:04:55] yeah [14:04:57] why not :D [14:06:59] I will as well include s4 anyways [14:10:24] the most important thing is that replicas in labs should have it as tool builders are still using it (we announced deprecation a year ago though but you know) [14:12:40] yeah, that's no problem [14:25:04] 10DBA, 10Wikidata, 10wikidata-tech-focus, 10User-Addshore: [EPIC] Kill the wb_terms table - https://phabricator.wikimedia.org/T208425 (10Marostegui) Following up the conversation on IRC with @Ladsgroup I have renamed `wb_terms` to `T208425_wb_terms` on the following hosts and wikis (all in codfw): ` root@... [14:35:56] 10DBA, 10Data-Services, 10Quarry: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Mike_Peel) >>! In T246970#5978191, @Marostegui wrote: > Not sure if you were looking at the right side, the host involved here is labsdb1011, which has a high CPU usage @Maro... [14:37:08] 10DBA, 10Quarry, 10Performance Issue: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) Keep in mind that you are using a shared resource, which is currently being used a lot: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&from=now-12h&to=now&var-dc=eqia... [14:37:25] 10DBA, 10Data-Services, 10Quarry: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) [14:38:39] 10DBA, 10Data-Services, 10Quarry: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10MBH) May it be the same as T246970? [14:39:01] 10DBA, 10Data-Services, 10Quarry: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Marostegui) >>! In T246970#5979745, @Mike_Peel wrote: >>>! In T246970#5978191, @Marostegui wrote: >> Not sure if you were looking at the right side, the host involved here is... [14:40:14] 10DBA, 10Data-Services, 10Quarry: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) >>! In T247978#5979761, @MBH wrote: > May it be the same as T246970? Yes, most likely. This might be useful for your situation too: T246970#5977729 [15:12:17] 10DBA, 10Data-Services, 10Quarry: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) I have tried your query on the other hosts and it takes around 2-3 minutes to complete. But it is also true that they are way less loaded. Just a quick command: ` [15:11:29] maros... [15:28:42] 10DBA, 10Cloud-Services, 10CPT Initiatives (API Gateway): Prepare and check storage layer for dev.wikimedia.org - https://phabricator.wikimedia.org/T246946 (10CCicalese_WMF) [15:31:15] 10DBA, 10Data-Services, 10Quarry: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10bd808) >>! In T247978#5979931, @Marostegui wrote: > The only thing I can think of to discard labsdb1011's specific host issues, would be to point quarry to another host for a few days (we'd n... [15:36:36] 10DBA, 10Data-Services, 10Quarry: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) >>! In T247978#5980018, @bd808 wrote: >>>! In T247978#5979931, @Marostegui wrote: >> The only thing I can think of to discard labsdb1011's specific host issues, would be to point... [17:21:00] 10DBA, 10MediaWiki-General, 10PostgreSQL, 10Schema-change, 10Wikimedia-database-error: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441 (10Reedy) [17:21:13] 10Blocked-on-schema-change, 10GlobalUsage, 10Patch-For-Review, 10Schema-change, 10User-DannyS712: GlobalUsage table `globalimagelinks` lacks a primary key - https://phabricator.wikimedia.org/T243987 (10Reedy) 05Stalled→03Open Patch C+2'd, marking open. Noting it's already a PK in WMF prod for common... [17:32:02] 10Blocked-on-schema-change, 10GlobalUsage, 10Schema-change, 10User-DannyS712: GlobalUsage table `globalimagelinks` lacks a primary key - https://phabricator.wikimedia.org/T243987 (10DannyS712) [17:47:12] We should look at this once we drop it from master: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=db1109&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&fullscreen&panelId=12&from=now-90d&to=now [17:57:15] marostegui: ^ Can I jfdi applying the patch to testcommonswiki? As it's an empty table etc [17:58:42] Reedy: we should just verify the PK is not existing on any host, for whatever reason, as that would break replication [17:58:50] I assume not, but probably worth double checking [17:58:52] heh [17:59:05] I mean, for commons itself, that wouldn't surprise me [17:59:14] for testcommons, it would [17:59:40] I guess probably not, but should be a quick check