[01:30:22] 10DBA, 06Operations, 10Phabricator: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2581152 (10Dzahn) [05:34:01] 10Blocked-on-schema-change, 10DBA, 10ArchCom-RfC, 10Wikimedia-Site-requests, and 2 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2581292 (10RobLa-WMF) During E263, Jaime (@jcrespo) put the following choice to us. Should he: a. Apply this change to all wikis b... [07:08:58] 10Blocked-on-schema-change, 10DBA, 10ArchCom-RfC, 10Wikimedia-Site-requests, and 2 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2581426 (10Nikerabbit) > 21:33:56 jynus: I'm not sure why people seemed to want this optional, perhaps it was only a sugg... [07:12:29] 10Blocked-on-schema-change, 10DBA, 10ArchCom-RfC, 10Wikimedia-Site-requests, and 2 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2581428 (10jcrespo) I hope with this, (a ratification of Mediawiki's Architecture comittee), that is perfectly //documented and annou... [07:16:56] 10DBA: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2581442 (10jcrespo) [07:20:58] 10DBA: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2581459 (10jcrespo) ``` mysql -BN -h db1077 -e "SHOW PROCESSLIST" | awk '{print $3}' | cut -d':' -f1 | sort | uniq -c | sort -nr | head -n10 12 10.64.32.34 12 10.64.32.33 11 10.64.32.149 9 10.64.48.141... [07:23:08] 10DBA: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2581460 (10jcrespo) For starters, snapshot1006.eqiad.wmnet is accessing a non-dump host; this is a bug, but I do not think is the problem here. [07:33:38] 10DBA, 06Discovery, 06Services: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2581464 (10jcrespo) The queries seem to be, at least in part Title::loadRestrictions from job runners. Potential offenders: * RestbaseUpdateJobOnDependencyChange * cirrusSearchCheckerJob Please... [07:59:01] 10DBA, 06Discovery, 06Services: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2581551 (10ArielGlenn) >>! In T143862#2581460, @jcrespo wrote: > For starters, snapshot1006.eqiad.wmnet is accessing a non-dump host; this is a bug, but I do not think is the problem here. Maybe... [08:04:36] 10DBA, 06Discovery, 06Services: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2581556 (10Gehel) On the CirrusSide, here is what I know: * 18:44 UTC: [[ https://wikitech.wikimedia.org/w/index.php?title=Server_Admin_Log&diff=818236&oldid=818235 | Config change ]] to send Mo... [08:16:03] 10DBA, 06Discovery: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2581572 (10jcrespo) ``` jynus: T143862 is likely related to the saneitizer issue. dcausse is looking into it. ``` [08:59:16] 10DBA, 10Dumps-Generation: Some dump hosts are accessing main traffic servers - https://phabricator.wikimedia.org/T143870#2581645 (10jcrespo) [11:10:40] 10DBA, 10Dumps-Generation: Some dump hosts are accessing main traffic servers - https://phabricator.wikimedia.org/T143870#2581968 (10ArielGlenn) I camped on one of the s3 slave dbs and watched for any queries from snapshot1005/6 but nada. Can you grab an example for me? Or give me a cheater's way to grab an... [11:11:21] 10DBA, 10Dumps-Generation: Some dump hosts are accessing main traffic servers - https://phabricator.wikimedia.org/T143870#2581969 (10ArielGlenn) p:05Triage>03Normal a:03ArielGlenn [11:12:11] 10DBA, 10Dumps-Generation: Some dump hosts are accessing main traffic servers - https://phabricator.wikimedia.org/T143870#2581982 (10jcrespo) Could it be a reverse dns-caching bug? I will check the statistics to give you suspected IPs. [11:12:29] 10DBA, 10Dumps-Generation: Some dump hosts are accessing main traffic servers - https://phabricator.wikimedia.org/T143870#2581984 (10ArielGlenn) I've claimed this bug for now for anything that is not related to wikidata dumps out of cron. That means any queries not coming from snapshot1007. When I've got that... [11:13:00] 10DBA, 06Discovery, 13Patch-For-Review: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2581442 (10mobrovac) Around that time (2016-08-24T19:30Z) we deployed #changeprop also, which was going through a backlog with elevated speed (~2k reqs/sec), but most them were Varnish pu... [11:32:39] 10DBA, 10Dumps-Generation: Some dump hosts are accessing main traffic servers - https://phabricator.wikimedia.org/T143870#2582030 (10jcrespo) These are the stats since last restart for db1077 (s3), I can check other host, too: ``` $ mysql -h db1077 sys -e "SELECT * FROM host_summary_by_statement_type \ WHERE... [12:10:01] 10DBA, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2582068 (10dcausse) p:05Triage>03Unbreak! [12:11:06] 10DBA, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2582071 (10dcausse) raising to UBN, https://gerrit.wikimedia.org/r/306649 should be deployed before wmf16 reaches group2. [12:11:31] 10DBA, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2582072 (10dcausse) a:03dcausse [12:49:15] 10DBA, 10Dumps-Generation: Some dump hosts are accessing main traffic servers - https://phabricator.wikimedia.org/T143870#2582135 (10ArielGlenn) db1077 was the host I was watching. so that's perfect. I really want to know what those selects were. Can we get a full process entry on one of those? [12:52:55] 10DBA, 10Dumps-Generation: Some dump hosts are accessing main traffic servers - https://phabricator.wikimedia.org/T143870#2582139 (10jcrespo) I think so, it is just it is not very immediate because `149.29 ms` queries are not caught by the current monitoring, and the new one is still WIP (its backend active, b... [13:45:18] 10Blocked-on-schema-change, 10DBA, 10ArchCom-RfC, 10Wikimedia-Site-requests, and 2 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2582349 (10MZMcBride) Legoktm made this edit yesterday: 10DBA, 06Operations: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master - https://phabricator.wikimedia.org/T141968#2582454 (10jcrespo) [14:15:28] 10DBA, 10Monitoring, 06Operations: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master - https://phabricator.wikimedia.org/T141968#2518246 (10jcrespo) [14:16:39] 10DBA, 10Monitoring, 06Operations: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master - https://phabricator.wikimedia.org/T141968#2582477 (10jcrespo) [16:36:50] 10DBA, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2582873 (10EBernhardson) I've deleted the relevant job queues across all wiki's which should reduce the load for now. Until the above patch is deployed though... [16:53:23] 10DBA, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2582929 (10jcrespo) I confirm it worked: {F4400172} [17:00:37] 10DBA, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review, 05WMF-deploy-2016-08-30_(1.28.0-wmf.17): s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2582988 (10jcrespo) You can ignore the subtask and close this independently, I just wanted to write a follow-up to... [17:22:40] 10DBA, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#2583070 (10jcrespo) >>! In T54921#2444234, @Nemo_bis wrote: >>>! In T54921#2428438, @jcrespo wrote: >> The following tables haven't yet been updated this month, some... [17:39:25] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2211884 (10demon) Confirmed: the `hitcounter` table, `page.page_counter` field and `site_stats.ss_total_views` field were all completely removed from MW core (about 2 major release cycl... [17:51:48] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2583193 (10jcrespo) Thank you demon, this will help solving one of the most important issues with unused tables (do to its specific nature, causing outages). [19:07:53] 10DBA, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review, and 2 others: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862#2583568 (10dcausse) 05Open>03Resolved [20:03:55] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search-Backlog, 03Discovery-Search-Sprint: MySQL chooses poor query plan for link counting query - https://phabricator.wikimedia.org/T143932#2583671 (10EBernhardson) [20:04:47] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search-Backlog, 03Discovery-Search-Sprint: MySQL chooses poor query plan for link counting query - https://phabricator.wikimedia.org/T143932#2583686 (10EBernhardson) @jynus Is there anything we can do to help mysql generate better query plans here? I can ad... [20:16:05] 10DBA, 06Labs: s2 replag currently 8 hours - https://phabricator.wikimedia.org/T143934#2583734 (10valhallasw) [20:20:09] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search-Backlog, and 2 others: MySQL chooses poor query plan for link counting query - https://phabricator.wikimedia.org/T143932#2583753 (10EBernhardson) example queries can be sourced from: https://logstash.wikimedia.org/goto/e58270ea1df1dd8d31b6b262674326fa [20:34:20] 10DBA, 10CirrusSearch, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review: MySQL chooses poor query plan for link counting query - https://phabricator.wikimedia.org/T143932#2583829 (10ksmith) [21:16:08] 10DBA, 06Labs: s2 replag currently 8 hours - https://phabricator.wikimedia.org/T143934#2584028 (10AlexMonk-WMF) [21:16:13] 10DBA, 06Labs, 06Operations, 07Tracking: Database replication services (tracking) - https://phabricator.wikimedia.org/T50930#2584027 (10AlexMonk-WMF) [23:54:58] 10DBA, 06Community-Tech-Tool-Labs, 10Striker, 13Patch-For-Review: Create production database and users for Striker - https://phabricator.wikimedia.org/T142545#2584649 (10bd808) 05Open>03Resolved a:03jcrespo