[02:58:45] 10DBA, 06MediaWiki-Platform-Team, 10MediaWiki-Special-pages, 10Wikimedia-Site-requests, and 2 others: "Invalid DB key" errors on various special pages - https://phabricator.wikimedia.org/T155091#3204719 (10TTO) I understand this week is going to be very busy for the #MediaWiki-Platform-Team due to the data... [06:14:07] 10DBA, 10AbuseFilter, 06Performance-Team, 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3204792 (10Marostegui) >>! In T116557#3204426, @jcrespo wrote: > ``` > root@db2062[enwiki]> ANALYZE TABLE revision;... [06:49:54] 10DBA, 10Wikidata: Repeated reports of wikidatawiki (s5) API going read only - https://phabricator.wikimedia.org/T123867#3204830 (10Ladsgroup) Shard and datacenter is wrong in the grafana link. I think the correct one is: https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?panelId=6&fullscreen&orgId=1&... [07:01:38] 10DBA, 10AbuseFilter, 06Performance-Team, 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3204852 (10jcrespo) I don't see that: ``` root@db1080[enwiki]> EXPLAIN SELECT /* AFComputedVariable::{closure} */ rev_user_text FROM `revision` WHERE... [07:02:05] ^ wtf!!!!!! [07:02:10] How's that even possible?! [07:02:56] db1067 is as you say [07:03:01] 10DBA, 10AbuseFilter, 06Performance-Team, 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3204853 (10Marostegui) That is _so strange_ same query, just some time later and now it filesorts?!!! [07:04:23] but how is that even possible, db1080 wasn't filesorting an hour ago?! [07:04:45] it is not filesorting for me now ????? [07:05:40] mm it is not the same query, the rev_page is different [07:06:42] 10DBA, 10AbuseFilter, 06Performance-Team, 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3204856 (10Marostegui) It is not the same query: ``` rev_page = '1743794' ``` vs ``` rev_page= '17437194' ``` [07:08:45] 10DBA, 10Wikidata: Repeated reports of wikidatawiki (s5) API going read only - https://phabricator.wikimedia.org/T123867#3204857 (10Ladsgroup) Thanks. [[ https://tendril.wikimedia.org/report/slow_queries?host=%5Edb&user=wikiuser&schema=wikidatawiki&qmode=eq&query=&hours=30 | This is a sample of slow queries in... [07:56:24] 10DBA, 10MediaWiki-General-or-Unknown: Timeout in WikiPage::insertRedirectEntry after move - https://phabricator.wikimedia.org/T163597#3204920 (10Marostegui) >>! In T163597#3203277, @Umherirrender wrote: > It is punctual, the user gets an error 3 times, on the 4. move it works If this is a one time issue, sha... [08:16:00] 10DBA, 10MediaWiki-API, 07Performance: action=query&list=pagepropnames really slow on a big wiki, got error with ppnlimit=500 - https://phabricator.wikimedia.org/T115825#1733673 (10jcrespo) [08:49:44] 10DBA: Reclone db1062 from db1041 (s7 master) - https://phabricator.wikimedia.org/T163665#3205105 (10Marostegui) [09:29:38] 10DBA: Reclone db1062 from db1041 (s7 master) - https://phabricator.wikimedia.org/T163665#3205237 (10Marostegui) Copying db1062's sqldata to dbstore1001:/srv/tmp as a backup [10:23:04] 10DBA, 10MediaWiki-API, 07Performance: action=query&list=pagepropnames really slow on a big wiki, got error with ppnlimit=500 function: /* ApiQueryPagePropNames::execute */ - https://phabricator.wikimedia.org/T115825#3205333 (10Ladsgroup) From what I see in the query plan it looks like a [[ https://dev.mysql... [10:32:30] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3205347 (10Dereckson) [10:32:38] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#2212857 (10Dereckson) In a few minutes, pt.wikimedia.org will temporarily redirect to pt.wikipedia.org (as handled again by our application server, with an entrypoint happy... [10:32:43] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3205349 (10Dereckson) [11:38:58] jynus marostegui: Hey, regarding https://phabricator.wikimedia.org/T115825 It's a known problem in optimizer: https://dba.stackexchange.com/questions/59347/why-does-mysql-fail-to-use-an-index-effectively-when-limit-offset-is-large http://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down [11:38:58] https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/ [11:39:23] I want to drop limit from the db query and handle it in php instead (even though it sounds horrible) [11:39:52] Before doing so, I just wanted to ask if you have any suggestions that might help [11:39:57] no, it is not that [11:40:14] the query mentioned doesn't have an offset [11:40:21] * jynus in a meeting [11:40:25] I actively tested it, by removing limit it uses the loose index [11:40:50] you linked to answers about an offset [11:40:51] it is not that [11:41:22] hmm, yeah, sorry [11:41:28] let me check and come back to you [11:43:54] https://bugs.mysql.com/bug.php?id=61517 [11:44:00] This looks related [11:57:31] can you get the EXPLAIN / Handler stats to compare it? [12:03:12] jynus: can you double check: db2018.codfw.wmnet (s3) master, stop slave; reset slave all; [12:04:08] no rush, I am going to get lunch now [12:04:11] just leaving it here :) [12:07:16] yes, that should be ok [12:07:47] I always do sHOW SLAVE STATUS just in case I mistype on the wrong host, but I assume you do the same [12:07:56] hehe yes [12:08:00] i also do select @@hostname [12:18:53] jynus: the explains are identical to https://phabricator.wikimedia.org/T115825#2424974 (I pasted one in the duplicated phab card) when I remove the limit, it uses the loose index [12:19:04] if you still need it, I can get it [12:31:09] maybe we can upgrade to 5.6 and test if it works (labs is 5.5) someone there said they can't reproduce it in 5.6 [13:05:03] 07Blocked-on-schema-change, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3205837 (10Marostegui) db1087 is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql --sk... [13:16:01] 10DBA, 10MediaWiki-API, 07Performance: action=query&list=pagepropnames really slow on a big wiki, got error with ppnlimit=500 function: /* ApiQueryPagePropNames::execute */ - https://phabricator.wikimedia.org/T115825#3205925 (10Ladsgroup) ``` ladsgroup@naos:~$ mysql --help mysql Ver 14.14 Distrib 5.5.54, fo... [13:16:38] 10DBA, 07Wikimedia-log-errors: metawiki: Error: 1146 Table 'dtywiki.linter' doesn't exist (10.192.32.110) - https://phabricator.wikimedia.org/T163688#3205935 (10hashar) [13:20:35] 10DBA, 10MediaWiki-extensions-Linter, 07Wikimedia-log-errors: metawiki: Error: 1146 Table 'dtywiki.linter' doesn't exist (10.192.32.110) - https://phabricator.wikimedia.org/T163688#3205949 (10hashar) Seems that is due to #linter being deployed an hour or so ago https://gerrit.wikimedia.org/r/#/c/347217/5 [13:23:13] 10DBA, 10MediaWiki-API, 07Performance: action=query&list=pagepropnames really slow on a big wiki, got error with ppnlimit=500 function: /* ApiQueryPagePropNames::execute */ - https://phabricator.wikimedia.org/T115825#3205954 (10jcrespo) @Ladsgroup - there is no mysql installed on naos, what you are seeing is... [13:23:59] jynus: Mysql 5.7 released only 13 days ago [13:24:10] https://en.wikipedia.org/wiki/MySQL [13:24:21] (I didn't change the date, check history :D) [13:24:27] Amir1, are you a bot? [13:24:30] Amir1: https://dev.mysql.com/doc/relnotes/mysql/5.7/en/ [13:24:42] no, why? [13:24:56] I can say I pass the Turing test [13:24:57] https://en.wikipedia.org/wiki/MySQL#cite_note-42 [13:25:22] replication on db2018 has been reset, so db2018 (s3) is no longer a slave of eqiad [13:25:52] I am going to ask you to calm down, and double check the patches you are sending for review [13:25:57] "Document generated on: 2017-04-22 (revision: 11471)" [13:26:06] marostegui, I think he is a bot [13:26:38] Maybe I'm missing something obvious here [13:27:47] Anyway. I rest and try to see what's I'm doing wrong [13:46:21] 10DBA, 06Labs, 10Labs-Infrastructure: ug_expiry column of the user_groups table is not present on Labs - https://phabricator.wikimedia.org/T160686#3206005 (10Marostegui) Better to be tracked here: T155605 [14:21:48] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339#3206135 (10Marostegui) @Papaul you can do the maintenance on db2043 and db2061 now. They have been depooled. Please let me know when it is done,... [14:31:20] jynus: going to deploy alter table on db1075 on etwiki.watchlist [14:32:07] +1 [14:32:29] done [14:32:45] checking labs, dbstores and all that now [14:34:43] dbstore looks good, watchlist isn't replicated to labs [14:35:55] true [14:36:08] I have a horrible terbium script summarizing it [14:36:17] that I have yet to puppetize [14:43:46] 07Blocked-on-schema-change, 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, and 3 others: Add wl_id to watchlist tables on production dbs - https://phabricator.wikimedia.org/T130067#3206224 (10Marostegui) We have done the first test deploying this directly on the master in eqiad. What we have don... [14:45:06] of all the things I said I would do, any preference? dbstore1002? [14:45:27] I mean dbstore1001 [14:45:38] I would go for dbstore1001 [14:46:18] we didn't talk about db1031 :-/ [14:46:28] x1-eqiad-master [14:46:34] and master of dbstore1001 [14:46:35] oh [14:46:40] that is true [14:46:46] not sure we have nodes... [14:47:10] yeah, we lost db1057…or whatever hostname it was [14:47:14] we'd need to double check yes [14:47:17] if we have nodes [14:47:35] planning: # missing 2 (smaller) servers for x1 [14:47:47] 57 was going to be for m1 [14:48:24] db1050 ? [14:48:46] at least until next fiscal yar [14:49:00] oh, that could be a good one [14:49:09] but it is more work [14:49:16] reimage it, etc. [14:49:30] we can try to see if we have time for it [14:49:51] it is going to be tight [14:49:54] for now I will leave dbstore1001 replicationg from it [14:50:02] yeah [15:24:49] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339#3206426 (10Papaul) @Marostegui we are clear for db2061 Tower A Loads: X 11.16 Y 8.61 Z 10.46 Tower B Loads: X 11.03 Y 7.93 Z 10.66 no more w... [15:25:32] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw - https://phabricator.wikimedia.org/T163339#3206434 (10Marostegui) >>! In T163339#3206426, @Papaul wrote: > @Marostegui we are clear for db2061 > > Tower A Loads: X 11.16 Y 8.61 Z 10.46... [15:27:13] do you have an eta for db1062 to be up again? [15:27:26] with the new data? [15:27:32] I can bring it up now really [15:27:38] Still mysqldumping on db1041 [15:27:40] in whatever state it is [15:27:43] sure [15:27:46] let me bring it back [15:27:54] what I mean is [15:28:05] I need to put dbstore1001 pointing to it [15:28:16] should I wait? [15:28:17] ah, yeah, but not yet, because it doesn't have the right data :( [15:28:20] ok [16:11:19] 10DBA, 10AbuseFilter, 06Performance-Team, 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3206807 (10Anomie) >>! In T116557#3204495, @jcrespo wrote: > CC @Anomie because I think he was hoping 10.1 would solve this or very similar issues, but it doe... [16:18:23] 10DBA, 10AbuseFilter, 06Performance-Team, 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3206859 (10jcrespo) Let's add a force- If someone tells me where that code is, I can add it. [16:59:03] 10DBA, 10AbuseFilter, 06Performance-Team, 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3206980 (10Anomie) >>! In T116557#3206859, @jcrespo wrote: > Let's add a force- If someone tells me where that code is, I can add it. Looks like it's [[https... [17:17:59] 10DBA, 10AbuseFilter, 06Performance-Team, 13Patch-For-Review: AFComputedVariable::compute query timeouts - https://phabricator.wikimedia.org/T116557#3207084 (10jcrespo) Thank you- seeing the code, I think there may be a change for a more elegant solution later- as the query is not really what it is wanted,... [19:24:01] 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, 06TCB-Team, and 3 others: Add wl_timestamp to the watchlist table - https://phabricator.wikimedia.org/T125991#2002327 (10jcrespo) How stable is this? Stable enough to deploy the schema on WMF before merging, or are there people that doesn't agree th...