[11:33:54] <_joe_> ema, vgutierrez users reporting TLS errors starting today on #wikimedia-tech [11:34:03] <_joe_> can you please join? [13:26:29] Waiting: [13:26:29] 9715 ms [13:26:32] while loading WP [13:26:36] from here, at home [13:27:01] in interview right now, but seems problematic and potentially correlated with the above reports [13:27:06] ema, vgutierrez ^ [13:27:22] :? [13:27:32] loading en.wp takes 10 seconds here [13:27:49] ack, could you check the cp server that's assigned to your IP? [13:27:55] and network-wise (traceroute etc.) looks fine [13:27:57] x-cache header should point it out [13:28:53] cp1087 pass, cp3032 pass, cp3032 pass [13:29:08] server: mw1253.eqiad.wmnet [13:29:11] thx :) [13:34:56] paravoid: is it a persistent issue? cp3032 got upgraded ~50 minutes ago [13:35:03] it is [13:35:16] (still in interview) [13:35:23] ok sorry [13:36:10] 12173 ms now [13:44:58] seeing that's persistent for you.. I'm going to downgrade cp3032 and check if that helps [13:45:11] but it doesn't look good :_) [14:01:55] ok, done [14:02:24] what's the current status? [14:02:55] vgutierrez: ^ [14:48:10] <_joe_> Krinkle is not here though [14:50:34] _joe_: o/ [14:50:39] hey [14:50:49] so a few users have been complaining of slowness in #-tech [14:50:59] <_joe_> so, some users, and just some users, among which paravoid, see pages consistently slow [14:51:01] so we have a small fraction of users seeing a very high Backend-Timing (10+ seconds), reproducible on mwdebug with profiles on [14:51:03] and I experienced it myself, and I've been trying to troubleshoot [14:51:18] https://performance.wikimedia.org/xhgui/run/view?id=5d541e293f3dfa731b8c5810 [14:51:21] I get ~8-9s consistently for one particular page ( https://en.wikipedia.org/wiki/Greece ), and as a high as 15s [14:51:22] <_joe_> looking at profile data, I think it's that for those users we re-pare the pages [14:51:40] <_joe_> paravoid: all pages or just some? [14:51:43] the profiling one took 29.9s, and timed out [14:51:53] just this one (Greece) is the one I've reproduced it with [14:52:13] <_joe_> try the obama page? [14:52:46] <_joe_> Krinkle: normal profile: https://performance.wikimedia.org/xhgui/run/view?id=5d541ead3f3dfa731b00e268 [14:52:59] <_joe_> profile when ran by paravoid https://performance.wikimedia.org/xhgui/run/view?id=5d541e293f3dfa731b8c5810 [14:53:12] also from that profiling run: https://tty.gr/s/enwiki-greece-timeout.png [14:53:35] <_joe_> that's probably mwdebug being awesomely underpowered [14:53:48] <_joe_> my fault then :) [14:54:00] we're going to re-upgrade cp3032 nginx btw, since it was the only downgraded one in the fleet and doesn't seem to be causative [14:54:40] Krinkle: another data point is that I can reproduce with just -H Cookie: enwikiUserName= enwikiUserID= centralauth_User= centralauth_Token= enwikiSession= centralauth_Session= and nothing else [14:54:41] <_joe_> the point is - it shouldn't be re-parsed in the first place [14:56:20] I'm looking at some graphs [14:56:30] https://grafana.wikimedia.org/d/000000580/apache-backend-timing?orgId=1 [14:57:39] <_joe_> Krinkle: if as I think this is limited to some users, it's probably not noticeable much there [14:57:51] https://grafana.wikimedia.org/d/000000038/navigation-timing-by-platform?refresh=5m&orgId=1&var-metric=responseStart&var-platform=desktop&var-users=authenticated [14:58:30] https://grafana.wikimedia.org/d/000000106/parser-cache?orgId=1&from=now-24h&to=now [14:58:32] are you looking at graphs to assess impact? [14:58:41] Yes, and also for possible clues to narrow it down [14:58:59] btw there's a sync-wikiversions entry there on the graph at 13:31. I thought I didn't see any deploys in ops logs [14:59:21] those graphs are yet to render for me [14:59:41] bblack: the metrics from the appservers began to be elevated at 12:54 or so [14:59:43] It seems latest Chrome has a problem with grafana, they keep locking my laptop completely [14:59:45] using Firefox instead [15:00:04] https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=10&fullscreen&edit&orgId=1&from=now-6h&to=now [15:00:30] That increase, cdanis , aligns with the drop in parser cache [15:00:30] https://grafana.wikimedia.org/d/000000106/parser-cache?panelId=3&fullscreen&orgId=1&from=now-2d&to=now [15:00:38] which underwent a sharp drop for the first time in several days [15:00:40] unsure why [15:00:52] parsercache disk free space? [15:01:07] oh, it's inversed, ugh [15:01:11] thouhgt it was space used [15:01:34] pc hit rate did go down some [15:01:41] I have no idea if related, but in the interest of highlighting timing correlations: [15:01:46] 12:24 < James_F|Away> !log We're going to try making a new wiki. T212881 [15:01:46] T212881: addWiki.php broken creating ES tables - https://phabricator.wikimedia.org/T212881 [15:01:49] it seems more like a symptom though [15:01:56] 12:47 < James_F> !log Wiki creation is still not working correctly, unfortunately. [15:02:20] 13:50 < James_F> (We're still testing in prod, sorry anyone that wants to do something.) [15:02:44] Yeah, sorry, deploys all done. [15:03:06] and then as part of that: https://phabricator.wikimedia.org/T212881 went out around 13:50 [15:03:19] or something [15:03:23] some patch referencing that [15:03:23] * James_F looks at Reedy. [15:03:40] ok, stupid question but... did anyone check logstash for anything out of the ordinary? [15:03:56] paravoid: I quickly glanced and nothing stood out, but I'd welcome others to do so [15:04:13] <_joe_> I did as well earlier for unrelated reasons [15:05:13] <_joe_> it's pretty clear imho that we're considering parsercache invalid for some users [15:05:27] <_joe_> paravoid: do you have a specific language setting for your user? [15:05:32] no [15:05:44] yeah, nothing stands out at logstahs/mediawiki-errors. both with and without php timeouts included no spike [15:05:45] <_joe_> I hoped a non-latin language was in play :/ [15:06:25] <_joe_> paravoid: I would be tempted to suggest to log out / log in [15:06:27] The parser cache miss category did not go up for rejection/invalid. [15:06:30] It only went up for expired [15:06:42] <_joe_> or maybe log in form a private session [15:06:58] <_joe_> and see if the problem persists [15:07:01] and some for absent. [15:07:18] So that might suggest a bulk of content just expired all at once, possibly due to being populated around the same time N days in the past? [15:07:26] I think it'd be better not to, as to have someone with a reproducible case :) [15:07:33] That doesn't connect with the curl example people mentioned earlier though [15:07:41] paravoid: you could log in again in an incognito window? [15:07:43] it's 100% reproducible here fwiw [15:07:54] paravoid: can you show again which one? [15:07:58] which one what? [15:08:04] the repro case you are using and how [15:08:08] <_joe_> ok so [15:08:49] I'd look for logstash under the hostname of the server that served the request (non-debug), and maybe also via requestId. [15:08:54] curl -vv 'https://en.wikipedia.org/wiki/Greece' -H 'Cookie: enwikiUserName=... enwikiUserID=...; centralauth_User=...; centralauth_Token=...; enwikiSession=...; centralauth_Session=...' 2>&1 |grep backend-timing [15:08:58] < backend-timing: D=10822692 t=1565795317648302 [15:09:06] multiple MWs [15:09:06] <_joe_> I just logged in in a private browser window, and I cannot reproduce the problem [15:09:33] <_joe_> paravoid: can you try to log in from another session and see if the problem is tied to your session or to your user? [15:10:24] just did, still reproducible from incognito [15:10:28] https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&from=1565786732399&to=1565801943823&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=es1&var-shard=es3&var-role=All&panelId=8&fullscreen [15:10:31] <_joe_> wtf [15:10:32] I'm getting a very quick response with that example paravoid [15:10:33] < server: mw1323.eqiad.wmnet [15:10:34] < backend-timing: D=237033 t=1565795372190859 [15:10:34] < date: Wed, 14 Aug 2019 15:09:32 GMT [15:10:34] < x-cache: cp1087 pass, cp3042 pass, cp3030 pass [15:10:43] less than a second [15:10:47] ^this means contents is being retrieved at high speed since 15 [15:10:53] Krinkle: I know, everyone else is, but note that we also have multiple people in #-tech also saying they're experiencing that kind of issue [15:11:03] which means parsercache is not working as it should [15:11:08] paravoid: which server are you hitting? [15:11:12] maybe some kind of IP sharding? [15:11:15] <_joe_> jynus: or well that it's not being used [15:11:20] yeah, that [15:11:23] Krinkle: all of them I think, it even was reproducible in mwdebug [15:11:23] <_joe_> Krinkle: multiple backends [15:11:30] Sounds like it would be network related in that case maybe? not MW. If its consistent for you on multiple app servers but not for me e.g. [15:11:40] right? [15:11:41] basically, something is bad at normal cache workflow [15:11:42] <_joe_> Krinkle: those are backend timings [15:11:42] Krinkle: backend-timing is MW-set [15:11:47] no I think it is something strange about specific user sessions [15:12:02] Backend-Timing comes from the Apache directly in front of the appserver, Krinkle, it's basically Apache-measured PHP TTFB [15:12:06] <_joe_> cdanis: users, we verified paravoid can reproduce [15:12:07] paravoid: are you using literal ... or using an actual session? [15:12:14] an actual session, my session [15:12:20] strangelly, it doesn't happen on es2, only on es1 and es3 [15:12:25] for all of those "..." values [15:12:37] <_joe_> jynus: uh, well, that's indeed very strange [15:12:58] cdanis: was something related to dbctl deployed at 15? [15:13:00] paravoid: php7? [15:13:10] no [15:13:14] I guess not if using CURL without cookies [15:13:22] <_joe_> Krinkle: it shows up independently of the engine [15:13:23] correct [15:13:29] jynus: no [15:13:51] I don't see anything relevant on sal [15:13:51] jynus: dbctl still does not touch es or parsercache settings, and all dbctl commits go to SAL [15:14:10] I know, maybe it started to do so :-D, but there is nothing on sal [15:14:11] OK. now it's slow for me as well on that page, paravoid , using curl and my actual session [15:14:17] ~ 11 seconds [15:14:22] awesome, I'm not crazy :) [15:14:37] jynus' graph above is the first smoking gun we have [15:14:41] < server: mw1323.eqiad.wmnet [15:14:41] < x-cache: cp1087 pass, cp3042 pass, cp3030 pass [15:14:41] < backend-timing: D=237033 t=1565795372190859 [15:14:47] you could still be crazy, we just lack proof now! [15:14:48] my second guess would be higher load [15:15:04] note my db focus is biased [15:15:23] <_joe_> paravoid: well we have the data from the profiling sessions too [15:15:38] <_joe_> the difference is in your case the page gets fully re-parsed [15:15:44] <_joe_> so the parsercache gets skipped [15:15:47] < server: mw1271.eqiad.wmnet [15:15:47] < backend-timing: D=8474522 t=1565795702982884 [15:15:55] I would like to see the purge load too [15:16:29] broken-record-mode: there was the creation of a new wiki, which was apparently problematic, around the right timeframe, and eventually a related config patch about something to do with ES and/or DB stuff later on [15:16:41] oh, that could affect that [15:16:48] but wasn't that more hours ago? [15:17:10] <_joe_> bblack: that seems very related yes [15:17:10] 14:01 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache + some expiration? [15:17:15] The only message logged in Logstash for that request (+host:mw1271 AND reqId:"XVQldgpAAEIAAA66LfAAAADR") is : [slow-parse] 8.08s Greece [15:17:22] From the Poolcounter layer [15:17:38] <_joe_> so you are parsing that page, yes [15:17:40] no info/warning/error anything besides that [15:17:41] Yep [15:17:45] <_joe_> jynus: yeah it sounds like a candidate [15:17:57] in fact, there was issues with es hosts on addwiki [15:17:59] the very first logs about creating the wiki was ~12:24, and there was some dblists update from it around 13:50 [15:18:03] 13:53 <+logmsgbot> !log reedy@deploy1001 Synchronized dblists/: T212881 (duration: 00m 48s) [15:18:04] T212881: addWiki.php broken creating ES tables - https://phabricator.wikimedia.org/T212881 [15:18:06] 13:53 <+stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:08] which relates to my previous graph [15:18:09] 13:53 <+stashbot> T212881: addWiki.php broken creating ES tables - https://phabricator.wikimedia.org/T212881 [15:18:30] https://phabricator.wikimedia.org/T212881 [15:18:36] that one :-) [15:19:29] <_joe_> how can creating a new wiki create issues on the other wikis, but only for some users? [15:19:47] <_joe_> I'm willing to bet it happens to some users up to a certain ID or something equally horrific [15:19:57] <_joe_> because some db update failed midway through [15:20:00] well, hipothetical [15:20:18] it could be stuck in a loop trying to store content for the new wiki [15:20:21] <_joe_> yes, I'm trying to evolve that hypothesis [15:20:22] and parsing [15:20:33] or some other problem affecting other wikis [15:20:37] <_joe_> but we have no errors [15:20:38] varnish purges went up at ~12, the rate itself though doesn't seem unusual though if you look back e.g. a week: https://grafana.wikimedia.org/d/000000180/varnish-http-requests?panelId=7&fullscreen&orgId=1&from=now-12h&to=now [15:21:01] do you know the name of the purge job on queue? [15:21:07] just to discard that too [15:21:42] I don't know offhand no, I hope it has purge in the name tho [15:21:48] <_joe_> that would affect everyone [15:21:56] <_joe_> not just paravoid and a handful other users [15:22:03] I know [15:22:12] <_joe_> well not knowing how much "a handful" is [15:22:12] nothing strange on the job queue that I can see [15:22:23] <_joe_> jynus: can you try with your historical wiki user? [15:22:25] <_joe_> the private one [15:22:37] <_joe_> not the wmf one [15:22:38] sorry, didn't get that [15:22:46] a, mw personal user? [15:22:50] <_joe_> you have a wiki user from before joining the wmf [15:22:51] <_joe_> yes [15:22:58] checking [15:22:59] <_joe_> that's bound to be an older account [15:23:02] Greece ? [15:23:07] <_joe_> yep, for instance [15:23:22] also are the problems for enwiki only or other wikis too ? [15:23:28] <_joe_> godog: others too [15:23:36] users in #-tech reported a serbian wiki [15:23:37] <_joe_> godog: the original report was for srwiki IIRC [15:23:39] instant load [15:23:44] (not sure if it was wp) [15:23:45] trying to reproduce as well with my user on multiple wikis, so far no luck [15:23:48] thanks [15:23:51] <_joe_> jynus: ok [15:24:05] <_joe_> paravoid: you see the same slowness on every page? [15:24:09] hitting random now [15:24:09] no [15:24:16] jobrunner errors from mediawiki: https://logstash.wikimedia.org/goto/b35ca9258e09e717fa6c97f89c524314 [15:24:20] (nothing stands out) [15:24:24] <_joe_> if not, can you post the profiling data for a page where you don't see slowness [15:24:39] I don't see it in any other page I've checked [15:24:49] <_joe_> obama loads fine? [15:24:58] yeah I have a 10+yr old volunteer account, have never repro'd there [15:24:59] <_joe_> wth is going on :P [15:25:10] can reproduce it with Obama [15:25:16] <_joe_> ok [15:25:19] the number of rows read from parsercache is lowering [15:25:22] <_joe_> so I suspect it happens on every page [15:25:26] <_joe_> but other pages are light [15:25:29] <_joe_> so you don't notice [15:25:29] yeah, possibly [15:25:41] which means we are probably losing cache entries slowly? [15:25:42] <_joe_> can you post the profiling for a page you don't see the problem on? [15:25:53] <_joe_> jynus: more users with the problem [15:26:05] <_joe_> ok, this is now a serious incident I think. [15:26:11] actually never mind, I just repro'd backend-timing: D=8834809 t=1565796340524996 [15:26:13] <_joe_> if it's spreading [15:26:19] on Greece [15:26:26] _joe_: https://performance.wikimedia.org/xhgui/?url=XVQoDQpAIHsAAGeRUu4AAACX [15:26:33] <_joe_> cdanis: can you repro consistently? [15:26:51] no [15:27:15] given the number of people-hours involved in debugging this, I'd say it's already a pretty serious incident :) [15:27:49] only intermittently, _joe_ [15:27:50] Backend Save timing - does not appear to be affected ; https://grafana.wikimedia.org/d/000000429/backend-save-timing-breakdown?refresh=5m&orgId=1&from=now-2d&to=now [15:28:13] backend-timing: D=6015811 t=1565796420801192 on my first load of https://en.wikipedia.org/wiki/Massachusetts?xx=yy but not when reloading or changing the garbage query parameter [15:28:44] <_joe_> oh so without cookies? [15:28:55] no, logged in as an old volunteer account [15:28:58] <_joe_> ok [15:29:03] shdubsh and I have a 1:1 coming up, do you need me? [15:29:04] backend-timing: D=2967710 t=1565796513387756 on another page [15:29:07] <_joe_> you shouldn't need to do cache-busting then [15:29:11] ah true [15:29:20] <_joe_> in that case you are forcing a reparse [15:29:26] * Krinkle editing PHP files on mwdebug1001 [15:29:28] <_joe_> no, scratch that [15:30:47] chanserv changed the topic? lol [15:31:01] paravoid: yes, +t in here and no one can get ops [15:31:16] but I have special disposition to change the topic via chanserv, we really need to fix the perms here [15:34:00] would adding a new wiki change parsercache keys in any way? [15:34:11] it still does not make sense that it is for a subset of users, ofc [15:34:30] cdanis: it would only change it in the trivial way: by adding cached content for the new one [15:39:51] so new guess: the deployment of the wiki had some collateral config change which doesn't break the parsing, but makes it less efficient [15:39:58] [DEBUG] CdnCacheUpdate::HTCPPurgePurging URL https://en.m.wikipedia.org/wiki/Greece via HTCP [15:40:04] Each time I request it via mwdebug, this also happens [15:40:04] I will check the diffs [15:40:05] which is odd [15:40:17] But in line with what we're thinking [15:40:36] (X-Wikimedia-Debug: … ; log) [15:40:54] uff https://phabricator.wikimedia.org/T212881#5413761 [15:41:05] I have never managed to reproduce it on my WMF account, but can get it to happen intermittently on my ancient volunteer account [15:41:09] "The ids look weird. It is using the wrong first two ids... possibly something to do with aawiki? Why does it choose 256 and 257 for the first text rows for aawiki?" [15:41:11] I have no idea if that is even a data point [15:41:51] I've dumped the cacheKey used from PoolWorkArticleView.php, which is "PoolWorkArticleView::__construct cache key: enwiki:pcache:idhash:12108-0!canonical!tmh-videojs" for me on Greece. [15:42:01] and it's the same on repeat views. [15:42:08] So it's not churning the key for example [15:42:17] ok, my theory invalidated [15:42:29] so run out of ideas [15:42:35] Krinkle: do you happen to know what was in the updates to T212881 that Reedy deleted? or what Reedy is up to now? :) [15:42:43] T212881: addWiki.php broken creating ES tables - https://phabricator.wikimedia.org/T212881 [15:42:47] cdanis: I just !log-d on the wrong task [15:42:53] it was some sync-files for the wiki creation [15:42:54] ah ok [15:44:39] Reedy is now next to me [15:48:43] I managed to reproduce once while logged out, but only once; it happens much more often when logged into my volunteer account [15:49:24] One hunch is that maybe poolcounter is misbehaving [15:49:28] but we don't have a dash for it [15:49:40] I'll be back in a bit, hackathon running out of food [15:53:53] <_joe_> I don't think poolcounter can misbehave for paravoid and not for me [16:01:51] infra foundations is having our weekly meeting now [16:02:00] let me know if some or all of us are needed [16:03:13] _joe_: right. [16:03:22] I'm assuming for now most of the debugging is happening in-person with Krinkle & Reedy [16:03:49] It is not. We are having pizza AFK as of 10min [16:03:54] ok :) [16:04:22] I can pretty consistently repro on my volunteer acct, lmk if that is helpful [16:04:51] <_joe_> and not on our work acct? [16:05:13] <_joe_> btw [16:05:16] yes, have never repro'd from my WMF acct [16:05:22] <_joe_> the timing spike we saw has recovered [16:05:28] <_joe_> this problem had nothing to do with it [16:05:38] <_joe_> so time correlation isn't so sure anymore with anything [16:05:47] <_joe_> we noticed now, can be ongoing since days [16:05:56] you mean the earlier graphs taking off ~12:54 thing? [16:05:57] <_joe_> or well this morning or yesterday evening for what we know [16:06:00] <_joe_> yes [16:06:03] ok [16:06:11] <_joe_> https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-12h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=appserver&var-method=GET&var-code=200 [16:14:19] I have not been much involved with the incident but ping me if an extra sre is needed [16:14:39] I will be around either way [16:19:34] <_joe_> Krinkle: I think we really need your help figuring out what's going on [16:19:48] <_joe_> or anyone else who knows mw internals well enough [16:20:52] <_joe_> what I have is: at some point today, after 15:30 I guess, for some logged in users we're reparsing (all?) pages consistently [16:21:51] <_joe_> this https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&from=1565786732399&to=1565801943823&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=es1&var-shard=es3&var-role=All&panelId=8&fullscreen hints the proble started at 15:00 [16:22:28] <_joe_> the positive part is this is not degrading, afaict [16:23:42] <_joe_> I would also add this latter graph can be a red herring, we have such steep rises in es read all the time [16:24:20] <_joe_> and we don't have users consistently taking forever to get a page because they're not hitting the parsercache ever [16:25:37] paravoi.d's first report was ~13:26 here, and -tech reports go back to 10:37 [16:25:50] I think [16:27:37] Hi again. I'll get a desk somewhere 5min [16:28:18] <_joe_> bblack: ugh you're right [16:28:21] Campus is closing down after 15min.. shit [16:28:32] <_joe_> I was somehow reading that completely wrong :/ [16:29:42] <_joe_> ok I guess I have to call up other people too [16:29:56] also the -tech report was srwiki which is s5, and the other was enwiki on s1 [16:30:01] so not shard-specific [16:30:02] I'll have to get to the hotel first. There in 18-20 min. I'll keep thinking and be on irc meanwhile [16:30:53] One think I'd like to know is whether time spent is mostly in parsing or also outside that halted somewhere [16:31:10] <_joe_> Krinkle: it seems all in parsing [16:31:14] Eg compare backend timing to the slow-parse log entry [16:31:33] <_joe_> but why is paravoid constantly parsing that page and I'm not? [16:31:41] <_joe_> it's user-specific [16:31:50] <_joe_> and that's unexplicable to me [16:32:03] <_joe_> my user => use parsercache, fast response [16:32:04] hey [16:32:23] <_joe_> paravoid's user => reparses every page, slow response on large pages [16:32:25] _joe_: note that i may be a mediawiki expert, but i'm no expert on debugging the live system [16:32:29] you want Krinkle for that [16:32:39] <_joe_> duesen: in a gist, this is our problem [16:32:43] If we know that means there must be some cookie, geo or user-Id dependent logic in the decision to consider parser cache hit valid or the poolcounter interaction [16:33:20] <_joe_> Krinkle: cdanis confirmed his volunteer account is affected, his wmf account is not [16:33:26] <_joe_> so I don't think it's geoip [16:33:29] yeah [16:33:36] <_joe_> it's tied to the user [16:33:37] same machine, just different Chrome profiles [16:34:24] <_joe_> how can user-related data affect the decision to consider the parsercache valid or not? [16:35:22] <_joe_> duesen: if you can answer this question ^^, that is basically what we need for now :P [16:35:35] second question, is there any per-user data that can influence the parsercache key? [16:35:38] parser cache entries vary on user options [16:35:56] <_joe_> ok, which ones specifically? [16:35:59] there is a "canonical" entry for every page, but there may be entries for e.g. output with a different humbnail size (yes, really) [16:36:18] The list is buried in ParserOptions, let me check [16:36:20] <_joe_> ok so let me try to alter my settings [16:36:22] anomie is the expert on this [16:36:55] private static $inCacheKey = [ [16:36:55] 'dateformat' => true, [16:36:55] 'numberheadings' => true, [16:36:55] 'thumbsize' => true, [16:36:55] 'stubthreshold' => true, [16:36:56] 'printable' => true, [16:36:58] 'userlang' => true, [16:37:00] ]; [16:37:08] <_joe_> userlang I expected [16:37:17] <_joe_> that's why I immediately asked paravoid about it [16:37:27] but only effective if the page actually makes use of it, e.g. with {{int:xyz}} [16:37:40] dateformat? [16:38:05] probably only effective if used, i.e. the page contains {{DATEFORMAT:...}} [16:38:10] thumbsize is the nasty one [16:38:39] also another little peice of evidence, was all the script timeout errors in pvoid's browser view (although I don't remember if that was only on mwdebug) [16:38:45] https://tty.gr/s/enwiki-greece-timeout.png [16:38:55] bblack: I think that was just mwdebug [16:39:01] oh ok [16:39:05] <_joe_> bblack: that's because mwdebug dies under heavy parsing, yes [16:39:18] what is the default thumbsize? [16:39:35] we should just kill thumbsize and numberheadings settings [16:39:41] I have 200px on my volunteer acct and 220px on my WMF account [16:39:53] I don't recall ever adjusting the size on either but my volunteer acct is so old I don't really know [16:39:55] did you choose those, or just defaults changed over time for new accounts? [16:39:55] <_joe_> cdanis: ok so you can do this [16:40:07] <_joe_> go to view-source:https://en.wikipedia.org/wiki/Greece [16:40:16] <_joe_> and search for NewPP limit report [16:40:35] <_joe_> I have Cached time: 20190814160703 [16:40:55] https://phabricator.wikimedia.org/P8909 [16:40:57] $wgThumbLimits is the relevant setting [16:40:58] <_joe_> if you use an account that has a modified thumb size [16:41:03] though i don't know which of them is the default [16:41:05] <_joe_> you should see a different time [16:41:13] <_joe_> but on reload, you should see the same time [16:41:20] _joe_: I got a reparse *after* changing my thumbsize to 220px [16:41:32] <_joe_> cdanis: uhm [16:41:39] <_joe_> but I see the cached time is the same as mine [16:41:59] 220px seems to be the default [16:42:04] <_joe_> so that's not a reparse [16:42:08] <_joe_> the one you pasted [16:42:08] oh you are right [16:42:12] funny, that's not even in the vanilla list in mw core [16:42:26] <_joe_> so lemme try to debug this myself [16:43:01] <_joe_> ok [16:43:08] *snort* [16:43:11] 'wgThumbLimits' => [ [16:43:11] 'default' => [ 120, 150, 180, 200, 220, 250, 300, 400 ], [16:43:11] '+itwikiquote' => [ 360 ], [16:43:11] 'svwiki' => [ 120, 200, 250, 300, 360 ], [16:43:11] ], [16:43:12] <_joe_> now with thumb size changed [16:43:19] <_joe_> I see a reparse [16:43:48] <_joe_> and on a reload, I see the same parsercache version [16:44:01] <_joe_> so thumb size uses a different parsercache [16:44:03] so why was paravoid seeing consistent reparses then? [16:44:11] _joe_: yeah, you can see the key change [16:44:12] <_joe_> yeah not thumbsize [16:44:30] <_joe_> cdanis: so with your old user you keep seeing reparses? [16:44:57] no [16:45:46] if I go to arbitrary pages I can often get a reparse [16:46:06] but reloading them once I have loaded a given page once does not seem to trigger a reparse [16:46:10] <_joe_> which is expected given your thumb size [16:46:22] <_joe_> ok so it's not so strange. [16:46:25] yes, although I don't remember it being slow like this in the past [16:46:32] paravoid: what thumbsize do you have? [16:46:57] <_joe_> what is the db table where I find user preferences? [16:47:13] in the HTML output, there should be a line like this: [16:47:13] Saved in parser cache with key mediawikiwiki:pcache:idhash:99540-0!canonical and timestamp 20190813143837 and revision id 3067163 [16:47:20] this will tell us the parser cache key [16:47:28] FWIW, in my logged-in WMF account from chromium, I get: [16:47:28] <_joe_> ok [16:47:30] which will tell us which options made it vary [16:47:32] Parsed by mw1320 [16:47:32] Cached time: 20190814160703 [16:47:41] and the cached time seems to remain stable, I assume that means no reparse [16:47:42] yeah in my case duesen it is the thumbsize key that's different from canonical [16:47:42] it'S in an HTML comment [16:47:46] <_joe_> paravoid: we need you to check the html you get for the Greece page :) [16:47:50] bblack: do you have a canonical key? [16:47:51] but the page load time is ~1000ms [16:47:56]