[01:12:42] <EdePopede>	 hi. i just stumpled upon https://de.wikipedia.org/w/index.php?title=Spezial:Hochladen&wpDestFile=Warentestlink.jpg which looks like a standard mediawiki message. only... the grammar (link text) is a complete mess. couldn't see any -de channel so i thought i'd leave it here.
[01:14:14] <p858snake|L>	 #wikipedia-de should be the channel for that
[01:17:09] <EdePopede>	 was my first thought, but would this get upstream to the software base?
[01:17:44] <EdePopede>	 since that's not about some wp article but one of the software dialogs
[01:20:20] <Krenair>	 maybe ask on translatewiki?
[01:21:09] <EdePopede>	 ah, it's here. will do it. thanks for the hint.
[03:43:27] <asdfghjk>	 hello
[03:43:41] <asdfghjk>	 i would like to know if anyone has any idea about what happened with liriki
[16:15:21] <Vulpix>	 Hello. Does anybody know if the LinksUpdate job saves the contents of the page it touches to the parser cache, or instead this task is left to the HtmlCacheUpdate job?
[16:17:03] <saper>	 background jobs are us admitting failure
[16:19:24] <Vulpix>	 I'm fine with background jobs. What I'm not fine with is when executing some specific jobs after a high-usage template edit causes the cache to be invalidated, and pages got parsed on the following web requests instead of being parsed by the job queue itself, thus bringing the server down during some minutes
[16:19:31] * urugway[m] sent a long message:  <  >
[16:21:12] <bawolff>	 urugway[m]: Just fyi, you message did not actually go to irc
[16:21:56] <bawolff>	 Vulpix: Historically, LinksUpdate did not, but I'm not sure about the current status
[16:22:40] <bawolff>	 If memory serves
[16:23:28] <Vulpix>	 running LinksUpdate jobs works fine. It invokes Lua, and I though it would generate the HTML of pages and save them into parser cache. During the run of those jobs, everything is stable
[16:25:17] <Vulpix>	 However, after I cleaned the queue from those jobs and I started to run HtmlCacheUpdate, all new web requests starting parsing the affected pages and invoking lua *again*. Of course, instead of a sequence of 1 concurrent update for page, it went to N concurrent updates, one for each php web process it can handle, which is disastrous
[16:26:00] <urugway[m]>	 bawolff: so what can I do ?
[16:26:41] <bawolff>	 Well HTMLCacheUpdate just marks cache entries expired, it doesn't insert new parser cache entries
[16:26:51] <bawolff>	 urugway[m]: Well I can't see your message, so I have no idea
[16:27:34] <bawolff>	 Split your message up into smaller chunks, or use an irc client that is not matrix that actually handles long messages sanely
[16:28:26] <bawolff>	 It doesn't look like LinksUpdate stores the resulting parser output, unless one of the functions it calls does it behind the scenes
[16:28:57] <bawolff>	 Maybe that's to avoid filling up the cache with unpopular items causing popular cached items to be evicted or something
[16:30:16] <Vulpix>	 HTMLCacheUpdate not only marks cached pages as expired (in the parser cache?) but also on varnish, and that's the problem. I want a background job to parse and store the results on the parser cache, and only then purge varnish, so new requests only need to pick the HTML from the parser cache and not reparse everything
[16:30:17] <bawolff>	 Using Extension:PoolCounter can help reduce the effect of cache stampedes from a popular page getting its cache purged
[16:31:56] <bawolff>	 hmm, yeah, I could see why that would be useful for smaller sites, but it doesn't look like its supported
[16:32:08] <Vulpix>	 PoolCounter won't help me, because I have several of "popular pages". The problem here is that parsing one page may take 4 seconds, and I have more than 1000 concurrent users viewing the wiki. If I get 1 request per second on different pages, and the server has only 2 cores, this is going to cause very bad contention issues
[16:33:12] <bawolff>	 Well it might help a bit to deduplicate. I can't imagine all your pages are equally popular
[16:33:19] <Vulpix>	 Uhh, that's sad... I guess I'll have to see how I can tune those jobs to actually save the resulting HTML into the cache
[16:33:52] <bawolff>	 And it does kind of do what you want, in so much as if more than X threads are trying to render the page, it will show the outdated parser cache instead of keep trying to render
[16:35:41] <Vulpix>	 Actually, that is handled by varnish very well. It doesn't let more than 1 request for the same page hit the application server until the first one get a response, then all of them get the same reply
[16:36:01] <Vulpix>	 (only for anons, of course)
[16:40:54] <bawolff>	 I guess the other approach would be to run the HTMLCacheUpdate jobqueue with --sleep option, to slow them down and spread out the cache purging
[16:41:58] <bawolff>	 For reference, if you're looking for the update parser cache logic, see poolcounter/PoolWorkArticleView.php around line 213
[16:44:17] <Vulpix>	 again, this only works when you have like 5 popular pages. I have only 18% of the current active readers in the 10 most popular pages... which means 50 different pages will be requested in 1 minute
[16:48:15] <Vulpix>	 A single HtmlCacheUpdate job invalidates 300 pages. Maybe reducing $wgUpdateRowsPerJob to a very small number like 5 or even 2 would help on this situation. However, that drastic reduction may cause the job queue to take ages to be processed 
[16:49:21] <bawolff>	 Ah, forgot about that
[16:50:20] <bawolff>	 Adjust $wgUpdateRowsPerJob as well?
[16:51:14] <Vulpix>	 There's also $wgJobBackoffThrottling, maybe it can do a better job here, since apparently it can be configured for specific job types
[16:51:58] <Vulpix>	 The documentation is not very clear, though
[16:52:30] <bawolff>	 I think it would make it average out to be a certain rate, but still might be spikey
[16:53:32] <bawolff>	 So if a job clears 300 pages, and you had a rate of 1 page/second, that just means it will wait 300 seconds before running the next job
[16:54:17] <bawolff>	 Maybe a combination of backoffThrottling and $wgUpdateRowsPerJob would be the best bet
[16:55:21] <Vulpix>	 looks like WMF uses it for HtmlCacheUpdate https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/265438/
[17:30:24] <jubo2>	 Hi and thanks for the awesome free software and free support. I'm trying to move a couple of wikis to another server. When I run 'composer update --no-dev' it complains it cannot delete things in vendor/pear/console_getopt. The normal user owns the dir and things in them, so there appears to be write protection
[17:32:01] <jubo2>	 I tried moving one of the files without sudo and it says the file is protected. I'll go search on how to remove the protection. Or if I can just remove the whole directory and it get rebuilt gimme a holler
[17:32:32] <Skizzerz>	 chmod -R o+w vendor
[17:32:58] <Skizzerz>	 er
[17:32:59] <Skizzerz>	 u+w
[17:33:01] <Skizzerz>	 derp
[17:33:09] <Skizzerz>	 jubo2: chmod -R u+w vendor
[17:33:14] <jubo2>	 ok. so recursively give write rights to ... me?
[17:33:17] <Skizzerz>	 yes
[17:33:27] <Skizzerz>	 presumably some files are set to read-only, even for the owner
[17:33:27] <jubo2>	 ok. I try that now. Won't hurt
[17:33:45] <Skizzerz>	 there's other forms of write protection but that is the most common
[17:34:07] <jubo2>	 Skizzerz: but I already have rw on everythin
[17:34:17] <Skizzerz>	 even the files it was complaining about?
[17:35:26] <jubo2>	 yeah. Checked again before came here to seek some free help
[17:36:05] <jubo2>	 could it be this 'chattr -i' given on this website https://serverfault.com/questions/648573/linux-is-there-a-way-to-prevent-protect-a-file-from-being-deleted-even-by-root/648884
[17:36:16] <Skizzerz>	 if that doesn't do anything I think your plan B is solid: rename vendor to something else like vendor-backup, delete/rename your composer.lock file, then try composer install --no-dev to rebuild a fresh vendor directory
[17:36:28] <Skizzerz>	 I would be very surprised if it was chattr +i because that needs root to set
[17:36:47] <jubo2>	 Skizzerz: ok.. I was thinking this could apply to normal users
[17:36:52] <Skizzerz>	 selinux is another culprit
[17:37:33] <jubo2>	 Skizzerz: I try the rebuild route
[17:43:45] <jubo2>	 I noticed what's wrong. Like almost always.. It's my bad
[17:43:55] <jubo2>	 I had accidentally expanded them as the wrong user
[17:44:07] <jubo2>	 Sorry about the channel noise 
[17:50:52] <jubo2>	 yes. I am past that now.
[18:00:46] <bewb>	 i have an issue logging in
[18:00:53] <bewb>	 Fatal error: Class 'QuestyCaptcha' not found in C:\inetpub\wwwroot\wiki\extensions\ConfirmEdit\includes\ConfirmEditHooks.php on line 18
[18:01:05] <bewb>	 i go there and see
[18:01:06] <bewb>	 			$wgCaptcha = new $wgCaptchaClass;
[18:01:17] <bewb>	 i have no idea how to go from here
[18:13:15] <Vulpix>	 bewb: Looks like you have set the captcha to use as QuestyCaptcha, but the QuestyCaptcha is not installed/enabled
[18:13:27] <Vulpix>	 !e QuestyCaptcha
[18:13:27] <wm-bot>	 https://www.mediawiki.org/wiki/Extension:QuestyCaptcha
[18:15:34] <Vulpix>	 The page had the correct line commented out...
[18:15:41] <Vulpix>	 Be sure you have wfLoadExtensions([ 'ConfirmEdit', 'ConfirmEdit/QuestyCaptcha' ]);
[19:27:35] <bewb>	 Vulpix it worked for a while, i'm not sure why it broke
[19:30:18] <bewb>	 there we go, working again
[19:30:23] <Vulpix>	 IDK but that is how it's supposed to be installed
[19:36:42] <bewb>	 thanks for your help
[19:56:09] <bewb>	 i'm looking at how to delete page entries directly from the database
[19:56:33] <bewb>	 got quite a few spam pages
[20:00:00] <jubo2>	 The wiki family of 2 is now happily migrated
[20:00:37] <jubo2>	 I have a question. Do I need to get all the extensions again for the 1.33.0 -> 1.33.1 upgrade or do they remain the same?
[20:05:50] <bewb>	 or just generally building one page
[20:28:31] <Vulpix>	 bewb: Do not mess with the database unless you're absolutely sure about what you're doing (and be prepared to the consequences of breaking it)
[20:29:10] <Vulpix>	 In the /maintenance folder you'll find some maintenance scripts that may be useful
[20:29:56] <Vulpix>	 jubo2: It's not necessary. However, some of them may get updates, but there's no easy way to track them unfortunately
[20:31:01] <bewb>	 i'm pretty well versed in sql and php, i'm working my way through it
[20:31:19] <bewb>	 the idea of manually deleting 4000 posts doesn't appeal to me though
[20:34:11] <bewb>	 probably more...
[20:34:23] <bewb>	 page 1, 1-500 takes me to the B's
[22:25:01] <doneu>	 Hello. I need help collecting a large set of articles from wikipedia in 10 different languages. I was thinking of making some script that would crawl wikipedia content, but just learned they don't want people crawling and scraping their sites - which makes sense. I don't think downloading the whole database might make sense, plus I looked a bit around and failed to find the db in other languages. And thinking about using the API to get 
[22:25:02] <doneu>	 content through a script sounds a bit like crawling in the eyes of wikipedia. Any better solution to any of these?
[22:29:54] <Platonides>	 hello doneu
[22:30:14] <Platonides>	 the other languages is available at the side of wikipedia in English language
[22:30:35] <Platonides>	 so you have https://dumps.wikimedia.org/enwiki/ for English Wikipedia
[22:30:48] <Platonides>	 https://dumps.wikimedia.org/eswiki/ for Wikipedia in Spanish, etc.
[22:37:59] <doneu>	 Ah didn't look for the obvious. Thanks a lot