[04:45:40] Anyone here experienced in importing complete dumps of enwiktionary or enwikipedia? [04:46:12] I am new to mediawiki and having an odd issue when importing enwiktionary into a local VM. [05:05:33] Hi jmorrison. [05:05:39] "Odd issue" is difficult to diagnose. Can you be more specific? [05:05:50] In general, "just importing" two large and old wikis is not a trivial task. [05:06:00] So many quotation marks! [05:06:55] Hi Leah [05:07:44] Sure - I have a turnkeylinux vm running mediawiki. I began importing using the importDump.php in maintenance...this proved to work perfectly but was too slow for my purposes. [05:08:16] I moved on to mwdump, using a direct connection to my SQL db...but the result are a lot of pages that have titles, but are missing content. [05:08:55] Hi ori. [05:09:07] Or content that are incomplete/inconsistent from what is seen on the wiktionary site. [05:09:40] Missing content like the current revision? [05:09:43] Or you mean past revisions? [05:10:10] The English Wiktionary in particular uses a lot of templates. [05:10:31] So until you have the templates imported and working (you'll need to install the ParserFunctions and Scribunto MediaWiki extensions), [05:10:37] lots of page content will look broken. [05:10:41] That was another issue - a lot of templates were missing. [05:10:50] You also need to make sure the dump you're using has templates. [05:10:53] I think most do. [05:11:07] Well, I thought this might be the issue, but for example the "computer" page would show etymology for German but no others. [05:11:22] Is the wikitext there? [05:11:33] For example, you see a ton of languages in the content pane on this page. [05:11:33] https://en.wiktionary.org/wiki/computer [05:12:02] Mine only contained German...and a ton of wiki markup referencing templates that were non-existent. [05:12:43] I was mostly perplexed by the existence of pages that don't actually have content....the random page tool would bring me to these pages after beginning the import, but many of them never had content at all. [05:12:51] https://en.wiktionary.org/w/index.php?title=computer&action=edit [05:12:56] Once again, I didn't observe this when using importDump [05:13:07] Do you have the same wikitext on your wiki? [05:13:23] I did not, no. [05:13:35] Once again, I wiped that VM and am starting over now, but there was very little content. [05:13:45] Oh. [05:13:49] You're having title conflict issues. [05:14:02] So by default, MediaWiki capitalizes the first letter of pages. [05:14:09] You need to disable that if you're importing a dictionary. [05:14:14] You're looking at https://en.wiktionary.org/wiki/Computer [05:14:23] Which is different from https://en.wiktionary.org/wiki/computer on Wiktionaries. [05:14:26] The setting is... [05:14:31] !wg CapitalLinks [05:14:31] https://www.mediawiki.org/wiki/Manual:%24wgCapitalLinks [05:14:35] omg.... [05:14:37] LOL [05:14:43] I owe you a beer. [05:14:46] :-) [05:15:30] https://en.wiktionary.org/wiki/Computer was exactly what I was seeing, except of course with template references instead of the templates in use. [05:15:33] For the templates, like I said, you'll need ParserFunctions and Scribunto, probably, to get them to render/parse/output properly. [05:15:47] This is so relieving. I thought I was going crazy. [05:15:51] And maybe a bit of CSS and JavaScript from the local wiki, depending on how much you care for an exact replica. [05:16:04] Oh, I can show you the en.wiktionary.org config. [05:16:06] I am just importing everything so I can use the API for a one-off task :) [05:16:12] https://noc.wikimedia.org/conf/InitialiseSettings.php.txt [05:16:48] For example, you can see 'wgCapitalLinks' => [ 'wiktionary' => false, ] in that file. [05:17:13] That file is overwhelming, but if you need to look up how the English Wiktionary is configured, it'll mostly be there or... [05:17:29] https://noc.wikimedia.org/conf/CommonSettings.php.txt [05:18:02] I am going to revert my VM and give this a shot. [05:18:12] Cool. [05:18:18] Thanks again for the help. [05:18:24] There's a fairly robust API, available at api.php on your wiki installation. [05:18:37] That'll expose in a sane way a lot of the internals, so you can compare SHA1s of the wikitext and such. [05:18:38] I'll stick around to ask more stupid questions... and yes I have been looking at the API doc already. [05:18:45] Nice. No problem. [05:19:19] After I get this working it looks like I will be using TextExtracts as well. [05:19:58] what are you trying to do, ultimately? [05:21:59] I would like to be able to scrape wiki pages for "#English" to make a reasonable determination as to whether a word is indeed English or not. [05:22:40] I have several lists of 400,000-600,000 "words", most of which are English but many are not, words like Hola, Amigo, Danke. [05:23:16] Many of these words, while not English, are still included in various English dictionaries that are freely available online. My goal is to eliminate a lot of the words I have in my dictionaries. [05:23:21] Identifying English words is hard. [05:23:40] Yes, it is. Especially without context in a sentence. [05:23:49] So if I were going to do this, I would use the categories. [05:23:53] Because those are indexed. [05:23:59] And available in SQL. [05:24:14] Every English word on the English Wiktionary will be in some English subcategory, I guess. [05:24:20] Really? [05:24:40] https://en.wiktionary.org/wiki/computer is in a bunch of categories. [05:24:47] > Categories: English words suffixed with -erEnglish terms with IPA pronunciationEnglish terms with audio linksEnglish lemmasEnglish nounsEnglish countable nouns [05:24:56] Nice paste, but you get the idea. They're at the bottom. [05:25:05] I am new to SQL and mediawiki, but I did have to get familiar with the DB to fix a bug earlier, so I will take another look around. [05:25:06] You can query those with SQL directly. We have available replicas. [05:25:20] select * from categorylinks limit 1; or whatever. [05:25:24] Or you can use api.php. [05:25:30] To get any of the category members in JSON. [05:30:47] I'm still just in shock that I overlooked the capitalization isuse. [05:30:52] resolving that now. [05:50:06] Leah - got the capitalization issue resolved...checking the import now as it is going and pages are showing as they should. [05:51:22] Cool. [05:51:24] Thank you so much again. [05:51:34] I wasted so many hours on this today... [06:23:24] :-) [09:04:32] is there anyone to help me I am new one [09:05:13] hello how to get a anchor of a wiki page, its quite large and i just want to show a part to somebody? [09:05:47] !ask | DilipPuri [09:05:47] DilipPuri: Please feel free to ask your question: if anybody who knows the answer is around, they will surely reply. Don't ask for help or for attention before actually asking your question, that's just a waste of time – both yours and everybody else's. :) [09:06:33] like I've found a bug so I want to fix that bug [09:06:49] DilipPuri: have you checked https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker already? [09:08:20] andre__: you mean me? [09:09:01] lol, the list of contents at the top of the page are links... #epicfailquestion [14:41:00] Hello! I have Parsoid installed and I'm trying to make it work [14:41:08] I have followed the instructions on the setup page, but when I launch curl -L http://localhost:8142/localhost/v3/page/html/Main_Page/ I obtain a requested refused by curl [14:41:26] curl: (7) Failed to connect to localhost port 8142: Connection refused is the error, how can I fix it? [14:48:57] Where are you launching curl from? [14:49:35] It will likely only respond on the same host where Parsoid is running. You likely won't be able to run curl from another host. [14:50:19] yes, I'm lanching curl on the same host [14:51:25] Leah: yes, I'm on the same host [14:57:14] All right. [14:57:20] How are you running Parsoid? [14:57:35] You're sure it's on port 8142? [14:57:49] cheip_: ^ [15:18:03] Leah: I'm pretty sure, I trust the message "starting Parsoid on port 8142" during the execution of service parsoid restart. [15:47:07] nobos else have some ideas? [15:47:12] *nobody [15:49:26] cheip_: are you sure Parsoid is actually running? perhaps it crashed [15:54:55] It sounds like it isn't running. Can you visit that URI in a browser to verify? [16:09:13] MatmaRex: Heh. [16:09:57] Leah: speaking from experience. we used to have configuration documented on-wiki that crashed Parsoid when you tried to use it. :) [16:36:44] I am getting an exception caused by Scribunto & using MW 1.19. https://dpaste.de/QkPx [16:36:58] I don't know why it wants 1.20 if I am using 1.1.9. [16:39:51] Edit: This is only happening when I have #$wgCapitalLinks = false; [16:39:51] #$wgCapitalLinks = false; [16:39:51] This is only happening when I have wgCapitalLinks = false; [17:32:54] jmorrison, because 20 > 19 [17:33:04] jmorrison, you're running 1.19. Scribunto requires 1.20. [17:33:32] (note that 1.19 and 1.20 are not supported anymore - see https://www.mediawiki.org/wiki/Download ) [18:10:23] @andre__ So it is impossible to use Scribunto on 19? Because I downloaded a .119 release. [18:15:46] I guess the better solution is to upgrade anyway. [18:20:08] 1.19, 1.1.9, and .119 are all very different things. [18:23:07] jmorrison: there is no .119 and no 1.1.9 I am aware of [18:23:26] https://www.mediawiki.org/wiki/Download lists supported versions [18:51:14] When is Mediawiki 1.27 going to be released? [18:52:23] Hello. Is there something like index.php?title=Example&action=edit§ion=new, but to create a new subsection under an existing section? [18:55:21] or even just to change the level of the created heading? [19:04:43] theGamer93, sometime around now: https://www.mediawiki.org/wiki/MediaWiki_1.27 [19:05:37] So I can maintain my scheduled upgrade at 01.06 [19:05:38] okay thanks