[03:38:19] !ops [03:38:28] !admin [03:38:28] To recreate the admin user, run "php maintenance/createAndPromote.php" on the command line. [03:38:50] Krassnine: yes? [03:57:44] !admin [03:57:44] To recreate the admin user, run "php maintenance/createAndPromote.php" on the command line. [03:58:30] A L V A R O M O L I N A [04:11:59] Do they really have nothing better to do than spam in here? [04:12:47] sadly :( [04:12:58] I am not even sure what they are spamming. :P [04:32:12] they're trying to harass some users [06:40:37] I am listed in the Croatian part of Wiki but the biograpy is quite spotty and needs to be redone. I have a proposal but wanr to discuss it with somebody who could advise on my own proposal [08:26:34] Hello all [10:44:58] hi [10:45:34] i have an api issue using python, is there anyone who can help me please:) [10:46:32] jseng001: python? do you mean using pywikibot? [10:47:11] oh no its a script i have written, i have posted in stackoverflow at this link: http://stackoverflow.com/questions/40066414/re-extracting-content-from-api-stops-halfway [10:47:58] what's the error? [10:49:26] well, if you're querying 15,000 keywords one by one, you probably got rate-limited [10:49:41] it used to work well before August [10:49:58] i wrote this script in july and it worked, is there any changes? [10:50:16] what's the error do you get? [10:50:18] i mean how do i overcome this problem, now after 50 contents it just stopped [10:50:29] after extracting 50 articles it just stopped [10:50:37] no error message nothing, like the connection stopped [10:51:08] timeout im guessing, because i dont seem to get any error message [10:51:23] is there a way for me to fix this code ? please advice [10:51:35] yeah, you don't seem to do any special treatment of errors... [10:51:51] im kinda new to python , a lot i dont know how to do it yet [10:52:09] besides that, you should wait a few seconds between each query [10:53:02] i would really appreciate it if u can help me out a little on how to modify my script please, i have no idea how to [10:53:03] doing 15,000 queries without any wait time is not considerate to the servers [10:55:08] I'd suggest adding a delay of 8-10 seconds between each query http://stackoverflow.com/questions/510348/how-can-i-make-a-time-delay-in-python [10:55:09] vulpix can you advice me on where to modify my script please?:) i am still learning python, im a little slow sorry [10:55:27] i see [10:55:41] the servers probably rate limited you because of so many queries [10:56:07] before july this script works fine, is there some changes to the server? [10:57:53] sorry just one question [10:58:33] where is the best place to put this " time.sleep(5)" in my script [11:00:06] for i in data1['query']['pages']: f3.write((data1['query']['pages'][i]['extract']).encode('utf8')+"\n"+"\n"+"\n") time.sleep(5) [11:00:28] there's a for loop in your script, it should be anywhere in the body of that for loop [11:02:31] thank you vulpix [11:02:35] yw [11:02:44] vulpix can i trouble you for a big favour pls [11:03:09] explain what do you need and I'll decide :) [11:03:16] do you think you can run my script with the keywords please?:) [11:04:17] Vulpix im wondering do you know what Service disabled: DBLoadBalancerFactory is when running the install.php script? [11:04:45] jseng001: no, sorry [11:05:07] oh ok no worries thanks [11:06:38] paladox: that's a ServiceDisabledException... no idea why it's thrown, though [11:06:47] Me either [11:07:05] Since i am trying postgresql, to see what i can do, but it just errors out on that [11:07:12] but strange that it works in travis [11:07:46] It stops at setting up database [11:09:16] maybe it's a not so nice way to say "we do not support postgres"? :D [11:09:55] Oh, nope Vulpix postgres is supported [11:10:18] They have a special db config that only works for postgres and mssql [11:10:34] !class ServiceContainer [11:10:34] See https://doc.wikimedia.org/mediawiki-core/master/php/html/classServiceContainer.html [11:10:50] thanks [11:10:51] !class MediaWiki\Services\ServiceContainer [11:10:52] See https://doc.wikimedia.org/mediawiki-core/master/php/html/classMediaWiki\Services\ServiceContainer.html [11:11:02] https://doc.wikimedia.org/mediawiki-core/master/php/classServiceContainer.html [11:11:04] Woops [11:11:08] thanks [11:11:31] https://doc.wikimedia.org/mediawiki-core/master/php/html/classMediaWiki\Services\ServiceContainer.html [11:11:33] Does not exisr [11:11:35] exist [11:11:38] https://doc.wikimedia.org/mediawiki-core/master/php/classMediaWiki_1_1Services_1_1ServiceContainer.html [11:11:51] those backslashes hurt [11:11:57] thanks [11:12:20] stupid PHP notation, who on earth would use backslashes for anything else that's not escaping?? [11:12:32] LOL [11:12:53] stupid PHP [11:12:53] hi vulpix, i tried putting a delay and it still stops halfway after 40 articles it stopped [11:13:28] jseng001: do you get any error message? [11:13:39] no error message [11:13:44] maybe there's one particular query that's erroring [11:13:45] it just hang there [11:14:16] maybe the delay is not sufficient [11:14:30] i used this [11:14:32] except Exception: pass [11:15:23] is there any way to check for any error? nothing showed when i put this except Exception,e: print(e) [11:15:34] http://stackoverflow.com/questions/21553327/why-is-except-pass-a-bad-programming-practice [11:16:46] delete the entire try-catch. If the script bombs out you wanna see the full error message and stack trace, not a catch-all-and-continue-breaking-anyway [11:18:34] oh ok i will delete that thanks [11:20:10] i have removed the try- catch [11:20:38] there is no error message after, it just waits there , im guessing the script time out? [11:21:41] if the script times out, it will throw an error after the timeout period [11:22:05] there should be a default timeout I guess [11:22:43] it just stall there not collecting anymore article content after 7 article [11:22:54] im not sure what to do now [11:25:03] I don't see any rate limiting setting in MediaWiki, maybe the rate limiting on api is being done on the server directly, by just hanging out connections [11:25:42] is it a new function? [11:26:05] it was working perfectly fine in may to july this year, only recently it stopped [11:28:23] is there any way i can go around this implementation? [11:34:42] jseng001: assuming it just hangs forever, try specifying a timeout (60 seconds) handle the timeout exception, and try again http://stackoverflow.com/questions/2712524/handling-urllib2s-timeout-python [11:42:33] thanks vulpix [11:43:31] from my code, i have no idea how to modify it such that it will continue getting the content after the timeout exception [11:45:37] i have made some changes : https://dpaste.de/VbEk [11:51:26] sorry this is the correct code : https://dpaste.de/Cywt [11:51:46] I guess the timeout exception will happen only on the opener.open call, so the try-catch should be surrounding only that line [11:52:48] i wrote it and i get this error: There was an error: URLError(SSLError('_ssl.c:574: The handshake operation timed out',),) [11:52:51] and then you can wrap it inside another loop that will continue there until results are fetched correctly [11:53:05] oh how do i do that? [11:54:41] set a flag, "while (flag)", and the body of that should call the opener.open, and set the flag to false afterwards, so in case of success it doesn't enter again in the loop [11:58:02] in my code since i am reading from a text file with a list of titles, how do i code it such a way that it remembers where i stop [12:06:08] You keep track of it yourelf [13:38:49] What's the interface page for the warning when editing an old revision? [13:40:17] nycloud: you can find the name of any interface message by useing ?uselang=qqx [13:40:33] Oh thanks [15:20:02] I have a reporter of failure to parse compact language links on en.WT: https://www.mediawiki.org/wiki/Talk:Universal_Language_Selector/Compact_Language_Links [15:21:05] Amgine: Can you file it in phab? :) [15:21:56] Heh. If I had time… (just woke up to bouncing IRC notifications, but have to go out into the storm.) [15:22:12] I'll do it quickly then [15:22:49] Amgine: ah, dupe [15:22:49] https://phabricator.wikimedia.org/T148117 [15:23:10] Excellent. [18:58:55] Hello, guys! I am a new member here and unfortunately I have discovered just today this awesome program - outreachy. I would love to apply for projects connecting with translations and management since I have most of my knowledge and skills in these spheres. Can you please give me a hint where I can find this kind of projects? the ones I have reviewed are technical/coding mostly [18:59:54] The translation guys hang out over at #mediawiki-i18n, but I'm not sure how that will link in to outreachy stuff unfortunately... [19:00:25] * Reedy has a loook in phab [19:00:54] cherven: I'd suggest creating an account on translatewiki [19:02:49] https://www.mediawiki.org/wiki/Outreachy/Round_13 [19:02:57] https://www.mediawiki.org/wiki/Outreach_programs/Life_of_a_successful_project#Coming_up_with_a_proposal [19:03:37] cherven: Project ideas seem to be at https://phabricator.wikimedia.org/project/board/2207/ [19:34:58] oh, thank you, reedy! [21:00:37] Hello. Given the wiki text for some article, what would be the easiest way to identify all internal links to articles and get the name of each linked article? [21:00:58] Ask MediaWiki to do it [21:01:08] MediaWiki knows best for text on it's own pages [21:01:36] Where are you trying to do it from? [21:02:01] I would like to write a program that does it. [21:02:09] An external one? Use the API [21:02:14] Specifically, the parse api [21:02:23] https://en.wikipedia.org/w/api.php?action=help&modules=parse [21:02:35] It can give you all the internal links from the page [21:03:44] Reedy: I would like to use the wiki text from an XML file produced as a database dump. [21:03:54] Sure [21:04:01] Just put that back into the API of the wiki the dump came from [21:04:30] Reedy: You mean so that I would request the information directly from the wiki instead of consulting the dump data? [21:04:37] No [21:04:57] The api parse can take arbitary text parameters [21:05:38] you can parse an existing article or revision, or just any arbitrary wikitext that doesn't exist on that wiki [21:05:42] https://www.mediawiki.org/wiki/API:Parsing_wikitext [21:06:11] So presumably I should download MediaWiki so that I can run this locally, since I intend to go through all English Wikipedia articles. [21:06:17] No [21:06:27] Because your local wiki won't be configured the same as enwiki [21:06:42] Yes but would it not be sufficient for the links? [21:06:49] Not necessarily [21:06:54] Is there any reason you're using it from a database dump? [21:07:02] If you're wanting the latest version of a page... [21:07:18] Using the api makes even more sense, as lists of links are already stored in db tables, making queries cheaper [21:07:23] Because it seems like it would be extremely slow to individually request the information for every page from Wikipedia. [21:07:49] I can edt Wikipedia at 20-30 edits a minute using a local tool [21:07:57] That's including getting text, parsing, processing and putting it back again [21:08:51] It would still take about 2800 hours to go through all of the articles at 30 articles per minute. [21:09:27] You can ask to get links for several pages in each request [21:09:37] still it will be a lot of time [21:10:07] assuming you ask for the most recent version of each page, of course [21:10:31] You can do batched, parallel requests... [21:10:40] What are you actually trying to do? [21:10:43] https://en.wikipedia.org/wiki/Wikipedia:Database_download#Please_do_not_use_a_web_crawler [21:10:44] Don't we have pagelink dumps? [21:11:14] marrenarre: Requesting links on page is not downloading articles [21:11:19] I'm not sure why it matters what I'm trying to do. [21:11:26] Ah right. Well it seems very similar. [21:11:27] https://dumps.wikimedia.org/enwiki/20161001/enwiki-20161001-pagelinks.sql.gz [21:11:33] "Wiki page-to-page link records." [21:11:49] Well, the reason for asking what you're trying to do, is to offer a better way to do it [21:11:53] Such as that dump above [21:11:58] Which is going to give you all the relevant links [21:12:09] Ah right. I thought you meant even more specifically. [21:12:24] I suppose what I need is a list of all unique internal article links per article. [21:12:38] That dump would do what you wanted [21:12:47] !xy [21:12:47] The XY problem is asking about your attempted *solution* rather than your *actual problem*. http://meta.stackoverflow.com/a/66378 [21:13:01] I don't think we'll store duplicate rows if a page links to the same place multple times [21:13:20] Nope, we won't [21:13:20] CREATE UNIQUE INDEX /*i*/pl_from ON /*_*/pagelinks (pl_from,pl_namespace,pl_title); [21:13:54] Vulpix: Good point. I didn't realise I was still doing so with that question. [21:13:57] Reedy: Seems perfect. [21:14:23] marrenarre: The other thing is, we get many people wanting to reinvent the wheel [21:14:27] It's usually not needed :) [21:15:10] That is a good point as well. [21:15:15] Thank you, I will check this out! [21:15:25] no problem :) [21:15:40] As seen above... The potential solutiion offered changed dramatically ;)