[11:41:27] Hi there. I´m using a crawler to crawl some wikipedia entries. I looked through the robot.txt. I´m going to to crawl the sites from 3 different servers. Right now I configured it to ask every 2 seconds for a new website. Should I give it more time or is 2 seconds ok? It´s about 1500 different sites, so every site would be ask after 3000 seconds. Also is there a specific policy for setting the header append and if not so, what exactly should I [11:41:27] put in there? [12:15:09] hey Snorri [12:15:12] there is https://www.mediawiki.org/wiki/API:Etiquette#Request_limit [12:15:17] and the entire page, really [12:15:26] there are also dumps on dumps.wikimedia.org if you just want content [12:17:16] Ahhh thank you! that´s what I wanted :) [12:17:41] Dumps do not help. I´m researching the caching and the header. Therefore dumps don´t help. [12:17:43] Snorri the etiquette page or the dumps? :) [12:17:46] ah [12:17:46] ok [12:17:49] fair enough :) [15:33:06] _o/ [15:33:46] halfak guillom HOW IS ANYONE ALLOWED TO BUILD SOFTWARE WITHOUT READING https://dl.acm.org/citation.cfm?id=62266.62273 [15:33:58] also why didn't anyone show it to me until last week?! [15:34:12] * yuvipanda waves too