[17:38:39] J-Mo1: halfak leila today is my last day at the WMF, but wanted to say: I'm still on-call for PAWS and Quarry :) so do feel free to email or call me if either of those things have a problem [17:38:45] I'll be off IRC for a while [19:48:41] good bye, everyone! [21:10:35] Is there a page detailing the limits researchers should obey when "querying" Wikipedia (with or without the API)? Such as, say, a maxmimum amount of page requests per minute, etc.? [21:44:18] pajz: not that I’m aware of, the API should throttle you if you send too many requests, and some libraries (e.g. Pywikibot) support that automatically… the API also restricts how many results you’ll get per query [21:44:48] often if you’re doing something that pushes the limits it might be eaiser to accomplish with a dump file [21:46:08] Nettrom, thanks. What if the researcher just accesses the pages without using the API? Is that also throttled? [21:52:25] pajz: good question, I don’t know of a lot of information on that… it seems that best practices is to set the HTTP headers properly so you can be contacted if it’s necessary, similarly as for https://www.mediawiki.org/wiki/API:Etiquette [22:02:30] I see. Thanks! [22:07:11] you’re welcome, and good luck! :) [22:16:05] pajz: have you seen https://www.mediawiki.org/wiki/API:Etiquette ? [22:16:42] ah, sorry, I see it was mentioned already [22:17:19] basically if you don't make parallel request and obey maxlag you will always be fine [22:17:46] if you don't make parallel requests but do not obey maxlag you will still be fine most of the time [22:18:19] and if you do make parallel requests, it's not possible to give a generic answer [22:19:47] some API functionalities are very cheap and you can batter them as hard as you can, some are expensive and can seriously endanger site stability if they get a large volume of requests, and there is no easy way to tell them apart [22:21:42] for example, list of all pages is cheap, unless you use the "skip redirects" option (which in fact had to be removed last week because someone started to spider the pages and it caused serious problems) [22:24:01] if you use the rest API to get pages there are guidelines at https://en.wikipedia.org/api/rest_v1/ [22:25:29] if you use plain HTTP requests the same way a browser would, I don't think there is any explicit guideline for that, but using the ones for the REST API is probably fine [23:06:43] tgr, thanks :)