[01:15:42] Coren: I'm trying to figure out how to edit the sidebar on wikitech… I feel like I should be editing something like MediaWiki:Sidebar/Group_projectadmin but in fact wikitech happily serves any possible MediaWiki:Sidebar/Group_ as the exact same content. [01:16:00] isn't it just mediawiki:sidebar? [01:16:26] andrewbogott: IIRC there is some black magic coming from the Openstack extension itself. [01:16:26] https://en.wikipedia.org/wiki/MediaWiki:Sidebar [01:16:38] andrewbogott: I think we'll have to lookit da sourcez. [01:16:46] Coren: I am, and it doesn't help… [01:16:57] I mean, there's a callback that determines which sidebars to display [01:17:08] but not the content of those sidebars. that, I think, has to be on a wiki page someplace. [01:17:21] You're sure it's just /which/ sidebars that the callback handles? [01:17:33] legoktm: look for yourself, all the interesting stuff is just called 'GROUP-SIDEBAR' [01:17:45] ah [01:17:45] i see [01:17:53] * legoktm has no clue :) [01:18:02] Coren, pretty sure… $roles = $user->getRoles(); $groups = array_merge( $groups, $roles ); [01:18:05] can that be adding content? [01:18:34] Not in a way that'd make sense no. [01:18:58] I think what I need is someone who has used dynamic sidebar and GROUP-SIDEBAR before. [01:19:05] For which, I am probably in the wrong channel. [01:19:13] Probably. :-) [01:19:52] doesn't really seem like a -dev question... [01:26:44] Ah! there is something at https://wikitech.wikimedia.org/wiki/MediaWiki:Sidebar/Group:projectadmin [01:26:53] weird that when I make up URLs it returns something rather than nothing [01:33:44] Mediawiki space is odd that way. [01:52:59] I was wondering if anyone who knows about hosting php scripts on Labs could give me some advice? [02:34:17] Atethnekos: hi. [02:34:53] Atethnekos: 'php' is an available command, and public_html supports php; if the former, use the grid engine to submit the script as a job. [02:44:03] I mean, far more general advice [02:44:27] There was a tool that was hosted offsite which is used by a few templates on en.wiki which is now done [02:44:40] down rather [02:44:54] I'm not a programmer. [02:46:36] which tool? [02:52:50] Sorry. It is described here https://en.wikipedia.org/wiki/Template_talk:Bibleverse#Bibleverse_template_broken [02:53:12] The code seems to be all here: http://code.google.com/p/codetestmap/source/detail?r=240 [02:53:29] It used to be hosted at http://bibref.hebtools.com/ [02:54:02] This Bibleverse template and other related ones (bibleref) are used on around 10,000 pages I believe, but now they don't work. [03:00:18] Based on my limited understanding it should be just a matter of moving the bibleversefinder.php and the two .txt files into an accessible directory. Then the address for the templates can be changed. But I don't know if this is something that Labs could do or not? [03:01:12] Atethnekos: there is "Tool Labs" and "Labs", Tool Labs is a subset of it, and yes, should be able to do that [03:03:13] I see the en.wiki page should be updated which conflates the two together. [03:03:49] Atethnekos: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help [03:04:49] since you say you're not a programmer, maybe you can find a "sponsor" to help you get your script up or so ? [03:05:49] but if it's true it's just a .php and 2 text files and it's in use on projects it should be fine [03:06:16] Well, I don't mind a challenge. Maybe I'll try applying for tools access and give it a go that way? [03:06:37] Yeah, it should just be moving the three files over. [03:07:02] Unless you advise I don't get involved since I don't really know what I'm doing. [03:07:28] if you dont mind a challenge, go ahead :) [03:07:41] and apply, make a wiki page describing your tool [03:08:05] then get a php dev to look over the code [03:09:01] Great, thanks for the encouragement. I'll start with getting an SSH key! I've done that before when I hosted some webpages at my University. [03:09:19] :) welcome [03:29:22] Atethnekos: Welcome! Also beware: you'll find making tools to be painfully addictive. :-) [03:55:27] Thanks, Coren! I'm just waiting for approval; hint hint wink wink https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Atethnekos [04:01:23] Coren: tried to help him out. didnt do this for a while Failed to add Atethnekos to tools. This needs user Atethnekos to have the "loginviashell" right. [04:01:50] mutante: You need to give him the 'shell' userright first. [04:02:09] Sorry, I'm having a movie watching night with my SO. :-) [04:02:45] Coren: thanks, run for movie! [04:06:36] Atethnekos: https://wikitech.wikimedia.org/wiki/Special:Log/rights [04:06:59] and Successfully added Atethnekos to tools. [04:07:27] Thanks so much mutante! [04:08:00] yw, i'll also head off for now though, hope it works well with the next steps [04:08:10] otherwise...ping the channel. and cya [04:09:48] thanks again. [05:52:25] Works perfectly, thanks mutante and Coren. [05:52:57] Atethnekos: I'm glad the documentation was clear enough. [10:00:31] Coren: remember the dead welink checking i didn't want to do in the grid. turns out to be not doable the way i wanted and i have to use the grid now. to have an acceptable speed we probably will have to increase the grid capacity [10:00:48] /grid./grid?/ [12:06:07] aw, mass inserting into redis doesn't work for me :( [12:06:15] fuck it [12:33:49] Coren: can we install redis-cli? [12:50:08] ok, works now, forget redis-cli [13:32:11] in my tool's folder (alkamidbot/scripts/output) there is one file with odd permissions: -rw-r--r-- 1 alkamid wikidev 14957 Nov 21 18:42 visits.txt [13:32:15] I can't "take" it as alkamidbot [13:32:25] it says "visits.txt: You need to share a group with the file" [13:33:32] I can't chown it as alkamid: chown: changing ownership of `/data/project/alkamidbot/scripts/output/visits.txt': Operation not permitted [14:56:17] Coren: do we have a redis connection timeout set? if so, why?? [16:24:09] giftpflanze: Not that I know of. [16:38:23] Coren: can we add more nodes to the grid? [16:38:55] giftpflanze: Yeah, no reason we couldn't. Are you hitting job delays? [16:39:25] i want to start an array job, only 26 are executed in parallel [16:47:17] Coren: do i have to ping you when i answer you? [16:49:09] Ideally. It's saturday and I'm doing other things in the house and not always at my computer. You'll hit into the per-user limit if you do this on the normal grid; what I can do instead is give you a dedicated node with the adequate resources for that task. [16:49:51] Coren: that would be great [16:51:18] giftpflanze: open a bz with details (number of expected parallel jobs, resource requirements). Will you need the actual grid engine parallel job support or are all the jobs independent? [16:51:43] um [16:53:09] Coren: what does actual grid engine parallel job support give me? [16:55:17] If you don't know about it you don't need it. It's for running jobs that need message passing for synchronization (such as with http://www.open-mpi.org/). Unless you have very complex parallelization and coded it that way, you don't need it. [16:55:39] ok, don't need it [17:02:09] Coren: the more resources the better, is it ok then if i request a m1.xlarge instance for the node? [17:02:45] giftpflanze: I'm probably not going to okay an xlarge for a single tool unless you make a damn good case that you need all that ram! :-) [17:03:19] coren: when i was testing it in my own project it was all needed [17:03:47] I'm sure it was sufficient, but was it necessary? :-) [17:03:53] hm, well, if it isn't stored in memory but written to disk it isn't needed [17:04:20] but i can make more web requests at the same time [17:04:48] Sure, but you realize than an xlarge is 16G of ram? How many requests do you want to /do/? [17:05:04] some thousand [17:05:34] ... wait woah. What *are* you doing exactly? [17:05:57] i check ~5m urls for brokenness [17:06:54] Fair enough, but how often do you expect to be doing that check? [17:07:05] every two weeks [17:07:44] That's less than 300/min [17:09:19] ok, then i guess everything as it is is enough [17:09:39] make it 500/min if you want your run to have some elbow room, though. [17:09:55] and then double that for safety. :-) [17:10:01] heh :) [17:10:13] timeout is 200 sec [17:10:29] So realistically, you need to be able to handle ~1000 web requests by minute. [17:11:08] Sure, but statistically timeouts are going to be rare compared to connection refused, 404s and such. [17:11:18] right [17:11:59] So I'd plan for 10s per connection as a reasonable median. [17:12:54] So if you want to make sure you can do 1000/min on average, and you can expect 6 per minute per thread or so, then you only need 166 parallel connections. [17:13:07] Make it 200 and you've added another safety margin. [17:13:25] And 200 won't need that much resources. :-) [17:16:00] * Coren ponders. [17:16:19] Wait, why do you even need to parallelize this? [17:16:34] That sounds perfectly doable with a single job. [17:16:55] except that it runs months [17:17:42] with our calculation 2 years … [17:17:43] It shouldn't. We just calculated that 200 parallel web connections suffices to do 5 million links in ~ 5 days. [17:21:34] Coren: should i do that with threading in my own project or in a single job on the grid, for the second scenario i think the memory isn't enough [17:22:17] also threading is so prone to errors [17:22:29] giftpflanze: That's probably best. Make a pool of worker threads handling exactly one connection asynchronously. [17:23:06] giftpflanze: It's not so bad if what threads to is really really self contained. The only thing they should do is "pick an URL from the list, fetch the contents, store the result on a list" [17:23:31] about a third of results get dropped in my tests [17:23:44] if stored in lists memory usage gets huge [17:24:04] giftpflanze: You don't need to have both lists populated completely. [17:24:28] giftpflanze: Have a thread refill the "pending" list when it gets below a certain level. [17:24:56] giftpflanze: What's your normal proportion of 'good' links that won't need any action? [17:25:22] i have a thread pool with a maximum of threads, that gets filled by a loop from file [17:25:41] let me look … [17:27:21] ~19% of urls are broken [17:27:52] but i need to know if a bad url turns good in a later run [18:59:17] Coren: What is the per-user limit on the grid? [18:59:26] anomie: 12 jobs, iirc [19:00:09] Coren: And if it's exceeded, the extra jobs wait for one of the others to exit? [19:00:37] They should. [19:03:45] they don't for me [19:03:51] I suppose they do. With the change I made to get rid of AnomieBOT's "updater" job, I have 9 continuous jobs and occasionally submit up to 8 task jobs, and all the task ones seem to get run. [20:56:31] Hi. I'm experiencing different behaviour of my script on login and exec servers: http://pastebin.com/48VrD1Cj [20:56:45] This is basically looking for the latest dump in the dumps folder [20:57:32] on login, it finds it properly (20131115). On exec, it raises DumpNotFound [20:59:34] oh, and config.path['dumps'] is /public/datasets/public/plwiktionary [21:06:54] surely dumps are mounted on exec servers? [21:26:29] Coren, could you check if dumps folder is properly mounted on all exec servers? [21:27:18] because my script (that check if dump filename exists) sometimes runs properly, and sometimes not [21:27:55] most of the times it doesn't [21:35:04] it seems to be ok on exec-08 [22:04:34] Coren: can you increase my job limit? [22:05:53] they pile up in waiting state