[03:31:31] Coren: Why are tools still being created with ~/cgi-bin/? [03:36:03] a930913: because some of use have habits and patterns we dont like to break [03:36:57] a930913: Patches welcome for labs/toollabs :-). [03:42:45] scfc_de: Is that the jenkins thing? [03:48:13] a930913: Eh? http://git.wikimedia.org/blob/labs%2Ftoollabs.git/master/misctools%2Ftoolwatcher#L31 [04:04:30] scfc_de: There's no edit button. [04:08:17] a930913: https://www.mediawiki.org/wiki/Gerrit [08:53:32] hello. i'm starting subprocess (calling jsub program01 from os.system()) and then terminating the parent program. will this make my program01 kill automatically? [09:44:39] !log nlwikibots fixed archivering cron to output to /dev/null by default [09:44:41] Logged the message, Master [11:26:25] rohit-dua: No, but that seems rather unnecessary. [11:26:42] jsub just shedules and exits anyways. [11:30:25] Coren: I moved all the output to .profile; that indeed seems to work. Iirc that didn't work on the toolserver, hence .bashrc [11:31:25] That's odd; didn't the ts login servers have bash as the default shell? [11:31:45] Because that's bash behaviour, not specific to the OS. [11:33:24] I'm not sure -- I think 'become' had slightly different behavior [11:34:12] Oh! With become that does indeed make a different; IIRC toolserver started a subshell whereas I start a login session. [11:35:01] in any case, moving it to .profile solved the issue -- should I close the issue or do you want to adapt crontab to work this way? [11:36:31] I think it's better if the behavior is kept standard even if it deviates a little from toolserver practice; I'd rather get newbies used to things behaving like the manpages say to expect. [11:37:48] I doubt the manpages say 'expect .bashrc output to end up in your crontab' ;-) [11:37:56] but I'll close the issue; at least it's there for future reference [11:37:58] * Coren chuckles. [11:38:12] No, but it does say that subshell invariably source .bashrc. :-) [11:38:15] 3Wikimedia Labs / 3tools: crontab -e adds bashrc output to crontab file - 10https://bugzilla.wikimedia.org/65980#c2 (10Merlijn van Deen) 5NEW>3RESO/INV Moving .bashrc to .profile solved the issue; using .bashrc was necessary on the toolserver due to a slightly different 'become' implementation. [12:36:49] Coren: could you take a look ~6-10 request/sec https://tools.wmflabs.org/paste/view/48115f4c [12:39:21] !log local-catbot Checked logs. All processes at https://commons.wikimedia.org/wiki/User:CategorizationBot#Process ran without problems [12:39:23] Logged the message, Master [12:48:17] hey. which will be better for calling 'jsub program'.. from subprocess.call (python) or using a dedicated grid library like http://pythonhosted.org//gridtk/program.html#module-gridtk.manager [13:06:29] rohit-dua: grid jobs cant submit jobs [13:22:05] Coren: may i use ~tools.giftbot/dwl.sh with jlocal or is it against the rules? [13:23:07] gifti: There's nothing /wrong/ with it, but why all the -sync y? That'll make things much slower. [13:24:00] each step depends on the others before it [13:25:34] i could rewrite it to 3 jsubs/qsubs but that would not be faster [13:26:49] Betacommand: ok. i am just calling grid jobs via subprocess (python) [13:27:08] rohit-dua: which is on the grid..... [13:27:32] Betacommand: no. [13:28:01] Betacommand: just some .py inside my public_html [13:28:10] Ah [13:28:24] be careful with that [13:28:33] Betacommand: why so? [13:29:15] rohit-dua: you dont want someone to cause a problem by submitting 500 jobs using your tool [13:30:13] Betacommand: yes. i dont want to overload it... can you suggest me any remedy.. right now every confirmed request calls jsub.... [13:30:29] rohit-dua: build in a throttle [13:31:37] or a queueing process [13:32:12] Coren: is there a way to get the number of active jobs a tool has running? [13:32:37] Betacommand: can i use the python multiprocessing module [13:32:41] isn't that limited already? [13:32:54] and there was an option for that as well [13:33:05] Betacommand: qstat | wc -l, or something like that [13:33:41] rohit-dua: multi-processing is a pain to work with [13:34:20] Betacommand: the quick and dirty way: qstat -xml|grep '"running"'|wc -l [13:34:39] * valhallasw hits Coren for grepping xml :-p [13:35:02] rohit-dua: how long do these jsub tasks take? [13:35:10] (not sure if there is an easy way to 'xgrep' or something like that) [13:35:16] valhallasw: Hard to avoid; qstat sucks and hits you with a variable amount of header lines [13:35:49] Betacommand: well i will be downloading book from Google-Books and then uploading it to Internet-Archive [13:36:09] Betacommand: this is for each jsub [13:36:24] rohit-dua: thats against googles ToS [13:37:30] Betacommand: that is my gsoc project. i'll be downloading the images since pdf downloading requires captcha. and only the books that are public_domain.. [13:37:49] rohit-dua: its still against their ToS [13:38:10] Betacommand: becuase i'm using an automated system? [13:38:14] yeah [13:38:32] google really doesnt like automated systems that dont use their APIs [13:38:56] Coren: i think dwl.sh violates " Any script or job invoked with jlocal should not be running more than a few seconds and use minimal resources; misuse of that feature may have severe impact on general reliability for all users and is not allowed. [13:39:00] " [13:39:06] Betacommand: I read in ther Tos they say not many requests using automated system.. so users have to confirm each request [13:39:08] dwl.sh runs for days [13:39:42] Betacommand: I'll email them still [13:40:13] rohit-dua: how long does it take for a book to be downloaded and then re-uploaded? [13:41:13] gifti: That's definitely a case of "Meh; the spirit is still kept." That you have a shell that's mostly sleeping waiting on jobs to finish isn't a resource hog; nor does it hinder performance of other jobs. Strictly speaking, you _could_ do this differently (it's possible to specify dependencies with gridengine) but it's almost certainly not worth the trouble. [13:41:17] Betacommand: that depends on the book. some books may have 100 pages. so i download 100 images.. then convert them to pdf and then upload them to IA [13:41:41] ok :) [13:41:52] gifti: I think I asked this yesterday already, but why don't just just make one script that does everything and submit that, instead of having one script that submits 7 jobs with -sync y? [13:42:35] valhallasw: the point is that i have to invoke qsub with an array job which can't be submitted from within a job [13:43:02] Betacommand: do you think I should use a captcha.. to avoid unnecessary usage.. [13:43:04] if there wasn't this weird restriction, i would do it [13:43:32] rohit-dua: No, its just a matter of queue management [13:44:10] gifti: huh. [13:44:11] rohit-dua: my thought would be to have one master worker that is always running and reads from a text file every X minutes and gets a list of books to process [13:44:43] your web app then just dumps stuff to the file for the worker to process [13:45:17] gifti: when you run the entire script on the grid, you don't need to use the grid to make an array job -- you can just let bash loop [13:45:41] Betacommand: so will the master process jump to next request only when the previous request is complete(downloading/uploading)? [13:45:47] gifti: and if qsub/jsub are still not available on the exec hosts, hit Coren ;-) [13:46:11] valhallasw: i can't imagine how i would let them run in parallel [13:46:51] rohit-dua: depends on how its coded. I have one bot that can process a variable number of jobs, I can set a max number and it waits until one of them completes and then adds another [13:46:54] i did that on toolserver, but it seems messy [13:47:20] gifti: ohhhh. I thought all jobs were sequential, but there are a few jobs sequential, then an array job, then a few jobs sequential? [13:47:29] yes [13:47:48] Betacommand: ok that i feel too would be better.. but how did the set the threshold for no. of jobs? [13:48:46] rohit-dua: both python threading and multi-processing have the ability to keep track of the number of workers [13:50:03] throw in a time.sleep(60) and it should be easy [13:50:30] Betacommand: ok. i meant what did you set the max. number.... [13:51:11] rohit-dua: it depends on the size and what the workers are doing. On mine I set it to 10 [13:51:46] rohit-dua: while testing on my local machine I had as many as 50 [13:52:37] just take a look at how much workload each worker creates and be reasonable about it [13:54:14] Betacommand: ok.. then I'll finalize the limit after the whole script (downloading/uploading) is complete.. [13:58:57] AAAAAH [13:58:58] Coren? [13:59:02] My crontab is gone. [13:59:12] 'read error in tmp file', and then it wrote the empty crontab [13:59:18] ! [13:59:20] nlwikibots [13:59:53] I should have it in text format somewhere, so it shouldn't be a huge issue... in this case at least [14:00:08] yeah, / on tools-login is full [14:00:48] Eeew. [14:01:03] People have been a little cray-cray in /tmp [14:02:07] !log nlwikibots crontab gone due to full /tmp; re-creating based on 'cronfile' which is a bit old but has most of the relevant bits [14:02:08] Logged the message, Master [14:02:32] Ah; someone edited a 4G file with... joe. that then crashed hard and left its tmp file [14:03:28] Eeesh. It's not immediately obvious how that's preventable in the general case. [14:03:28] Coren: hmm. As for the crontab - maybe not save an empty crontab by default? [14:03:55] i remember that i did that before :\ [14:03:56] or show a diff and ask the user whether that was the expected change? [14:04:12] valhallasw: That sounds like a sane sanity check, especially since crontab -r remains available. [14:04:15] i should go checking after such incidents, but why? [14:04:19] or, finally, force people to feed crontab from a file [14:04:39] that last one actually sounds like the best option to me, even though it's non-standard unix [14:05:53] valhallasw: Arguably, but endusers expect crontab to... well, work like crontab. I expect the stock crontab doesn't behave any differently for that matter; it also uses /tmp [14:06:24] I think 'use crontab with a file' is best practice, certainly, but axing the -e functionality seems wrong. [14:06:41] Sure. [14:07:18] maybe we can add that it stores the crontab in the tool's directory? [14:07:38] that wouldn't help if it overwrites it [14:08:03] the problem is that crontab can't know the difference between 'creating the temp file failed' and 'user cleared the file' [14:08:48] Coren: oh, actually, one could just make crontab -e edit ~/.crontab and 'crontab ~/.crontab' that [14:10:08] !log nlwikibots crontab restored; now let's move dplinks from TS to TL [14:10:10] Logged the message, Master [14:10:51] (Forgive me if this is a stupid question) One of my python codes requires Beautiful Soup 4.1.2+ library, but the version available on labs seems to be older, is it possible to install a newer version? What's the process? [14:11:42] ali_: virtualenv should work [14:11:56] ali_: that basically gets you your own 'python environment' where you can install your own packages [14:12:52] Got it, thanks valhallasw [14:18:23] is there any way to get informed through the grid itself that a job completed successfully.. or do I have to check manually (qstat) every time... [14:20:46] rohit-dua: if you run a job with -sync y qsub will wait until the job is done before returning [14:21:44] valhallasw: It will then run only one job at a time .. right? what if I want to ru nsay 10 jobs at a time... [14:23:02] I think there's a way to do dependencies with sge, but gifti has more experience with array-like jobs [14:30:47] !log nlwikibots dplinks ported. Farewell, Toolserver, it was nice knowing you. [14:30:49] Logged the message, Master [15:53:25] I'm trying to connect to redis via redis.StrictRedis(host='tools-redis', port=6379, db=0) which gives error "" 'module' object has no attribute 'StrictRedis' "" what can be the problem... [15:54:19] rohit-dua: bad coding? [15:55:35] Betacommand: is the syntax wrong? i checked it from redis-py documentation.. do we have to give smthing the db parameter? [15:55:53] rohit-dua: thats a syntax error [15:56:15] did you import it correctly? [15:56:55] Betacommand: yes... still not working ;( [15:58:43] rohit-dua: is it already installed on labs? [15:59:36] yes. import redis dose'nt give any error.... and redis.ConnectionPool(host='tools-db', port=6379, db=0) is working... [16:00:13] i'll use redis.ConnectionPool instead.. [16:00:27] rohit-dua: do you know what version of the library is installed? [16:03:55] Betacommand: idk. maybe its the version... [16:14:13] !log local-catbot Set a .my.cnf with a default server [16:14:16] Logged the message, Master [16:16:12] !log local-heritage Replaced the .my.cnf with the right credentials [16:16:13] Logged the message, Master [16:18:29] !log local-heritage Created the s51138__heritage_p database [16:18:31] Logged the message, Master [16:34:47] (03PS1) 10Multichill: Moving from Toolserver svn to Toollabs git * .gitreview * Removed some path stuff * Commons has a different server on labs This should get the tool flying [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/136649 [16:35:20] (03CR) 10Multichill: [C: 032 V: 032] "Bla, self merge" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/136649 (owner: 10Multichill) [16:35:58] !log local-heritage Cleaned out some code in https://gerrit.wikimedia.org/r/136649 and merged it [16:35:59] Logged the message, Master [16:41:06] !log local-heritage Fixed ~/.database.inc , still have to do the i18n part [16:41:07] Logged the message, Master [16:46:40] !log local-heritage Moved erfgoedbot, public_html & pywikipedia to ~/old/. to make room [16:46:43] Logged the message, Master [17:01:48] !log local-heritage Pull pywikibot (compat) and heritage. Symlinked it and setup the bot [17:01:50] Logged the message, Master [17:14:09] !log local-heritage Updated ~/bin/create_all_monuments_tables.sh and created 105 tables. Fired up update_database.py to fill the database [17:14:11] Logged the message, Master [17:55:37] wee [18:15:05] hi. why is there the oldest version of redis installed on toolslab.. 2.6.13 and the latest is 2.8.9 [18:18:48] rohit-dua: because that's the version provided by ubuntu [18:19:55] valhallasw: oh.. but 2.6.13 isint even avial. for download.. it starts from 2.6.14!! [18:27:11] rohit-dua: er, no? [18:27:12] https://code.google.com/p/redis/downloads/list?can=1 [18:27:21] that goes back to at least 1.1.* [18:27:47] http://redis.io/download only shows 2.6.17 because that's the newest 2.6 release [18:28:11] what's the prblem with 2.6.13 exactly? [18:29:08] valhallasw:thank you... well some functions are implemented. like p.get_message() [18:29:49] ...what? [18:31:36] For publish/sucscribe https://github.com/andymccurdy/redis-py#Publish--Subscribe [18:32:19] sory https://github.com/andymccurdy/redis-py#publish--subscribe [18:32:55] erm, that's available on tools' redis [18:33:54] so: what are you trying to do, and why didn't it work? [18:42:09] valhallasw: i'm building a redis based queue for my tool [18:43:05] rohit-dua: and the current redis has everything you need for that [18:43:39] valhallasw: yes I got what I needed. [18:43:45] :) [19:41:40] (03PS1) 10Multichill: Some tweaks to get stuff working * Fix database name (redundant space at the end) * Fix the CH database in English (still pointing to the wrong db) * Remove the legacy Wiegels database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/136683 [19:42:28] (03CR) 10Multichill: [C: 032 V: 032] "Bla, want to have this up and running tonight." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/136683 (owner: 10Multichill) [19:44:12] puppet makes instances slow it seems … [19:46:16] Are all the bots on ToolServer/Labs CC--3.0? [19:47:35] no, but they have to be under a free license [19:47:53] Thanks. [20:00:19] gifti: since when do bots have to be freely licensed? [20:00:41] I'm not sure about labs, but certainly not on the toolserver [20:00:44] Earwig: anything on tool labs has to be freely licensed [20:00:56] but indeed, not on the WMDE toolserver [20:01:11] right [20:01:30] not that I agree with that policy, but... [20:01:40] T13|sleeps: so, Tool Labs: OSS licensed, Toolserver: not necessarily [20:01:58] also, it's not necessarily well-defined, as not everyone places a LICENSE file [20:02:12] so even if something is on TL, that doesn't mean you can actually copy it [20:02:23] just that the original author is violating the rules [20:02:34] (which is not going to help you if the original author has been run over by a bus) [20:02:41] at least the TS has a default license thing you're recommended to set [20:02:57] yeah, but apparently that was too hairy for the legal staff [20:03:07] as we are not allowed to copy tools from those users [20:03:11] huh [20:03:25] So, can I freely copy tools and bots from TS to TL without issues? [20:03:36] That's my main question. [20:03:39] T13|sleeps: no [20:03:41] no [20:03:48] we just explained that [20:03:50] T13|sleeps: only if there is a LICENSE File that states you can [20:03:56] Ones that have been abandoned. [20:04:05] Hrmm. Okay. Thanks. [20:04:06] T13|sleeps: and *maybe* if the user has set a default OSS license, but that's not entirely clear [20:04:12] which tool are we talking about? [20:04:27] you can check https://toolserver.org/~nemobis/tsusers/licenses.txt [20:04:46] ah, I was just getting getent out to do that :-) [20:05:03] !log local-heritage Some tweaks in https://gerrit.wikimedia.org/r/136683 database is filled. Api is working (admintree and statistics still missing) [20:05:04] Logged the message, Master [20:05:14] None in particular yet. Just thinking ahead of trying to save some of the tools that haven't been moved yet. [20:06:12] T13|sleeps: best course of action: mail those people, and get them to explicitly allow you to move the tools [20:06:37] T13|sleeps: if the tool is alive, then the author is active enough to extend their account every 6 months [20:06:45] and thus active enough to send you a reply -- hopefully. [20:07:07] Maybe... lol no guarantees though. Lol [20:17:33] And keep in mind that Toolserver is scheduled to be shut down at the end of this month, IIRC :-). [20:18:25] Earwig: yeah, sorry, i didn't read toolserver [20:18:47] !log local-heritage Set up cron to run the update_monuments job every night. Some parts of it will still fail. [20:18:48] Logged the message, Master [20:20:25] valhallasw: i only ever state my license directly in the files, so no license file ;) [21:39:43] hello by mistake i removed my account name on wikitech from local-jbbot. how to write it back? [21:48:21] scfc_de: can you fix that? ↑ [22:10:30] javedbaker: (I hope) I added you again to javedbaker and jbbot. ("hope" because I don't read Arabic (?) :-).) [23:36:47] 3Wikimedia Labs / 3tools: Tool Labs: Provide filtered view of user_properties table containing short list of properties, linked to userID - 10https://bugzilla.wikimedia.org/64115#c1 (10JuneHyeon Bae (devunt)) p:5Unprio>3High s:5normal>3minor Since toolserver shutdown, all tools should be migrated to... [23:39:16] 3Wikimedia Labs / 3tools: Tool Labs: Provide filtered view of user_properties table containing short list of properties, linked to userID - 10https://bugzilla.wikimedia.org/64115 (10Nemo) s:5minor>3normal