[00:04:59] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Needed Toolserver features was modified, changed by Fastily link https://www.mediawiki.org/w/index.php?diff=660792 edit summary: [+317] /* Bots project */ + [06:47:44] !log account-creation-assistance Altered security group 'default' config, changed rule 'from 80 to 80 proto tcp ip 10.4.0.54/32' to 'from 80 to 80 proto tcp ip *10.4.0.0/21*' to fix issues with accessing web services on accounts-application [06:47:48] Logged the message, Master [09:02:13] Coren ping [09:02:48] is there a way to manually drop a queue for certain host for specific time? [09:06:52] addwork ping [09:57:38] !log wikidata-dev wikidata-testclient: Pulled merged changesets in puppet from gerrit [09:57:42] Logged the message, Master [10:14:49] !ganglia [10:14:58] @search ganglia [10:14:58] Results (Found 2): load, load-all, [10:15:02] !load-all [10:15:02] http://ganglia.wikimedia.org/2.2.0/?c=Virtualization%20cluster%20pmtpa&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [10:15:05] eh [10:15:10] where is labs ganglia [10:18:48] lol addwork [10:18:57] 750 jobs within 10 minutes? [10:19:02] come on :D [10:19:33] fickin queues [10:19:43] I need to find out how to disable them [10:21:19] and more are coming... [10:21:34] great [10:24:09] petan: i think its his cron? [10:27:20] yes [10:27:25] Beetstra hi [10:27:42] Beetstra your bot is eating lot of ram on bnr1 :o I don't really care but right now there is about 12gb of memory used [10:27:53] as it require swap to be used, it slow down the box [10:28:06] so, just FYI your bot is probably very slow now [10:30:41] hmm .. [10:30:46] let me look [10:31:24] (I actually thought it was going pretty fast) [10:33:10] hmm ... [10:33:30] coibot is eating memory .. ? [10:35:30] I don't know... [10:35:42] there is one process eating about 3g and dozen of small processes [10:36:00] I saw that coibot had about 7.7G ... [10:36:09] I mean, the main bot, 'coibot.pl' [10:36:29] hey, -bnr3 ..! [10:36:36] I have restarted the bot [10:37:12] aha [10:38:04] no wonder it put it to bnr3 given the load on bnr1 [10:38:10] see qtop [10:39:15] I'll slow down the linkwatcher a bit [10:39:28] @notify Merlissimo [10:39:28] This user is now online in #wikimedia-tech so I will let you know when they show some activity (talk etc) [10:39:47] Beetstra, you shouldn't need to - there is no problem with load, rather with memory [10:40:57] well .. that memory-load is due me running a large number of LinkParsers on the linkwatcher, which each eat a piece of memory [10:41:11] And the more linkparsers, the faster the bot [10:41:11] aha [10:41:11] but maybe this is overdone :-) [10:52:17] linkwatcher is a bit slower now (munching the backlog at 1 file every 5 minutes in stead of munching 1 file every 3 minutes) [10:53:20] ok as long as that is ok to you :P [10:54:12] !log bots petrb: removed exec from gs [10:54:18] Logged the message, Master [10:54:25] It will just take longer [10:54:40] !log bots petrb: found log files for master server in /var/spool/gridengine/qmaster yaylog removed exec from gs [10:54:41] Due to my good friends Legobot and Addbot I have a backlog of several thousands of files :-p [10:54:42] Logged the message, Master [10:54:58] actually its just addbot now [10:55:10] legobot stopped editing ~10 hours ago due to a bug i havent fixed yet [10:55:14] well [10:55:16] heh [10:55:23] interwiki removal edits [10:55:30] it still does a bunch more stuff :P [10:56:13] But as long as it is munching backlog files .. [10:56:26] ah [10:56:27] If it starts writing backlog files again, then it means that it can't keep up with real time [10:56:48] Don't worry, legoktm .. I'll still blame you ;-) [10:57:30] heh [10:57:44] ill start have to categorizing my scripts as "beetstra-compatible" and not :P [10:58:39] Well, legoktm .. in a way you should blame the spammers. If they would not be there, we would not need to monitor who is adding links [10:59:12] SGE is really dumb thing [10:59:34] for some reason it put your bot to worst possible candidate instance :P [10:59:55] your bot was eating shittons of ram so that new instance was put to instance with lowest free memory [11:00:01] heh [11:00:16] I expect you will get to same no memory troubles soon [11:00:36] I don't know, maybe it was due to the load on the box [11:00:44] yes but bnr2 would be best [11:00:45] I have never seen memory-hole-problems with coibot [11:01:23] I'll try to keep an eye on it [11:03:34] Beetstra ok right now usage is 7.3g and ram is 7.8g [11:03:35] :P [11:03:44] that box will run out of memory within few minutes [11:03:50] see qtop [11:04:02] not blaming ur bot, this is rather mistake of SGE [11:04:13] that box had low free ram before you started your bot [11:04:21] it should have never been submitted there [11:04:43] !log wikidata-dev wikidata-dev-9 Set all logrotate config (except the puppet-managed ones) to daily and rotate 2. Submitted bug 46259 to limit size of dispatchChanges.log (on repo). [11:04:47] Logged the message, Master [11:05:51] meh .. what does coibot have today [11:06:24] Nooooh .. that is not coibot fault .. coibot is only using 174 (plus all the modules) [11:06:40] addbot is using 210m per module, and glusterfs is using 721 [11:06:51] meh [11:07:06] Beetstra I told you it's not a problem of COIBOT [11:07:23] that box was already overloaded with other stuff [11:07:34] 7.4g :o [11:07:38] WOW .. addbot has 991 processes running [11:07:42] indeed [11:07:58] and I was only running 105 linkparsers before .. [11:07:59] wow [11:08:59] it insists to put coibot on bnr3 [11:10:16] petan .. to run coibot -> '/home/beetstra/coibot.sh' .. in case you manage to change tell it to run somewhere else and I am not around to restart - feel free [11:10:32] petan: Did you set up the virtual_free config and it is still failing? [11:11:00] scfc_de where? I was googling a lot and couldn't find how to set up max memory usage [11:11:18] I added h_vmem to load_thresholds [11:11:25] petan: Didn't I post a link yesterday/this night? [11:11:29] !logs [11:11:29] logs http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs [11:11:30] and it just was producing some errors [11:11:41] uh yesterday night I wasn't here [11:12:26] I need a config that will do: if memory is over 6gb - don't submit jobs to this box [11:13:05] Then you have an impostor :-). http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs/20130317.txt: "[17:54:21] petan|wk: http://jeetworks.org/node/93" [11:14:51] scfc_de: error - unknown attribute virtual_free [11:14:55] doesn't work :/ [11:15:20] oh nvm [11:20:13] ok scfc_de now the configuration worked, but nothing really happened, I don't know what is it supposed to change though [11:20:17] there is no dropped queue [11:20:25] but maybe it changes behaviour of submitting? [11:20:40] I changed mem_free instead of virtual_free [11:20:48] because I don't want swap to be used [11:22:46] yay I found it [11:22:54] what we need is mem_used in thresholds [11:23:34] Beetstra I fixed teh memory problem :o [11:23:46] if you restarted your bot now, it would for sure wouldn' [11:23:50] t go to bnr3 [11:23:57] but it 's fine now [11:24:00] keep it there [11:24:11] running addwork's task's will end and memory will free [12:40:04] petan: Pong. [12:40:11] Cheers, petan [13:00:56] Coren I figured out how to do that, but I will need to write it somewhere because it was really hard to find it :D [13:01:04] I needed to set up maximal memory per node [13:01:24] it's load_thresholds mem_used=SIZE [13:01:31] No, that' [13:01:53] that will stop submitting jobs to node when certain memory is reached [13:01:57] that'll just set /preference/ for what node to use, it won't prevent a process from eating it all. [13:02:03] of course [13:02:07] that's what I needed [13:02:13] I wanted to change the preference [13:02:15] You need to set h_vmem as consumable. [13:02:24] yes I know but that's not what I want to change [13:02:30] that's already set to 4gb [13:02:36] o_O [13:02:44] Beetstra's bot is eating more than 4gb [13:02:53] if you have that less than 4gb his bot won't work on tool labs [13:02:56] which one? [13:03:02] that one you restarted today [13:03:06] he was about 4gb [13:03:08] coibot? [13:03:09] * it [13:03:11] yes [13:03:12] probably [13:03:14] one of them [13:03:23] that one eating 46% of ram [13:03:30] You can't do that; you'll never have a stable grid. You simply need to set h_vmem on the nodes, and make it a consumable resource. [13:03:35] (I didn't restart anything) [13:03:41] Coren that's what I did in past [13:03:56] Let coibot request the memory it actually needs. [13:03:56] Coren but what if you have multiple small jobs which together eat a lot of ram huh? [13:04:08] petan: Then they need to set their h_vmem accordingly. [13:04:22] Coren imagine you have 1000 jobs on one exec node, each eating 100mb of ram [13:04:24] what would you do [13:04:34] from 1000 users [13:04:44] 1000 different, separate tasks with separate limits [13:05:02] you need to set up some global limit / per instance [13:05:03] If they have their h_vmem set properly, then nothing breaks. If you don't have enough resources, they'll remain queued. [13:05:13] Coren what is "proper" [13:05:18] what is your value for h_vmem? [13:05:34] how much [13:05:40] ... what? I don't have a value for h_vmem; the *jobs* do. [13:05:47] of course I mean in config [13:05:52] If they need 1G, they set it to 1G [13:06:05] Coren but these jobs don't know that? [13:06:15] howcome job can know how much memory it will need [13:06:20] that's something what changes on demand [13:06:22] petan: ... what? [13:06:22] coibot is only using 300 mb over 9 processes [13:06:48] petan: You have to set a limit to your job. If you really don't know how much, then you need to set the limit high and check how much you really used. [13:07:02] petan: and reduce accordingly. [13:07:09] that may not be possible for all bots [13:07:39] look at Beetstra's bot, today one of his processes was eating about 4gb of ram and it never did before [13:07:47] linkwatcher something like 2G over ~100 processes [13:07:49] it's just totally random [13:08:06] based on size of task the bot is processing [13:08:15] which is different every time it run.. [13:08:41] What I think happened with COIBot today, is that because the load was high, it could not cope with something and started storing things in memory (filling an array or something). [13:08:50] Coren of course, if you had jobs which do know how much ram will they need then it would be pretty easy to manage them [13:08:54] even without swap [13:09:09] petan: If the bot has been written in a way the ram it uses is unbounded, then it has a bug. You have two choices: limit the jobs and keep the grid stable, or don't limit the job and allow any one of them to bring a whole node down. [13:09:24] petan: That's why you put _hard_ limits [13:09:26] Coren but that's like 90% of all bots which we are hosting now [13:09:40] and I believe that it's same with other bots we do not host yet [13:10:22] Coren there is third option too - don't limit the job and configure the node so that it can't be easily brought down :P [13:10:41] petan: That option does not exist. [13:10:47] if you set a per node limits as well, the SGE will just stop submitting jobs to that "ill" node [13:10:54] of course it exist that option we are using now... [13:11:08] petan: No, that's the option that you are attempting to use, and failing. [13:11:19] well I have not failed yet in that [13:11:22] it works [13:11:49] if I configured SGE as you suggest result would be SGE killing all bots we have in loop [13:11:50] petan: If *any* job is brought down because of the behaviour of another, you have failed. [13:12:03] Coren but that's what never happened to me yet [13:12:13] no job yet crashed [13:12:15] petan: No, it would result in bots going over their limit being unable to allocate more memory. [13:12:24] petan: On Toolserver, Bots *must* specify how much memory they need, and they do, and it works. [13:12:30] being unable to allocate more memory = crash [13:12:31] scfc_de: That. [13:12:53] petan: If a bot crashes because it is unable to allocate more memory, then that bot is buggy and needs to be fixed. [13:13:01] scfc_de when I was running bot on toolserver I didn't, but it wasn't using SGE anyway [13:13:27] petan: Sorry, you just didn't *notice* the hard memory limit. [13:13:29] Coren you should then announce that somehow so that all people have time to fix their bots then.... [13:13:34] Coren: if I configure a job with a limit, and my bot reaches that limit, will the job be killed? [13:13:42] Darkdadaah yes [13:13:53] petan: Now you have to because rogue bots were bringing the house down :-). [13:14:02] Darkdadaah: Further mallocs will fail. If you have error checking in your code, then you'll be able to handle it the way you want it. [13:14:04] while Beetstra would limit the amount of ram used by the bot, you can do little if you allow an user to run hundred of job in parallel [13:14:06] scfc_de well, I moved the bot to labs ages ago [13:14:10] !log wikidata-dev wikidata-dev-9: All Wikidata log files are now in /var/log/wikidata-*.log files. I'm trying to have them managed by logrotated. [13:14:13] Logged the message, Master [13:14:28] Ah ok, thanks. [13:14:35] phe: Why? Every job has its own memory limit; SGE knows how to add. :- [13:14:38] phe: :-) [13:14:55] Coren anyway I don't see what is wrong on setting per node limits as well, it will hurt nothing and prevent unexpected troubles [13:14:59] petan: https://wiki.toolserver.org/view/Rules 9.5 [13:15:31] Coren your current explanation of how stuff should be configured seems to me like it would work in "ideal world" only, but not everything is ideal in fact [13:15:54] like there is no reason why SGE should be allowed to submit jobs to node that is OOM [13:16:00] phe, only if SGE settings is sane :) [13:16:09] -phe +Coren [13:16:22] petan: You are lacking a critical requirement: "a bot that is running must not be brought down by another bot's behaviour". It's simple. It's clear. [13:16:23] I know it should never happen for that box to be OOM if the world would be idal [13:16:25] * ideal [13:16:37] Coren of course - that's what I follow [13:16:45] it also never happened to me that it would happen [13:16:45] petan: There is also no reason why a job should be allowed to bring a node down. Ever. [13:16:59] petan: Really? You never had to kill a job? [13:17:06] Coren not on this grid [13:17:40] * Beetstra points at legoktm ;-) [13:17:42] that box was already overloaded with other stuff [13:18:15] ill start have to categorizing my scripts as "beetstra-compatible" and not :P [13:18:23] Those are things that should never happen. [13:19:17] * legoktm was joking :P [13:20:13] phe: Indeed. [13:20:32] phe: And I've been trying to explain to petan what needs to be done so that assertion is true. :-) [13:20:42] !log bots petrb: everywhere iptables -A OUTPUT -p tcp --dport 25 -j REJECT [13:20:44] Logged the message, Master [13:21:26] what about dedicating an instance for addshore (and legotkm ? I've not followed the story) so they can only kill themselves :) [13:21:33] * Beetstra starts studying BSD::resource [13:22:02] phe: i think this is only a temporary problem, our jobs should be done by the end of the month [13:22:29] Right now, on tools-, if your process runs and is bounded in its memory use it will keep running until the VM is brought down. If it grows unbounded, it will not affect other jobs. If other jobs go berkerk, it will not affect yours. [13:23:08] In an environment where any community dev can run things they write, nothing else will do. [13:24:07] but you'll have trouble with sane SGE settings, bot will need to overestimate the amount of resource they need and instance will tend to be underloaded [13:25:54] phe: Possibly. They are VMs, though, so "underloaded" is not a concern. Better underloaded than unstable. [13:25:54] phe: And bots who requests too many resources get penalized when they try to run; so the maintainers are encouraged to reduce it as much as they safely can. [13:26:14] phe: Something like an IRC bot with modest memory requirements will run easily, at high priority, and be rock stable. [13:26:23] !log bots root: created /home/addshore/.forward [13:26:25] Logged the message, Master [13:26:34] right, and you can add more instance to fix that, the way sge works, you'll have probably less potential trouble with a greater number of small instance than a small number of big instance [13:26:46] phe: The memory hogs have a choice of request more and be penalized accordingly or split the task up in smaller, predictable chunks. [13:26:59] phe: Exactly. [13:27:25] Coren but legoktm was joking about his bot producing job for Beetstra's bot on wiki [13:27:38] (beetstra's bot need to check every edit of legoktm's bot) [13:27:45] he wasn't talking about system load [13:27:59] petan: I know, by point was about Beestra's growing to accomodate the extra load. [13:28:02] my* [13:28:10] petan: It's an externality. [13:28:21] petan: You cannot let externalities bring unrelated jobs down. [13:28:25] but if beetstra accomodate extra load - that node will just stop being used for any other jobs than beetstra's [13:28:34] so everything will be ok [13:28:46] that is actually what happens now [13:28:54] petan: Really? You have a mechanism in place where other jobs that were *already* running on that node will be migrated as resources diminish? [13:29:37] well, that's what I deemed impossible in past when I was talking with you regarding OOM, but I didn't know you want to know how much memory jobs will need before they are actually started [13:29:55] that of course will work (your proposed solution) but it will waste lot of system resources too [13:30:13] because some bots will ask for more than they will actually use [13:30:24] which will result in lot of unused memory [13:30:26] petan: Yes. That's exactly what you want. [13:31:17] * Beetstra jokes: "I'll make the bot ask for 4G per process .. even if I use only 25M ;-) [13:31:58] Beetstra: And that'll work. You jobs are going to be rock solid, but you're not going to like the result since your jobs will run at a big penalty, and won't have many slots to run in. [13:32:04] Coren that's a question... in case there will be HUGE number of small bots, there will be huge amount of memory that will not be used [13:32:16] I know, Coren [13:32:29] which isn't ideal either [13:33:00] petan: I think if memory is that important, maybe we should instead fix Beetstra's bot not use 4 GB :-). [13:33:07] petan: It's a minor problem at best. The bots keep running, which is what counts. If users go overboard, you teach them the joys of pacct. :-) [13:34:00] petan, it's a choice between underloaded instance, and "OOM can occur at any time" [13:34:06] petan: Requirement 1: jobs stay up. If a job is hindered by the infrastructure, then the infrastructure is wrong. [13:34:38] Coren well, but in this very case - beetstra's bot would crash on your configuration, while on mine it survived :P [13:34:40] petan: Corollary: if a job is hindered by an other job, the infrastructure is wrong. [13:34:51] Coren because it would eat more memory it asked for [13:35:04] which of course you can consider his problem.... [13:35:14] * than it asked for [13:35:30] petan, SGE in the toolserver already ask to provide an upper bound for memory usage [13:35:36] grmpf, BSD::resource is not installed [13:35:39] it doesn't seem like a problem [13:35:40] Platonides yes I was told [13:35:46] Platonides but I wasn't using SGE at all [13:35:48] on toolserver [13:35:59] petan: If his bot is written in such a way that it is unable to limit its memory use, it will crash *anyways* even if it were alone on a node as it consumes the resources. The difference is only "what else will it bring down" [13:36:07] well, I admit it's a bit annoying [13:36:17] Platonides, sge virtual_free is ignored on the TS for all linux box :) [13:36:24] petan: Which is why there was a rule put in place that requires SGE use. [13:36:53] phe, is it? [13:36:57] Beetstra: What's your bot written in? [13:37:02] Coren fair, but I am still afraid lot of resources will be wasted [13:37:08] Platonides, it's ignored on linux box 'cause java process require to much memory and never start if virtual_free is used [13:37:09] we'll see [13:37:10] coren: perl [13:37:20] petan: Then you are just worrying about the wrong things. [13:37:33] btw Coren I didn't need to provide information about memory when I was starting job on your cluster [13:37:51] Beetstra: How are you growing unbounded, titanic hashes? [13:37:51] phe, don't use java :P [13:37:52] Coren: things that are paid from donations :P [13:38:07] petan: That's because I have a sane default at 256M that works for most things. [13:38:20] petan: A TB of RAM costs, what? :-) [13:38:24] Coren: ? [13:38:29] Beetstra: Is the source available somewhere? [13:38:29] petan: Bttz. Virtual things that run on things that are paid from donations. :-) [13:38:38] in my directories .. [13:38:55] petan: Could you add me (scfc) to the Bots project? [13:39:17] Beetstra: No, I mean, if your perl program can grow to very many GB of size, either it has big-ass hashes, or you never undef stuff when you're done with it. :-) [13:39:44] or I am push-ing stuff in an array that gets never emptied [13:40:05] Beetstra: That too, but then that's very predictable as a behaviour. [13:40:10] Like 1400 edits per minute through legoktm-bots ... [13:40:14] Beetstra: You can plan accordingly. :-) [13:40:29] well, it was .. it was running at 600 edits per minute without problem .. and then there were lego and addshore ;-) [13:40:35] Beetstra: What's that array? A queue of "stuff to do"? [13:40:41] yep [13:40:57] and .. the bot is smart enough that when the queue gets too big, it saves the queue as a backlog and empties the queue [13:40:59] !log bots inserted scfc as member [13:41:03] Logged the message, Master [13:41:03] well, linkwatcher, that is [13:41:03] Redis? [13:41:07] petan: Thanks. [13:41:11] yw [13:41:14] COIBot never had that problem to worry about [13:41:36] Beetstra: That seems like a good way of doing things that should keep memory usage to fairly stable levels. [13:41:44] Beetstra, then I don't know why it is having problems [13:41:45] It does [13:41:54] Beetstra: as a tangent, if you do have some time and the data is still available, i would be interested to know how fast my bot was editing globally :P [13:42:13] Moreover, the sub-processes in the bot respawn themselves after so much work [13:42:32] Platonides because only one of his bots were doing that :P [13:42:39] Platonides: COIBot had problems with the massively increased editing speed - from 600 to >2000 edits per minute [13:42:42] Platonides the second one wasn't saving to disk [13:42:43] Beetstra: That's probably even overly paranoid, but ironclad. [13:43:26] Coren: the main bot even kills sub-processes when it detects one is hanging .. [13:43:48] Beetstra: Yeay! Robust programming. :-) [13:43:55] Beetstra, you may want to have an ignore list, populated with other bots [13:44:13] Platonides .. that is what I now have [13:44:37] Coren there may be some cases when this behavior of SGE would be counter productive, such as in case of wm-bot but that is not going to use SGE soon anyway so I don't care [13:44:41] Coren: but no memory limits .. so not completely robust [13:44:53] when hard drive breaks, wm-bot start using operating memory as alternative storage [13:45:01] and empties it once hard drive is remounted and working [13:45:11] in that case if SGE killed it, many people would become evil [13:45:17] especially sumanah [13:45:18] :P [13:46:37] and virtual hard drives breaking is happening quite often on gluster - and TBH, can happen to any kind of storage [13:46:47] petan: You're making a number of unwarranted presumptions, the first of which is that a box where the hard disk failed would suddently grow infinite amounts of ram. [13:47:09] petan: It would run out of ram eventually *anyways* [13:47:10] Coren of course not - but when usage slowly grow up - I can smell troubles [13:47:21] check what is going on - remount disk and {{fixed}} [13:47:40] in case of SGE-terminator my task would probably die before I noticed [13:47:52] petan: ... why? [13:47:54] Coren I am talking about growing like 5MB per day [13:48:04] that wouldn't kill the box OOM withing few minutes [13:48:05] petan: 5M/day == minuscule. [13:48:21] of course if I didn't fix it, the box would die OOM in few years [13:48:29] petan: Just start the job with a healthy padding. Splurge! Give it an extra 100M! [13:48:39] that's possible? [13:48:49] petan: ... of course! [13:49:05] is that even possible for SGE to "grant extra memory in case process is getting out of it + send email to operator to fix it" [13:49:13] it'll better to fix glusterfs than to work-around at application level :) [13:49:29] phe: yes but it can happen with any storage in future [13:49:48] petan: Well, not that way. What you want to do is give it an extra 100M to begin with, and set a soft cap 100M below. Then you can handle the soft cap any way you want. [13:49:49] bots which are supposed to literally never crash - needs to count with breakages of any kind [13:50:20] petan: But wouldn't it be better for the *bot* to give a warning the minute it looses its disk? [13:50:30] petan: After all, /it/ knows. [13:50:47] Coren yes but what I mean is if this process could become automatic somehow, so that in case some process would be getting OOM, it would get extra "grace memory" and email would be dispatched to operator [13:50:56] so it can be fixed without need to kill anything [13:50:59] petan, as a fallback if it loses the disk where it is saving its data, make it email that 2GB file to the admin with instruction on where should it be placed ;) [13:51:19] Platonides yes that's 3rd alternative actually :P [13:51:28] petan: Like I said, it works the other way around. You can never go above your hard limit, but you're perfectly allowed to set a soft limit some margin below that and act accordingly. [13:51:52] Platonides 4th is connect to irc and post the RAW data to irc channel so that someone can grab them and fix it [13:52:09] I.e.; give 1G as your hard limit, but set a soft limit at 750M and handle it whichever way you choose. [13:52:11] 5th is sending e-mail to barrack obama [13:52:19] you would have problems with irc flood limits [13:52:26] Platonides 1 message per second of course [13:53:06] Coren is it possible to hook some script to soft limit [13:53:21] like execute: warning.sh when soft limit [13:53:46] petan: Not globally; what happens when you hit your soft limit is that the job gets sent a SIGUSR1 [13:53:54] ah [13:53:57] petan: The job can then handle it the way it wants. [13:54:01] well but wrapper could handle that [13:54:14] some jobs can't even read signals [13:54:21] (these written in cross-platform languages) [13:54:26] not all platforms have signals [13:54:32] petan: Sure, you can handle the signal and for the "actual" job. [13:54:35] but wrapper script can do that [13:54:35] python and signal are a pain for example [13:54:56] s/for/fork/ [13:55:08] petan: The point is, there is a mechanism to warn. [13:55:11] phe, but you can do it [13:56:06] Platonides, if you want reliabilty you need to handle EINTR in many point of the application, most people will not get the point [13:56:34] * Beetstra needs to figure out how to use setrlimit .. tomorrow [13:57:44] oh .. maybe I did :-) [13:58:51]