[00:00:06] you mean have 1/2 disks from each controller in each set [00:00:12] Right. [00:00:43] I guess assuming we'll just have two raidsets, it's not actually complex [00:00:47] well, 3 [00:00:55] Right, the OS. [00:00:58] /, and /srv [00:01:09] where /srv is two sets combined with LVM [00:01:43] Sounds like the "obvious" approach. [00:02:00] yeah. that may be a sane approach [00:06:11] Hm. So here's an idea. [00:07:02] 2 for the os raid 1, two arrays 8+2 for the data, and two disks for snapshot. [00:07:47] I did no spares on the old config, btw [00:07:54] since we were using two raid 6 [00:08:01] didn't see the point [00:09:08] Yeah, spares on raid 6 isn't as useful. [00:09:23] But a snapshot drive or two -> easy fun replication. :-) [00:09:27] hm [00:09:35] the second controller doesn't show any disks :( [00:09:51] it's an H800 [00:10:03] are the damn shelves not connected? [00:10:29] well, let me check the others [00:10:47] Can't help you there. I don't see the machine room from here. [00:10:52] :) [00:22:01] I also see none on labstore1003 [00:22:27] I remember having this problem before and solving it via foreign config [00:22:31] but I can't seem to use tat [00:22:33] *that [00:23:45] [08:35:51 AM] legoktm: It doesn't, but perhaps you ran out of memory? <-- I suppose thats possible. Whats the max set to by default? And can I raise it? [00:24:10] I wonder if the system bios is causing a problem [00:27:15] legoktm: 256M, which is a bit tight for python; you can just invoke with -mem 512M or 1G, say. Be generous, and use qstat to see how much it really takes is the best way. [00:28:42] Ryan_Lane: Can't find anything on Google on that topic. [00:29:38] I have a good feeling the shelves aren't attached [00:29:58] Coren: Ok I'll try that in a bit [00:36:47] * Ryan_Lane screams [00:36:52] I hate hardware so much [00:37:08] it keeps saying it's going into the config utility, then it bypasses it [00:38:54] Ryan_Lane: Yeay silicon! [00:39:01] so full of hate [00:39:02] (Actually, yeay crappy bioses) [00:44:08] well, eqiad isn't going to be a great test environment without all of the drives [00:44:18] I guess I'll put a ticket in [00:44:34] it's really pissing me off that it won't let me back into the LSI config [05:35:47] heh [05:35:49] this time i got a [05:35:51] MemoryError [10:30:15] legoktm what's up?? [10:30:31] oh hey [10:30:38] i was testing a script on the tools project [10:30:45] ah... [10:30:46] and i couldnt figure out why i kept running out of memory [10:30:55] ok, did you? :D [10:30:55] so i just gave up and am running it on my laptop now [10:31:12] well i allocated 1G to it [10:31:16] well you can always put it back to bots :D instead of laptop [10:31:30] yet when i run it on my laptop it only takes up like 400M [10:31:33] heh [10:31:36] weird [10:31:38] i plan to [10:31:54] this is like .1% of the scale of what i want to do [10:31:56] check versions of libraries vs tools [10:32:04] maybe you have newer version on your laptop [10:32:22] well its just python, and theyre both 2.7.3 [10:32:28] hmm [10:32:41] there are always problems when u restrict memory :> [10:32:51] programs are always asking for far more than what they use [10:33:10] so I believe your app on tools was killed just for requesting too much, it wouldn't use it anyway [10:33:19] no i dont think so [10:33:24] it runs for about 5 seconds [10:33:36] yes but during these 5 seconds it send request for memory [10:33:39] then it dies with an out of memory [10:33:50] what do you mean? [10:34:05] it ask kernel for memory, that doesn't mean it will use it all [10:34:29] kernel pls gimme 1.2g of ram, kernel: you got it! or kernel: that's too much... fuck off [10:34:45] oh [10:34:56] that doesn't mean the app will actually use 1.2g [10:35:05] kernel give it 1.2g of virtual memory [10:35:17] in htop you can see virtual allocated memory vs real used memory [10:36:00] for example on my server, mysql has like 900MB of virtual memory but using only 80MB of real [10:36:11] I don't know why it asked for so much... [10:37:51] or now on my laptop - my IDE has 1990MB of virtual vs 270MB of real [10:38:24] you need to somehow tell your app to never ask for more than 1gb of ram [10:38:25] oh right [10:38:28] virtual vs real [10:38:48] python probably does the memory allocation itself just as c# [10:38:51] which suck [10:39:06] in c for example you can easily control your memory [10:40:23] IMHO the tools project should restrict real memory, not virtual... [10:40:54] virtual is good for sysadmin but not for a programmer [10:47:34] http://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used [10:48:01] VIRT is the virtual memory space: the sum of everything in the virtual memory map (see below). It is largely meaningless, except when it isn't (see below). [10:48:10] restricting memory to virt is nonsense [10:48:14] no idea why Coren did it [10:48:46] * legoktm has no clue [10:49:34] on bots only restrictions are for RES [10:49:41] so it should work there with no problem [11:16:21] python(81662,0x106806000) malloc: *** error for object 0x7f89cc847c00: pointer being freed was not allocated [11:16:21] *** set a breakpoint in malloc_error_break to debug [11:16:21] Abort trap: 6 [11:16:25] yayyyy [11:29:23] legoktm where? [11:29:27] tools or bots? [11:29:33] on my laptop :P [11:29:37] heh [11:29:38] :D [11:29:53] this is bug in python I think [11:30:09] no, its in one of the libraries im using [11:30:19] well i think [11:30:28] or that [11:30:38] im pretty sure its mwparserfromhell [12:19:10] petan: If you limit real then you're over allocating resources [12:37:21] Damianz over allocating VIRTUAL resources [12:37:41] Yeah but at some point it will want real ram [12:37:44] seriously, why should I care about them? [12:37:50] why? [12:38:36] it will only want real ram when RES memory will went up [12:38:41] resident [12:38:45] or shared [12:38:51] resident + shared is real ram [12:39:00] virtual is virtual ram (opposite of real) [12:39:35] your application may have 300GB of virtual memory allocated on box with 256mb or ram [12:39:44] it will still run ok... [12:40:01] But that program can use up to 300GB and at some point it will want real ram... or you'll be swapped to hell [12:40:04] Either way it's not ideal [12:40:20] but you can just kill it when it get over X res memory [12:40:28] if you limit it to 200mb or RES memory [12:40:38] it will get killed when it eat 200mb or REAL memory [12:40:53] limiting it to virtual memory sounds like nonsense to me [12:40:54] You could - but then why let it allocate more virtual ram than it can really use [12:41:10] because all these frameworks are crap [12:41:41] python, java whatever... if you really want to limit virtual memory you are forcing bot operators to write their app's in c or assembler [12:41:58] because that's level of programming where you can control memory [12:42:07] in java or python you can only dream about it [12:42:09] Not really [12:42:25] well, maybe there are some startup parameters I don't know [12:42:32] for python [12:43:39] I think there's a module to change the RLIMIT via the a api [12:46:15] petan, it's very difficult to write reliable application if they can be killed at any random point [12:46:29] yes [12:46:39] that's why I don't like idea of strict limits like this [12:46:48] that's the point of limiting virt [12:47:08] ok but it's also hard to write an application that uses low virt memory [12:47:12] Not really killed, just prevented from allocated more memory [12:47:45] Though we're not really working with low memory systems so it shouldn't be a huge deal.... [12:47:57] if you limit rss but not virt, you'll get killed when touching the memory, writing something like can if (a[0] == 'c') can kil your program [12:48:26] that's true, but on other hand it should never happen [12:48:39] because you are not supposed to allocate so much ram [12:48:51] why ? on a shared box it can occur [12:49:02] if your application is typically using 50mb of ram and you request 100mb limit you will hardly meet it [12:49:15] even if your virtual memory could be 200mb [12:50:06] it would happen only if there was some memory leak or whatever [12:50:23] which wouldn't in java or python [12:50:41] that's not the point, if you don't limit virt, the kernel can fail to allocate a page in physical memory and swap, so one wrong application can kill any other correctly written program [12:50:42] you can hardly know why java interpretor is allocating so much virt memory [12:50:55] Java handles memory allocation differently to anything else though [12:51:15] phe: why would that happen? the box can't be OOM [12:51:18] given the design [12:51:32] Swap is evil [12:51:44] GE is submiting jobs only on boxes with enough of free ram [12:52:05] so your box can't start on box with not enough ram [12:52:21] That means in theory all boxes can be running 1 job and nothing will get run :P [12:52:21] or it shouldn't [12:52:46] Damianz yes [12:52:50] Damianz that's what I think :P [12:53:22] Bah it's sunday today [12:53:30] I really want to see how coren handle that addshores bot [12:53:38] he will need to create like 200 instances :P [12:54:08] if I changed limits to what is now on tools, all 3 boxes would probably run 1 instance of his bot [12:54:10] that's a misdesigned tool [12:54:38] well, unfortunately I guess almost all bots we have are misdesigned at some point [12:54:45] they are all written in python or php [12:54:51] so they will all have same VIRT problem [12:54:55] SGE doesn't really fit some design patterns though... like I'd have to re-write cb's core to support autodiscovery as I need the other parts to know where everything else is running. [12:55:11] yes, same for wm-bot [12:55:22] I can't really imagine wm-bot running on SGE [12:55:39] petan, yeps, those misdesigned bot will not try ro run a few hundred instance of themself like addbot :) [12:55:47] *but those [12:55:59] heh who knows [12:56:06] hundred? [12:56:19] he is running like 3000 instances some time :D [12:56:53] Lots of bots follow that pattern - even cb does, yes it probably should use a threadpool or such but a child process forked off is good enough, most of the time. [12:56:59] but my point is - without limiting VIRT it can run on 3 boxes... while the same task on tools would require 3000 boxes [12:57:36] sum of all VIRT in that case might be some 1TB of ram while real memory usage is low [12:57:48] prolly the bst option for such very special like addbot will a separate instance, a big one, and let the bot owner manage it as he want [12:58:08] or run it on bots like grid? :P [12:58:12] and admin will have to look elsewhere when the instance crash :D [12:58:27] no warranty that your bot will not get randomly killed, but it never happened so far [12:59:01] Most stuff like php - if it's going to get killed then it will happen on start a lot of the time. [12:59:13] probably [12:59:30] Php is funny though - if you set the internal memory limit to lower than a certain value it segfaults on start with no error at all [12:59:39] heh [14:37:38] petan: Remember that part of my job is to give coding support for tool maintainers who are less experimented; this will include help with fixing bots that need to make their resource usage more bounded. [14:38:45] o.O so you wanna help me reduce the amount of memory I need? :) [14:39:15] legoktm: Sure; and the pattern learned can help other maintainers too. [14:41:49] ok well here's what it looks like right now http://dpaste.de/UrS3O/raw/ [14:42:49] i think the main problem is that it fetches the pages (and their content) in the first main thread and just holds it in memory until it gets processed by the next thread(s) which fills up rather quickly [14:47:37] I take it that the processing is very much slower than the fetching. [14:49:48] yeah [14:50:15] because it can fetch ~250 pages every 5 seconds, and it takes about 5 seconds to process every page [14:55:04] Well, I see an obvious tweak that might suffice with little change in performance: have the fetch implement a high and low water marks to the queue. I'm no python pro, but I'm pretty sure it has condvars available in its threading model? [14:55:49] Errr, [14:55:54] high and low water marks? [14:56:25] http://docs.python.org/2/library/threading.html#condition-objects [14:56:32] Okay, here's the general pattern: [14:56:52] Your main thread adds the pages to the queue to be processed, right? [14:56:53] oh locks [14:56:57] yes [14:57:22] Allright, when it hits a high water mark (say, 3x the number of processing threads you have), have it sleep on a condition variable. [14:57:35] In the thread that consumes from that queue... [14:58:07] if it consumes a page such that it now is below the low water mark (say, 1x the processing threads) wake up sleepers on that condition variable. [14:58:33] This way, the main thread keeps the queue full enough that the processors never stall, but the memory it consumes is bounded. [14:58:39] Oh I get it [14:59:14] thanks :) [15:00:19] No worries; it seems like the least intrusive way to do it in your model. I could see other ways, but this is probably the simplest and it'll do the trick. [15:05:28] So the trick is, in your main trhread, before you add a page to the queue you acquire(), check the queue length, if the queue is too full you wait(), add the page, and release() [15:06:00] In the thread that consumes the pages, you acquire(), get a page, if the queue is short enough you notify(), then release() [15:06:34] Where "too full" and "short enough" is something you choose that makes sense. :-) [15:08:45] the low water mark you want to try to be the lowest possible that still never lets the processors stall; one item per processing thread should do the trick. [15:09:51] the high water mark bounds your memory; you want it high enough that the main thread doesn't need to be woken up too often to add just a few pages, I'd try with 3x the low water mark as a reasonable place. [15:14:13] Ok I'm gonna go afk and when I come back in a few hours I'll try it :) [16:07:36] Oh hey, it's saturday not sunday... damn bank holiday weekends. Yay for 2 days extra [17:34:00] 2 days [17:34:03] we have only 1 day [17:34:15] Damianz you poor socialists from UK :D :D [17:34:41] whole day more [17:34:50] that's almost communism in UK [17:34:52] Well we technically have friday and monday... but I'm heading down to our other office on Monday so I only really get friday [17:35:38] if I wasn't told I would probably went to office anyway [17:35:48] Wouldn't be the first time [17:36:33] that feeling when you are sitting there alone, saying "something is wrong" [17:38:06] I can never tell... our team work all over the show lol [18:06:56] so petan amsterdam? :D [19:04:39] addshore I will be there :D [19:04:47] addshore you? :P [19:04:53] :) [19:05:16] btw my english is scary I speak far worse than I type :D [19:05:29] * Damianz wonders how nucular petan is now [19:05:38] what s that [19:05:40] nucular [19:06:08] You're nearer to the last nucular meltdown than the rest of us [19:07:48] mh [19:10:57] I should probably book the friday off work and register for ams thinking about it... it's a bank holiday on the monday for us =D [19:13:46] yay if Damianz go there we will really get drunk [19:14:23] drunk? I'm like a 6"5' fat dude, that requires lots of alcohol [19:15:38] well so am I :D [21:56:23] xky7%0nM! [22:04:58] Coren: is that your autogenerated password?