[00:01:27] *Finally!* [00:09:54] Coren: Welcome back! [00:19:50] Krinkle: Want to know something funny? [00:19:56] Shoot [00:20:47] Krinkle: nodejs would have been installed completely before I started fighting with my bouncer if I had noticed that it waited for a Y from me on another xterm. :-) [00:20:59] lol [00:21:21] {{done}} [00:21:21] Are you sure you have an amazing 10kb space on your harddrive? [00:21:33] I always find it funny when apt-get asks [00:22:54] Sorry for the delay. [00:24:52] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Notepad was modified, changed by MPelletier (WMF) link https://www.mediawiki.org/w/index.php?diff=664247 edit summary: [+7] +nodejs [00:30:51] Coren: no worries [00:31:01] Coren: btw, I find myself trying to figure out how I used jstart last time [00:31:28] One of the things I liked of using qcronsub from crontab (instead of something like "xxstart" from the command line directly) is that it is saved [00:31:49] I like jstart better in every other way, but there's that. [00:32:03] So, strictly speaking, is using jstart; it gets stored in the cluster config instead of cron [00:32:31] Oh, you mean the actual jstart command line. [00:32:38] not using cron is nice, nothing to say there. [00:33:16] Probably, a good practice would be to start bots through a script that invokes jstart with the desired arguments, that also gives you a good record and can be stowed in source control to boot. [00:33:16] but I mean, once I execute it if it stops or fails for some reason I need to figure out again the path to it, which name and how much memory etc, and whether I need -cwd etc. [00:33:33] For now I'm going to put it in an init.sh file in ~ so that it's recorded somewhere [00:33:53] Ha, just what I thought [00:33:55] Alrighty [00:33:57] Yes, GMTA clearly. "0( [00:34:02] :-) [00:34:30] Coren: So what does jstart do if there's a job with the name already? [00:35:16] jstart implies -once; if it already exist, it will log a single-line notice in the .err that it was already running, and exit quietly. [00:35:55] Great, so I won't kill myself if I run init.sh when it's already in the queue [00:36:01] Great [00:36:01] If you want to know live, you might want to add -stderr to the jstart command line, that'll give you errors and notices starting the job rather than send them to the file. [00:36:25] (stderr of the job itself is not affected and always goes to the file) [00:38:00] Coren: OK [00:38:19] (It would, arguably, be a nice idea to send errors to stderr when it's a tty, but I don't like programs that behave differently on a pty than over a dumb pipe) [00:41:45] Coren: Hm.. looks like jstart doesn't stop it from being queued twice [00:41:54] Right now I got my bot running and under "Pending jobs" in qstat [00:42:00] o_O [00:42:40] local-ecmabot@tools-login:~$ jstop ecmabot-wm [00:42:40] local-ecmabot has registered the job 144 for deletion [00:42:40] local-ecmabot@tools-login:~$ jstop ecmabot-wm [00:42:42] local-ecmabot has deleted job 145 [00:43:09] Huh. [00:43:30] * Coren checks something. [00:44:54] Ooooh. [00:45:31] Hm. I probably should have tested this first. [00:45:47] (Something about the new node is broken on the continuous queue( [00:49:36] Coren: Looks like memory is an issue with node as well (at least it needs more than the default). Can you see how much it is using? I started it outside jstart for now. [00:49:37] 8013 32117 0.0 0.5 735112 10804 pts/2 Sl+ 00:44 0:00 node apps/oftn-bot/wm-ecmabot.js [00:50:50] on tools-login [00:51:27] That 735112 is your size right there. Surprisingly expensive. [00:51:50] Coren: oh? [00:52:23] Actually, it [00:53:03] PID 32117 [00:53:15] that includes some shared libraries that might bias things a bit. If you want to be sure, start it with, say, -mem 2G and qstat it; that garantees real usage info [00:53:16] VSZ 735112 [00:56:23] Coren: Started it with -mem 1G. Still pending after 3 minutes. [00:56:30] Maybe there isn't that much available? [00:57:03] eh, nevermind. That'd be crazy [00:57:10] No, your job is in error state. [00:57:13] Note the E [00:57:23] job 150 [00:57:32] qstat -explain E -j 150 [00:57:39] I didn't get any error in stderr [00:57:51] That's a queue error [00:58:06] Oh, there's your problem. [00:58:14] qstat -explain E -j 150 tells me it is going to /home/krinkle/ecmabot-wm.err [00:58:19] :) [00:58:31] You're trying to start a job with a simple sudo and not a -i (or become( [00:58:40] indeed [00:58:59] You're not "really" local-ecmabot if you don't -i. :-) [00:59:14] I know, I just forgot. [00:59:20] I'll never use it again in labs [01:00:00] Hm.. stil erroring [01:00:01] Coren: qstat -explain E -j 150 [01:00:03] Coren: qstat -explain E -j 152 [01:00:22] Ah, that one is my fault. :-) [01:01:15] Missing config on the new compute node; I can't wait for the local users to be in LDAP [01:04:15] Coren: shouldn't this error be reported in stderrr and/or the .err file? [01:04:58] That's an SGE limitation, annoyingly enough. Or a feature, depending on how you look at it: the job will "unwedge" when the error condition is cleared. [01:05:14] I.e. It's not canceled, it just can't run until fixed. [01:05:17] Ah, okay. so it will try again (in theory) [01:05:46] So ahm. what do I do now? [01:06:05] Give me a few minutes to actually clear the error condition. :-) [01:06:12] alrighty [01:15:59] Krinkle: can't fix fast enough; removed the new exec node from the queue for now. Please resubmit? [01:16:52] job 153 [01:17:07] and running [01:17:17] Yep. 722M [01:17:23] Ouch. Quite a glutton. [01:17:55] cpu=00:00:00, mem=0.04173 GBs, io=0.00007, vmem=722.082M, maxvmem=722.082M [01:18:33] eh, it died [01:19:34] I gave it a few commands to respond to and then it died. [01:19:46] First 3 or so went fine [01:19:48] Any idea? [01:19:48] Does it fork and call another jnode? [01:19:58] You might have busted 1G if you did [01:19:59] I don't know [01:20:15] But where can I see what happened? [01:20:18] It just dissapears? [01:20:21] No .err [01:20:23] no qstat [01:21:55] Hm. I need to find *some* way to get some sort of message for busting memory out [01:22:14] Or perhaps preserve qstat [01:22:23] Maxvmem" 1.335G before it died [01:22:31] Where did you get that from? [01:22:38] qacct [01:22:56] But it needs to be run on the master, which isn't accessible [01:23:13] I'll put a web interface in for that. [01:23:18] k [01:23:42] That nodejs is a *monster* [01:23:55] Apparently [01:24:04] That, or you're using incredible amounts of data with it. :-) [01:24:30] Coren: btw, is decimal supposed to work in -mem? Or should I use a smaller unit instead? [01:24:34] e.g. 1.5G or 1500M [01:24:55] Hm. I think only integers [01:25:03] OK [01:25:04] Yeah, it said only integers [01:25:14] but I wasn't sure whether that excludes any decimals [01:25:32] Hm. I notice that node has some memory limiting options you might want to look into [01:26:08] --max_executable_size and --use_idle_notification appear promising. [01:27:38] BBIAB [01:33:47] Coren: Researching it online I get mostly answers in the shape of "virtual memory doesn't matter" "that's normal" "a lot of programs allocates gigabytes of VIRT memory" [01:34:19] and some other comments along the lines of "limiting VIRT is silly, containers should limit RSS instead" [01:34:51] Coren: I just experimented with --max_executable_size and it does limit it, but the vmem is the same. It only changes mem [01:39:41] bbl [01:43:47] Containing RSS is boneheaded; it means you overcommit memory and cannot predict when some random process will beat you up for your lunch money. [01:44:19] "a lot of programs allocates gigabytes of VIRT memory". Kids these days. It's bad coding. [01:47:43] * Coren grumbles. [03:19:01] [bz] (NEW - created by: silke.meyer, priority: Unprioritized - normal) [Bug 45483] Make instance creation failures more verbose - https://bugzilla.wikimedia.org/show_bug.cgi?id=45483 [05:26:51] !log openstack updating mediawiki on nova-precise2 to master [05:26:52] Logged the message, Master [06:06:06] woah [06:06:10] I need to login more often [06:06:16] Coren: I love the ASCII art :P [09:46:17] Is Coren here? [10:46:32] I think he's sleeping [12:30:15] !help [12:30:15] !documentation for labs !wm-bot for bot [12:42:57] Good morning, labs! [12:43:08] Darkdadaah: I'm here now, if you still are. [12:43:37] Good morning Coren :) [12:43:57] Be gentle. I'm on my first coffee. :-) [12:44:19] Okay ^^ [12:45:43] Coren: On tools-login, I'm trying to run some commands like "rename", but perl complains about locale settings. [12:46:33] It's not behaving oddly for me. What is your current value for $LANG? [12:46:54] "echo $LANG" [12:47:41] en_US.UTF-8 [12:47:58] ... odd. Can you pastie some sample commands and outputs? [12:48:13] Just a sec. [12:50:11] Coren: http://pastebin.com/qKRwGUXM [12:51:10] Ah, some of your _other_ variables are set to fr_FR. That locale might not be installed by default on the labs image. Lemme check. [12:51:36] Ah, indeed. [12:59:00] * Coren generates a couple of plausible locales in advance. [13:01:03] fr_FR, fr_CA, de_DE, de_CH, en_UK, it_IL, ja_JP [13:01:47] it_IT even [13:01:54] Darkdadaah: Try it now? [13:02:39] Let me check. [13:03:14] Ok it works now :) [13:05:28] Thanks Coren :) [13:05:54] NP [13:33:35] Coren: On the tools grid, my job is in error state but the error message is not clear: 03/27/2013 13:25:43 [8004:4860]: execvp(/var/spool/gridengine/execd/tools-exec-02/job_scripts/169, " [13:34:45] ... less than clear indeed. Lemme try to see what's up. [13:36:34] Coren: I think I set the wrong shebang. [13:36:55] "failed: No such file or directory" [13:37:03] Yeah, almost certainly. What did you try? [13:37:47] I wrote /usr/bin/dash instead of /bin/dash [13:38:21] That was it. [13:39:11] I think you are the first person /ever/ I've met that used dash sans typo. :-) [13:48:58] Hm? Coren: I have the same issue with "LOAD DATA LOCAL" as before, when I'm using it with qsub: The used command is not allowed with this MariaDB version [13:49:57] Ah, I didn't expect you'd be trying to use that from the exec nodes. [13:50:11] Easily fixed, if a little ugly. [13:51:01] Is it bad to use it there? [13:52:44] It seems a little brittle to me; that's really more of an interactive tool for loading data manually. I'm not clear whether that'll be reliable over that much indirection. [13:52:55] But there's no reason you can't try it. [13:55:42] The database is big and read-only, so it makes sense to load it all in one go. I'm not aware of more efficient ways to do that. [14:03:54] I dono't worry about efficient, I'm just a little concerned about reliable; that data path is not as well tested. In theory, it should work. [14:05:18] You should have proper mariadb support for local data on the compute nodes now. [14:07:32] Coren: It works now, thanks again. [14:07:51] No worries. [14:29:07] [bz] (NEW - created by: Jan Luca, priority: Unprioritized - normal) [Bug 46460] Allow tools to create databases - https://bugzilla.wikimedia.org/show_bug.cgi?id=46460 [14:31:56] Coren: You are fast ;-) [14:32:30] it's easy to be fast when I was in the middle of email anyways. :-) [14:33:17] Coren: I hope you could use this procedures. I had some time this week so I thought I could help you :-) [14:33:50] Jan_Luca: It's appreciated, and will probably make it easy to put this in place much faster. [16:23:42] !ping [16:23:42] pong [16:45:55] Ryan_Lane: ugh, why isn't openstack using shapado? (and stefano's offline) [16:46:11] what's shapado? [16:46:22] ask.debian.net is an instance [16:46:41] it doesn't look like it's open source? [16:46:55] which? [16:46:59] are you crazy? [16:47:02] ah [16:47:02] https://github.com/ricodigo/shapado [16:47:11] you think Debian people would use non-free software? [16:47:13] :P [16:47:14] yes :) [16:47:17] of course [16:47:27] their main website doesn't mention it at all [16:47:35] it's on github, though [16:47:45] jeremyb_: this is ruby? [16:47:56] there's a great reason right there [16:48:04] they are using something python based [16:48:05] i've no idea [16:48:08] All Shapado.com content and data are available under the Creative Commons Attribution 3.0 license [16:48:12] Powered by Shapado 4.1.0 under the GNU Affero General Public License [16:48:16] that's what ask.debian.net's footer says [16:48:30] stackoverflow-like app written in ruby, mongomapper and mongodb [16:48:33] all of Ryan's loves [16:48:36] all into one app [16:48:44] I like stackoverflow :) [16:48:49] ewwwwwwww [16:48:51] hahaha [16:48:51] mongo? [16:49:12] what's mongomapper? [16:49:23] they are using askbot: https://askbot.com/ [16:49:26] ORM i guess [16:49:26] mongo orm [16:49:30] it's open source and python based [16:49:35] a mongo orm? [16:49:36] hahaha [16:50:05] Ryan_Lane: well i looked for askbot source for at least 30 secs and came up dry with them too. good to hear it's open source [16:50:26] ahh, pip [16:50:34] https://github.com/ASKBOT/askbot-devel [16:51:35] would be nice to get something about the server side code license in the footer like shapado does [16:52:05] I don't see this in the footer... [16:52:15] this is all I see: © Ricodigo 2012 [16:53:16] ah [16:53:41] I had to go past the marketing screen to the actual Q&A part [16:54:01] how do you find that? [16:54:14] jeremyb_: I meant for shapado [16:54:23] how do you find that? :) [16:54:30] shapado.com [16:54:39] so, I wonder how long it'll be before openstack starts getting trolled by the stackoverflow people [16:54:54] i guess on pricing it has the footer [16:55:04] "why are you forking the question and answers community?" [16:55:18] "It's stupid to create your own Q&A" [16:55:50] "Stackexchange has gone through this before and it's not nearly as easy as you think it is to maintain Q&A software" [16:56:34] I like stackoverflow, but their developers/owners are kind of douchebags [16:57:14] (this was the exact experience when webplatform used askbot) [16:58:58] actually, also sounds a bit like wikitravel vs. wikivoyage, "why are you forking the travel community", "it's not as easy as you think it is to maintain a travel wiki: :p [17:00:05] heh [17:00:57] hah [17:01:58] Any fork is, by definition, a wound. It may well end up having been a /necessary/ wound (to wit, egcs) but a wound nonetheless. [17:04:51] Coren: stackoverflow is not open source [17:04:55] it's like github [17:05:18] I'm talking about the concept of a community fork. [17:05:28] ahhh, ok [17:07:11] > user contributions licensed under cc-wiki with attribution required [17:07:19] there are links to the right places but... [17:07:26] what does "cc-wiki" mean? [17:07:33] (this is from stackoverflow itself) [17:12:20] * jeremyb_ is now discussing adding footer links with reed (in #openstack) [17:55:21] Ryan_Lane: should dc people ask you or Andrew before taking down labstore1? [17:55:33] they shouldn't take it down [17:55:36] the disks are hot-swap [17:55:49] ah, cool [20:02:36] Ryan_Lane: thanks for the ldap samples! I am running into trouble with the service user, though… opendj says 'does not include a structural objectclass.' http://dpaste.org/Cz9k6/ [20:02:45] What am I missing? [20:10:07] hm [20:23:05] andrewbogott: maybe person is necessary [20:23:28] ok… then I'll need an sn. sn==uid? [20:23:55] yeah [20:24:49] Ryan_Lane: Doesn't have to be Person, but it needs to be structural. [20:24:50] and cn, apparently... [20:25:08] hm. you may actually need inetorgperson [20:25:12] which is really annoying [20:25:31] person + sn and cn seems to work [20:26:16] ah, yeah. person works [20:26:26] may as well just use the same schema as normal users [20:59:31] andrewbogott: I'm doing a quick test on nova-precise2. it may break temporarily [20:59:36] ok [21:48:56] Ryan_Lane, did you and Coren reach a conclusion about what service group gids should be? [21:51:23] yeah [21:51:39] primary gid shall be the same as the group's id [21:53:23] I only yield to you on that so that you'll look elsewhere when I write perl. :-) [21:53:24] when you say 'the group' you mean the service group? [21:53:39] In which case, are you saying that the service group's id should be the service group's id? [21:53:55] Oh, oh, damn. Just tought of an important detail: the service groups should also be in the /project/ group! [21:54:30] andrewbogott: the service group's gid should be the next available in the range [21:54:33] same with the user's id [21:54:44] ideally the id of the group and the id of the user will be the same [21:54:59] the primary gid of the user definitely should be the same as the id of the group, though [21:55:42] You recently said "We need to specify a range for this. We want to make sure the uid range doesn't overlap with regular users." What range would you like me to use by default? [21:59:23] something currently unused :) [21:59:24] let's see [22:00:19] We use 500 as the base for users currently [22:00:33] well, we use that as the primary id [22:00:48] we use the 50000 range for project groups [22:00:59] seems the uid range for users is currently < 3000 [22:01:52] how about 40000-49999 for service users and groups [22:02:07] 1000-19999 for global users/groups [22:02:09] err [22:02:25] 1000-19999 for global uids [22:02:36] 50000-59999 for project gids [22:02:57] we'll need a config option for this, and the function that searches for uids and gids will need to use it [22:03:52] yep [22:07:00] something like this, maybe? http://pastebin.com/jMcT9FYD [22:08:45] hm [22:08:57] maybe ranges of some variety may be better than min/max [22:09:31] and integers rather than strings [22:11:25] ok. I put nova-precise2 back to normal [22:20:23] 'ranges of some variety' you mean a set of (min,max) pairs rather than a single pair? Or you mean the php range() function? [23:19:39] andrewbogott: a set [23:20:18] wow, that'll be ugly :) [23:20:21] basically, we'd need the function to start a search at a number, and if the next available number is higher than the max, it'll skip to the next range [23:20:48] I think it's likely possible to ignore this for now [23:20:51] sure, makes sense. [23:20:59] it's going to be a really long time till we hit 10,000 users [23:21:02] Yeah, right now I have a simple implementation and a TODO comment [23:35:39] Ryan_Lane: Is the project membership memcache wfMemcKey( 'openstackmanager', "project-$project", $this->userDN ); invalidated someplace? [23:35:55] I'm looking for invalidation examples and not finding many, which leads me to think I'm looking for the wrong thing [23:36:15] it should be when the user is removed from the group [23:36:31] if it isn't invalidated anywhere, it's a bug [23:36:36] Or added I would think [23:36:45] indeed [23:36:58] do a search for the same key [23:37:21] that seems like a weird key to me.... [23:37:41] I can't find any use of memcache in OpenStackNovaProject. Which has me thinking those keys live forever. [23:37:47] Of course, maybe they aren't actually used anyplace [23:39:15] Invalidating is just $wgMemc->delete( $key ); ? [23:40:28] yep [23:40:46] I'm definitely invalidating cache in the extension :) [23:49:08] !log wikistats - add new 'he' and 'uk' wikivoyages [23:49:10] Logged the message, Master [23:49:17] Reedy: thanks:)