[00:00:01] * Damianz pats addsleep and un-kicks him [00:00:05] :) [00:00:11] night all :) [00:00:53] sorry for shocking you all ;p [00:03:33] Ruddy hell beetstra's bots eat a lot of bandwidth [00:03:42] how can you tell? :O [00:04:25] http://ganglia.wmflabs.org/latest/graph_all_periods.php?title=Bots+network+in&vl=&x=&n=&hreg%5B%5D=bots-.%2B&mreg%5B%5D=bytes_in>ype=stack&glegend=show&aggregate=1 < bots-liwa host the linkwatcher etc bots [00:04:54] and do all of bots instances run on virt9? [00:05:17] I think they're spread out a bit - most that traffic is to gluster [00:05:26] yep :/ [00:05:34] silly gluster [00:06:02] weird it does like 20mb of reading from disk... rather than writing and reading though [00:06:42] RX bytes:4185627807280 (4.1 TB) TX bytes:136730882171 (136.7 GB) < nice, considering it's only been up 5 days [00:07:25] haha Damianz yep! [00:08:17] anyway :P really sleeping now :p Night xD [00:09:20] * Damianz tucks addsleep in [00:09:38] do you have bandages for my bruises? ;p [00:09:55] bandages wouldn't help for bruises [00:10:12] i know :O But then I wouldnt have to look at them :P [00:10:45] * Damianz eyes the duct tape and grins [00:15:34] Damianz: Is there a way I can make sure a package is installed across all bots instances so when I use SGE it works? [00:16:14] not really right now - working in getting puppet classes merged in which we'll use for all nr instances [00:17:06] Is it just nr*, bnr* that the job will run on? [00:17:55] Oh just looks like bnr* [00:18:04] Just bnr I think [00:19:32] Could I get you to install python-requests, python-virtualenv, python-twisted on bnr2 then? [00:20:26] !log bots damian: apt-get install python-requests python-virtualenv python-twisted [00:20:28] Logged the message, Master [00:20:47] * Damianz notes to fix that script so it includes the instance name [00:32:20] I wonder, is it possible to specify a different executable depending on which server it runs on? [00:32:47] Actually, that might not be needed. [00:34:12] Ok, so is the syntax for SGE the same as the toolserver? [00:53:19] Hello, [00:53:40] I noticed the new public_html folders [00:55:22] What is the webpage that the things in public_html would be viewed at? [00:55:52] bots.wmflabs.org/~user [00:58:16] yays [01:21:55] Damianz: Could you install python-requests on bnr1 too? Apparently it doesnt have it [01:22:45] !log bots damian: apt-get install python-requests [01:22:56] ty :) [01:23:06] but we lost labs-morebots in the netsplit... [01:25:58] aw shit, its an old version of requests. [01:26:26] ugh.... [01:27:27] Damianz: sorry about all of this. Would it be possible to uninstall python-requests on bnr1,2 and run $ sudo python setup.py install in /data/project/legoktm/requests/ ? [01:27:54] the API changed with v1 onwards, and it seems ubuntu has v.0.8.2 [01:28:53] !log bots damian: apt-get remove python-requests -y [01:28:56] Logged the message, Master [01:29:07] !log bots damian: pip install requests [01:29:08] Logged the message, Master [01:29:17] ah that works too [01:37:36] it works! [01:37:40] * legoktm hugs Damianz  [01:40:45] * Damianz notes legoktm has really long arms [01:50:41] :> [01:50:52] i queued like 10 jobs [01:50:57] all waiting :P [01:51:16] * Damianz tickles Ryan_Lane [02:02:20] ? [02:02:41] petan: you know coren has been setting up open grid in the tools project, right? [02:02:42] :) [02:03:29] tickles are awesome, no questioning the tickles [02:03:42] heh [02:04:03] petan, Damianz: you guys should coordinate efforts with Coren, since you're all working on similar things [02:05:14] I've never figured out why we have tools and bots, since bots have web stuff and web stuff has bots... we should just have some development project for bots/tools and some production project for running bots/tools *shrug* [02:24:43] do i only get 2 running jobs at a time? or is there another limit somewhere? [02:25:00] im not sure why only 2 are running right now... [02:26:41] Dunno how it's setup, probably 1 per host or something funny [02:27:44] 22 0.75000 am_rmv legoktm r 03/11/2013 01:36:24 main.q@bots-bnr2.pmtpa.wmflabs 1 [02:27:45] 24 0.29457 an_rmv legoktm r 03/11/2013 02:19:54 long@bots-bnr2.pmtpa.wmflabs 1 [02:27:57] and i have 25-36 all "qw" [02:37:29] http://bots.wmflabs.org/robots.txt [02:37:35] why? [02:38:25] because we never decided if allowing bots was a good idea [02:39:41] ok [02:39:52] i'll just override it [03:08:49] it's funny than we go back to the 'github vs gerrit' argument every month [03:12:49] Damianz: why not "hg vs git", just to discuss something different XD [03:13:48] There's not that much difference betwean hg and git really... [03:13:56] or "blue vs red" [03:15:36] but i must be honest, gerrit had some good layout improvments, but i still miss some code explorer feature [03:16:07] I never use to like gerrit but its workflow is pretty awesome when you get use to it [03:16:29] Damianz: coren is setting up a "tools" project that's going to be bots/tools combined [03:16:40] So I am. [03:17:06] Alchimista: use gitblit? [03:17:21] oh. did they turn that off temporarily? [03:17:25] Did tools never become webtools then? [03:17:31] Have you ever tried compiling gitblit? rofl [03:17:42] * Damianz notes to build the latest gerrit with gitblit tomorrow for his install [03:17:43] Damianz: it's a plugin for gerrit, now [03:17:56] Yeah - that plugin won't complile against stable [03:17:58] D: [03:18:01] heh [03:18:13] Damianz: Not sure I understand your question. [03:18:16] Ryan_Lane: i'm not a professional developer, it's just for fun, and in my contact to gerrit, not so permanent one, some things aren't quite easy [03:18:43] or at least percectible [03:20:17] Coren: 'tools' was going to be 'web tools' or something, that plant was doing though I think it never happended... so do we have bots and tools (which will end up being tools) or bots, webtools and tools (which will end up being tools) [03:20:20] well, gitblit will be the replacement for gitweb [03:20:31] Alchimista: and it'll be a code explorer [03:20:37] gitweb kind of is, too, but it sucks [03:21:07] yah, i've been watching the discussion, gitlib seemed a good replacement [03:21:21] Damianz: All three exist atm; tools is the ultimate destination, bots and webtools are left alone to avoid disrupting work. But eventually, there's going to be just the one (tools) once everyone moved over. [03:22:10] gitblit is awesome, just a pain to install [03:23:39] I'd like to see more integration between gitblit and gerrit [03:24:46] Coren: q about tools, which server should i submit jobs to, and do i need to ask you to install packages? or can i do it myself? [03:25:59] legoktm: atm, most of the management is manual, but I'm pretty fast about installing packages. But you don't submit jobs /to/ a server, that's what the queue system is there for. It picks. :-) You can submit jobs from -login or -webserver-* (in case you have something started by a web interface, say) [03:26:03] isnt a gitlib demo somewhere? [03:27:07] well, it was on our gerrit server [03:27:15] * Damianz wonders if he should work more on tools than bots or neither and work on the interesting stuff (underlying/supporting infrastructure) [03:27:17] Coren: Ok can I get python-twisted, python-virtualenv, python-pip, python-dev installed, then run $ pip install cython oursql requests [03:27:22] :) [03:28:26] legoktm: is oursql much more fast than sqlalch? [03:29:12] Alchimista: i've never used sqlalch, i just always use oursql for any mysql/python things. [03:31:11] so you're in my situation, i've never used oursql, i usualy use sqlAlchemy, but saw somewere that our was faster :S [03:31:45] :P [03:32:07] oursql handles unicode much better than MySQLdb, so thats why I originally switched [03:34:59] i've started using alchemy because it can manage mysql,sqlite mongo... , and specially in webtools, *prevents* sqlinjection [03:35:26] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Notepad was modified, changed by MPelletier (WMF) link https://www.mediawiki.org/w/index.php?diff=657724 edit summary: [+149] +more [03:36:36] legoktm: pip gets build fails during cython build (some missing deps). I'll look into the dependencies tomorrow and add them (and/or make a puppet Package provider for pip) [03:36:47] The others are already there. [03:36:55] Hmmmm [03:36:58] Did oursql install properly? [03:37:04] If so, then I don't need cython. [03:37:27] I thought cython was a dependency for ousql, but not always apparently. [03:37:47] legoktm: No, since it needs cython. lemme use the ubuntu-provided cython instead. [03:37:59] Ok that should be fine [03:44:19] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Notepad was modified, changed by MPelletier (WMF) link https://www.mediawiki.org/w/index.php?diff=657726 edit summary: [+41] fixy (use ubuntu's cython) [03:44:27] legoktm: u can haz oursql [03:44:37] I can haz sleepz? [03:44:39] :D [03:44:47] goodnight Coren [03:44:50] I'll try and not blow it up [03:45:04] er wait [03:45:08] No, actually, try to blow it up. We know it worked if you fail. :-) [03:45:10] Yes? [03:45:13] Did you install python-requests from apt? [03:45:15] Or pip? [03:45:18] apt [03:45:19] Because the one in apt is outdated [03:45:26] Its v0.8.2 [03:45:28] pip has v1 [03:45:51] Aaaah. [03:45:54] sleep? what is this nonsense [03:46:20] Hm. Then I *really* need to write/find a puppet package provider for pip. [03:46:40] So it'll have to wait until tomorrow if -request won't do. :-) [03:46:50] I mean 0.8.2 [03:47:00] See? See? Need sleep! [03:47:02] * Coren waves [03:47:08] v1 wasn backwards compatible so ill just wait [03:47:13] wasnt* [03:47:14] pip's in core iirc [03:47:36] Coren|Sleep: For tomorrow: http://projects.puppetlabs.com/issues/6527 [04:40:22] @notify petan [04:40:22] I doubt that anyone could have such a nick 'petan ' [04:40:28] @notify petan [04:40:28] This user is now online in #huggle so I will let you know when they show some activity (talk etc) [04:55:03] [bz] (RESOLVED - created by: Antoine "hashar" Musso, priority: High - normal) [Bug 45908] review lsearch-global.conf for beta context - https://bugzilla.wikimedia.org/show_bug.cgi?id=45908 [05:15:38] [bz] (NEW - created by: Antoine "hashar" Musso, priority: Normal - normal) [Bug 45868] let search indexer access the squid public ip - https://bugzilla.wikimedia.org/show_bug.cgi?id=45868 [05:39:50] [bz] (NEW - created by: Antoine "hashar" Musso, priority: Normal - major) [Bug 41132] live hack in beta mediawiki-config (tracking) - https://bugzilla.wikimedia.org/show_bug.cgi?id=41132 [05:40:02] [bz] (NEW - created by: Antoine "hashar" Musso, priority: Normal - normal) [Bug 38995] [OPS] udp2log prevents udp2log-mw from starting - https://bugzilla.wikimedia.org/show_bug.cgi?id=38995 [05:41:03] [bz] (NEW - created by: T. Gries, priority: Unprioritized - normal) [Bug 45214] Suggestion: when installing instances, starting puppet runs etc.: ping the developer by mail about the status - https://bugzilla.wikimedia.org/show_bug.cgi?id=45214 [05:44:36] !log deployment-prep Running MediaWiki update.php on all databases [05:44:38] Logged the message, Master [07:04:49] Ryan_Lane: hi [07:05:43] I had a kind of Berlin Hackathon, and fixed many showstoppers in Extension:OpenID, have you seen ? [08:23:31] addsleep MAN [08:23:42] next time write an email before you reboot production boxes [08:23:47] ^^^ [08:23:50] hi petan [08:23:53] hi [08:24:05] i had a question about SGE [08:24:06] wtf [08:24:09] hi [08:24:19] bsql01 rebooted and yet it needs one more reboot [08:24:23] o.o [08:24:24] addsleep I wanted to install new kernel before [08:24:26] well [08:24:28] I already got it [08:24:33] just was scheduled for installation [08:24:40] for some reason only 2 jobs of mine run at a time [08:24:50] ohwut [08:24:53] legoktm but I don't see any job in queue of you [08:24:54] now 4 are running [08:25:09] what you see when you type qstat [08:25:12] http://dpaste.de/KU42g/raw/ [08:25:34] any chance we can get more resources added? [08:26:00] ganglia shows lots of unused cpu/memory [08:26:05] on both bnr1 and 2 [08:26:50] wow [08:26:53] let me check... [08:27:35] yes I see now [08:27:47] well, I just installed exec on 1 [08:27:54] maybe it's related to this, but it's weird [08:28:20] legoktm these jobs have different name all [08:28:26] why do you think they are all the same process? [08:28:39] err, they aren't? [08:28:55] an_rmw != bat_smg_rm [08:28:59] yeah [08:29:01] they're all different [08:29:08] but right now only 4 are running [08:29:09] so what's wrong [08:29:17] ah [08:29:25] yes that's true I need to find out why [08:29:32] I submitted some jobs as well and they didn't started [08:29:35] the other 4 are still queued, and there are free resources... [08:29:38] maybe there is some limit [08:29:46] earlier it was stuck at 2 [08:30:48] also not sure why some are running in the "main.q" and 2 others are in "long" [08:31:24] lol [08:31:31] there is limit in each queue [08:31:33] let me remove it [08:31:53] thanks [08:31:57] * legoktm goes to queue more jobs [08:32:30] just type qstat -j [08:32:35] you will see what is reason why it' [08:32:39] s [08:32:40] stuck [08:32:43] :o [08:32:49] ah [08:32:51] didnt know about that [08:34:33] Krenair: are you listening ? [08:34:41] lol [08:39:21] legoktm it runs now [08:39:34] woo :D [08:39:37] all my jobs are running [08:40:59] Coren|Sleep hi [08:41:05] Coren|Sleep is there any nice documentation for SGE? [08:52:01] addsleep never ever reboot sql server without consulting with others [08:54:13] Damianz did you restart your db import? [08:54:20] petan: It's done [08:54:21] or u gave up [08:54:23] ah fine [08:54:41] so cb is now running or still some imports? [08:54:50] legoktm what about your sql import [08:54:54] is it working? [08:56:20] well it got killed after the servers got rebooted [08:56:21] ill restart it in a bit [08:57:35] okay... [09:23:12] !log bots petrb: addshore to motd so that people know who to blame :) [09:23:14] Logged the message, Master [10:01:05] petan: running [10:01:21] yay [10:01:33] Damianz btw you have some experiences with sge? [10:01:40] I am trying to find a manual for queue parameters [10:02:00] Not really [10:04:29] petan: did you try https://wiki.toolserver.org/view/Job_scheduling ? [10:04:39] Darkdadaah I like sge more [10:05:04] or is that sge as well? [10:05:07] petan: that is SGE [10:05:10] oh lol [10:05:11] Well, yes. [10:05:29] but that's for end users [10:05:37] I need some manual for sge admins who creates queues [10:05:53] there is ton of parameters for each queue but no manual descibing them [10:06:10] Oh, then I can't help much :P [10:06:20] right now I am guessing instead of knowing what I do :P [10:06:54] Maybe ask DaB on the toolserver channel [10:07:30] Hm he's not there yet... [10:07:59] Merlissimo knows quite a bit about queues [10:08:08] what command are you using? [10:08:12] qconf [10:08:19] qconf -mq [10:08:36] it open vi editor with text file containing many lines - each line is one parameter [10:08:42] with no description of what they mean [10:08:48] I guess from name [10:09:13] for example I would like to find out how to allow job in long queue to run infinitely long [10:09:27] I belive it kills the job after some time [10:09:52] did you see qconf(1) and queue_conf(5) ? [10:09:54] yes [10:10:06] it contains parameters of qconf but not of that text file which gets opened [10:10:45] yay [10:10:47] man queue_conf :D [10:10:49] here we go [10:11:56] :) [10:22:39] legoktm I hope your job return 0 [10:22:41] as exit code [10:22:48] otherwise it will be restarted in long queue [10:22:48] they should [10:22:50] ok [10:23:05] long queue should be probably used for permanent jobs only [10:23:11] but we could change that behaviour [10:23:17] mine are right now set with a runtime of 7 days [10:23:19] we can create more queues for multiple purposes [10:23:25] ok [10:23:47] most will finish after 2-3 hours, some will run for a few days [10:23:52] i just gave it 7 days to be safe [10:25:32] ok [10:25:52] error: commlib error: access denied (client IP resolved to host name "bots-gs". This is not identical to clients host name "bots-gs.pmtpa.wmflabs") [10:25:52] Unable to run job: unable to send message to qmaster using port 6444 on host "bots-gs.pmtpa.wmflabs": got send error. [10:25:52] Exiting. [10:26:20] petan: ^ [10:27:00] legoktm@bots-gs:/data/project/legoktm/wd_removals$ qstat [10:27:00] error: commlib error: access denied (client IP resolved to host name "bots-gs". This is not identical to clients host name "bots-gs.pmtpa.wmflabs") [10:27:00] error: unable to send message to qmaster using port 6444 on host "bots-gs.pmtpa.wmflabs": got send error [10:29:15] yes, I see [10:29:17] fixed [10:29:29] I cleaned some mess in config and forgot to update hosts [10:29:44] works now, thanks [10:54:13] $ git pull --rebase [10:54:14] First, rewinding head to replay your work on top of it... [10:54:15] Applying: shit [10:54:16] XD [10:54:30] hehe [11:26:31] legoktm: can I point you to http://www.wikidata.org/wiki/User_talk:Addshore#(Can you help me?) (I am guessing you probably know the answer [11:26:38] looking [11:27:56] done [11:28:09] cheers :) [11:34:50] !logs [11:34:50] logs http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs [11:35:52] something somewhere has changed in the system that brings down my bot onbots- liwa ... while I was not there .. [11:36:01] communication with bots-sql2? [11:36:58] errr [11:37:08] that no longer exists i think. [11:37:12] everything got migrated into bots-bsql01 [11:37:25] kay! [11:37:27] fun .. [11:37:40] everything got migrated .. including my dbś ?? [11:37:50] and why do my bots not complain about not being able to connect to .. oh .. [11:37:53] i think so…? [11:38:12] I am on bots_sql2 .. the instance exists .. but is defunct .. [11:38:15] or something [11:38:22] * Beetstra investigates bots-bsql01 [11:39:15] * Beetstra curses [11:40:49] the 'i think so...?' .. maybe not - I don't seem to have an account on bsql01 [11:40:49] ? [11:40:54] oh [11:40:56] addsleep: ^ [11:40:59] give Beetstra an account [11:41:08] I mean .. in mysql .. [11:41:14] yeah [11:41:20] addsleep had to manually create mine too [11:41:31] oh .. aargh [11:41:52] and then I will probably also have to manually move the db from sql2 .. which is going to take ages [11:42:10] i thought petan had imported them [11:42:37] sorry i literally have to go now, how big is the db, shouldnt take that long, i will be back in a few hours, there should be a cron which will run and create your account, ttyl [11:42:59] OK [11:44:02] By then, I wilill then probably see you tomorrow [11:47:38] In case you see it .. I need the databases 'coibot', 'linkwatcher', and 'monitorbot' (including the accounts that can access them [11:50:40] * legoktm checks [11:51:44] and the linkwatcher db is huge, really huge [12:03:24] Beetstra nooo [12:03:32] Beetstra I don't think anyone migrated your bot [12:03:36] bots-sql2 exist for sure [12:03:49] Yeah, but where is then my problem .. :-( [12:04:06] what the problem is? [12:04:16] Suddenly the boxes can't keep up anymore [12:04:19] it worked fine just yesterday [12:04:23] nah [12:04:24] no [12:04:36] ok, Beetstra first of all, where your bots run? [12:04:36] instance name [12:04:40] bots-liwa and bots-nr1 [12:04:43] ok [12:04:58] can you move it from bots-nr1 either to gse or some bots-bnrX instances? [12:05:05] I recommend gse [12:05:18] it would be also cool to move your db's to bots-bsql01 [12:05:24] because it's faster than -sql2 [12:05:30] however you can keep it on -sql2 for now [12:05:33] ... or tools- [12:05:33] hi Coren [12:05:38] Coren read mails pls [12:05:40] :-) [12:05:40] the bot on bots-liwa constantly saves backlog files, as if it is too busy with something else [12:05:46] Hi Coren [12:05:47] Coren btw you have sql server on tools? [12:05:59] Ayup [12:06:12] Coren ok then proceed to reading mails :P [12:06:32] Beetstra okay, so the problem now is that you can't connect to sql2? [12:06:42] no, it seems to be connected [12:06:53] ok, so what is problem :P [12:06:56] Actually, I don't think that that is the problem .. except maybe that it stores to slow .. [12:07:10] what I say .. the bot was running fine until two days ago .. now it can't keep up at all [12:07:12] aha, the performance might be problem, that's why I suggest moving to bsql01 [12:07:19] uh [12:07:31] the sql server is not loaded at all [12:07:34] either the performance of sql2, or of the boxes itself [12:07:39] Yeah, but the instances seem to be [12:07:42] let me check the instance [12:07:46] But that was never a problem [12:08:05] yes liwa is kind of loaded [12:08:12] glusterfs is fucked [12:08:17] I think you should reboot it [12:08:21] that will fix it for sure [12:08:30] hmm .. could try .. [12:08:41] * Beetstra should figure out how to reboot it .. [12:08:52] first: shut down all your processes [12:08:55] then sudo reboot [12:09:14] but, I recommend you to move your bot to sge anyway :P unless you require a dedicated instance only for you [12:09:20] petan: Ima gonna reply on -labs, but you're missing the point. :-) (1) Most of tools- will be self-serve (i.e.: doesn't need sysadmin intervention), and (2) the objective is to also have volunteer roots eventually. [12:09:47] Coren well, so far you have none. There is always need for interventions [12:09:52] We earlier made the choice to have liwa on an own instance .. [12:10:05] I am not allowed to sudo on bots-liwa [12:10:05] machines are crashing, software needs to be updated or reinstalled [12:10:05] gluster is breaking [12:10:11] Coren ^ and tons of more issues [12:10:19] I was fixing stuff whole weekend [12:10:32] petan: In all fairness, gluster hasn't broken since the upgrade. [12:10:47] Coren well, it did broke like 2 times just yesterday [12:10:55] unless the upgrade was few minutes ago, I doubt [12:10:59] petan: On bots-? [12:11:01] yes [12:11:06] brain splits [12:11:19] Beetstra really? [12:11:24] Beetstra ok I will reboot it then [12:11:30] Beetstra let me know when I can do that [12:11:30] yep [12:11:37] And I thought I had root on that box ... [12:11:40] oh yes I know why you can't now [12:11:47] you can now .. [12:11:54] petan: Ah. No matter anyways: Gluster is gonna go away. [12:11:55] it's because of new sql server, I can give you root back but I will remove /mnt/secure [12:12:24] Coren whatever is going to replace it - you will still need some sysadmins who are available 24*7 every day for assistance to users [12:12:41] wmf-based project will always fail to meet that criteria [12:13:04] take a look at wikipedia - that is managed by volunteers, you will always find some wikipedia admin online any day and hour [12:13:50] !log bots petrb: rebooting -liwa per request [12:13:52] Logged the message, Master [12:14:02] Wikipedia admins are not real "admins" I believe. [12:14:05] petan: Yes, and note how many sysadmins with root on the cluster they... oh, wait. :-) What makes you think there aren't going to be 24/7 support on the tools- project? :-) [12:14:21] Coren because you aren't going to be awake 24/7 [12:14:38] petan: That's what nagios is for. It wakes sysadmins. :-) [12:14:44] s/nagios/icinga/ [12:14:49] Coren not really [12:15:00] nagios won't wake you up when glusterfs break or when some user need a help [12:15:35] nagios wake you when some instance is down or such extra critical event happen [12:15:45] but there are many nagios isn't watching [12:15:52] petan: Let's be clear: gluster is crap that's on its way out. If the netapp that'll take over doesn't get icinga to yell if it breaks (unlikely), then it's a bug that needs to be fixed. [12:16:23] petan: If nagios* isn't watching something important, then it hasn't been configured right and needs fixing. [12:16:25] !log bots petrb: giving root to beetstra on -liwa [12:16:27] Logged the message, Master [12:17:10] Beetstra you should have sudo [12:17:32] eh .. no [12:17:37] still not allowed to run sudo [12:17:39] Beetstra that is weird [12:17:52] Beetstra that is bug in nova, let me fill a bug [12:18:25] Coren: ok - that's still monitoring of some critical services over nagios - but what about generic problems like HELP my bot doesn't run I don't know why etc [12:18:29] petan: Look, the objective is to make tools- a "frist class citizen" of the WMF infrastructure. I'm not going to be satisfied with anything less than rock solid reliability and 24/7 coverage. [12:18:37] Coren: for that you should have admins available as well [12:19:02] petan, I am trying to run the bot again .. lets see if it now can keep up [12:19:03] petan: Well, to date, I've never been available less than 12h/day; but there will be more than me once we're in full swing. [12:19:54] Coren ok, anyway, until that is going to be community maintained project, I will be doubtfull about it just as I am over toolserver - my experience with getting support there was horrible [12:20:14] for example yesterday you weren't here :P [12:20:18] petan: It now has a backlog of 2.8 million lines [12:20:33] keep in mind that labs are being used by volunteers mostly over weekends [12:20:47] because that's when we have time [12:21:02] petan: That's a good datapoint, and I a good argument for me to shuffle my primary schedule around. [12:21:22] it shouldn't be about shuffling schedules, but about making the project more open... [12:21:28] petan: But let's be honest here; you'll never reach the same amount of reliability with the number of roots you have on bots- [12:21:42] Coren number of roots is currently like 4? [12:22:00] except for labs global admins [12:22:16] petan: Yes, but that's the point. You have no tool isolation, and no unified design. [12:22:50] why do you think that? [12:23:03] what kind of isolation you mean [12:23:08] petan: (Unless you go one-instance-per-tool which is unscalable, unmaintainable, and a huge waste). [12:23:24] of course we don't do that [12:23:28] with exceptions [12:23:32] Right. [12:23:33] some tools require own box [12:23:44] but that's only for special bots... [12:23:54] Although I would be interested even then to understand why you think they require them. [12:24:06] look I don't say your project is designed bad - actually I think it's far better than current bots project [12:24:21] my problem with that is that it's not community maintained just as bots [12:24:34] working on your project will be far harder to end users IMHO [12:24:39] I know that's your problem, and I understand what you're saying. [12:24:45] because of all these restrictions etc [12:25:40] Coren they require own box for example because their maintainer wants to be root? for whatever reason - customizing the box for their need, installing libraries they want etc [12:25:50] What /I/ am saying is that a project with multiple volunteer roots cannot be as reliable or maintainable in the long term. Tool writers need their tool to run. [12:26:21] Coren I think it can be just as reliable when it is maintained by community volunteers just as it could be if it was maintained by paid employees only [12:26:22] Hm. I said that wrong. [12:27:00] Tool writers need to be able to /know/ that their tools will keep running. [12:27:14] Coren .. having all bots on one instance may also give problems if one bot goes rogue (like what happened when COIBot and a bot from lego were sharing an instance - lego's bot managed to munch all resources from the other bots until the other bots crashed) .. [12:27:29] :/ [12:27:36] sorry, lego [12:27:46] Beetstra: That's because the environment wasn't done right, not because of legoktm! :-) [12:27:56] One of my bots did the same in the far past [12:28:09] Beetstra: If your setup allows one broken bot to break /other/ bots, that's a bug in the infrastructure not the bot. [12:28:17] You can indeed solve that [12:28:37] [bz] (NEW - created by: Peter Bena, priority: Unprioritized - normal) [Bug 45985] https://wikitech.wikimedia.org/wiki/Special:NovaSudoer doesn't work in case of beetstra - https://bugzilla.wikimedia.org/show_bug.cgi?id=45985 [12:28:59] Beetstra: That's what I mean by "tool writers need to know..." [12:28:59] thanks petan [12:29:19] If your bot is running fine, it shouldn't break *ever* absent a hardware failure. [12:29:28] Beetstra I don't believe your bot would break now if you were using sge.. [12:30:08] we could try .. it still does not seem to be happy on bots-liwa for some reason .. [12:30:09] Coren why should they not be able to know that their tools will keep running on community maintained project? [12:30:20] Coren what makes paid employees > volunteers? [12:31:19] Coren: On the tools project, is my mysql account setup and everything? [12:31:44] legoktm: It should have been automagically; the credentials are in ~/.my.cnf [12:31:59] legoktm: You tool's home, not yours. [12:32:14] ah ok [12:32:17] legoktm: Just 'mysql' should drop you in there. [12:32:40] right now I'm logged into tools-master…is that the right place? [12:32:44] petan: Coandidly [12:32:49] ? [12:32:57] legoktm: No, you should use -login for 99.999% of things. [12:33:04] ah ok [12:33:12] legoktm: -master has almost nothing on it, just the grid master. :-) [12:33:23] ok [12:33:39] petan: You want the honest answer, candidly? [12:33:59] of course [12:34:52] legoktm@tools-login:/data/project/legobot$ mysql [12:34:53] ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) [12:34:59] do i need to specify a different host? [12:35:38] petan: There are things which are better done when it's someone job and reponsibility to do them, and they have obligations. Infrastructure is one of them -- not because of skill, but because of accountability. [12:35:50] legoktm: Errr, no. Did you sudo to your tool? [12:35:58] Erm... [12:36:04] How do I do that? [12:36:25] Are we really supposed to sudo to a tool? [12:36:46] legoktm: Ah. The easiest way: sudo -iu local-legobot [12:37:05] Coren you may think that, but given that there used to be volunteers with shell access even on wikimedia production and it never caused any troubles, I doubt about it [12:37:14] Ah -u [12:37:28] Darkdadaah: Yeah, your tool should run as its user so that it has its own set of credentials. [12:37:41] It does make sense. [12:37:58] Darkdadaah: It's a little moot when you're the only maintainer (except for cleanliness) but it makes things 500% easier once you have more than one maintainer. :-) [12:38:35] addsleep ping [12:38:43] petan: Which is why the objective is to eventually have volunteer roots on tools- as well. [12:38:51] pong cant really talk for an hour [12:38:58] addsleep mhm ok [12:39:03] addsleep pong me when you can [12:39:11] So: sudo mysql -iu local-anagrimes ? (<- this doesn't work) [12:39:34] Damianz: Oh, don't mysql. Just become local-anagrimes [12:39:42] Damianz: Easier. [12:39:45] Oh. Ah. [12:39:50] Coren: TAB TAB TAB [12:39:52] Coren that "eventually" will make me skeptic (or "septic" if you wish) until I see that happen [12:40:10] (héhé) [12:40:39] It works. [12:40:49] petan: Well, if you want to distrust me, there's nothing I can do. You can either take me at my word or not, I can't jedi mind trick you into believing me. [12:41:52] jedi mind weld* [12:42:03] legoktm: Nope. I'z not Obama. :-) [12:42:11] :P [12:43:28] Coren: Clearly you need to work on your jedi mind control [12:43:55] * Beetstra looks at lego [12:44:07] how many edits to wikidata do you do a minute? [12:44:13] Ummmm :P [12:44:30] * Beetstra suddenly notices something ... [12:44:35] I have a bunch of different scripts running which all edit a wiki + wikidata [12:44:48] And some are running on the toolserver, and some are on labs. [12:44:51] Damianz: Meh, I'm more into the "look at my results" category of mind control. :-) [12:44:54] -> "... Total: 48326 edits (2024 PM) ... " .... [12:45:00] 2024 edits per minute to parse .. [12:45:02] WHAT [12:45:08] that was always about 500 [12:45:14] no wonder LiWa can't keep up .. [12:45:19] * Beetstra cries a little [12:45:21] * Damianz notes to properly inspect Coren's package and play with it as required to determine it's usefulness [12:45:21] So now you're blaming my bot for editing too fast? :PPPP [12:45:42] No .. I don't think that liwa should parse wikidata .. [12:45:50] grmbl [12:48:28] Most of the bots on wikidata run without a throttle since we're fighting humans who remove links without checking, breaking the interwiki chain, and making a lot more human cleanup [12:48:58] :-D [12:49:19] well .. it looks like those bots are adding about 1500 edits per minute to eat to my bot at the moment [12:49:20] Coren: I'm assuming that the SGE parameters are the same as the toolserver, but do I need to request access to the mysql server? [12:49:30] i strongly support legoktm statments about humans XD [12:50:04] OK I think I have everything I need to test a migration of my tools. Will do that when I have the time... [12:50:17] Interestingly wikidata has a really bad spambot problem, but I think thats because we see the hits through the AbuseFilter, instead of the invisible SpamBlacklist that doesnt work on Wikidata. [12:50:53] yep .. from ~600 edits per minute to ~2000 edits per minute .. [12:51:03] is labs stable enough to start migrating the ts tools? [12:51:25] (yah, i know labs will never be stable, you know what i mean) [12:52:02] I don't know, that's why it's a test. [12:52:39] Strictly speaking I will do a clone, not a migration. [12:55:15] Oh man .. it is even my own doing .. I told the bot to follow wikidata .. [12:55:42] legoktm: 15 epm.. it's a really fast bot, considering that it has to deal with wp and wikidata :O [12:56:07] legoktm: pm so as to not spam? :-) [12:56:10] o.O do you want me to slow it down? [12:56:12] Coren: sure [12:57:17] let's see how it behaves, at this time there isn't too many active editors, lets see if the epm decreases when all people comes back to the wiki [12:57:37] once, i saw a bot making 24 epm [12:59:13] Heh [12:59:22] I hit 110 epm on Wikidata last week. [13:00:27] yah, but with multiple instances, or one single script? [13:01:21] One script :) [13:01:30] It was just adding properties to pages [13:02:14] :O [13:10:12] Beetstra maybe try restarting it using sge? you will see it runs faster... [13:10:58] it may be that it needs to run 3-4 times faster ... [13:11:04] well, that will [13:11:17] what instance is sge? [13:11:21] bots-gs [13:11:28] then qsub [13:11:40] but you should first optimize your bot to run on these boxes [13:11:45] sge is using bnr instances [13:11:50] @labs-resolve bnr [13:11:50] I don't know this instance - aren't you are looking for: I-0000056f (bots-bnr1), I-00000629 (bots-bnr2), [13:11:55] these 2 [13:12:03] they are identical [13:12:11] if your bot run on one, it will run on other one as well [13:12:15] how would I optimize the bot to run there? [13:12:31] simplest way is to try starting it on one so that you see if it works or not [13:12:44] do I have /mnt/share there [13:12:52] yes, but that is local storage [13:12:57] you can't use local storage for sge [13:13:09] you need to use /data/project - which is unrealiable gluster or /mnt/secure [13:13:15] which is quite secure and reliable [13:13:19] nobody can see into that [13:13:22] not even root [13:13:41] I just need lots of space for logs (in case I don't clear them in time) [13:13:49] aha, /mnt/secure is small [13:13:54] in that case, try /data/project? [13:14:02] for example I have bot in /data/project/petrb [13:14:11] bot data needs to be accessible on all nodes [13:14:51] or - you can use /mnt/secure for private data - such as configuration and passwords [13:14:55] and /data/project for logs [13:15:18] I don't have write access in /data/project [13:15:45] well, there are many workarounds - simplest is to ssh to all-root instance such as bots-2 and set up your folder using sudo [13:15:48] or you can just request it [13:16:00] if you want I will create /data/project/beetrstra for you if it doesn't exist already [13:16:05] afaik that should be automatic [13:16:21] it does not exist, can you create it? [13:16:23] mhm I will create a folder there for you [13:16:46] done [13:17:04] you shouldn't put private data in there [13:17:09] like secret stuff [13:17:18] OK [13:17:48] we might enforce stricter security policies in future - but I would really LIKE TO merge bots with tools in future, but that all depends on Coren [13:17:48] !log tools added python-requests (1.0, from pip) [13:17:50] Logged the message, Master [13:18:45] petan: I'll be more than happy to give you the resources you need on tools- for that wherever there is a divergence. I'd rather tools be able to run on either with no change if possible. [13:19:49] Coren are you serious with configuration of mysql server? [13:19:53] IMHO it suck [13:19:59] I just had a quick look [13:20:16] petan: No, atm, it's provisional "just need something up" [13:20:21] aha [13:20:27] 100 connections max... mhm [13:20:40] right now in bots project we have like ~500 connections on mysql [13:20:44] petan: Well, I have all of... 4 users? But yeah, that's a bit on the small side. [13:21:15] petan: Thing is, I want a non-virtualized box for it which we might get soon. [13:21:24] anyway - the mysql configuration is not very well, I don't even like how you just moved mysql base directory to /mnt/mysql instead of remounting it like to /mysql [13:21:38] so that it's more clear which fs is dedicated to sql just from typing df [13:21:43] have a look on bots-bsql01 [13:21:46] how it's setup [13:22:04] Coren yes I know, Ryan promised us one like year ago [13:22:07] since then no progress :/ [13:22:27] anyway - I am again septic regarding mysql server where only wmf people will have admin access to [13:22:30] :P [13:22:51] I just like to be able to htop / iotop rather than reading ganglia... [13:23:05] petan: Please remember that the labs project did not have the necessary resources (mostly time) yet. [13:23:13] mhm [13:23:17] petan: Which everyone then realized was a problem, hence... me! :-) [13:23:20] petan, LiWa3 is running on bnr1 without problem [13:23:30] Beetstra okay [13:23:40] Beetstra in that case if it's on network storage [13:23:45] like /data/project or /mnt/secure [13:23:51] you can just go to bots-gs [13:24:01] and use qsub [13:24:06] to submit it to be executed [13:24:37] OK, let me try that [13:24:39] for example qsub -q long /data/project/beetrstra/start.sh [13:24:44] that would submit it to LONG queue [13:24:58] which contains long-time running jobs which get restarted when they crash [13:26:26] Coren well, anyway when I joined labs, which was like in early stage - I was nearly first non-wmf volunteer to have account there, what actually was most interesting for me was that labs made it possible for community volunteers to participate on operation, which is one of main advantages of labs [13:26:35] your restricted project kind of kills that idea... [13:27:19] oh .. eh .. [13:27:37] * Beetstra typed 'qsub start' ...  [13:27:56] ok what it did? :D [13:28:02] you can do qstat [13:28:04] to see if it's running [13:28:25] start should be executable shell script in `pwd` [13:28:26] it said "Your job 264 ("start") has been submitted" [13:28:30] in case you really started that [13:28:38] lol [13:28:44] start is a unix command [13:28:48] for starting services [13:28:58] Beetstra how you start your bots? [13:29:08] do you have a shell script for that or something [13:29:29] I usually did "nohup perl linkwatcher.pl &" .. now it is the script 'linkwatcher' [13:29:32] Beetstra you can also use parameter -o to redirect output to a file [13:29:34] optimizing that [13:29:36] okay [13:30:24] hmm [13:30:28] Beetstra create a shell script containing: [13:30:29] #!/bin/dash [13:30:30] cd "working directory" [13:30:31] qstat does not say anything anymore [13:30:31] perl linkwatcher.pl [13:30:32] echo "bot crashed!" [13:30:33] exit 1 [13:30:51] Beetstra yes because "start" probably resulted in something like "missing job name" [13:30:59] you need to schedule some script [13:30:59] Beetstra: If petan used unix semantics, you can just qsub linkwatcher.pl [13:31:13] or that, given it will use proper exit codes [13:31:34] but from what I have read from sge docs, you should always submit scripts for some reason [13:31:36] petan: /did/ you set your queues with unix semantics? [13:31:45] I think that's default [13:31:56] petan: Default is POSIX forcing csh. [13:32:09] petan: (Yeah, that's dumb -- Sun are odd sometimes) [13:32:17] Coren if you send me your queue config, I will happily use it for compatibilty purposes [13:32:31] * petan likes csh :P [13:32:36] shell_start_mode unix_behavior [13:32:37] maybe because I like c too [13:33:01] petan: I have two queues, a restartable one with checkpoint and a non-restarting for one-offs. [13:33:02] Coren that's what I should have? [13:33:09] Coren I have posix_compliant there [13:33:16] Coren so we have [13:33:30] petan: It's the easiest one for users, because it looks at the '#!' script header [13:33:36] Coren long queue with restarting and main queue with some default options [13:33:39] petan: So you can start perl or python [13:33:43] aha [13:34:38] so Beetstra do you need any help setting it up? [13:34:59] imho the script is most simple, in case your bot needs to be started from certain WD [13:35:05] maybe .. but not much time left today [13:35:10] petan: Did you add a checkpointer for your long queue so bots can be moved? [13:35:28] what is checkpointer [13:35:33] the bot ran on bots-bnr1 .. but now on bots-gs I get problems because perl modules are not running .. [13:35:39] not installed, that is [13:35:59] Beetstra okay, let me check, how do you start it? can you copy paste qsub command? [13:36:21] cd /data/project/beetstra/linkwatcher [13:36:21] perl linkwatcher.pl [13:36:31] nope, don't start it directly on gs [13:36:39] ah [13:36:48] wait a moment I will create a script for you [13:36:53] petan: http://arc.liv.ac.uk/SGE/howto/checkpointing.html [13:37:03] I have linkwatcher.sh in my dir [13:37:33] /home/beetstra/linkwatcher.sh [13:38:31] petan: For most things, that's not necessary, but bot writers can install a handler for SIGUSR1 to dump their state in prevision of being moved/restarted. [13:38:43] petan: So that long startups can be avoided. [13:39:59] Beetstra try: qsub -u long /data/project/beetstra/linkwatcher/start.sh -o /data/project/beetstra/linkwatcher/output.log -e /data/project/beetstra/linkwatcher/errors.log [13:40:25] Coren aha [13:40:41] Beetstra that will start your bot and will save logs to disk [13:41:17] * Coren cries at his email. [13:41:18] ah .. OK [13:41:19] thanks! [13:41:22] yw [13:41:24] :) [13:42:33] eh .. from where do I run that [13:42:40] bots-gs, or bots bnr1 [13:43:11] it has some troubles, let me try it [13:43:17] I think I made a syntax error in that script [13:43:29] No, it has a perl-module problem [13:43:40] -> Can't locate POE.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.14.2 /usr/local/share/perl/5.14.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.14 /usr/share/perl/5.14 /usr/local/lib/site_perl .) at linkwatcher.pl line 4. [13:43:40] BEGIN failed--compilation aborted at linkwatcher.pl line 4 [13:48:37] aha [13:48:39] okay [13:48:46] Change on 12mediawiki a page Wikimedia Labs/Tool Labs was modified, changed by Legoktm link https://www.mediawiki.org/w/index.php?diff=658024 edit summary: [+704] getting started [13:49:05] Coren: feel free to move that if its not in the right place^ [13:49:26] Beetstra let me check that actually I think your bot is working when I start it directly on bots-bnr2 [13:49:36] for some reason it doesn't just when using qsub [13:50:45] legoktm: Oh hey, thanks! My week's schedule is "documentation"; so you've just given me a good headstart from maintainer-view. :-) [13:51:33] heh :) [13:52:14] I need to do the basic howtos this week, and so forth. [13:52:35] Short IRC log excerpts just won't cut it. :-P [13:52:54] Coren: there is a database for each tool, created automatically; is it possible to create other databases? [13:53:21] ^yeah, shared databases would be nice [13:54:03] Darkdadaah: Not automatically atm, because we don't have the wikitech interface for it. I can create databases on request, though. [13:54:09] Beetstra you are right [13:54:19] I think there needs to be some variables set [13:54:22] no problem! [13:54:23] Ok. [13:54:26] we can do that in script [13:54:36] Darkdadaah: But, tbh, I would suggest looking at a shared database as a 'tool' that just doesn't have code so that it can have its set of maintainers as well. [13:55:23] Damianz: So you'd have a 'foodb' tool or whatever, with maintainers. [13:55:43] Can a tool access another tools database then? [13:56:11] Darkdadaah: Yes; the grant on the database to the tool user is WITH GRANT OPTION, so that you can grant to other tools/users [13:56:41] Alright :) [14:02:36] * Damianz looks at Coren [14:02:50] * Coren blushes. [14:03:28] I didn't do it! [14:03:50] (Unless it's a good things, then it's me) [14:05:42] yay Beetstra it works! :D [14:05:57] all I needed was to set up PERL5 path [14:06:02] system variable [14:06:31] Beetstra you can type qstat now to see what is going on [14:06:49] the output of your bot is stored in /data/project/beetstra/syslog but you can change it in script [14:07:08] that is what you would normally see on terminal or nohup [14:08:05] I hope it will run faster, you should probably reschedule it to long queue, but that's up to you [14:08:19] now when your bot finish it won't be auto restarted [14:10:16] im back [14:10:18] addwork hey [14:10:40] addwork first of all: never ever reboot servers without consulting with others [14:10:53] especially bsql01 [14:11:16] addshore then I wanted to remind you to switch to sge :d [14:11:25] bsql01 was very broken >.< [14:11:32] I know [14:11:41] but you should still consult it with others before just rebooting it [14:11:51] ask on irc, or write a mail so that people know what is going on [14:11:59] * addshore will :) [14:12:00] you can't just go and reboot a box without asking [14:12:16] if I did it in my work I would be looking for a new job [14:12:29] haha :P [14:12:39] * addshore thinks we need to work on the db structure a bit more, i.e. a bit more redundancy [14:12:45] yep [14:12:52] however can you try moving your bot [14:12:53] to sge :D [14:12:59] so that we get lower load on bnr1 :P [14:13:00] also which servers does sge deply to? bnr1 and 2? [14:13:03] right now it's 8:1 [14:13:08] yes [14:13:13] !labs-resolve nr [14:13:16] bnr only [14:13:21] @labs-resolve nr [14:13:21] I don't know this instance - aren't you are looking for: I-0000049e (bots-nr1), I-0000056f (bots-bnr1), I-00000629 (bots-bnr2), [14:13:26] is everything puppetd and they are identical? :D [14:13:26] nr1 not [14:13:45] they are identical, but it's not ALL puppeted because i don't have rights to merge stuff in puppet [14:13:54] usually it takes like 2 weeks for ops to merge stuff [14:13:58] kk :) [14:14:00] so using puppet doesn't really work here yet [14:14:05] right then ill go and change my cron :) [14:14:14] I need changes to be more flexible than - in 2 weeks [14:14:22] p.s. we should really have a 'submit' instance like on toolserver [14:14:28] WE HAVE [14:14:30] :D [14:14:32] addshore bots-gs [14:14:35] qsub [14:14:35] :D [14:14:41] that's what I talk about all time [14:15:33] right give me 1 second :) [14:15:38] :P [14:15:51] sure [14:16:46] so literally stick squb infront of everything ? ;p [14:16:54] * qsub [14:17:05] thats what I said! ;p [14:17:25] addshore@bots-gs:~$ qsub php /data/project/addbot/wikidata/g.php --lang="en" [14:17:25] Unable to read script file because of error: error opening php: No such file or directory [14:17:44] i want use to fix this! :D [14:17:56] addshore: you dont need "php" in front [14:17:58] they had / stillhave the same annoying issue on toolserver [14:18:32] addshore if it was me I would make a script :P [14:18:36] for some reason I like them more [14:18:44] because that -o option doesn't work much [14:18:51] for some reason it fail to store output [14:18:53] Someone (with some time available) should add some doc about bots-gs on https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots [14:19:05] Darkdadaah I'd love to [14:19:13] but even if it doesn't look like that I am in work :D [14:19:30] I have several PC's infront of me working on all of them simultaneously [14:19:49] * petan needs like 2 more screens [14:19:55] but that damn table is so small [14:20:25] I have one screen but several workspaces. [14:20:30] * addshore cant remmeber how to check on the status of jobs :/ [14:20:36] addshore qstat [14:20:44] addshore, qstat [14:21:02] but IMHO it's best to create a script for that and in that you can insert redirect of output to some file - that works, while qsub -o doesn't work to me [14:21:04] dunno why [14:21:14] dont need output :) [14:21:18] your script will likely crash and you will not know why :P [14:21:29] I second that. [14:21:40] I think your script just did [14:21:47] crash? :P [14:21:47] because I saw it in queue and it's gone [14:21:58] I don't know if it properly ended or crashed :P [14:22:00] no output [14:22:07] it didnt end properly ;p [14:22:10] heh [14:22:25] but the code didnt crash [14:22:33] addshore get output :P [14:22:43] blargh [14:23:05] Probably a wrong path or something :/ [14:23:19] make a shell script, put in there php >> /data/project/addshore/output 2>&1 [14:23:35] addshore ^ [14:23:50] that's exactly what -o -j oe should do [14:23:54] but it doesn't :/ [14:23:57] maybe some bug in sge [14:24:48] Beetstra is your bot working now? [14:24:51] I see it in queue but no idea [14:25:02] yay [14:25:21] grid is getting stabilized :D load of bnr1 faded to 3 and load of bnr2 increased to 1.4 [14:26:10] petan: thats cause i turned my cron off [14:26:10] xD [14:26:15] hehe [14:26:24] bnr1 should have gone down but nothing should have gone up ;p [14:26:45] lol [14:26:53] haha this is fun [14:27:46] hmmm [14:27:47] Login error: [14:27:56] qsub cant access my password ;p [14:27:56] that's all? :D [14:28:02] where you have it? [14:28:10] secret place ;p [14:28:14] is it local place? [14:28:19] or /data/project [14:28:26] addshore use /mnt/secure for secret stuff [14:28:31] :D [14:28:32] that is very hard to access :P [14:28:36] and it's global [14:29:04] * addshore has no folder ;_; [14:29:12] really? o.o [14:29:14] oh lol [14:29:23] yes that folder is being created by mysql account maker [14:29:30] ahh :P [14:29:32] which failed for you because your account was created by hand [14:29:38] let me fix [14:29:48] * addshore goes to find his secrets [14:30:19] addshore done [14:30:21] you have it :D [14:32:31] okay lets try again :) [14:34:16] Im guessing the process qsub starts is not owned by addshore? [14:38:16] addshore it is [14:38:28] addshore unless you did sudo su or something before :P [14:38:45] addshore btw I inserted you to motd so that people can blame you more [14:39:11] I saw ;p [14:39:33] still get login error, who is the process qsub starts owned by? a qsubbby user or addshore? [14:39:51] addshore [14:39:57] it's your process [14:40:06] maybe you are missing some variable? [14:40:09] it doesn't load .profile [14:40:10] hmm [14:40:16] * addshore is mildly confused [14:40:32] * addshore thinks this may go deeper that qsub ;p [14:40:58] addshore maybe try to run a cat or something on that file so that you can see in logs if it has access to it [14:41:39] it does, as it is require, otherwise it would dir before it could output a login error [14:41:42] *die [14:41:58] okay so what is error now :D [14:42:30] login error still O_o [14:42:36] o_O [14:42:42] which almost makes me think my account is broken ;p [14:42:46] lol [14:42:48] nope [14:42:52] ok, step 2 [14:43:00] ssh to bnr2 and try to run it by hand :D [14:43:20] btw what kind of login error is it [14:43:24] error logging in to wiki? [14:43:29] or to sql? [14:43:33] or somewhere else? [14:43:47] into wiki [14:44:01] and petan its not bnr2 as otherwise again it would give me a different error ;p [14:44:01] maybe it really can't login [14:44:03] oh wait... [14:44:07] what [14:44:09] does bnr2 have curl? [14:44:14] it should have [14:44:18] * addshore checks [14:44:18] let me check [14:44:27] unless you installed it by hand on bnr1 :P [14:44:34] I think you did ;p [14:44:41] I have script that install across all instances [14:44:45] ahh [14:44:50] and keep track of what it installs [14:45:00] was it that? [14:45:21] runs fine if I run it on bnr2 [14:45:33] okay in that case it's something with variables [14:45:43] you need more debugging in your bot [14:45:48] just: login error is vague [14:46:05] ... [14:46:12] that is the error the wiki api returns.. [14:46:19] oh lol [14:46:25] php should return an error if there is no password file... [14:46:31] so maybe then check what arguments you passed to it [14:46:44] like echo whole these POST data to some private log [14:47:11] i have that if I turn debugging on :P [14:47:15] but require '/data/project/addbot/config/wiki.php'; [14:47:22] which in turn has [14:47:22] require '/mnt/secure/addshore/.password.addbot'; [14:47:30] but if it runs by hand on 2 I am pretty sure it's something with profile [14:47:39] which in turn sets $config['password'] [14:48:02] which is then used in [14:48:02] $wiki->login($config['user'],$config['password']); [14:48:12] mhm [14:48:23] my wiki bot is running fine :D [14:48:27] but it's written in c# [14:48:30] xD [14:48:39] * addshore tries something [14:48:43] :o [14:48:44] petan, it runs [14:48:48] Beetstra cool [14:48:54] not sure if it is fully up to speed, will check that [14:49:03] Beetstra you may want to reschedule it into long queue / or somehow cron some check / restart [14:49:08] ok [14:49:19] ffs all of ssh sessions to bots keep dieing... [14:49:36] Beetstra I submitted it using just qsub, so once it die, it will stay like that [14:49:44] addshore hm? [14:49:50] addshore not to me [14:49:54] >.< [14:49:56] probably your connection suck [14:49:58] OK, the bot is used to run for long, long times .. [14:50:00] use screen on bastion [14:50:15] Beetstra ok - if it's crash resistant [14:50:15] screen on bastion... not just ssh from it...? [14:50:24] look, Beetstra if you want I can restart it if I see it's down [14:50:36] addshore I use screen on bastion [14:50:51] :o [14:50:52] Let me play a bit, and then I will write a manual for that .. [14:50:52] try it [14:50:53] :P [14:50:59] Beetstra sure! [14:51:04] i will next time it dies [14:51:15] thanks for offering to re-start it when it is down .. I may be out soon for some time (becoming father ..) [14:51:29] yay [14:51:33] interesting login error again ... [14:51:36] grats :) [14:51:40] Though the bot has in the past run for 50 days without problems .. so I may be back by then :-) [14:51:42] thanks! [14:52:00] okay :) [14:52:08] addshore I will be back, need to patch my bouncer [14:52:16] let's hope this version will work [14:52:22] I'lll also move COIBot, XLinkBot and UnBlockBot into this system, tomorrow probably [14:52:27] ok [14:52:29] cool [14:52:32] see you all tomorrow! Time to go home! [14:52:36] Beetstra if you needed help making a script let me know [14:53:10] OK, thanks for the support! [14:53:14] yw [14:58:44] * addshore is fixed [14:59:46] damn [14:59:57] I forgot one thing in that code [15:00:03] will need to restart once more... [15:08:52] right [15:08:56] it seems I have to run [15:09:45] echo "php /data/project/addbot/wikidata/g.php --lang='en'" | qsub [15:11:03] @notify petan [15:11:03] This user is now online in #huggle so I will let you know when they show some activity (talk etc) [15:11:19] petan.. [15:11:33] The only way I can work out to do this is >>> echo "php /data/project/addbot/wikidata/g.php --lang='en'" | qsub [15:11:54] i fixed the login error :) [15:11:54] yes? [15:11:55] o.O [15:11:56] wtf lol [15:12:04] LOL [15:12:11] ok [15:12:16] but to submit a php job without using another bash submission script I need to echo it in this silly way [15:12:32] well, I typically use script [15:12:37] that fixed lot of problems [15:12:48] gimmie ;p [15:12:55] what? [15:12:58] ok [15:13:29] #!/bin/dash [15:13:30] $* >> /data/project/addshore/logs [15:13:31] :D [15:13:38] that's it [15:13:39] xD [15:16:38] so then if I call that submit. I can do >>>>> qsub -s testname /data/project/addbot/submit php /data/project/addbot/wikidata/g.php --lang="en" [15:17:58] oh there is no -s [15:17:58] xD [15:18:32] and it still doesnt work >.< [15:19:26] maybe [15:19:27] qsub -v var="php /data/project/addbot/wikidata/g.php --lang='en'" /data/project/addbot/submit [15:22:07] petan: ? O_o [15:22:10] addshore just use script [15:22:23] that is using the script... [15:22:25] aha [15:22:28] how is it called [15:22:42] if it was called "addshore-loader" [15:22:54] qsub "addshore-loader php /data..." [15:23:16] it should be problably full path [15:23:18] everywhere [15:23:34] qsub "/path/addshore-loader php /path/..." [15:23:42] addshore ^ [15:23:48] it is [15:23:48] qsub /data/project/addbot/submit php /data/project/addbot/wikidata/g.php --lang="en" [15:24:13] no [15:24:25] qsub "da ... --lang=en" [15:24:33] 1 parameter [15:24:38] qsub "/data/project/addbot/submit php /data/project/addbot/wikidata/g.php --lang='en'" [15:24:44] yes [15:24:52] why you have en in quotes? [15:25:01] you could do "blah=\"blabla\"" [15:25:06] if you wanted " [15:25:07] :P [15:25:16] qsub "/data/project/addbot/submit php /data/project/addbot/wikidata/g.php --lang='en'" [15:25:16] Unable to read script file because of error: error opening /data/project/addbot/submit php /data/project/addbot/wikidata/g.php --lang='en': No such file or directory [15:25:26] great mail, Coren [15:25:29] um [15:25:42] addshore interesting [15:26:08] maybe just create a script with no paramaters that has hardcoded php name? [15:26:20] O_o [15:26:20] that's simplest or google how to pass arguments to job in qsub [15:26:33] but then I must make like 250 scripts [15:26:34] let me google it [15:26:45] iv been googling it ever since we started talking about this [15:26:46] xD [15:26:54] Platonides: Err, which? I'm like about 20 emails written this morning already. :-) [15:27:01] Platonides: Oh, you mean the -labs one. [15:27:10] http://stackoverflow.com/questions/3504081/parameter-for-shell-scripts-that-is-started-with-qsub [15:27:31] LOL [15:27:35] addshore use echo XD [15:27:54] but thats just ugly [15:27:54] xD [15:28:01] hehe [15:28:05] thats what I had ages ago [15:28:05] xD [15:28:17] use it as workaround and fix it "some day" [15:30:37] right then, time to rewrite my cron [15:36:43] petan: all croned up [15:36:56] in 5 mins everything will be running as normal [15:37:35] is there a live updating version of qstat? [15:39:29] and how big can job-id get? xD [15:41:23] addshore no idea [15:41:33] take a look at qstat now :P [15:41:53] its a bit slow handing the jobs to instances :/ [15:42:54] and many seem to be getting dumped in the log queue O_o [15:46:40] addshore nope [15:46:47] was just threshold_limit [15:46:50] I increased that [15:46:52] :D [15:46:55] good :) [15:46:59] now it's all there [15:47:04] i did just notice the rest of the got accepted :P [15:47:21] !log tools enabled X forwarding for qmon. Also, installed qmon. [15:47:23] Logged the message, Master [16:03:47] hmm petan it still seems to handle submited jobs rather slowly [16:04:03] hmh [16:04:06] let me check [16:04:30] no wonder - load 6+ [16:05:04] addshore ur sysadmin - fix it [16:05:13] :P [16:05:15] HAHA I wasnt checking the load on gs [16:05:16] xD [16:05:32] * addshore empties his cron [16:06:03] nope [16:06:09] that is load on nodes [16:06:26] qhost [16:06:56] if you are really going to use such a "massive" task we might need to create one more node, but isn't it just possible for you to optimize the bot? :P [16:07:02] .. [16:07:06] :D [16:07:11] it can run at this speed on bnr1 alone... [16:07:18] ah [16:07:19] mhm [16:07:21] weird [16:07:26] then keep it as you had [16:07:32] let's see what will happen [16:07:43] if it will stuck in queue there is something wrong what needs a fix [16:07:47] * addshore goes to find his old cron [16:07:53] huh? [16:07:54] what for [16:08:07] wait you mean ekep it in the queue or put it back on bnr1? ;p [16:08:15] keep it in queue :D [16:08:30] * addshore reinstates his new cron [16:08:39] atm there is 0 jobs waiting [16:09:30] some spawn every min [16:09:32] seom every 5 [16:09:36] the rest every 20 [16:09:47] and then other stuff stuck everywhere [16:17:41] HAH, petan I think the extra load is just generally caused by more people using labs, the bnr2 load is mainly lego and beetsra [16:18:05] infact I cant see a single one of my processes in htop there other than htop itself [16:18:06] yay [16:18:43] maybe it does need another instance? O_o [16:22:42] !log bots petrb: disabling MTA on whole bots project to resolve spam [16:22:43] Logged the message, Master [16:24:56] legoktm: around? [16:37:01] addshore load 32? [16:37:02] wtf [16:37:04] :D [16:37:14] addshore try qstat -j [16:37:20] you will see why your jobs are held [16:37:42] addshore I can create bnr3 but do we want it? [16:38:25] petan: im gonna fiddle with the sge settings [16:38:30] just gotta catch up on my talk pages [16:38:44] addshore that's not problem of SGE that there is load over 30 [16:38:48] that's problem of your bot :D [16:39:02] only solution I can think of is creation of more nodes or optimization of your bot [16:52:47] addshore if you are going to change sge log it [16:53:02] I will [16:53:07] still working on my talk pages atm ;p [16:53:28] ok [16:54:08] * jeremyb_ glares at the walls of text (on the list) [16:57:22] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=658103 edit summary: [+13] /* Wikimedia Foundation */ added Marc to WMF staff [16:59:33] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by MPelletier (WMF) link https://www.mediawiki.org/w/index.php?diff=658104 edit summary: [-6] /* Wikimedia Foundation */ -4 atm [17:06:15] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=658107 edit summary: [-3] /* Wikimedia Foundation */ make TZs consistent [17:06:23] Coren: ^ [17:07:19] jeremyb_: Oh, I didn't try to do the timezone switch; I just added my own without noticing where the others were. :-) [17:07:49] yeah [17:07:53] just FYI :) [17:07:58] hrm, no ryan [17:11:45] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=658108 edit summary: [-35] /* Labs */ use internal link [17:11:54] ^^ thanks Coren for adding you TZ :) [17:12:18] Silke_WMDE_: No worries. I'm in the process of replying to that email with questions now. [17:12:25] cool! [17:12:43] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=658109 edit summary: [-27] /* Technical discussion on MediaWiki.org */ use internal link [17:13:13] hrmmmm, who wrote this page? [17:13:26] there's more external links than i originally noticed [17:13:33] (unnecessarily) [17:13:40] I did [17:13:49] soory ermmm [17:14:36] * Silke_WMDE_ is still learning what is considering "external" and what isn't really [17:15:28] well they're formatted differently in the page when you're just reading it :) [17:16:16] there's this cool list somewhere [17:16:38] hidden in a wiki [17:19:42] of what? [17:20:35] found it: the interwiki links / abbreviation but there is none for wikitech [17:20:50] does it have one? like wt:blah [17:20:52] ? [17:21:19] where are you looking? [17:21:27] http://en.wikipedia.org/wiki/Help:Interwiki_linking [17:21:28] 11 17:11:45 <+wm-bot> Change on mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=658108 edit summary: [-35] /* Labs */ use internal link [17:22:39] see [[special:interwiki]] and /w/api.php has an interwikimap [17:22:48] and special:sitematrix [17:22:52] and i'm editing the page now [17:23:06] ok [17:23:22] thanks [17:26:34] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=658112 edit summary: [-403] convert to interwiki/internal links [17:28:26] That reminds me, I iz need a bz component. [17:28:48] ok, wtf is going on with the bots instance ? [17:28:52] bots-bnr2 [17:28:57] because it's been emailing root [17:29:02] like hundreds every minute [17:29:06] and i am about to delete the instance [17:29:30] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=658116 edit summary: [+36] fix what i broke :) [17:29:54] Ryan_Lane: LeslieCarr: ok, wtf is going on with the bots instance ? [17:29:55] [5:28pm] LeslieCarr: bots-bnr2 [17:29:55] [5:28pm] LeslieCarr: because it's been emailing root [17:29:55] [5:29pm] LeslieCarr: like hundreds every minute [17:29:55] [5:29pm] LeslieCarr: and i am about to delete the instance [17:30:01] Silke_WMDE_: take a look [17:30:24] LeslieCarr: it's petan's, ask him :) [17:30:34] petan: [17:30:49] maybe i just turn them off [17:30:54] i guess we could just stop crond or whatever [17:31:00] it's not cron [17:31:02] (this time!) [17:31:13] 11 16:22:42 <+labs-logs-bottie> !log bots petrb: disabling MTA on whole bots project to resolve spam [17:31:28] idk how your experience can agree with that !log [17:31:30] GE 6.2u5: Job 965 failed [17:31:36] oh, it's SGE [17:31:38] actually it has just stopped [17:31:40] as of 4 minutes ago [17:32:40] Ryan_Lane: prod interwiki map needs a new def for "wikitech" [17:43:43] jeremyb_: Great, got it. Thanks! [17:46:22] ok, this started up again [17:46:23] you know, I dont believe it for a second when the load for an instance says 85 but mem is at about 25% and cpu at about 10%... [17:47:06] addshore: Why not? Load is just "number of proceeses ready to run" [17:47:38] addshore: have you checked disk i/p ? [17:47:41] i/o even [17:47:50] Ryan_Lane: can i suspend an instance instead of just rebooting it ? [17:48:08] LeslieCarr: yes [17:48:15] how do i do that ? [17:48:20] what started again? email? [17:48:22] yep [17:48:30] the same instances? [17:48:31] bots-bnr1 and bots-bnr2 [17:48:36] filling the crap out of gmail [17:48:42] ugh [17:48:42] as always most of the processes are just sleeping, I think the load formula needs tweaking [17:49:03] LeslieCarr: on virt0.... [17:49:06] one sec [17:49:54] ok, i have to go to the doctor's office anyways [17:49:58] Coren: are you just using defaults for tools or have you done much tweaking in the sge confs? [17:50:09] when i get back, i may just turn off the vlan for labs ;) [17:50:13] bbiab [17:50:33] addshore: There is some tweaking of sge, more to come as actual use patterns emerge. [17:50:51] LeslieCarr: or you could just use iptables [17:50:53] addshore: more likely than not, there will be some specialized queues as well. [17:51:00] to block the individual instances [17:51:19] or someone could create exim relays so that this doesn't affect production [17:51:23] addshore: right now, I'm being conservative in my settings; placing reliability above performance. [17:52:08] LeslieCarr: as root: OS_TENANT_NAME=bots nova suspend bots-bnr1 [17:52:13] OS_TENANT_NAME=bots nova suspend bots-bnr2 [17:52:32] iptables is really a better solution, though, as then you can see why they are emailing [17:54:11] I've already run the command for you [18:01:04] * addshore notices bnr1 and bnr2 fall off the map [18:02:36] as well they should [18:02:52] D: [18:03:08] jeremyb_: Did you just commit servercide? [18:03:42] FastLizard4: I think that was an act of righteous retribution rather. :-) [18:03:45] FastLizard4: err, huh? [18:03:54] i'm not familiar with that [18:03:59] Coren: Lol [18:04:27] jeremyb_: ([a-z]+)cide where $1 is something that is killed. :-) [18:05:07] it was suspended not killed [18:05:15] jeremyb_: not as well they should. I wouldn't have disabled them if I would have known it was just open grid spam [18:05:29] Ryan_Lane: still getting mail spam? [18:05:42] Coren: We should upgrade jeremyb_'s sense of humor :P [18:05:46] it's not spam [18:05:59] it's a misconfigured service [18:06:25] which? :D [18:09:34] pertb disabled the mta for all bots instances an hour and a half ago [18:10:14] maybe he didn't do it properly? [18:10:39] or it's finding some other way to get out or puppet reeenabled [18:10:40] * addshore cant check if he cant access said instances, but I will check on the others [18:10:47] jeremyb_: that's also likely [18:15:05] If Oracle finally renames Sun Grid Engine to Oracle Grid Engine [18:15:11] Can we start calling it OGRE? :P [18:16:58] we're using open grid, not sun grid engine ;) [18:17:48] Ryan_Lane: Oh, so you're already using OGRE :D [18:17:49] :P [18:18:01] Or OGR, I guess [18:18:05] If they dropped the "Engine" :P [18:19:25] heh [18:40:11] !logs [18:40:11] logs http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs [18:40:30] Why doesn't wm-bot understand "!logs" in a query? [18:41:32] It looks like it did? [18:46:52] No, it doesn't in "/query wm-bot": "Hi, I am robot, this command was not understood. Please bear in mind that every message you send to me will be logged for debuging purposes. See documentation at http://meta.wikimedia.org/wiki/WM-Bot for explanation of commands" (BTW, "debugging") [18:50:01] Ah. [18:50:14] I just got "in a query" [18:52:10] Must. Define. Language. Unambiguously. :-) [18:53:32] LeslieCarr it's been already reported and stopped ages ago [18:53:46] LeslieCarr did you do something with that instance? [18:54:34] scfc_de: because it has ton of databases, not only for this channel, how should it know which one you want to use [18:54:58] petan: Ah! That makes sense. [18:55:36] Would be nice if it could offer some options then ("!logs #wikimedia-labs" or something like that). [18:55:44] petan: GE 6.2u5: Job 1083 failed we got those until an hour ago.. lots of them [18:55:53] with different job numbers [18:55:55] I know that apergos messaged me [18:55:58] ok [18:55:59] in private message [18:56:10] it was already sorted out and ipfiltered by time leslie came here [18:56:17] gotcha [18:57:24] I pmed Leslie but got no response, didn't realise you were also here, also petan the instances are still running [18:57:36] so nobody did touch them? [18:57:43] did you reboot some of them recently? [18:57:58] because the iptables filter will be reset when you reboot them [18:58:20] also Ryan_Lane why does it even send some emails out? you said emails are disabled in labs [18:58:33] and why does it send these emails to some of your production boxes [18:58:40] it should send it to local root [18:58:41] technically I said we didn't have a labs relay [18:59:01] root is an alias in our configuration [18:59:08] ok can you change it for labs? [18:59:09] it sends to us. not to the local system [18:59:20] just disable the mail sending for now [18:59:21] there is no point in sending labs stuff to production [18:59:24] Ryan_Lane how? [18:59:31] I ipfiltered the port [18:59:37] and you say it's still coming? [18:59:42] does open grid not have a config option for this? [18:59:58] I don't see email coming in now [18:59:59] I don't even have access to target mailbox - I don't even know what name it has - how could i know if it's fixed or not [19:00:06] we can tell you ;) [19:00:18] fine, apergos told me like hour ago it's fixed [19:00:34] again, does grid engine not have a config option for this? [19:00:39] no idea [19:00:50] I didn't change any config related to emails [19:00:59] I wasn't saying you did [19:01:00] btw did someone touch these two instances or are they still running? [19:01:10] probably it has some config - who knows [19:01:12] petan: they are still running [19:01:13] I suspended them when someone told me they were spamming [19:01:22] aha so they are not running [19:01:29] I unsuspended them [19:01:32] my processes are still running and editing wiki [19:01:38] mhm [19:01:40] after I was told it was legitimate mail [19:01:59] ok I go check them then [19:02:06] thanks [19:02:19] Ryan_Lane if iptables are not able to prevent emails from being delivered - what can?? [19:02:29] doesn't make sense to me [19:02:39] I'd imagine that worked [19:03:00] maybe there was a lot of mail and it just took that long to be delivered [19:05:43] Ryan_Lane do you know how is it aliased [19:05:52] so that I could unalias it and make them send the emails to localhost [19:05:53] via exim ;) [19:06:02] WTF [19:06:05] doing that will be… difficult [19:06:08] I removed exim from these instances [19:06:18] just to stop all these spamming [19:06:21] puppet is going to put that back, obviously ;) [19:06:26] lol [19:06:55] there is no exim running atm [19:06:57] pester paravoid and/or mark for an exim relay for labs :) [19:07:13] I even ipfiltered port 25 [19:07:23] it shouldn't be technically possible for these instances to spam [19:07:28] are you sure that the spam is recent? [19:08:16] it was from this morning [19:08:25] REJECT tcp -- anywhere anywhere tcp dpt:smtp reject-with icmp-port-unreachable [19:09:13] ok is there any mail that is newer than 1 hour? [19:09:50] what will happen when I send mail to myself? [19:09:52] instead of root [19:09:56] I found the config option [19:09:59] it'll be dropped [19:12:04] !log bots petrb: changed email in /etc/gridengine/configuration [19:12:06] Logged the message, Master [19:12:13] great [19:12:32] it's incredible how hard it is to prevent a machine from sending out emails... [19:12:50] it's impossible, really [19:12:55] I always had troubles to get some box to send out some, this time it's the other way [19:13:17] petan: why not set the email to null? [19:13:24] how [19:13:30] in the config? [19:13:34] is it impossible? [19:13:37] I don't know if it's possible [19:13:40] I don't want to break it [19:13:51] the documentation for SGE actually pretty suck [19:14:13] on other hand I would like to receive these emails in future [19:14:17] you're using SGE rather than open grid? [19:14:22] not in personal box but in local mail [19:14:29] well, actually it is open grid [19:14:35] I just call it SGE :P [19:36:23] Coren still need a category? [19:36:35] in bz [19:36:47] That, and a good beer. I expect you can only help with the former. :-) [19:37:06] I can help with both but no idea where to deliver the beer [19:37:34] how do you want to call it? and what the description should be? [19:37:37] petan: We'll arrange something if you're in Amsterdam for the hackaton. [19:37:47] * Coren ponders. [19:37:48] yes I am [19:38:04] petan: why would you need the xen version? [19:38:12] of a kernel? [19:38:13] Ryan_Lane because grub requires it [19:38:19] o.O [19:38:31] I compiled standard kernel and grub told me: ignoring non-xen version blah blah [19:38:41] a xen kernel is specifically for xen [19:38:49] I have no idea why it required it [19:38:53] either as dom0 or domU [19:38:55] I though we are on kvm [19:38:58] we are [19:39:03] it uses standard kernels [19:39:05] then it doesn't make sense [19:39:17] Just "tools" is probably too obscure. [19:39:18] look - I still have these packages in my home [19:39:37] just ssh to bots-2 or that, go into my home, dpkg -i linux-image*3.8* [19:39:42] you will see that message [19:40:03] * Ryan_Lane shrugs [19:40:03] Coren I don't really care :P [19:40:05] no clue why [19:40:22] but if you use the xen kernel it's possible the instance won't reboot [19:40:25] Coren maybe tool-labs? or how you officialy call it? [19:41:07] Default Assignee and description pls [19:41:33] petan: Meh. "tools" will do, and matches the wikitech project. As for description, "The Tool Labs project" seems sufficient, perhaps with an URI to the right spot? [19:41:48] petan: default asignee should be me 'marc@uberbox.org' [19:42:08] it's usually wikibugs-l@lists.wikimedia.org but I can set it to you [19:42:20] http://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs [19:42:39] For now it's probably best if I am default asignee. [19:44:58] Coren done [19:45:04] let me know if you want to fix it :P [19:45:06] Danke. [19:52:36] addshore did you just disable your bot? o.o [19:52:46] nop [19:53:06] wait, i removed the crontab from gs as I was creating a bit queue with the instances disabled [19:53:12] *big [19:53:47] aha [19:54:08] btw without your bot load is under 1 [19:54:14] with your bot load is like... 30 [19:54:15] :P [19:54:30] * addshore checks [19:54:31] I don't know what it does but I think you want to optimize it [19:54:33] :D [19:54:36] really [19:54:40] but thats makes no sense at all [19:54:46] huh? [19:56:00] as running the same cron on a single instance has a load of like 1 or 2 ... [19:56:58] mhm maybe you didn't submit so many at once? [19:57:52] I did >.< its an identical cron file, I just used a regex to alter it to use og [19:58:29] I think OG just uses a stupid calculation of load [19:58:36] maybe [19:58:49] that isnt practical for such tasks :/ [19:59:09] so are u going to start it? [20:02:48] again ? :P [20:02:58] I was considering looking at the algorithim it uses for load first [20:03:21] ok [20:03:48] but as you said above the documentation is ..... not the best xD [20:07:25] petan: adding my cron back [20:07:43] gl [20:07:50] load is at 0.05 and 0.43 currently [20:11:41] 70 processes just submitted at the same time and the load has only gone up to 0.13 and 0.90 [20:24:05] petan: it is silly how the cpu peaks to 38% and drops but the load that og uses wont allow more procs as its at about 10.. [20:24:48] that's how I configured it [20:24:56] it can be changed to higher load [20:25:36] *goes to find the config* [20:25:50] it's not a config [20:25:57] you need to use qconf -mq [20:29:09] addshore these things must be logged [20:29:19] I know you said before xD [20:29:21] btw [20:29:27] ur going to be in amsterdam? [20:30:24] mhm? [20:34:03] I currently need to be in the uk on the 25th :/ this might move to the 28th though [20:36:20] Could someone add me to the Openstack project please? [20:36:20] Notice: Undefined property: CentralAuthPlugin::$boundAs in /var/www/wiki/mediawiki/extensions/OpenStackManager/nova/OpenStackNovaLdapConnection.php on line 20 Fatal error: Call to undefined method CentralAuthPlugin::connect() in /var/www/wiki/mediawiki/extensions/OpenStackManager/nova/OpenStackNovaLdapConnection.php on line 21 [20:36:28] ^ Starting to piss me off locally ;) [20:39:01] Reedy: yep [20:39:51] Reedy: done [20:39:56] Great, thanks [20:40:00] we have stuff pre-configured on nova-precise1 [20:40:03] err [20:40:05] nova-precise2 [20:40:21] https://wikitech.wikimedia.org/w/index.php?title=Special:NovaProject&action=configureproject&projectname=openstack [20:40:30] "You can not complete the action requested as your user account is not in the project requested." [20:40:32] Useful :D [20:40:52] Reedy, you need to log off/back in. [20:40:57] I did [20:41:01] hm [20:41:07] Ah, then it's Ryan's fail. :-) [20:41:11] you're reedy in wikitech, right? [20:41:35] configuring the project isn't actually what you want to do ;) [20:41:55] True [20:42:00] butbutbutbutbutbutbut there's a link to click [20:42:08] heh [20:42:08] indeed [20:42:25] must be a caching issue [20:42:30] I can view add member, so I'm guessing the permissions are right [20:42:40] yeah [20:42:52] I'm also getting that error [20:42:54] I'd imagine we're not invalidating a cache somewhere [20:42:56] really? [20:43:00] I'm definitely part of the openstack project [20:43:04] I wonder if it's being limited to the wrong group [20:43:08] * Ryan_Lane checks [20:43:36] Doesn't work for any of the projects I'm in it seems [20:44:20] The link to it shows up on https://wikitech.wikimedia.org/wiki/Special:NovaProject [20:44:50] ah [20:44:51] I see why [20:45:01] with a lowercase 'c' :P [20:45:13] because it's using manageproject right [20:45:23] which is limited to bureaucrat [20:45:30] we need to use a different right for that [20:50:31] Shouldn't wikitech-testing.wmflabs.org resolve to something? [20:50:46] petan: I just found a administrator_mail variable that that is set to root on OG [20:50:48] it does for me [20:50:55] oh [20:51:01] it probably redirects improperly [20:51:07] addshore you can change it [20:51:07] wait. no [20:51:08] it works [20:51:16] but, it looks like memcache may have died [20:51:28] OpenDNS reports NXDOMAIN [20:51:33] addshore: obama@whitehouse.gov [20:51:37] to what? O_o it is just set to 'root' currently, [20:51:38] xD [20:52:27] Visiting http://208.80.153.198 points it at http://wikitech-testing.wmflabs.org [20:53:02] Reedy: oh [20:53:07] it's wikitech-test.wmflabs.org [20:53:43] AH [20:53:56] Ryan_Lane, it works but only temporarily [20:54:07] Krenair: what does? [20:54:14] Next time puppet runs it'll overwrite my config changes again [20:54:24] oh. rigt [20:54:26] *right [20:54:28] the redirect [20:54:44] Also as you just found out, my change is half broken. will fix in a sec [20:55:09] lol [20:55:15] hm. wikitech-static is down [20:56:07] okay wikitech-test.wmflabs.org should be working now [20:56:13] still has an invalid https cert, but meh [20:56:38] bleh. stupid non-standard memcache port we use [20:59:21] oh. I think I see why configure really doesn't work [20:59:29] bad variable name [21:02:42] could someone force reviewer-bot to shut down? it's malfunctioning and sending lots of spam. see #wikimedia-tech [21:04:44] Ryan_Lane: BTW, I'm working on the tools:: class for all my roles; do you prefer one big changeset with all the big picture or in party-sized chunks? [21:05:04] party sized chunks would be best [21:05:10] I really need to get lunch... [21:05:20] Go get food. [21:05:22] Ditto [21:05:28] Then I realised it was meeting time [21:05:28] :/ [21:07:41] Reedy: configure in the projects page should work [21:07:45] *now [21:08:04] Yay [21:09:06] Ryan_Lane: can I take some minutes of your valuable time ? [21:13:04] Password: Reset password [21:13:04] Public SSH keys: Manage your public SSH keys [21:13:04] Instance shell account name: testreedy [21:13:04] Two-factor authentication: Manage two-factor authentication [21:13:06] Remember my login on this browser (for a maximum of 180 days) [21:13:13] Why does that order vary between wikis? :/ [21:13:36] on wikitech that's in an order that makes sense.. [21:13:50] On my dev wiki and wikitech-test it's what I'd considering wrong [21:14:13] :| [21:15:07] Coren: With Bugzilla in place, where do you want to keep TODO items like the "AllowOverride None" bit? On Bugzilla as enhancement, somewhere on wiki, elsewhere? [21:16:45] * Coren thinks. [21:17:04] scfc_de: Probably best to bz as enhancement. Higher probability of good tracking. [21:17:29] <^demon> Wikinaut: Did you read the upstream patch I listed? You don't show up in the table if you can't leave a score != 0. [21:17:34] <^demon> On drafts you can only leave 0. [21:17:37] <^demon> So no table for you. [21:19:56] https://bugzilla.wikimedia.org/show_bug.cgi?id=46002 - Ordering in Special:Preferences groups are awkward and non consistent [21:21:55] Reedy, it's not based on extension loading order is it? [21:22:04] Presumably [21:22:11] But why would the core (password change) come later? [21:22:16] Config based on globals? [21:23:16] Ryan_Lane: Where do people log issues with mediawiki-strapping? The tabs in preferences look silly [21:24:30] [bz] (NEW - created by: Tim Landscheidt, priority: Unprioritized - normal) [Bug 46003] Relax restrictions on .htaccess - https://bugzilla.wikimedia.org/show_bug.cgi?id=46003 [21:48:28] ohaiyo [21:48:38] Coren: ping [21:53:18] wizardist: Pong! [21:57:34] https://wikitech-test.wmflabs.org/wiki/Special:Preferences#mw-prefsection-openstack [21:57:35] lol [21:57:53] lol [21:58:24] bit better still [21:58:28] Copy pasting a load of code... [21:58:32] * Ryan_Lane nods [21:58:44] But it's going to be removed from Special:NovaKey... soooo [21:59:16] Have we got a nice way to commit code from nova-precise2 [21:59:31] noting it's a https checkout, files owned by root [21:59:34] I commit locally and push there [21:59:37] oh [21:59:38] right [21:59:40] I found I couldn't without the commiter ending up being root@nova-precise2 [21:59:52] we could change the ownership [22:00:04] It's a pain having to use sudo for everything [22:00:05] root:project-openstack [22:00:14] with sgid bit set for the directory [22:00:26] and chmod -R g+w [22:00:43] arrrgh, scary default editor! [22:00:45] I'll do that really quick [22:00:52] the default isn't vim? [22:01:14] * Krenair prefers nano to vim [22:01:17] ^ [22:01:39] * Ryan_Lane shudders [22:01:40] :) [22:01:44] I should learn to use other stuff [22:02:21] I learnt to use vi when I was like 9 or 10, but I prefer nano by far :P [22:04:11] 3pm [22:04:16] I better go get some food [22:04:32] how do I add SSH keys with this preferences tab, Reedy? [22:04:38] You don't, yet [22:04:41] I never said it was finished [22:04:51] You can still do so on Special:NovaKey [22:04:51] ok :) [22:04:51] heh. I can't live without my key combos [22:05:19] Adding keys would probably be better left on Special:NovaKey [22:05:27] Same way as OpenID uses https://wikitech-test.wmflabs.org/wiki/Special:OpenIDConvert [22:06:02] Rather than making a big mess in Special:Preferences [22:06:11] Though, not sure how well trying to delete keys and stuff may work [22:06:35] Back in 10 or something [22:08:00] ugh. SSH auth forwarding doesn't apply with sudo it seems. [22:13:16] I'm changing ownership of stuff on nova-precise2 [22:14:42] Krenair, Reedy: root should no longer be needed [22:16:31] thanks Ryan_Lane [22:16:34] yw [22:30:50] petan: I give up fiddling with OG >.< [22:54:00] [Mon Mar 11 22:52:25 2013] [error] [client 216.38.130.163] PHP Fatal error: Using $this when not in object context in /srv/org/wikimedia/controller/wikis/w/extensions/OpenStackManager/special/SpecialNova.php on line 163, referer: https://wikitech-test.wmflabs.org/wiki/Special:Preferences [22:54:20] function createActionLink( $msg, $params, $title = null ) { [22:54:20] if ( !$title ) { [22:54:20] $title = $this->getTitle(); [22:54:21] :| [22:55:20] Reedy: how's that possible? [22:55:32] I've no idea [22:55:33] are you calling it staticly? [22:55:42] line 163 is the getTitle line [22:56:13] the error seems to imply its being called staticly [22:57:40] array_push( $actions, SpecialNova::createActionLink( 'openstackmanager-delete', array( 'action' => 'delete', 'hash' => $hash ) ) ); [22:57:47] SpecialNova::createActionLink [22:57:48] Yeah [22:58:00] So it lets me call it statically, then barfs later? :/ [22:58:57] any reason it needs to be static? [22:59:22] oh [22:59:28] Calling it from the user preferences hook (indirectly) which is static [22:59:30] if you pass it a title, it can be called staticly [22:59:54] though it's not defined as static, so that's a little hacky :) [23:00:45] it can be defined as static if all calls to it provide "$this" as title [23:00:51] or we can have a wrapper function [23:00:54] lol [23:01:16] I like the wrapper function option ;) [23:02:16] honestly, it's really just a wrapper to Linker:link [23:02:32] may be sane to call that explictly [23:08:06] -_- [23:08:13] labs-morebots: howdy [23:08:13] I am a logbot running on i-0000015e. [23:08:14] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [23:08:14] To log a message, type !log . [23:08:18] !log testing test [23:08:20] Logged the message, Master [23:20:29] Can anyone help me use labs with filezilla? [23:28:31] Can anyone help? I would really like to get my scripts transferred. [23:37:40] FREEGWICK|DEMI: we don't really have docs for filezilla [23:37:45] and I don't have windows [23:37:50] we do have docs for winscp, though [23:37:51] https://wikitech.wikimedia.org/wiki/User:Wikinaut/Help:Access_to_instances_with_PuTTY_and_WinSCP [23:38:59] Ryan_Lane: I am trying to transfer files, not access instances. [23:39:07] that's what winscp does [23:39:46] you're on windows, right? [23:39:59] Correct [23:40:17] winscp is likely your best bet [23:40:33] https://wikitech-test.wmflabs.org/wiki/Special:Preferences#mw-prefsection-openstack [23:40:33] Yay [23:41:15] sweet [23:41:44] heh. it redirects back to NovaKey [23:42:21] Ryan_Lane: Thanks! [23:42:26] FREEGWICK|DEMI: yw [23:42:42] Reedy: this is like a million times better already [23:43:20] Need to fix the returnto on the delete link [23:43:23] Add a link to add keys..