[02:40:01] ok [02:40:04] who's messing w/ wm-bot? [02:40:12] * Jasper_Deng glares at petan [02:40:18] (not b/c I suspect you but b/c you'd know who) [02:40:38] Hm> [02:40:42] *? [05:18:44] was there a special trick to submit jobs from within jobs? [05:20:43] afterok ? [05:21:04] run jsub command from the first jobscript ? [05:21:25] http://nf.nci.org.au/facilities/faq/self_submit.php [05:21:28] gifti, [05:21:54] mind they use qsub but this labs thing uses jsub instead [05:24:45] *gasp* [05:28:27] i don't get it [05:31:28] my problem is that q/jsub is not found when inside a job [05:40:55] gifti: execution hosts don't have those scripts [05:41:39] gifti: I would restructure the approach to not require jobs spawning jobs [05:41:49] ok … [05:43:17] it maybe allowed if admins want to make execution hosts submission hosts, but I guess they don't want. [08:43:03] something is weird with tools-redis, i cannot connect anymore [08:47:37] yeah, grrrit-wm went down [08:47:53] !log tools [01:28:28] * grrrit-wm has quit (Remote host closed the connection) [08:47:56] Logged the message, Master [08:48:42] !log tools Your job 438884 ("lolrrit-wm") has been submitted [08:48:44] Logged the message, Master [08:59:12] !log tools grrrit-wm: 2014-04-20T08:28:15.889Z - error: Caught error in redisClient.brpop: Redis connection to tools-redis:6379 failed - connect ECONNREFUSED [08:59:14] Logged the message, Master [08:59:32] not gonna try anymore [11:12:14] wheeee redis is broken [11:12:36] or is it just my crappy code [11:12:38] hmmm. [11:12:48] nope. [11:12:49] redis.exceptions.ConnectionError: Error 111 connecting tools-redis:6379. Connection refused. [11:16:55] petan ^ [11:49:55] who runs labs-redis [tools.clue@anonymous.user] ? [11:50:32] it keeps floodding rc channels by permanent (dis)connecting [11:50:53] about every 5 secs [11:53:36] Danny_B: see above redis is having issues [11:54:48] Danny_B: see also https://bugzilla.wikimedia.org/show_bug.cgi?id=64150 [11:54:56] still it shouldn't flood channels :-) [11:54:59] still it shouldn't flood channels :-) [11:55:09] it should be killed [11:55:14] until it's fixed [11:55:15] I think it's cluebot [11:55:42] or cluestuff [11:55:44] not sure [12:00:10] Coren, petan, ^ [12:00:31] also a930913 Damianz legoktm ^ [12:01:31] valhallasw: ops is MIA, I sent an email and have complained several times about enwiki_p being broken [12:02:34] last edit was ~51 hours ago [12:21:25] Erm, that's mine. I saw something was broke but quickly got distracted with less important things. The IRC bouncer is borking because of redis you say? [12:22:17] Hi [12:22:32] Any estimate for when the replication lag will be reduced? [12:22:42] Catscan is still giving me old data [12:23:10] Qcoder00: Apparently we're doomed. [12:23:19] ? [12:23:28] Qcoder00: DOOMED! [12:23:35] ? [12:23:43] a930913: well, redis is down, so that might be why it's reconnecting continuously [12:24:02] Qcoder00: Ive been trying to poke people but getting no response [12:24:05] a930913: Now , I'm sure there's a reasonable explantion, Fraser! XD [12:24:12] Qcoder00: Sorry, I just went from really awesome to fixing brokened. [12:24:35] Betacommand: It is Easter Weekend [12:24:37] :( [12:24:42] Qcoder00: I know [12:25:50] Betacommand: well, 2 days replag is not /that/ bad :p [12:26:21] Betacommand: back when /I/ was young, replag was weeks! [12:26:36] Kids today don't know they are born... [12:26:38] valhallasw: ahah, When I was young replag was in the years [12:26:54] (not sure what the actual numbers are, but enwiki replag was huge on the toolserver, and even smaller wikis had large replags) [12:27:05] Betacommand: yes but when replicatio0n means copying stone tablets... XD [12:27:38] Qcoder00: been there done that, In fact Ill be working with some stone later today [12:29:49] valhallasw: thats why I didnt start using the DB servers for data until ~2009 [12:30:42] Danny_B: You still here? [12:32:16] Hi [12:33:50] I have faced an issue at "Pages created" tool. When I try to get the outcome from this link https://tools.wmflabs.org/sigma/created.py?name=AntanO&server=enwiki&ns=,,&redirects=none it shows only 43 pages, but I have created more than 45 pages. It happened after my user rights changed to none to rollback and reviewer. [12:34:27] anton___: where those new pages in the last few days? [12:34:57] https://en.wikipedia.org/wiki/Manmunai_Bridge [12:35:09] it was last page i created today [12:35:28] Yes, Betacommand [12:35:32] the database is about 2 days behind right now [12:35:42] (Read: We're doomed.) [12:35:48] see http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag [12:35:49] ok [12:35:58] 2 days, 4:21:51 [12:36:18] anything newer than that wont show up [12:36:50] ok. So, how to see new ones? [12:36:54] Wait [12:37:01] ok [12:37:13] until the dbs are fixed nothing can be done [12:37:33] ok, thanks for the info [12:37:35] Danny_B: Ok, I think I fixed the IRC problem and the problem I made in fixing the IRC problem. If you can confirm. Thanks. [12:38:05] anton___: Find an operator and bribe them to work over Easter ;) [12:38:37] Gotta try that :) [12:38:40] a930913: not going to happen, they are off gird [12:38:44] *grid [12:39:26] Betacommand: Hence the find part. [12:39:49] a930913: If they are off the grid your not finding them [13:04:00] !ping [13:04:00] !pong [13:04:58] hi Coren [13:05:07] Why is it only en that is lagged? [13:05:13] The others don't look too bad [14:08:03] !log tools tools-redis: /var is full [14:08:06] Logged the message, Master [14:21:31] a930913: yup, it is not reconnecting anymore [14:27:37] !log tools tools-redis: Set role::labs::lvm::mnt and $lvm_mount_point=/var/lib, moved the data around and rebooted [14:27:39] Logged the message, Master [14:27:48] Could someone check if Redis is working again, please? [14:28:09] scfc_de: seems to work [14:28:16] k [16:29:05] Danny_B valhallasw: Not cluebot - it's redis stuff comes off a seperate feed internally from the main cluebot bot, believe it's cluestuff. [16:30:14] * Damianz notes he's not so much around today [17:02:58] Hi. I think it is time to kill my toolserver account and only use the one on labs. Whats the easiest way to download a copy of everything on toolserver to my pc just to be sure if I need something in the future? [17:07:44] MGA73: goodbye_package and burn_bridges [17:07:50] or whatever the scripts are called [17:08:31] Thank you valhallasw [17:08:35] MGA73: [17:08:36] The script "byebye" creates a tarball to download all you have on the [17:08:36] toolserver except for the user store. Get your user-store data [17:08:36] separately - we don't have enough space to pack it for you. [17:08:40] The script "burnbridges" will expire your account and *delete* all your [17:08:40] data except for your redirects to Tool Labs. They will stay intact after [17:08:41] the shutdown. [17:11:34] valhallasw: Do you happen to know how to copy the files from Toolserver to PC? [17:11:40] MGA73: winscp [17:11:53] MGA73: or move the tarball to your public_html [17:33:49] All gone now valhallasw :-( Will miss toolserver... Like your first car... ;-) Thank you [20:15:32] 4Wikimedia Labs 3deployment-prep (beta): Use scap to deploy on apaches - 10https://bugzilla.wikimedia.org/63746#c9 (10Gerrit Notification Bot) Change 123674 merged by Ori.livneh: Configure scap master and clients in beta https://gerrit.wikimedia.org/r/123674 [20:38:07] @replag [20:38:08] Replication lag is approximately 2.12:24:16.8801790 [20:45:57] how do I mount local storage to eqiad instances? I enabled the nfs role and expected that local storage was mounted to /mnt but that did not work... [20:47:56] I see the correct role is role::labs::lvm::mnt [20:49:34] physikerwelt_: Exactly. Does it work for you now? [20:49:50] yes sorry I messed that up with the nfs role [20:50:31] scfc_de: Do you mind if I add that to the https://wikitech.wikimedia.org/wiki/Labs-vagrant page? [20:52:01] labs vagrant requires that the /home/vagrant points to the home directory of the vagrant user [20:52:48] my solution was to create a symlink from /home/vagrant to /mnt/vagrant-user [20:53:41] physikerwelt_: I don't have anything to do with vagrant, so can't tell. Just add and if it's wrong someone'll fix it? [20:54:20] ok thy [20:54:22] thx [21:10:46] the only problem I have is that there is no monitoring at http://ganglia.wmflabs.org/latest/?r=20min&cs=&ce=&m=load_one&c=math&h=math-preview&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=ALLGROUPS running puppetd -tv over and again leads to notice: /Stage[main]/Base::Puppet/Notify[instanceproject: math]/message: defined 'message' as 'instanceproject: math' notice: /Stage[main]/Ganglia_new::Monitor::Service/Service[gangl [21:14:03] physikerwelt_: I've had the problem with a number of hosts in the Tools project that a gmond process was running, but it wasn't apparently the "right" one. So I would look for a gmond process on the host, "kill -HUP" it, rerun Puppet and see if that helps. [21:14:34] I did a reboot... that helped [21:14:50] I'll try your fix on another node [21:15:36] Yeah, reboot should work as well. [21:18:17] oups.. on other nodes I get a different error sudo puppetd -tv Exiting; no certificate found and waitforcert is disabled