[01:43:07] Coren: Ping [01:44:21] or any Labs dev [01:56:38] Coren: Ping [01:56:40] or any labs dev [03:09:55] Any labs devs around? [05:12:57] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 0 minutes) [05:26:26] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 13 minutes) [05:39:59] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 27 minutes) [05:53:19] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 40 minutes) [06:06:43] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 53 minutes) [06:20:11] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 67 minutes) [06:33:36] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 80 minutes) [06:47:06] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 94 minutes) [07:00:35] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 107 minutes) [07:14:04] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 121 minutes) [07:27:32] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 134 minutes) [07:41:02] Warning: There is 1 user waiting for shell: Wangxuan8331800 (waiting 148 minutes) [07:54:28] Warning: There are 2 users waiting for shell, displaying last 2: Wangxuan8331800 (waiting 161 minutes) Muthumani2 (waiting 8 minutes) [08:07:57] Warning: There are 2 users waiting for shell, displaying last 2: Wangxuan8331800 (waiting 175 minutes) Muthumani2 (waiting 21 minutes) [08:21:18] Warning: There are 2 users waiting for shell, displaying last 2: Wangxuan8331800 (waiting 188 minutes) Muthumani2 (waiting 35 minutes) [08:34:47] Warning: There are 2 users waiting for shell, displaying last 2: Wangxuan8331800 (waiting 202 minutes) Muthumani2 (waiting 48 minutes) [08:48:16] Warning: There are 2 users waiting for shell, displaying last 2: Wangxuan8331800 (waiting 215 minutes) Muthumani2 (waiting 62 minutes) [09:01:45] Warning: There are 2 users waiting for shell, displaying last 2: Wangxuan8331800 (waiting 228 minutes) Muthumani2 (waiting 75 minutes) [09:09:03] !rq Muthumani2 [09:09:03] https://wikitech.wikimedia.org/wiki/Shell_Request/Muthumani2?action=edit https://wikitech.wikimedia.org/wiki/User_talk:Muthumani2?action=edit§ion=new&preload=Template:ShellGranted https://wikitech.wikimedia.org/wiki/Special:UserRights/Muthumani2 [09:34:08] /msg ChanServ DROP ##otr 634bf508:eea2518c [09:34:27] o.O [09:34:33] Hi Petan [09:34:37] heya [09:34:51] Any chance of migrating Catscan to Labs [09:34:57] I use it a LOT [09:36:21] why not [09:36:24] do that? :P [09:38:09] I lack the technical comptecne [09:38:55] Qcoder00: who runs catscan? [09:39:22] Magnus Manske [09:39:43] Whose also involed in the Migration process already [09:40:35] oh he should get to it soon I suppose [09:40:39] IIRC he loves playing with new tech? [09:43:03] !log tools petrb: syncing packages on exec nodes to avoid troubles with missing libs on some etc [09:43:08] Logged the message, Master [10:38:38] valhallasw: ! [10:42:17] addshore: see -bag [11:17:32] addshore: 'sup [11:21:13] :> [11:21:19] free much today? :P [11:21:55] no, not really :-) [11:22:04] :< [11:22:06] oh well :P [11:22:11] but I'm available for short questions :-) [11:30:05] valhallasw: was thinking of getting the dump scanner working :P [11:37:54] Coren: http://tools.wmflabs.org/admin/packages.html I synced all packages on exec nodes [11:38:01] except for grub and kernel [11:38:17] valhallasw: ? :< [11:43:30] Ill put the json here for you too see ;p [11:43:30] {"dump":"enwiki","nsinclude":["0"],"titlecontains":"","titlenotcontains":"\/","textcontains":"\\{\\{orphan(s|article)?\\}\\}","textnotcontains":"","textregex":"true"} [11:55:58] addshore: do you think you could get the filtering to work? not necessarily including reading the json, but just reading every step in the dump? [12:01:19] not sure what you mean :p [12:07:50] BTW is there a tools reqwuest feature? [12:13:01] Qcoder00 access request feature :P [12:16:25] OK [12:22:41] legoktm ping [12:34:39] @seenrx yuvip [12:34:40] petan: Last time I saw yuvipand1 they were quitting the network with reason: no reason was given at 10/31/2012 7:04:47 PM (214.17:29:51.9045920 ago) (multiple results were found: yuvipanda}brb, yuvipsnds, yuvipamda, yuvipanda__, yuvipanda___) [12:34:52] @seenrx yuvipanda [12:34:52] petan: Last time I saw yuvipanda}brb they were changing the nickname to , but is no longer in channel #wikimedia-office at 12/5/2012 6:31:33 PM (179.18:03:18.7644100 ago) (multiple results were found: yuvipanda__, yuvipanda___) [12:35:02] o.O [12:35:25] @seenrx panda [12:35:25] petan: Last time I saw yuvipanda}brb they were changing the nickname to , but is no longer in channel #wikimedia-office at 12/5/2012 6:31:33 PM (179.18:03:51.5630360 ago) (multiple results were found: Yuvipandan, pandalch, yuvipanda__, yuvipanda___) [12:35:35] wtf [12:35:57] @seen yuvipanda [12:35:57] petan: Last time I saw yuvipanda they were changing the nickname to zz_YuviPanda and zz_YuviPanda is still in the channel #pywikipediabot at 6/3/2013 11:00:42 AM (01:35:15.2618870 ago) [12:36:04] here we go [12:36:09] @seenrx YuviPanda [12:36:10] petan: Last time I saw YuviPanda they were changing the nickname to zz_YuviPanda and zz_YuviPanda is still in the channel #pywikipediabot at 6/3/2013 11:00:42 AM (01:35:27.5819830 ago) (multiple results were found: YuviPanda_, YuviPanda|Prep, YuviPanda|Storm, YuviPanda|sleeee, YuviPanda|sleep and 1 more results) [12:36:14] aha [12:36:21] case sensitive [13:06:45] addshore: the function that determines whether a page fits the criteria [13:07:06] basically something that says "if page.namespace in " etc [13:07:35] hi manybubbles [13:07:56] hi! [13:08:52] manybubbles: welcome to the Foundation! [13:11:57] sumanah: thanks! I'm happy to be here! [13:12:19] I have so much to read today! [13:12:41] and so much stuff to save for when my laptop comes [13:12:51] :) There is a long but exciting ramp-up period. [13:18:18] yeah - I started with two wiki pages and by the time I'd gone through them I had 17 other pages waiting for me to read. not to mention the handbook. [13:19:19] yep! [13:20:54] So when does the lab "Turn ON THE MAIN REACTOR!" [13:23:56] Qcoder00 we already did :P [13:24:21] Oh right [13:32:33] Qcoder00: list of tools currently running on labs [13:32:34] http://tools.wmflabs.org/ [13:34:07] btw Qcoder00 isn't this what you are looking for http://tools.wmflabs.org/catscan2/catscan2.php [13:34:18] Thanks [13:35:32] whats the interwiki for labds ? [13:35:37] ? [13:36:06] as in w: is wikipedia s: WQikisource etc... [13:36:30] none I know of [13:36:46] Hmm [13:47:08] I'd like to have wikipulse running on tool labs: https://github.com/edsu/wikipulse... Is this possible? [13:48:25] lbenedix depends [13:48:32] how complex you think it is? :o [13:48:58] it looks to me like some kind of server application, which probably should live on separate labs project [13:51:20] !log bots petrb: killing gluster daemon on bots-bnr1 it eats 6gb of ram [13:51:22] Logged the message, Master [14:04:02] Qcoder00: Now witness the firepower of this fully armed and operational Tool Labs! [14:08:28] Coren: You've not built an 'Ultimate Killer Invincible Doomeday Bot' have you? [14:08:30] XD [14:09:08] No, but we do have a door over our thermal exhaust ports. :-) [14:10:19] XD [14:10:38] "Tool LAbs - Max-i-mum Po-wer!!" [14:10:40] XD [14:10:52] Coren: can we do cross db joins yet? [14:11:37] !log tools petrb: removing /etc/logrotate.d/glusterlogs on all servers to fix logrotate daemon [14:11:39] Logged the message, Master [14:11:56] Coren, i had the catgraph test wikis running over the weekend, and everything went smooth so far [14:11:59] http://tools-dev.wmflabs.org:8090/list-graphs [14:12:11] no enwiki yet - too large [14:12:13] Betacommand: Yes, albeit not between wikis and commons or wikidata yet (well, unless they are on the commons shard) [14:12:23] JohannesK_WMDE: Yeay! [14:12:24] so i guess it is time to start testing with a separate vm [14:12:41] Betacommand: Cross wiki ? [14:12:42] Coren: then tools not at max :P [14:12:58] As in things like checking for images on Commons stil on enwiki and so on? [14:13:15] Coren do we need popularity contest? [14:13:27] petan: Why in blazes for? [14:13:30] it is borked anyway and produces annoying error logs [14:13:56] I wanted to remove it :> [14:13:57] petan: I'd say "No." [14:14:01] petan: wikipulse is written in nodejs and needs redis [14:14:20] does it mean "no I like it don't remove it pls" or "no I don't care you can remove it" [14:14:47] lbenedix I still don't know how it works [14:14:56] lbenedix it seems to me that it is a server that listen on some port [14:15:05] lbenedix that doesn't sound like a tool nor bot :P [14:15:14] hence it should be in own project [14:15:23] petan: It means I have no use for it. [14:15:36] its watching the irc-channels and returns the number of edits per time [14:15:38] !log tools petrb: removing popularity contest [14:15:39] Logged the message, Master [14:16:12] Coren so i guess it is time to start testing with a separate vm [14:16:39] can you create one Coren? i can do it if you tell me how ;) [14:16:52] JohannesK_WMDE: It may be. Do you think we can have a chat about it in ~45 mins when I have time to dedicate to you? [14:17:07] ok [14:18:16] right now it runs on heroku... here is the number of edits/min for wikidata: http://wikipulse.herokuapp.com/stats/wikidata-wikipedia/60000.json [14:19:46] Coren if I delete a mail from my inbox, is it deleted from your inbox as well? I mean, is root@tools.wmflabs.org shared inbox? [14:19:51] or it's a list? [14:20:11] root is an alias that explode to all members of local-admin [14:20:20] ok [14:20:27] So you can delete yours without concerns. [14:20:53] Coren, how's the researcher flag coming along? [14:21:50] Cyberpower678: No any faster than it was less than half a non-work day ago when you last asked. :-P It wont be for at least a week or two, at the very least. [14:22:32] Coren, I didn't ask yesterday. [14:22:51] I'll ask next week then. :p [14:25:30] Cyberpower678: Ah, no sorry you're right -- that was TParis. [14:26:09] Are we there? [14:26:14] no. [14:26:21] How about now? [14:26:25] Coren if someone needed to raise the limit for jobs, how can I do that? [14:27:06] petan: You should be, but I'd need a pretty darn good justification to go over 16. [14:27:37] I said "how can I do that" not "can I do that" :P [14:31:51] petan: You're a gridengine manager, so you can edit most config. [14:32:24] ok, but in which specific config is this set? and how to change per user? I couldn't find it :( [14:32:32] also is there some good documentation page for grid? [14:42:21] petan: Not really, though Oracle tends to have fair documentation on SGE 99% of which is applicable here. [14:50:50] is php-curl installed on tool labs? [14:53:32] php5-curl [14:53:38] that is [14:58:22] http://commons.wikimedia.org/wiki/Commons:Village_pump#File_moves_and_inactive_Commons_Delinker - Erm Has delinker migrasted? [15:23:16] become Coren ? [15:23:22] urgh [15:23:26] Coren, ? [15:23:29] you there? [15:23:40] Yes? [15:23:57] Coren, what's up with wikidatawiki_p? [15:24:05] It seems to be incomplete. [15:24:31] It is, there are odd things with its schema that need manual handling. on my TODO for today. [15:24:46] YES! [15:25:21] The edit counter seems to be handling it well but keeps complaining of missing tables. [15:36:30] MaxSem: Hi [15:37:36] May I ask what the status of maps-ceph1 is? [15:38:15] are you managing that through puppet, or just installing things straight out, as it is for test purposes only? [15:40:51] Coren, found another broken database. Reported by a user. [15:40:58] rowiki_p [15:41:10] is incomplete as well and has empty tables. [15:41:27] MariaDB [rowiki_p]> SELECT * FROM user; [15:41:27] Empty set (0.03 sec) [15:45:02] The actual replication for this one seems to have failed. [15:45:43] Cyberpower678: Will report to Asher. [15:45:59] Coren, thank you. [15:50:54] hi apmon [15:51:06] it's not managed by puppet [15:51:39] Do you mind if I just edit the ceph config then to get it up and running? [15:51:48] no objections from me [15:52:16] MaxSem: are you seriously setting up ceph? [15:52:32] I seriously attempted to:) [15:53:07] I'd like to try and see if I can get a proof of principle running for next weeks SotM-US, in case scaling of renderd comes up in discussions. [15:55:10] paravoid: It was pretty easy to get something / ceph running on my laptop, so it shouldn't be too difficult to set it up on labs either? [16:02:25] apmon: It shouldn't be. [16:02:31] legoktm: Around? [16:03:26] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 0 minutes) [16:16:54] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 13 minutes) [16:30:26] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 27 minutes) [16:43:55] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 40 minutes) [16:54:56] Coren, petan: hi [16:55:25] legoktm: Heya [16:55:31] legoktm have a look at /shared/memtest [16:55:41] legoktm there is a python script that works with new memcached [16:55:45] yuvipanda made it [16:55:53] ? [16:56:00] that memtest [16:56:03] it works with it [16:56:04] :D [16:56:07] nice :D [16:56:11] legoktm script does not [16:56:19] so maybe you can inspire by it [16:56:29] legoktm: Do you think you could spend a moment to revise your memory allocations? You seem to have been... overly generous. Like ufaa at 196M/2G [16:57:11] And the pywikibot stuff at 129M/2G and 192M/2G (peak 319M) [16:57:20] Coren: Ah sorry. Yeah sure. [16:57:27] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 54 minutes) [16:57:42] I was testing it but never reduced it once I finished testing [16:57:53] legoktm: It's not critical, but it would be the Right Thing to do. :-) [16:58:33] is there an easy way to see the peak memory a job is using/used? [17:02:31] legoktm: It's shown separately if it's different from the current high on the status page. [17:02:40] legoktm: http://tools.wmflabs.org/?status [17:03:46] ah thanks. lemme try and reduce it [17:03:55] legoktm: It also shows as maxvmem on the usage line of a 'qstat -j ' [17:03:59] ok [17:04:09] lol VMEM: 760M/762M [17:04:18] this is what I call optimization XD [17:04:56] petan: Actually, a well behaved program can run indefinitely at its limit if it has proper error handling for memory allocation. [17:05:32] in that case my bot is well behaving [17:05:46] Java is a good example, it's rather the glutton but once it allocated its arena it won't bust beyond it. [17:10:56] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 67 minutes) [17:14:33] New patchset: coren; "Tool Labs: Grant select on %_p on local DB as well" [labs/toollabs] (master) - https://gerrit.wikimedia.org/r/66563 [17:15:15] Change merged: coren; [labs/toollabs] (master) - https://gerrit.wikimedia.org/r/66563 [17:15:38] Oh yeah Coren, is there an eta for the archive table being available? and is there a bug tracking it? [17:16:35] legoktm: I don't think there's a bug tracking it, and it's a couple weeks before I have a definitive answer. [17:17:40] Hm ok. I'll go file one then :) [17:18:50] Out of curiosity, is it a legal issue or a technical one thats holding it back? [17:19:29] legoktm: I think legal. [17:21:32] legoktm: Legal. [17:21:37] :/ [17:21:43] thanks though [17:22:18] I hope they say yes. Otherwise you can kiss the deleted edits count goodbye. [17:23:01] legoktm: I can tell you offhand that, if it is going to be okayed at all, it will be on a per-case basis and likely require approval with a process similar to that of getting the researcher right. [17:23:23] CP678|iPad: I still don't get why you think that. Edit count - not deleted edits = deleted edits. [17:23:40] RIP Toolserver :( [17:23:58] TS went down? [17:24:23] Coren, deleted edits = access to archive [17:24:26] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 81 minutes) [17:24:44] edit count = revision [17:24:55] Oh no lol, I was just making a bad joke [17:28:04] Filed https://bugzilla.wikimedia.org/show_bug.cgi?id=49088 [17:28:12] bbl lunch [17:28:13] legoktm, I saw/ [17:28:18] You CC'd me. [17:28:25] Cyberpower678: Nope. select user_name, user_editcount-(select count(rev_id) from revision_userindex where rev_user_text='CorenSearchBot') as deleted_edits from user where user_name='CorenSearchBot'; [17:28:55] user_name: CorenSearchBot [17:28:56] deleted_edits: 42701 [17:29:02] See? [17:29:34] And, honestly, if you're doing an edit counter you already have the counts of undeleted edits handy. :-) [17:30:00] Hey guys. I'm looking to get a public IP address for a project I'm managing. Can anyone help me out? [17:30:17] halfak_: sure. what project and what do you need it for? [17:30:48] Coren, could we still get access to the archives though? [17:31:09] "snuggle". I'm working on building an ecosystem of tools like this: snuggle.instance-proxy.wmflabs.org. I'm just starting to gain users. [17:31:20] So, it will be a web based wiki tool. [17:31:38] CP678|iPad: Like I said, if it is possible at all, it will require researcher-like access. It's not going to be by default, and it will require a case-by-case okay. [17:32:09] halfak_: were you looking to have a better dns name? [17:32:14] Ok. Who would be the ones to okay it? [17:32:30] Yes. That too. [17:32:43] I was hoping to set up a few dns names eventually. [17:32:50] oj [17:32:52] err [17:32:53] ok [17:33:06] snuggle as in STiki and Huggle? [17:33:30] :) Similar kind of tool, yes. However, this one is to find good newcomers, not revert the bad ones. [17:34:05] halfak_, so you're promoting vandalism? [17:34:13] no [17:34:15] what? [17:34:24] not revert the bad ones? [17:34:37] :\ The tool is for finding good newcomers. [17:34:46] Huggle and STiki are for reverting bad newcomers. [17:34:56] halfak_: upped your floating IP quota to 1 [17:35:23] we have good newcomers? I thought only well established editors were ok? ;) [17:35:28] CP678|iPad: I don't know; that's the kind of thing Legal will have to decide. I would expect it'll end up similar to http://en.wikipedia.org/wiki/Wikipedia_talk:FAQ/Research [17:35:45] Excellent. Do I just allocate an IP now? [17:35:56] halfak_: I like the idea of a tool that finds good newcomers, rather than just reverts people :) [17:35:59] halfak_: yep [17:36:24] Awesome. Thanks for your help! I'm back to hacking. [17:36:34] halfak_: hm [17:36:45] it looks like this takes the wiki user's username and password? [17:37:01] Yes. This is something I struggled with. Is there a better way? [17:37:16] not yet, but soon, yes [17:37:28] that said, you're not allowed to do this via the labs ToS [17:37:29] As soon as it's ready, I'll be on board. [17:37:34] :\ [17:37:41] That's a pretty big deal breaker. [17:37:49] you need to wait till oAuth or OpenID are available [17:37:55] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 94 minutes) [17:37:56] halfak_: The alternative, OTOH, will be okay [17:37:57] those are allowed via the ToS [17:38:14] So Huggle couldn't run on Labs? [17:38:34] not unless it uses OAuth, or OpenID for authentication [17:38:53] OAuth is targetted for this quarter [17:38:57] OpenID should as well [17:39:01] *should be [17:39:08] Wow. That's not OK. I guess I'll need to move off of labs then. [17:39:08] halfak_: Huggle is a bad comparison, though, because it works on the /user's/ equipment [17:39:09] *sigh* [17:39:15] halfak_: this isn't just a labs thing [17:39:46] halfak_: if we see applications asking for credentials for users, they'll likely be asked to stop [17:39:51] Coren, Huggle has a server backend. Most everything for Snuggle exists in the user's browser. [17:39:54] anywhere on the internet [17:40:09] What do you propose that I do, Ryan? [17:40:31] does this app do anything on a user's behalf? [17:40:40] like edit, or give barnstars, etc? [17:40:44] Only what the user commands it to do. [17:40:47] * Ryan_Lane nods [17:40:55] you'll need to add OAuth support [17:41:06] But there's no OAuth available. [17:41:11] halfak_: Nevertheless, the idea is that the credentials live on a server that is not under the user's control. That's a very big no-no. [17:41:17] hm. no csteipp in the channel [17:41:49] Coren, the credentials are not stored on the server. [17:41:49] halfak_: I'm asking someone to join the channel [17:42:03] Ryan_Lane: what's up? [17:42:16] csteipp: halfak_ is a great guinea pig user of OAuth: http://snuggle.instance-proxy.wmflabs.org/ [17:42:25] halfak_: Wait, the edit doesn't come from the tool, but from the *end user's* client? [17:42:59] Oh the edit comes from the tool. The user shares cookies with the server to make the edit. [17:43:09] The server can see the cookies which act like an auth token. [17:43:35] halfak_: Which domain's cookies are shared? [17:44:01] Coren, not a viable otpion [17:44:03] halfak_: snuggle uses the username/password combo to log into wikipedia, then stores the cookie, correct? [17:44:06] Sorry about that. [17:44:14] Yes Ryan, that's right. [17:44:37] Cyberpower678: What isn't? For the edit counter, you mean? [17:44:39] the issue here is that the username/password combo are passing through snuggle [17:44:39] yes. [17:44:51] because snuggle could use it to steal project credentials [17:44:57] Ah, yes. This is coming back to me :) [17:45:02] Yes. Technically, it could. [17:45:08] Cyberpower678: Could you elaborate? [17:45:13] csteipp: this is the original use-case I was mentioning months ago :) [17:45:13] Yep, so yeah, this is perfect use for oauth [17:45:41] scfc_de, I'm talking about retrieving deleted edits. [17:45:47] Is Oauth ready? I was hoping to release version 1.0 today. [17:45:55] halfak_: for this to continue operating right now we're going to need to get legal onboard with this [17:45:58] Cyberpower678: I'm afraid that, short of a researcher-like okay, archive is not going to be generally available. TS should never have had it in the first place, and any tool that relies on it will have to either get blessing or find some other MO [17:46:16] halfak_: No, not yet. We're targeting to have at least an initial version out by the end of the month... but it's not up yet [17:46:19] halfak_: at mininum, anyone with root level access to the snuggle project will need to sign an NDA [17:46:34] OK. That's cool because it is me and I have. [17:46:45] I'll not allow any others into the project until this is resolved. [17:46:54] Coren, which is why I will be patiently waiting to hear from legal. :-) [17:46:55] ok. let me get legal onboard otherwise, so they know what's up [17:47:15] we can likely make an exception to the policy for this until OAuth is ready [17:47:17] What's the project? [17:47:18] Cyberpower678: Why doesn't Coren's query work for you? [17:47:20] OK. Thanks for your help Ryan. [17:47:44] scfc_de: There's a bug in it. :-) [17:47:52] Incoming little flood. [17:47:56] scfc_de, [17:47:58] MariaDB [enwiki_p]> select user_name, user_editcount-(select count(rev_id) from revision_userindex where rev_user_text='MBisanz') as deleted_edits from user where user_name='MBisanz'; [17:47:59] +-----------+---------------+ [17:47:59] | user_name | deleted_edits | [17:47:59] +-----------+---------------+ [17:47:59] | MBisanz | -16364 | [17:48:01] +-----------+---------------+ [17:48:24] scfc_de, that's why [17:49:21] * Coren tries to figure out how that is possible. [17:50:31] Cyberpower678: Thanks for explaining. I think it should be possible to provide a view/snapshot of archive that only contains the aggregated information that is needed for editcounters. [17:51:13] Coren: http://www.mediawiki.org/wiki/Manual:User_table#user_editcount [17:51:27] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 108 minutes) [17:51:51] scfc_de: Ah, I was aware, but not that it could deviate /that/ much. [17:52:45] So someone just needs to run initEditCount.php on the cluster. Can't take long :-). [17:53:42] Actually, no, initEditCount.php only counts revision, not archive. [17:54:34] Ryan_Lane: Do you know if new files are added atomically to /public/datasets/public? Otherwise, who's the one to ask? [17:55:07] As long as I get an accurat AND realistic number. [17:56:34] scfc_de: I doubt they are added atomically [17:56:41] CP678|iPad: Actually, that should really be pushed to the API [17:56:59] scfc_de: afaik they are copied directly into place [17:57:14] maybe it's using rsync and doing a move to the permanent spot? [17:57:22] That would mean hundreds of API queries. [17:57:27] ariel would be the person to ask [17:57:36] apergos on irc [17:57:44] Ryan_Lane: There's some rsync logs there, and rsync does atomic I believe. I look for Ariel, thanks. [17:57:51] Cyberpower678: FYI: The bug with rowiki is that rowiki is on S7 and thus does not exist yet. :-) [17:58:04] Ok. [17:59:13] CP678|iPad: How do you figure? It'd just be a query/auprop [18:00:51] Moving everything to the API for hundreds of user, means hundreds of queries. [18:01:32] CP678|iPad: The API has generators and often you can specify more than one argument. [18:02:33] Crap. The siren in my area is going off. [18:04:55] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 121 minutes) [18:06:29] Really?!? [18:06:44] CP678|iPad: are you safe? [18:06:55] A thunderstorm is going through my area and they decide to do the siren tests NOW? [18:07:00] WTF? [18:07:42] stay safe, CP678|iPad [18:07:45] maybe it isn't a test [18:08:00] The siren is off now. [18:09:08] I plan to stay safe though. [18:18:28] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 135 minutes) [18:18:56] csteipp: How can I make sure that I get on OAuth at the earliest opportunity? I'm interesting in being a tester if you think that would be helpful. [18:19:46] halfak_: Definitely! [18:21:42] csteipp: Where can I learn more about progress and ways to help out? [18:25:38] halfak_: https://www.mediawiki.org/wiki/Auth_systems/OAuth [18:25:42] I'm keeping that pretty up to da [18:25:45] date [18:26:22] *watched* [18:31:57] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 148 minutes) [18:40:25] Sigh, read-only filesystem [18:40:34] Ryan_Lane: IIRC this means gluster needs a kick? [18:44:01] marktraceur: yep. which project? [18:44:05] orgchart [18:44:13] maybe after I get these OSM changes in I'll work on moving stuff to nfs [18:44:25] marktraceur: home or /data/project? [18:44:32] /data/project [18:44:36] ok. one sec [18:45:05] Ryan_Lane: Speaking of, are we going to start moving things to NFS server gradually, or will you be planning an all-at-once outage? [18:45:19] all-at-once for the non-critical projects [18:45:29] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 162 minutes) [18:46:02] Ryan_Lane: FYI, deployment-prep has done the puppet class dance with no issues, so it's known to work right. [18:46:09] yep [18:46:21] the issue with the puppet class is puppetmaster::self [18:46:26] Ryan_Lane: That said, I'd be more comfortable with having the spare server's ram replaced before we moved too many other things. [18:46:40] that *still* isn't done!? [18:46:43] -_- [18:47:15] https://rt.wikimedia.org/Ticket/Display.html?id=4939 [18:47:16] Nope. [18:47:30] Perhaps we should escalate this? [18:48:31] yep [18:53:29] Ryan_Lane: Any luck? [18:58:37] marktraceur: sorry. got distracted. one sec [18:58:54] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 175 minutes) [18:59:38] Mmkay. [18:59:38] marktraceur: ok. it may be fixed. which instance? [18:59:48] Er, there should be only one [18:59:50] orgchart [18:59:56] heh. ok [19:00:40] Seems good [19:00:46] cool [19:12:19] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 189 minutes) [19:22:51] Is there any way to use an external program to edit a tool file? [19:25:33] a930913: what do you mean? [19:25:48] Warning: There is 1 user waiting for shell: Alvaro Lopez Tijeras (waiting 202 minutes) [19:25:57] Betacommand: Well you need to "become" a tool. [19:26:13] So even if notepad++ or equivalent worked with keys... [19:31:20] a930913: WinSCP [19:31:41] edit the files on your local machine and copy them to labs [19:32:33] Betacommand: How does that get to /data/project/*? [19:32:53] Betacommand a930913: WinSCP [19:33:14] Windows Secure Copy [19:35:10] * addshore uses WinSCP also [19:35:17] * Coren notes that, if one really insisted, one could use gvim or something -- I do allow X forwarding. :-) [19:35:38] sshfs might also work. [19:35:55] X forwarding? :D [19:35:57] hehe [19:40:17] It doesn't like my key. [19:42:00] Coren: WinSSHFS didn't lke my key either. [19:42:38] I might sshfs from my server, and then sshfs via that... [19:42:39] a930913: You probably need help from someone who has more windows expertise than I there. [19:43:00] WinSSHFS ? [19:45:01] I'm working on setting up HTTPS through a public IP address with apache. I've configured the instance for apache2 and the certs. Where do I find the relevant ".crt" file. [19:45:12] ^^ Anyone with SSL on labs experience [19:51:40] halfak_: well, there's no "real" cert/key to be used by default [19:52:19] It's a wild-card for *.wmflabs.org, right? [19:52:29] yeah, and it's a self-signed [19:52:37] but it needs to be installed via puppet [19:52:54] there's a puppet class that can be used [19:52:57] Does that mean that browsers get a warning? [19:53:02] yep [19:53:37] Oh. Hmm... So, I configured the two "certificates" options. Is there anything else I need to do to have the certs available? [20:07:24] Betacommand: So I got WinSCP working, but how can I edit tools? [20:07:54] a930913: You should have write access to /data/project/thetool [20:09:36] Coren: Thought as much, but as to how, is beyond my devop capabilities. [20:10:13] I don't know how winscp works, but I expect you can type a path in somewhere. :-) [20:10:25] Or at the very least follow .. out of your /home :-) [20:10:32] Ryan_Lane: Do you have time now to continue talking about ssl certs? [20:10:40] Coren: As in the group permissions stuff goes over my head. [20:11:14] halfak_: yeah [20:11:35] halfak_: so, the wildcard will throw browser warnings, it's not meant to be used for real things [20:11:58] Gotcha. What would you recommend if I don't want to cause warnings? [20:12:19] well, a real certificate would need to be purchased [20:12:28] Gotcha. Is that something I should look into myself? [20:12:40] you can't buy one for wmflabs.org [20:13:12] you could register your own domain and buy a cert for that. or we'll have to figure out internally how to handle this [20:13:30] Sure. I'm looking for your recommendation on how I should proceed. [20:13:44] Is this an unsolved problem? (A tool that needs ssl and has users) [20:14:24] unless you want to use tools.wmflabs.org/snuggle [20:14:30] that one project has a real cert [20:14:44] Oh cool. How did they do it? [20:14:49] but you'd also need to move your stuff into the tools project [20:15:06] the tools project is run by foundation staff and volunteers who have signed NDAs and we bought a cert [20:15:06] Coren: How do I give myself permissions to write to my tool? [20:15:40] a better solution for this would be for us to have a generic ssl proxy for labs [20:15:43] a930913: You should have that by default. What is your tool's name? [20:15:43] Gotcha. Well, there's actually already been a cert purchased for snuggle.wikimedia.org. [20:15:52] * Ryan_Lane nods [20:15:59] we can likely buy one for this project too [20:16:20] Coren: a930913 -> cluestuff [20:16:24] we need a transparent proxy that can allow self-service backend creation [20:16:58] "give me a snuggle.wmflabs.org host that points to snuggle.pmtpa.wmflabs:80" [20:17:20] and it would make snuggle.wmflabs.org:443 and snuggle.wmflabs.org:80 -> snuggle.pmtpa.wmflabs:80 [20:17:50] a930913: You have write permission there. I've heard rumors that winscp is trying to be too smart by default but that there is an option to "defer permission checks" that tells it to believe the server's rather than its own idea of what you are allowed to write to -- that might be your problem. [20:18:08] What would it take to set up such a service? [20:18:28] time and engineering work [20:20:22] Ok... So what should I do in the meantime? [20:21:23] probably not offer ssl until we can figure it out [20:21:30] Bummer. [20:21:42] probably would have been a good idea to bring these things up before launch ;) [20:21:48] Say, I was told (forget by who) that instance proxy could handle ssl. is that wrong? [20:21:54] can it? [20:21:55] Oh I did. [20:22:05] I asked about instance proxy and was told it would just work. [20:22:55] I can grep the channel logs, but that seems unproductive. [20:23:11] hm. wikitech is having issues? [20:24:03] weird. it's back. I wonder what happened there [20:27:25] Oh... I just re-read and it appears that Coren was talking about tools having https support. [20:28:05] What is the snuggle project? [20:28:09] Ryan_Lane, is it possible for me to have a labs instance (with self-hosted puppet) that doesn't include simple-mail-sender? [20:28:23] http://en.wikipedia.org/wiki/WP:Snuggle [20:29:01] andrewbogott: yep. you can override the classes that are being set in ldap via site.pp [20:29:17] halfak_: I think it's probably pretty easy for us to add this to the instance proxy [20:29:21] we just need to get an ssl cert for it [20:29:48] Ryan_Lane: I removed it from class standard { in /etc/puppet/manifests/site.pp but it seems to still be installed. [20:29:49] but you'll need to use the instance-proxy url and not your custom hostname [20:29:55] Why not just get a signed wildcard for *.wmflabs.org? [20:30:02] andrewbogott: I mean by adding a node [20:30:18] halfak_: we could do that as well, but that's somewhat terrifying [20:30:24] Instance proxy is cool with me in the short term. [20:31:00] Ryan_Lane: much less terrifying than no ssl. [20:31:11] But you're right. [20:31:27] I'd be happy to fund a cert for snuggle.wmflabs.org if it would help. [20:33:50] Is it not possible to have a certificate capable of being used to sign other certs for subdomains of wmflabs.org? [20:34:06] Ryan_Lane, does site.pp inherently override the ldap node definition if present? [20:34:17] Krenair: hm. maybe? [20:34:21] andrewbogott: yep [20:41:25] * Coren hunts down whoever is getting the -login load to the high 100s with trout in hand. [20:41:58] Cyberpower678: That'd be you. [20:43:13] Huh? [20:43:22] Coren: ah, that explains why logging in takes ages :-) [20:44:00] Cyberpower678: you weren't using nano, were you? [20:44:09] Wow: load average: 254.96, 195.77, 108.93 :-). [20:44:23] I thought that's been fixed. Everything gets sent to jsub. [20:44:41] Cyberpower678: Apparently, all at once, and without little success. [20:44:47] With little success. [20:45:09] pm me what you're saying? [20:45:14] *seeing [20:45:21] Coren: It appears I'm unable to ssh to tools-login.wmflabs.org (I see the MOTD, but prompt does not appear) [20:45:23] any known issue? [20:45:50] Krinkle: load average: 284.42, 221.12, 126.37 [20:46:00] beta labs doesn't answer right now as well? [20:46:02] That doesn [20:46:06] That doesn't mean anything to me [20:46:12] units? of what? [20:46:28] Krinkle: On it [20:46:33] OK [20:46:48] Krinkle: units of full CPU usage [20:46:51] Krinkle: Processes waiting to be executed. [20:46:52] (but not quite) [20:46:59] scfc_de try tools-dev [20:47:26] Cyberpower678: wtf? [20:47:29] Actually, the problem seems to be NFS related. It looks like I've lost contact with it -- but it's up and running and without load. [20:47:33] Coren, what's running under my name? [20:47:34] Smells like network woes. [20:47:42] Krinkle: basically, 1 means a single CPU is fully loaded, so everything up to the number of processes is OK, everything higher than that means processes have to wait for eachother [20:48:04] Krinkle: https://en.wikipedia.org/wiki/Load_average#Unix-style_load_calculation [20:48:35] Betacommand, I'm honestly clueless. My cron sends everything to jsub. [20:49:05] note the "However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system. " [20:49:27] Logging in on tools-dev is stuck after "Last login: ". [20:49:43] My ^C to a python script is hanging…. [20:50:59] Coren, will you please tell me what's running under my name? [20:51:18] qstat ? [20:51:35] Cyberpower678: Hang thight, Cyberpower. A couple dozen cron jobs, but you're not the only one so it seems the problem is deeper than this. [20:51:53] Coren, wait what? [20:51:56] How? [20:52:10] Cyberpower678: 22:31 < valhallasw> note the "However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system. " [20:52:11] My crontab uses jsub now per your instruction. [20:52:24] load is now 383 and climbing. [20:53:06] Coren -dev doesnt work either [20:53:07] crontabs with "* * * * * jsub -once"? [20:53:15] I think problem is nowhere on -login [20:53:33] Like I said, something seems amis with the NFS server. [20:53:37] indeed [20:53:53] hashar: around? any ideas about https://bugzilla.wikimedia.org/show_bug.cgi?id=49078 ? [20:54:02] how many projects use that nfs server? [20:54:07] is it possible to restart the services? [20:54:10] scfc_de, yes. [20:54:17] It's _working_ [20:54:24] how can you know that :P [20:54:34] do you see any traffic there? [20:54:45] chrismcmahon: no idea :-D [20:55:02] I can ssh to both servers but my /home isnt mounted [20:55:04] chrismcmahon: it is surely in my bugs backlog, I haven't processed it today though [20:56:00] hashar: that one and https://bugzilla.wikimedia.org/show_bug.cgi?id=48203 are hanging up me and the Language team right now [20:57:18] chrismcmahon: yeah that one is most probably because MediaWiki does not send the purging cache requests back to squid [20:57:26] chrismcmahon: or something similar to it [20:57:58] petan: I'm logged in with root, and everything is actually (oddly enough) fine. [20:58:10] petan: Give me a bit to figure out what's going on, please. [20:58:10] of course root doesnt have home on nfs [20:58:15] chrismcmahon: I can't really handle all the requests by myself :] [20:58:17] Cyberpower678: so every minute you run jsub, and because NFS is down that creates another process that increases load by one, ad nauseam [20:58:31] Cyberpower678: or something like that. [20:59:00] Cyberpower678: in any case, probably not your fault, although it's better to only submit jobs every hour, or every five minutes if really necessary [20:59:20] valhallasw: Yep. We should note that in the documentation, and suggest that for continuous jobs, hourly is enough. [20:59:42] for continuous jobs, one should use -continuous ;-) [20:59:47] valhallasw, huh? [20:59:53] or whatever the flag is. [21:00:03] I just followed Coren's instruction and he was happy. [21:00:13] Coren I just logged to -db which is using gluster so I confirm its nfs for sure :o [21:00:51] What's NFS exactly? [21:01:10] Cyberpower678: network file system - instead of a local disk, your home directory is on a network drive. [21:01:23] oh. [21:01:38] valhallasw: Yes ... So what are the use cases to run a job all the time, but only one instance, without -continuous? Hmmm. [21:02:03] Cyberpower678: Not only *your* home directory, all of ours :-). [21:02:08] scfc_de: what about something you want to run not more often than every five minutes? [21:03:05] Aha. [21:03:11] valhallasw: Okay, but then we should be safe to discourage "* * * * *"? [21:03:33] scfc_de: I think most * * * * * cases would be better solved with the continuous option [21:03:54] Something you want to run more often than every five minutes should be continuous and sleep internally. You don't want the overhead of a new job. [21:04:06] right. [21:04:30] Coren: So: "Aha."? [21:08:16] halfak_: hm. It seems that https for the instance proxy works [21:08:41] halfak_: I bet you need to have https configured on your instance [21:10:07] I take that back, I don't see it listening on 443 at all [21:11:26] Yeah. 443 just times out. [21:11:33] yep [21:11:38] I'll look into making that work [21:12:11] Thanks for making time. :) [21:14:13] Ryan_Lane, so now I have a node definition that /only/ defines exim::config, and when I run puppet it conflicts with the exim::config defined in exim::simple-mail-sender. What am I missing? [21:14:27] hm [21:14:42] andrewbogott: dunno. i haven't messed with the exim config much.... [21:15:07] andrewbogott: how are you doing the node config? [21:15:10] This isn't an exim question, though, it's a puppet question. I'm not inlcuding exim::simple-mail-sender in my node, and yet puppet is including it. [21:15:26] In site.pp: [21:15:27] node rt-testing-dev3 { [21:15:27] class { "exim::config": queuerunner => "queueonly" } [21:15:28] } [21:15:30] ah [21:15:35] you need to do a fully qualified name [21:15:45] and it needs to be the i-xxx name [21:15:46] I think [21:16:04] It definitely picks up changes when I alter that definition... [21:16:07] But I will rename, hang on [21:16:27] I guess, obviously it is picking it up, since it causes a conflict [21:16:40] I still don't get it. I was told that jsubbing everything would no longer place a load on -login [21:16:50] So I jsubbed everything. [21:16:58] So why is it still causing problems. [21:17:05] Cyberpower678: Not your fault. [21:17:19] Cyberpower678: You're a symptom, not the cause. [21:17:37] Coren, well that's better, I think. [21:17:41] what is cause? is there any plan how to fix it? [21:17:58] petan: On it now, debugging. [21:18:05] ok [21:18:22] Cyberpower678: But then again, your symptom wouldn't be so egregious if you were a bit easier on cron. :-) [21:18:40] Coren, I'm no linux expert. [21:18:54] * Cyberpower678 is linux-0 [21:19:39] Coren, At least it quickly showed you that something broke. :p [21:22:06] Ryan_Lane, yeah, with a fqdn the behavior is the same, and the i-xxx name doesn't work at all [21:22:11] hm [21:22:27] oh [21:22:27] valhallasw: https://wikitech.wikimedia.org/w/index.php?diff=72785&oldid=72740 - okay? [21:22:34] Coren, my terminal froze up. [21:22:41] andrewbogott: did you restart the puppetmaster daemon? [21:22:44] on the instance [21:23:02] Doing that now [21:23:12] nope, no change [21:23:13] Something is _hamerring_ on the NFS server. [21:23:19] From -login [21:24:06] Aha! Dozens of simultaneous git pulls! [21:24:44] Coren, pulling what? [21:24:57] Can't tell offhand, not on the command line. [21:25:11] Coren, my command line is frozen now. [21:25:15] I think I'm stuck having to reboot -login [21:25:23] Cyberpower678: Yeah, you're waiting on NFS [21:25:52] Which is frozen. ;/ [21:25:54] :/ [21:26:44] Webtools are also broken. [21:26:46] !log tools Rebooting -login; it's trashing. Will keep an eye on it. [21:26:57] Cyberpower678: That is all symptom of the same cause. [21:27:12] AHhh. TS all over again. [21:27:31] On labs. [21:28:51] Coren: Does the shutdown really take so long? My terminal received the "going down for reboot" message, but the connection is still there, I believe. [21:30:53] scfc_de: looks good to me [21:32:04] Reboots take forever. [21:32:14] valhallasw: Or is "jstart" more appropriate? Hmmm. [21:32:59] Yeah, it's the command listed to use :-). I'll change it. [21:34:43] hm when the server got rebooted, my open ssh connections just hung [21:34:59] Broadcast message from root@tools-login [21:35:00] (unknown) at 21:33 ... [21:35:00] The system is going down for halt NOW! [21:35:00] Power button pressed [21:35:03] ? [21:35:51] 21:33? Mine said 21:26. [21:36:37] Coren why reboot? you could just restart automount? [21:36:48] also all of the boxes have this problem... not just login [21:36:58] petan: Symptom, not cause. [21:37:50] Something on -login managed to wedge a lock, it appears. [21:38:03] It'll take some time to sort itself out. [21:38:58] Coren: what's up with the nfs server? [21:39:40] Ryan_Lane: Still in debugging. Will emerge with postmortem in a few. [21:39:45] ok [21:41:47] Things should be unwedging now. [21:42:00] Coren: Is it okay to use -login now? [21:42:21] scfc_de: Should be. [21:43:02] Exponential backoff on NFS timeouts probably means some things will be stuck for a couple of minutes on boxen I haven't rebooted though. [21:45:34] That has been an amusing exercise in testing failiure modes of the NFS server. [21:46:44] all failure modes of NFS suck [21:47:02] Bah, now I've lost everything I was working on :'( [21:47:23] a930913: Sorry about this, but -login was pretty much dead. [21:47:41] Coren: I'm not blaming everyone. [21:47:55] a930913: Doesn't mean I'm not very much annoyed at it. [21:48:00] Mainly blaming myself I guess for relying too much on RAM :p [21:48:46] On the positive side, the grid behaved exactly as expected/hoped. While disk I/O was frozen, things simply picked up where they left off once it came back. [21:51:34] Ryan_Lane: I'm a little dumbfounded. While the symptom was easy enough to find, I never satisfactorily found a clue to the true underlying cause. [21:51:57] for NFS going away? [21:52:02] I need food so the headache stops, then I'll investigate. [21:52:06] * Ryan_Lane nods [21:52:39] Bah, what's the mysql flag for using the replica.my.cnf? [21:53:07] Ryan_Lane: Yeah, apparently *something* managed to wedge the mount point in a way that locked every other process out. Things started piling up from there. [21:54:11] fun [21:54:37] On the plus side, hard NFS mounts meant that when things returned everything was back to "just fine" [21:55:26] a930913, so this was your fault? [21:55:33] Aw, my scrollback queries in the db are gone :( [21:55:59] Cyberpower678: I knew I shouldn't have reversed the polarity... [21:56:14] a930913, what? [21:56:43] Ryan_Lane, how do I delete a task? [21:56:51] delete a task? [21:56:55] where? [21:57:02] from qstat. [21:57:03] Cyberpower678: I reversed the polarity to the flux capacitors. [21:57:07] qdel ## [21:57:13] a930913: hueheheheh [21:57:17] a930913: "--defaults-file=/home/.../replica.my.cnf". [21:57:37] a930913, how far forward did you get lurched in time? [21:57:51] scfc_de: That's the one. I keep missing the "s" in defaults :( [21:58:00] Cyberpower678: I didn't check. What time is it? [21:59:05] a930913: It sucks much more that "mysql --defaults-file ~/replica.my.cnf" (space instead of equal) gives "mysql: unknown option '--defaults-file'". [21:59:44] I just copied it to .my.conf to avoid that mistake again. [22:00:34] a930913: :-) [22:01:25] * a930913 rolls his SQL dice. [22:01:33] * a930913 wants an even number. [22:01:45] Oh good, it is even. [22:02:27] Which meant that given the last was odd, there aren't two competing tasks flooding the db. [22:10:16] why does labs hate me? [22:11:17] Coren|Food what is tools-exec-cg [22:12:24] Coren|Food plz use !log when you change stuff :P [22:16:13] Betacommand: in which way does it hate you? [22:16:22] Coren|Food: Once you're fed: I have a couple of "DBI connect('database=itwiki_p;host=s2.labsdb;mysql_read_default_file=/home/scfc/.my.cnf','',...) failed: Can't connect to MySQL server on 's2.labsdb' (110) at /home/scfc/bin/replagstats line 22" mails from cron which seem to indicate that MySQL connections were also down (or MySQL has a strange way of expressing some other error). [22:16:25] Ryan_Lane: role::puppet::self is definitely installing things defined in site.pp /and/ things from ldap. So I'm stuck, can't get exim off this instance :( [22:16:30] Ryan_Lane: after crash all my bots died [22:16:36] andrewbogott: hm. [22:16:49] Betacommand: ah [22:16:51] Betacommand: Zombies? [22:17:17] a930913: I wish, I cant seem to resurrect them atm [22:17:54] Betacommand: What are you trying to do, and what error do you get? [22:18:13] scfc_de: all of betas jobs vanished from the grid :/ [22:18:17] scfc_de: jsub -continuous -cwd -mem 500M -N stalkboten php persist.php [22:18:20] Betacommand: WinSCP - "--------------------------- [22:18:20] Error [22:18:20] --------------------------- [22:18:20] Copying file 'C:\Users\ARIELS~1\AppData\Local\Temp\scp17211\data\project\cluestuff\public_html\wave\wave.js' failed. [22:18:23] --------------------------- [22:18:26] scp: /data/project/cluestuff/public_html/wave/wave.js: set times: Operation not permitted [22:18:29] --------------------------- [22:18:32] OK Abort Help [22:18:34] --------------------------- [22:18:37] ... [22:18:37] a930913: set your file permissions [22:18:39] a930913: pastebin! [22:18:41] ewww a930913 pastebin [22:18:45] and {{trout}} [22:18:50] I copied one line. [22:18:59] no you didnt :P [22:19:02] WinSCP must have hijacked. [22:19:36] Im submitting jobs and they just vanish [22:19:42] qstat shows nothng [22:19:52] Betacommand: what job id does it give when you submit? [22:20:11] andrewbogott: mind a review when you get a chance? https://gerrit.wikimedia.org/r/#/c/66303 [22:20:13] Betacommand: Coren|Food told me that the permissions were fine for me to write. [22:20:25] andrewbogott: I'll see what we can do about exim [22:20:28] 214909 [22:20:33] ideally we wouldn't need to stick that into ldap [22:20:46] a930913: they arent [22:21:02] a930913: Did you set the "delayed permissions check" Coren talked about. [22:21:16] scfc_de: I couldn't find it. [22:21:41] Betacommand: odd, it shows no infomation for the job at all [22:21:59] addshore: Like I said labs hates me [22:22:02] does anything appear in your error or output files? [22:22:04] :D [22:22:08] nothing [22:22:18] output is just the first few parts [22:22:21] petan: Could you "qacct -j 214909" on tools-master, please? [22:22:39] sure [22:23:30] /tmp/scfc [22:23:34] is that [22:24:08] No [22:24:13] failed 100 : assumedly after job ? [22:24:22] should be persist.php [22:25:00] addshore: why is it failing? running it without grid works [22:25:51] Betacommand: Which bits do I need to set to get the right permissions? [22:26:07] Betacommand: /data/project/betacommand-dev/stalkboten.out is the output? [22:26:14] scfc_de: yes [22:26:21] hmm [22:26:31] a930913: The permissions on your directory are fine, don't know why WinSCP doesn't see that. [22:26:54] a930913: try restarting your connection, your client my not have worked out you are in the group [22:27:15] or if you have the option select to ignore permission errors (that works on winscp [22:27:27] Betacommand: And what's the command you use to start the bot? [22:27:36] addshore: Nothing has changed since connection, and it didn't work before the crash either. [22:27:37] scfc_de: one sec [22:28:53] Exit code 137 = 128 + 9, could mean SIGKILL. Hmmm. [22:29:27] scfc_de: jsub -continuous -cwd -mem 500M -N stalkboten php persist.php [22:29:45] Ive copy/pasted that before without issue [22:29:47] Fatal error signal "n" [22:30:06] addshore: ?? [22:30:30] well, it appears that that particular task got killed? O_o [22:30:46] Betacommand: what happens if you try to submit to a new job? [22:31:15] addshore: Ive tried starting several of these stalk bots each fails [22:31:55] Betacommand: Is persist.php readable somewhere? [22:32:18] scfc_de: that shouldnt be the problem, the code is un-changed in 6 years [22:32:44] No, I mean is it PHP only, does it start other programs, etc. [22:33:12] php only [22:33:36] it calls a few other php files that are in the same directory [22:33:49] I would guess it got out of memory. [22:34:03] scfc_de: thats not the issue [22:34:11] hmm, it should have a different error then [22:34:18] its memory is consistent [22:34:33] addshore: The OOM killer kills with -9, I believe. [22:34:34] Betacommand: just as a test what happens if you set it to 1G? [22:34:36] ask Coren|Food about it, he complemented on that [22:35:20] Betacommand: The qacct log shows that maxvmem was 504.590M, and you asked for 500 MB. [22:35:33] thats odd [22:35:47] I found I had to run my php scripts with 700M [22:35:54] might be able to reduce it a bit [22:35:59] someone must have screwed with php settings [22:36:26] scfc_de: the odd thing is this has been on labs for at least a month with these exact same settings [22:37:05] Betacommand: Well, perhaps the data it processes has changed? [22:37:17] scfc_de: its the same RC feed [22:37:19] its running? :) [22:37:32] addshore: I just killed it to adjust the settings [22:38:37] Cyberpower678: 214751 is eqw just so you know ;p [22:38:57] Betacommand: I had a similar case some weeks ago where some Python script was killed *once* due to memory, and before and after it remained in the requested range. Don't know why. [22:39:30] heh, Betacommand runnign with 600M now? [22:39:45] addshore: trying [22:40:02] usage 1: cpu=00:00:00, mem=0.00000 GBs, io=0.00021, vmem=13.410M, maxvmem=13.410M [22:40:03] my IRC bots crashed an hour ago and haven't restarted properly… was there some maintenance done? [22:40:25] rschen7754: server crashed [22:40:41] Betacommand: ok, thx [22:41:00] rschen7754: Do you use jstart or something else? [22:41:04] scfc_de: jstart [22:41:17] rschen7754: What's the job name? [22:41:36] scfc_de: name is python2 on hat_collector, and python2 on usrd-tools [22:41:36] * addshore goes to get food while scfc_de helps everyone ;p [22:42:29] my connection isn't stable enough to ssh in and manually restart [22:43:31] rschen7754: Did you restart hat-collector? That seems to be running (job 203776). I don't see anything from usrd-tools, though. [22:43:46] scfc_de: yeah but it hasn't rejoined the channel [22:44:45] addshore, how can I view a specific task? [22:44:53] qstat -j ### [22:45:00] rschen7754: /data/project/hat-collector/python2.out and .err are empty, so I don't know what's wrong there. [22:45:25] scfc_de: ok, i'll just have to manually restart the whole thing :( [22:45:25] rschen7754: try qmod -r ### [22:45:38] addshore: what does that do? [22:45:47] should restart / reschedule [22:45:51] addshore: ok [22:45:59] i'll have to do that on a more stable connection [22:47:03] :< [22:47:09] Cyberpower678: Could it be that you edit /data/project/cyberbot/crontab, but don't install it with "crontab /data/project/cyberbot/crontab"? chu has a different error file there. [22:47:37] scfc_de, oops. p [22:47:39] :p [22:47:45] {{trout}} [22:48:10] hehe, my crontab loads my crontab ;p [22:50:29] addshore, how? "* * * * * crontab $HOME/ */10 * * * * crontab /location/of/cron :) [22:51:04] then i git push to github [22:51:09] cron git pulls [22:51:19] and everything is uptodat with master all the time :D [22:51:27] addshore, cool. [22:58:25] petan, I typed sql metawiki and it froze. [22:59:42] hehe, same, also I typed 'sql enwiki' and it stops too [23:00:14] Ah! So that's the MySQL error I pasted above. [23:00:32] ERROR 2003 (HY000): Can't connect to MySQL server on 'enwiki.labsdb' (110) [23:01:40] Maybe the DB servers need to be reset as well. [23:02:08] edit counter works well. [23:02:33] scfc_de: should do O_o As far as I know they are seperate to everything [23:03:37] Cyberpower678: You have access to enwiki_p *now*? [23:03:55] addshore: same here [23:04:26] Coren|Food: petan sql issues [23:06:34] scfc_de, not on the command line. [23:07:36] https://icinga.wikimedia.org/icinga/ [23:07:39] opps [23:07:47] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=labsdb [23:08:45] im guessing the duration actually means everythings fine :P [23:08:46] oh wells [23:10:10] mysqld processes This service problem has been acknowledged CRITICAL 2013-06-03 23:08:10 80d 3h 9m 53s 3/3 PROCS CRITICAL: 2 processes with command name 'mysqld' [23:12:11] * Coren|Food checkes [23:12:16] checks, evem [23:12:17] :> [23:12:18] even* [23:13:12] Fix't. My fault, something not yet in puppet that hadn't survived the reboot. [23:13:22] :D [23:13:22] (Wouldn't have affected the nodes, only -login) [23:13:40] fixed indeed! cheers!