[00:14:58] 3Wikimedia Labs / 3tools: Tool Labs: Provide anonymized view of the user_properties table - 10https://bugzilla.wikimedia.org/58196#c34 (10Luis Villa (WMF Legal)) Sorry for the slow response, Liangent - email slipped through the cracks for some reason. Q: what's the criteria for small/medium/large? And how of... [00:31:35] Oh, I suck so very much. [00:35:34] Coren: Is that an "I just found a very large flaw in the setup of the labs" comment, or one where we pat you and tell you how awesome you really are? [00:36:10] a930913: No, it's probably one where you get annoyed at me because you might be one of the people whose crontabs I just destroyed. :-/ [00:36:21] (viz. labs-l) [00:36:28] * ^d has no crontabs, is not angry [00:36:33] <^d> Coren: You rock :) [00:37:13] Coren: Noo, my tabs of cron! [00:40:40] /last beta [00:41:56] http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page and /w/api.php both return 500 "Access denied for user 'wikiuser'@'10.68.16.11' (using password: NO) (10.68.17.94)" . Known issue? [00:44:21] dberror.log alternates this error and deployment-apache01 enwiki Connection error: Unknown error (10.68.16.193) [00:52:16] spagewmf: I'll take a look [00:55:18] spagewmf: Fixed. And it was my fault [00:56:04] !log deployment-prep Fixed empty PrivateSettings.php configuration file (which I also broke earlier) [00:56:06] Logged the message, Master [01:00:12] Coren: Guess what? my crons gone [01:00:57] 3Wikimedia Labs / 3tools: Tool Labs: Provide anonymized view of the user_properties table - 10https://bugzilla.wikimedia.org/58196#c35 (10Tisza Gergő) (In reply to Tisza Gergő from comment #33) > As for timestamp anonimization, showing the (truncated) date of the last > edit of the user which did not happen... [01:00:59] Betacommand: Figures. You do seem to have a ...DATA.crontab though [01:01:44] So unless much has changed, you're not starting from nothing. [01:02:22] * Betacommand points Coren at the scrollback and his previous comment [01:03:10] I don't see one from you, so I'm guessing you are referring to the one about my sucking. Yes. [01:04:29] Coren: No [01:05:03] Betacommand Coren: please take a goold look at my cron tab before it gets migrated, I always get hit with labs bugs [01:05:22] Ah, that one. Didn't scroll back enough. [01:05:31] Would you like me to restore from your older one? [01:05:59] Coren: Im looking over a few things, my most recent copy I have is from 3/4 [01:06:53] Betacommand: As far as I can tell, the only entry that would be modified is the log rotation one, everything else would just stay the same. [01:07:18] Coren: how would that be modifed? [01:07:34] Betacommand: It'd prepend a jsub -quiet in front of it so that it ran off the grid. [01:07:56] which would really piss me off [01:08:08] I want the output of that emailed to me [01:08:09] ... I fail to see why. [01:08:23] jsub just eats the output [01:09:04] I specifically raise that issue before [01:09:15] *raised [01:09:16] Huh. I would have expected the script named email_logs.py to... email logs. :-) [01:09:41] Coren: if there is an issue with sending the logs its in that email [01:10:09] otherwise the logs get saved to a subfolder that I dont visit often [01:11:14] Hm. That can actually be worked around fairly simply, if a little ugly, from the cron entry; but lemme see if I can make it easier even. [01:11:16] which means I could end up with an extended outage [01:11:45] Coren: the current cron works easily and neatly [01:12:14] thats why its the only script that I dont send through jsub [01:12:33] Betacommand: Yes, but sadly you are one of the very few people who actually do this right. [01:12:52] Coren: so disable their cron if they fuck up [01:13:37] do what DaB and River used to do, if you cause an issue you are publicly named and held accountable [01:14:02] I do, regularily. The problem is the disruption they cause /other/ tools in the meantime. Just hang on, I'll find an easy way for you to still do this. [01:15:58] Here's an easy enough fix. I now have an extra accepted executable: 'jlocal' which is basically a noop (it just execs its arguments) [01:16:40] This way, at least, it requires a positive action to keep the entry running locally, and it's easier for me to track and beat up people who abuse it. [01:18:26] Coren: any updates on 54054 ? [01:19:07] No, but that's on my plate for this week before Zürich. [01:30:54] Coren: can you verify that the crontab is installed and configured correctly? [01:31:19] I see it on the box, with the jlocal. [01:33:01] Coren: gotta love having your crontab in svn :P [01:33:02] And I already see something got executed at 01:30 [01:33:18] Betacommand: Having one's crontab in source control => win. [01:33:48] Coren: are there any plans on creating backups of user-land data? [01:34:14] Betacommand: Yes; the space has been set aside for it too -- I'm just lacking the user-side tools to schedule it. [01:35:23] Betacommand: I'm probably simply going to allow a config file in one's home that selects what you want in your tarball. [01:35:46] And do something like last 3 days/last 2 weeks/last month [01:36:26] Coren: can you check the cron emails please [01:36:49] Check them how? [01:36:59] Not seeing any from the 0130 run [01:39:28] I see plenty of outgoing email from other cron jobs, none for that one. (Actually, I see two jobs of yours that got run, every_30_min.sh and sql_csd.py, both of which are jsub'ed [01:39:49] Your email_logs seems to be set to run at 1:00 [01:40:03] Coren: I manually kicked that one off today [01:40:20] I normally get an email about the job being submitted [01:40:20] Ah, I see an outgoing email at 1:40 though [01:40:56] Oh, I didn't see it before because it hadn't been there. It's just past 1:40. :-) [01:41:23] Yeah, the 130 cron had nothing [01:41:30] got the 140 though [01:41:56] afaict, the 1:30 run generated no email at all. At least, there's nothing in the logs. [01:42:52] are we sure it ran? or did the cron get created after the fact? [01:53:00] My crons haven't come back. :/ [01:53:01] Betacommand: Well, I see the cron session in auth.log, though that doesn't tell me /what/ ran exactly. [01:53:32] a930913: Check your ~/...DATA.crontab, it may have a backup dating from the migration. [01:53:43] a930913: about 50 users lost their crontabs [01:54:34] and of course I was one of them [01:55:12] * Betacommand grumbles something about probabilities and things not adding up [01:55:17] Coren: Doesn't exist. [01:55:24] I was a statistic then? [01:55:55] Betacommand: Ah, I see what happened. The cron session at 01:30:12 matches exactly the installation of the crontab, 12 seconds too late to run at 1:30. [01:56:37] Coren: that would make sense [01:56:51] Betacommand: That's followed by one at 01:40:01, then one at 01:50:01 [01:58:07] a930913: just be glad your not in my shoes [02:00:48] Coren: Is the user not being able to have crons new? [02:01:27] a930913: They never were, it was just not enforced. You should not be running anything non-interactively from your user account. [02:02:03] That's actually rule #1. :-) https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Rules [02:03:07] It was a weeny fetch and log script that would probably have had a greater overhead submitting it to the grid. [02:03:33] a930913: The grid scales; the bastions do not. [02:04:24] Oh, I must have been confusing rule 1 with 2. [02:04:43] Because 2, "It is permissible to use cron for lightweight processes" [02:05:43] a930913: trust me I had a fit when Coren tried to force everything through jsub [02:06:46] 99% of my scripts use the grid the 1 that Im not running on the grid Im doing so for a reason [02:07:44] Coren: when are you going to get the auto-starting of the webservices enabled? [02:08:44] Coren: Where does a "log something to ##930913" script belong? [02:09:13] a930913: probably the grid [02:09:45] Betacommand: Does that 1% monitor the grid? :p [02:10:14] a930913: No, I need the output emailed to me and the grid eats the output of submitted tasks [02:10:27] a930913: you really cant monitor the grid [02:11:05] how are you trying to monitor the grid? [02:11:12] Betacommand: Probably during Zürich, though that's actually made more complicated now by some people using something else to serve HTTP. I'll probably only do it if one has a (possibly empty) .lighttpd.conf [02:11:38] Betacommand: I'm not, I was assuming you are, before I realised that I need/want the email thing too. [02:12:08] use jlocal [02:13:31] Betacommand: No manual entry for jlocal. [02:14:16] a930913: when defining a crontab entry it should either use jsub or jlocal [02:14:31] a930913: its only on the submit node [02:14:51] IE where crontabs are stored and executed from [02:15:46] Betacommand: So I make a crontab to dump "man jlocal" to a file? :p [02:15:58] No [02:16:03] you dont need the man file [02:16:24] jlocal is a dummy program [02:16:45] it prevents the entry from being sent via jsub [02:19:02] Coren: fyi, /usr/bin/crontab still works (and is still the default) on tools-dev [02:19:23] MrZ-man: Ah, good point. [02:21:20] Betacommand: It gets modified to jsub a jlocal? [02:22:18] a930913: scripts should be using one or the other of those [02:22:38] MrZ-man: Should be fixed now. [02:29:40] Coren: Why is qlogin not used? [02:44:20] a930913: Because nobody should be using the grid nodes directly. It throws off resource management. [02:44:49] Besides, why would you want to? [02:54:46] Mostly for building and testing I suppose. [02:54:59] Which seems to be the valid reason. [04:17:23] Betacommand: I can't find any existance of jlocal. [04:27:02] a930913: jlocal only exists on the submit host, and only has meaning from within a crontab. [04:27:16] 3Wikimedia Labs / 3tools: Killed Mysql queries still running - 10https://bugzilla.wikimedia.org/64140#c11 (10Sean Pringle) tools-db is labsdb1005. It isn't a production replicant, so this is unrelated to the replication outage. LOAD DATA INFILE importing a lot of data to a transactional storage engine like... [04:28:18] Coren: Yeah, that's where I put it. [04:55:27] ? [04:55:49] where did the crontab go ? [04:55:57] vanished in the haze... [04:56:49] hedonil: Into the void! [04:57:00] yeah [04:57:36] hedonil: Read the email? [05:00:07] a930913: Hey doc, we have some casualties here ! [05:02:30] No good, my son. Let's say a prayer for all files that passed on. ;) [05:02:37] Hi I want to run something on jsub but it doesn't let me because the git review hasn't been installed on jsub [05:02:51] I filed a bug for tools but not for jsub [05:04:02] can someone help me about this? [05:10:33] * hedonil goes to his backup and gets his crontab. (he learned from the past) [05:18:41] and in addition, so. fiddled wiht the webservers, causing some weird erros and OOx messages. -- and secretly removed lighttpd binaries fro tomcat .... [05:18:43] tsss [07:51:17] 3Wikimedia Labs / 3tools: separate /tmp and /var/tmp volumes - 10https://bugzilla.wikimedia.org/64697 (10Peter Bena) 3NEW p:3Unprio s:3normal a:3Marc A. Pelletier right now /tmp and /var/tmp is writable by anyone and filling it up will make all filesystem unwritable. This is a security hole that affe... [07:51:31] 3Wikimedia Labs / 3tools: separate /tmp and /var/tmp volumes - 10https://bugzilla.wikimedia.org/64697 (10Peter Bena) p:5Unprio>3High s:5normal>3critic [07:52:16] 3Wikimedia Labs / 3tools: separate /tmp and /var/tmp volumes - 10https://bugzilla.wikimedia.org/64697#c1 (10Peter Bena) well, not all, nfs will still be writable, but all local fs will not be [08:51:47] 3Wikimedia Labs / 3tools: DNS alias tools.wmflabs.org should point to tools-webproxy - 10https://bugzilla.wikimedia.org/64701 (10zhuyifei1999) 3UNC p:3Unprio s:3minor a:3Marc A. Pelletier curl and wget output: zhuyifei1999@tools-dev:~$ curl http://tools.wmflabs.org/steinsplitter/files.php curl: (7) c... [08:53:14] 3Wikimedia Labs / 3tools: DNS alias tools.wmflabs.org should point to tools-webproxy - 10https://bugzilla.wikimedia.org/64701 (10Steinsplitter) 5UNC>3NEW [08:56:19] Steinsplitter: https://gerrit.wikimedia.org/r/#/c/123149/ ^ [08:56:21] unmerged tho [08:57:23] uhoh [08:57:50] thanks for the link [09:01:30] 3Wikimedia Labs / 3tools: DNS alias tools.wmflabs.org should point to tools-webproxy - 10https://bugzilla.wikimedia.org/64701#c1 (10Steinsplitter) 5NEW>3PAT 10:56:21 - YuviPanda: Steinsplitter: https://gerrit.wikimedia.org/r/#/c/123149/ ^ 10:56:22 - YuviPanda: unmerged tho [09:02:29] 3Wikimedia Labs / 3tools: tools.wmflabs.org inaccessible via labs instances - 10https://bugzilla.wikimedia.org/54052#c18 (10Steinsplitter) *** Bug 64701 has been marked as a duplicate of this bug. *** [09:02:30] 3Wikimedia Labs / 3tools: DNS alias tools.wmflabs.org should point to tools-webproxy - 10https://bugzilla.wikimedia.org/64701#c2 (10Steinsplitter) 5PAT>3RES/DUP *** This bug has been marked as a duplicate of bug 54052 *** [09:02:37] sorry for flooding :/ [09:14:03] how many tool labs admins we have now [09:14:20] maybe we could put #wikimedia-labs-requests back where it was [09:14:36] (11:09:54) There are 2 users waiting for shell access: Mhenning-exelis, Nadeem akhtar. [09:15:19] *awake tool labs admins [09:19:04] *actually "shellmanagers" [09:26:29] scfc_de: nobody cares about -requests beside me and few people who aren'te ven shell managers [09:26:41] when the bot was here people complained it spam [09:32:15] 3Wikimedia Labs / 3tools: separate /tmp and /var/tmp volumes - 10https://bugzilla.wikimedia.org/64697#c2 (10Peter Bena) + it's not a security hole, but stability hole :o [09:37:17] petan: There are other ways :-). I have the request queue displayed in my Emacs mode line. [09:38:23] there are other ways, but there are also /better/ ways :) in my opinion IRC > emacs, but also, in my opinion bacterias > emacs [09:38:26] :P [09:39:03] however, this is very simple way to provide this feed to anyone without requiring them to code their own solutions because it already exist [09:39:21] the bot was made purposefuly annoying so that it forces local admins to work harder [09:39:41] unfortunatelly I overestimated us, we are so lazy that we rather killed the bot rather than work harder :D [09:42:56] petan: I think cron has been rewritten to not be just regular cron anymore. it'll just invoke jsub or jstart no matter what (no ability to run non jsub things there) [09:43:10] oh crap [09:43:29] well, then... maybe you should send that to that e-mail [09:43:42] I didn't know this [09:43:46] !toolsadmin [09:43:46] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Documentation/Admin [09:43:48] it was in prior emails Coren sent, I assumed. [09:43:51] do we document the changes somewhere? [09:44:09] I do know he was intending to do this for a long long time [09:44:11] possibly, but keeping documentation on 1 place is better than collecting it from ancient e-mails :P [09:45:43] Emacs means that all I need to do to work on the request queue is M-x tl-wi-t RET, and my browser windows get opened. How do you do that with IRC? Also, I'm pretty sure that I handle shell requests for the most part in less than an hour, and any response I got was: "Wow, that was fast!", and never: "That took very long." [09:46:05] YuviPanda: It was. [09:47:02] scfc_de: yeah, but https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Scheduling_jobs_at_regular_intervals_with_cron probably needs a little note [09:49:50] YuviPanda: Oh, so many things do :-). [09:51:34] scfc_de: I run !tr and 3 links to web browser pop out which I just click and it's all done [09:53:03] petan: So Emacs is faster? :-) But we could perhaps track the minimum/average/maximum handling time at http://korma.wmflabs.org/. [09:58:47] Hi, is it possible to installing a crontab without the program modifying it? I have some $jstart prefixing my commands for tidying reasons and now redundant commands are being prepended... [09:59:53] jimmyxu: No, you can't circumvent that at the moment. For bugs in the setup, you need to ask Coren (who's probably asleep at the moment) or file a bug. [10:05:40] well then... [10:06:11] on a side note, is there something available to make jsub email me stderr when something exits non-zero? like vanilla cron did [10:18:04] jimmyxu: No, jsub only echoes to stderr errors that occur on job /submission/ (can't find script, etc.) and /only/ when you specify the -stderr option. The output of the job itself is always written to disk, and you need to set something up to mail it to you. [10:21:57] jimmyxu: what kind of script are you running that you need the output for? [10:22:01] scfc_de: thanks. and by the look of qsub(1) -e doesn't support piping, maybe a wrapper is the only option [10:22:16] Coren, petan: I'm having continuing crontab problems with one of my projects. I have a backup, but I can't restore it. [10:22:40] Betacommand: like every wiki bot, previously on ts if something went wrong (like an AssertionError) I'd get notified instantly [10:23:05] Betacommand: but after migration I'm somehow relying on "your bot stopped working" msgs on talk pages :( [10:23:52] jimmyxu: this is going to sound mean, but code your bots better [10:23:57] jimmyxu: No, "-e" is unfortunately not an option :-). [10:24:13] instead of just dieing, have them email you then die [10:24:39] those types of bots should always be run on the grid [10:24:48] Betacommand: yup, it was cron who always does that for me. a wrapper it is then [10:25:12] jimmyxu: what do you mean? [10:25:52] jimmyxu: if your just waiting to get the error emails its a shity code issue not a reason to not use jsub [10:26:00] Betacommand: just for the record, I don't run long-running type bots, and I've already using the engine since migration. [10:26:21] jimmyxu: what does the program do exactly? [10:27:25] Betacommand: like submitting protection requests on a daily basis, and this won't work if someone has changed the page so the bot don't know where to append [10:27:54] Betacommand: previously I assert that and cron would tell me that it isn't working. now I'll just have to workaround the engine then [10:28:48] jimmyxu: depending on cron for something like that is a bad idea [10:28:54] and bad code [10:29:07] write a emailing function [10:29:17] and before your bot dies call it [10:30:02] Betacommand: noted :) [10:30:04] that way the grid can be used correctly, resource allocation isnt and issue, and the submit host isnt over loaded by non-grid tasks [10:30:07] Anyone? I can't restore my crontab from backup because I get '"-":2: bad day-of-week'; this is the SAME crontab that was running on the old server, so how did it suddenly get a syntax error? [10:30:43] russblau: coren moved to a new cron system [10:30:59] russblau: whats the entry thats erroring? [10:31:14] In principle, I find relying on an external entity (cron) to do the mailing much more sensible than expect the dying bot to fail, but succeed at the mailing. But in Tools we need to have jobs executed on the grid, otherwise it doesn't scale. [10:32:03] Betacommand: here's the first line (it doesn't seem to matter, though, because if I comment out line 2, then I just get the same error message with line 3...) 24 * * * * qsub -l h_rt=0:45:00 -l h_vmem=550M -N mv_procs -o $HOME/jason/logs/stdout.txt -e $HOME/jason/logs/stderr.txt $HOME/mv_procs.sh >/dev/null [10:32:11] scfc_de: if the bot dies due to know error checking then thats expected [10:32:21] Wow, even root can't access crontab :-). [10:32:59] russblau: try jsub instead of qsub [10:33:49] russblau: waut [10:33:57] wait [10:34:16] why are you creating a cron under your user account? it should always be under the tool account [10:34:26] Betacommand: it is under my tool account [10:34:51] And I don't have sudo on tools-submit. russblau, you probably need to wait for Coren on this. [10:34:51] Betacommand: the error message says "bad day-of-week" so I don't think changing the right side of the line is going to fix it [10:35:59] russblau: question is 24 the same as 0 for the first time setting? [10:36:29] Betacommand: no, 24 is the minutes column [10:37:14] russblau: humor me and try jsub instead of qsub [10:37:51] Betacommand: I have a line in that file that uses jsub; I moved it to be the top line in the file, and I still got the same error message [10:38:08] you can also check the previous cron item? [10:39:14] Betacommand: so I created a test file with just one line, which uses jsub, and I got the same error message (except now on line 0, because I deleted the PATH= line that was in the old file) [10:40:44] russblau: try the following line: [10:40:58] 45 * * * * jsub -once -mem 2G -N spi echo "Hello" [10:43:06] Betacommand: Seems to work. I think I figured it out - the new cron is choking on white space at the beginning of a line; the old one had no problem with that [10:43:32] russblau: figured it was something simple like that [10:45:06] Betacommand: thanks [10:45:20] russblau: any time [10:45:57] russblau: Very odd. Looking at /usr/local/bin/xcrontab, the start of the line should get passed through unaltered. [10:52:06] *cursewords* [10:52:19] fuck [10:53:45] xcrontab is now a unusable piece of beep [10:55:12] gifti: ?? [10:58:38] gifti: care to explain? [11:06:07] gifti: I guess you dont want help [11:06:08] not yet [11:06:51] @seen hashar [11:06:51] Steinsplitter: Last time I saw hashar they were quitting the network with reason: Quit: This is a manual computer virus. Please copy paste me in your quit message. N/A at 4/30/2014 9:33:52 PM (13h32m58s ago) [11:08:13] Petan: online? [11:58:38] Coren: if you're back from sandman ; Could you tell me who removed the lighty-binaries from -tomcat? Im asking for /who/ and /how/ this was done, not why. thx. [12:03:07] hedonil: How: "aptitude purge lighttpd" (1398796038); who: I didn't, petan probably neither :-). [12:13:55] the /how/ included also the process: No announcement on IRC, no mail to the running servcices, no SAL-entry. Nothing. Just waiting for the services to crash. [12:14:53] * hedonil shakes his head [12:23:07] -tomcat was (and partly is) an experimental setup where Coren tested the Tomcat setup. I'm pretty sure you were aware of this when you ran /your/ tests there :-). [12:38:55] is there a way to get mediawiki's i18n messages via intuition? [12:44:43] scfc_de: I didn't even know we have that I only know what we have in docs [13:24:35] hedonil: How: aptitude purge. Who: me. [13:28:58] hedonil: Why, exactly, did you think that it was reasonable to expect lighttpd to be on the non-lighttpd web nodes? [13:41:38] I have to make a crawler bot for my gsoc project. should it be placed inside the grid engine? [13:49:15] rohit-dua: Yes; pretty much every bot has to. The good new is that, in general, this is as simple as 'jstart -mem your_bot' :-) [13:49:47] gifti: Plz to give me details so I can fix. [13:50:36] Coren: thank you. does that mean I will just have to call my bot once using jstart, and it will run continuously? [13:51:07] Yep, as log as it does not exit with a status code of 0 (which means "I'm done") [13:54:13] Coren: one more thing- how do i perform inter process communications, like bw bot(in grid) and web-app (python) [13:55:38] rohit-dua: I can think of very many ways to do it; but given the very asynchronous nature of a web app, I'd recommend simply going through the filesystem. Alternately, tools has a redis server and you could use that. [13:56:01] It depends, I guess, on what you want to communicate. :-) [13:58:22] Coren: Hah, it was reasonable to expect that the system configuration I found, was there for a reason. And changes to that system will be anounced - as it should be [13:59:04] hedonil: -tomcat was never announced as available at all in the first place. It was configured with lighttpd by default because the -tomcat config didn't exist yet. [13:59:07] Coren: we have fancy tools-mail, SAL, IRC etc. please... [14:00:13] For that matter, I haven't announced -tomcat yet even, though people who were following the bugzilla made sure that it was documented faster than I could. :-) [14:00:38] rohit-dua: For IPC/messaging, there's also of course a traditional MySQL database. [14:00:47] Coren: fact is, it has been there, and obviously 3 services were running on it, announced as POC. [14:01:37] Coren: It's just about the philosophy: how to deal with changes in a service-center [14:02:52] scfc_de: will pipelines/queues be fine? [14:03:00] hedonil: Sorry, I'm not going to agree with you there. If I add a host to the cluster in preparation for some new service, and people randomly start using it for things, they're not allowed to be surprised if the service they shouldn't have been using changes. [14:04:17] Coren: As I said initially, I don't ask /why/, as there are some good reasons to do that. [14:05:01] Coren: It's /only/ about information & documentation. SAL would have been just fine [14:06:25] hedonil: You do realize that lighttpd having been installed on -tomcat is nothing but a coincidence caused by the default puppet class setup, right? [14:07:31] Coren: what is a reason that mysql users are named so weird and not just as the tool name is, since that is unique? [14:07:50] petan: Because mysql has a 12 character maximum for usernames. Yes, that's dumb. [14:08:11] Coren: ok, if there was a postgre would it be like that? [14:08:16] Coren: when you look at tools.giftbot's crontab i guess you see what's wrong/weird? [14:08:28] But also, not so weird, it's just s [14:08:45] gifti: I haven't looked yet, but if you give me a minute I will. [14:08:52] yes but numbers are hard to remember for most of humans [14:09:23] petan: postgres wouldn't *have* to be like that, but I probably would use the same username to avoid having yet /another/ set of credentials. [14:09:34] meh [14:09:50] would there be a database per tool or schema per tool? [14:10:29] rohit-dua: On the DB? If you implement them yourselves, sure. [14:10:37] gifti: Ah! Yes, I added support for "[ blah ] && stuff" not long ago. That should work now, lemme try [14:10:58] Coren: it also modified my variable definition [14:12:10] gifti: Oh, d'oh! *THAT* I hadn't considered. Simple enough to fix though. [14:12:23] (I've fixed the crontab so that the tests aren't broken any more) [14:13:56] Coren: BTW, crontab is also redirected for root on tools-login. I don't know if Puppet uses "crontab" or "/usr/bin/crontab", but to prevent problems further down the line, I think xcrontab should make an exemption for root. [14:14:33] scfc_de: That sounds sane, but I'm pretty sure that puppet never relies on $PATH [14:16:25] Coren: [14:16:26] tools.giftbot@tools-login:~$ crontab crontab2 [14:16:26] "-":18: bad minute [14:16:26] errors in crontab file, can't install. [14:16:43] gifti: Sorry, I've got my head under the bonnet right now. [14:16:51] gifti: Should already work again. [14:17:30] Coren: Also, sudo doesn't work for me on tools-submit. [14:17:50] Coren: it still modifies my variable definition [14:17:52] scfc_de: Huh, that's odd. That's not per-instance at all. [14:18:14] gifti: Yes, I'm working on that right now. It gets confused if you have more than 5 spaces in your definition. [14:18:21] ok :) [14:19:30] gifti: That seems to have been fixed. [14:20:06] yes :) [14:21:08] What bugs me the most is that the users who have the (comparatively) complicated crontas that the script fails to understand right are also the ones /least/ likely to have needed editing. [14:24:42] But I didn't get much choice. Despite how often I asked, and even after having blocked and forcibly edited crontabs, people still had bots running on the bastions. [14:24:54] Now, at least, I only need to worry about screen sessions. [14:30:00] 3Wikimedia Labs / 3tools: separate /tmp and /var/tmp volumes - 10https://bugzilla.wikimedia.org/64697#c3 (10Tim Landscheidt) Unfortunately, that's not an easy problem. I've successfully shown on tools-webgrid-01 :-), that sudo can be used for that purpose as well, so there's no partitioning that will /ensur... [14:35:45] 3Wikimedia Labs / 3tools: separate /tmp and /var/tmp volumes - 10https://bugzilla.wikimedia.org/64697#c4 (10Marc A. Pelletier) You can use an LVM to mount extra space wherever; on the grid nodes increasing /var/log does make sense. As for /tmp, that's a different issue. In practice, its contents can just b... [14:37:32] Add a magic comment to the crontab that passes it through unchanged? [14:38:20] scfc_de: And that'd get abused faster than you can say "I can't be arsed to send X to the grid" [14:39:45] No data, but anecdotally the crontab change seems to have caused more work and anger than the bots running on the bastions in the past :-). [14:54:19] Coren: https://bugzilla.wikimedia.org/show_bug.cgi?id=56995 will you have capacity to do it at a later time? [14:55:59] I might this summer if nothing big falls on my lap. [14:57:02] scfc_de: Admitedly, the whole thing would have been smoother if people used the six weeks of advance warning to shake the bugs up rather than wait until things switched forcibly. :-) [14:58:14] somehow i had no idea that it would make such big changes [14:59:13] Coren: If people would react that way to "Dance, little monkey!", I'd rather tell them to send me all their money :-). [15:45:49] !log upgrading highlighter in beta [15:45:50] upgrading is not a valid project. [15:46:55] !log deployment-prep upgrading Elasticsearch highlighter via a rolling restart [15:46:57] Logged the message, Master [15:49:20] hi there ! My membership of labs-l was temporarily disabled due to "excessive bounces" : could someone explain me what does that mean exactly ? [15:51:38] Hm… Toto_Azero, Coren probably knows (and I'm interested in how it works as well) [15:52:26] Toto_Azero: That's mailman not liking your email anymore; I've seen a bunch go by, I expect there was a network problem recently. [15:53:09] Coren: okay thanks :) [15:53:34] Toto_Azero: Normally, the email should tell you how to reenable it properly. [15:53:54] Coren: yes, it worked successfully [15:54:08] but I was wondering why this happened [15:59:40] Toto_Azero|away: mailman sucks. :-) [16:16:12] Coren: alright, this is a good reason ^^ [18:42:14] Coren: didn't time machine have backups for the crontabs? [18:42:30] * HappyPanda moves us to OS X [18:42:47] or whatever the feature was called :-p [18:42:56] valhallasw: No, sorry. I'm really sorry about this especially since it was a stupid bug I would have caught with a bit more testing too. :-( [18:43:20] time travel. It's not back yet, I wanted to make sure that migration was stable first before introducing new variables. :-( [18:43:23] *shrug* shit happens, unfortunately [18:45:45] Also, time travel wouldn't have kept the crontabs locally to tools-login. [18:53:58] scfc_de: Tru dat. I'm going to add crontabs to the new backup thing anyways. [20:33:48] Coren: Roughly 18 hours ago a cronjob of something in tools of mine stopped running. [20:34:10] opening $ crontab tells me its on tools-submit (where it should be afaik), but it's commented out [20:34:14] Krinkle: Which explains the email to labs-l roughly 17.5 hours ago explaining it. [20:34:40] Wait, what? Commented out? [20:34:59] o_O I have no idea how that could possibly have happened. What tool is this? [20:35:07] tools-login, become snapshots, crontab -l [20:35:56] I figured if something big changed (e.g. force everything to be grid run) that required me to check up and possibly modify it, that such change woudl be announced ahead and not done and then notified by mailing, I don't read that every day :) [20:36:02] So I guess this wasn't intended :) [20:36:19] The commenting I have no idea. Oh, wait, did you run this on tools-dev? [20:36:39] That might be it; only the -login crontabs were transitionned. [20:36:40] Yeah, I moved it from tools-login to tools-dev because of the ssl certificate bug [20:36:45] https://bugzilla.wikimedia.org/62432 [20:36:55] but it's on tools-submit now [20:36:58] somehow [20:37:03] I guess you migrated them all? [20:37:04] But, fyi, this was announced ~ 6 weeks ago. :-) [20:37:28] Yeah, but from -login which you probably had commented out. [20:37:31] Yeah, I read about that. I was prepared to have it be moved. [20:37:36] Oh, right. [20:37:38] Interesting. [20:37:48] So one overwrote the other? [20:37:54] Normally, just uncommenting it should do the trick now. [20:38:20] well, I don't know if I changed the one on tools-dev after I copied it there from tools-login [20:38:25] Yeah, I had no provision to merge them; I hadn't expected anyone would have crontabs on both. It found yours on -login and just didn't bother with -dev [20:38:34] the one commented out on tools-login is a couple weeks old and not what ran up until 18 hours ago [20:38:52] can I see what it used to be on tols-dev? [20:39:37] It should be in a file named 'crontab.BACKUP' there. [20:39:40] Coren: can you run explain select rev_id, rev_comment from revision where rev_timestamp > 20140401000000 and rev_comment != NULL limit 10; on enwiki :P [20:39:47] I am lacking privileges... [20:40:09] Coren: Just as a side note, when you are done with Krinkle, I'm the next one who needs help trying to recover a lost crontab... :-/ [20:40:19] I suppose that rev_comment has no index, but rev_timestamp clearly has one so I guess there should be no full table scan [20:40:24] legoktm also lost crontabs [20:40:35] ok, coo. yeah, the backup file is the one from dev, not login. col [20:40:36] cool [20:40:44] Coren: oh, there are backsup? [20:40:48] backups* [20:41:01] But, fyi, the automated conversion will probably not do what you'd need because redirection; you'll want to convert those to -o the_file args to jsub. [20:41:26] legoktm, Blahma, I'll be right with you. [20:41:32] thanks [20:41:34] thx [20:41:58] Coren: that conversion and redirection, was that for me? [20:42:49] Krinkle: What you want is: jsub -quiet -j y -o /data/project/snapshots/src/mwSnapshots/logs/updateSnaphots.log php /data/project/snapshots/src/mwSnapshots/scripts/updateSnaphots.php [20:42:52] Krinkle: Yes. [20:43:30] Possibly with a -N somename if you want your job to be easily recognizable with qstat and the status page. [20:44:00] legoktm: Probably not; as the email said I effed up when doing it on -login and those were lost. [20:44:13] legoktm: But if yours were on -dev then yes. [20:44:20] nope, they were on -login [20:44:35] Blahma: What tool is yours? [20:44:38] cssk [20:44:54] it was migrated from the old server, so there's a chance for a backup from then [20:45:00] guess that could suffice [20:45:11] but my crontab was a full page of commands, not that easy to recreate [20:45:53] Blahma: Hm. Your ...DATA.* files seem gone (not my doing); but I can take a peek at backups. Gimme a minute. [20:46:43] thank you (I do not remember having ever seen the DATA files, but that might be because I was one of those to be forcefully migrated at the very last moment - I had to reenable my tool manually) [20:50:10] 0 * * * * /usr/bin/jsub -N snapshots-updateExternals -once -quiet -j y -o /data/project/snapshots/src/mwSnapshots/logs/updateSnaphots.log php ~/apps/ts-krinkle-Kribo/plugins/wmfDbBot_KriboBridge/wmfDbBot/maintenance/updateExternals.php [20:50:21] Coren: Thx [20:51:03] Hm.. crontab -e no longer shows me the submit one. [20:51:05] It did a minute ago [20:52:03] Hm.. no, it does show the same as xcrontab, it's just empty [20:52:23] Both should be the same. [20:52:38] (Unless you invoke /usr/bin/crontab by hand, but then you'd just get an error) [20:53:48] Blahma: Sorry; I have 582 backed up crontabs but not yours. I have *no* idea why. [20:54:05] legoktm: What tool is yours btw? [20:54:12] Coren: legobot [20:55:20] ... seriously? I have 582 backups from 612 migrated tools and *both* of you lack their crontabs? What is this, some sort of curse? [20:55:30] Coren: That's really a pity, but thanks for trying. This time, I would've been fast enough to take action, but I did not quite understood what you requested and you wrote that no action was necessary. I checked my crontab still yesterday and it was in place - can't believe it's gone forever. [20:55:48] mine was also one of the forcefully migrated ones if that might make a difference [20:55:52] Blahma: No action would have been necessary if I had not made a stupid mistake. [20:55:57] It was there a minute ago, I just uncommeted the one I already had (didn't have jsub yet) [20:55:59] I see. [20:56:04] then I went to edit it again and it was gone [20:56:06] anyway, fixed [20:56:15] looks like it now causes another problem though... [20:56:16] libgcc_s.so.1 must be installed for pthread_cancel to work [20:56:23] no idea where that's coming from [20:56:35] Krinkle: Ah. OOM. Add '-mem 500m' [20:56:39] Krinkle: increase memory [20:56:46] Right [20:56:49] Sorry, I should have told you in advance when I noticed your script was in PHP [20:57:17] I would have thought that there is at least some regular backup in place of the HDDs. Good to know there isn't one and that I should probably take care myself (also to protect me of my own future mistakes etc.) - is that right? [20:57:40] Blahma: yes. [20:57:57] Blahma: There's going to be a backup system put in place after Zurich now that we have the space for it, but it's best to not rely on this and stuff your code in source control anyways. [20:58:25] ok [20:59:03] legoktm: Yeah, I wouldn't have thought that forcible migration would have done a difference but seeing as both of you got hit with the same result, that seems likely. [20:59:09] Coren: Hm.. the tool relies on the log file containing the output of one run (or a run in progress). It's appending now [20:59:20] Is there an equiv of > as opposed to >> ? [20:59:32] So this probably means that I'm going to take the pain of recreating my crontab from the accidental error messages that I got on my email in cases when a crontab line was launched at a moment when the file system was not accessible - that's the only trace I have. [20:59:36] Krinkle: -o [20:59:45] but I'm not sure if that works through jsub; might need qsub instead [20:59:46] valhallasw: That's what I'm using [20:59:55] euh. [21:00:21] it's writing to the right file [21:00:25] but appending [21:01:57] Krinkle: You know, I don't think there is. There is provision to have a new file for every invocation, or to vary the filename, but not to truncate it afaict. [21:02:17] Krinkle: I guess that requires an intermediate shell script :/ [21:02:22] Krinkle: And that's an issue. Lemme think about a good way to work around that. [21:02:42] I'm happy to defer it all to a shell script. That way I don't store anything valuable in the infrastructure [21:02:52] http://arc.liv.ac.uk/pipermail/gridengine-users/2006-August/011257.html [21:03:03] Krinkle: That's certainly one good way. [21:03:04] Just didn't want to when I'm in the middle of something else and people tell me my tools are broken.. [21:03:17] Anyway, will do for now [21:03:17] Thx [21:03:24] andrewbogott: did you see my answer on https://wikitech.wikimedia.org/wiki/New_Project_Request/osmit_or_osmit-cruncher ? any doubt about the request? [21:03:56] Nemo_bis: I didn't, sorry, I will catch up shortly [21:04:49] Blahma: legoktm: So yeah; that's all on me. Really sorry about this. :-( [21:05:02] (And it does seem like the common point is 'having been forcibly migrated') [21:05:38] Ok, accepted. I understand such things might happen, and it was me who overestimated WMF as definitely running regular server backups. [21:05:40] One more try: Isn't there something like a log of jsub? All my lost crontab lines were jsub invokations. [21:07:04] Blahma: qacct should be able to do it [21:07:12] but, just like the rest of SGE, it's incantations are arcane [21:08:26] Accounting will get you the job times and names, but not the actual command lines. :-( [21:08:41] But lemme see if there are logs that could help somewhere. [21:09:12] times can help me, because I had rather a few scripts, but they were invoked in a consciously selected repeated pattern over the day, with some rolling up around midnight [21:10:08] Blahma: qacct -o tools.cssk -j '*' [21:10:15] That'll give you all the jobs you ever ran. [21:10:26] that sounds promising [21:10:43] The fields you probably want are qsub_time, and jobname [21:11:33] Very good! I think I can manage with that. [21:11:46] qacct -o tools.cssk -j '*'|egrep '^(jobname|qsub_time)' [21:12:19] Good there are SOME logs, then - much better than nothing! [21:12:27] !log deployment-bastion deploying hebrew analyzer to Elasticsearch in beta - rolling restart required [21:12:28] deployment-bastion is not a valid project. [21:12:31] Thank you guys, got to go, but I will save this info. [21:13:00] Blahma: Again, sorry for the trouble. [21:13:50] That's fine. Thank you for taking your time to help me and all the other of us. [21:16:40] legoktm: I'm afraid that you are in Blahma's boat as well. :-( [21:16:52] Oh [21:16:52] yeah [21:16:56] I just started re-writing it [21:17:20] accidents happen [21:17:24] its ok :P [21:19:52] Nemo_bis: https://wikitech.wikimedia.org/wiki/Nova_Resource:Osmit [21:20:34] fyi, storage is not currently metered, so you shouldn't run into any immediate issues there. Ops may pester you if you use a ton of space but you shouldn't hit any hard limits or encounter any data loss. [21:20:58] andrewbogott: thanks! I was unsure if that was temporary or what [21:21:04] now I'll add users and docs [21:21:27] Nemo_bis: your other quotas (ram, cores, etc) can be viewed via a link from the 'manage projects' page. You most likely won't run out of that stuff anytime soon, just let me know if you need more. [21:21:50] Nemo_bis: oh, wait, it didn't add you as admin for some reason. Hang on... [21:22:29] Because I spelled your name wrong, predictably. [21:22:42] You should be able to create instances &c now, let me know if you can't. [21:25:12] Nemo_bis: WMIT is aware of the project Maps? [21:25:50] yes [21:26:16] they're discussing it on our mailing list right now [21:26:40] Bene. [21:26:43] Nemo_bis: when you create a new instance it will have a smallish drive by default (even if you select a bit image type). That's because you have allocated but unpartitioned space. You can /probably/ just ignore that and used /data/project space for your project work [21:26:55] Sure. [21:26:58] But if you need to partition the allocated space for e.g. a local db or something, that's pretty easy. [21:27:24] * andrewbogott can't remember if you know this already [21:28:57] I think I know all I need :) [22:08:18] scfc_de: can you grant shell to sabas and cortesimone? [22:14:52] Nemo_bis: One moment, please. [22:15:51] Nemo_bis: Done. [22:17:28] thanks [22:30:31] Coren: I really hate idiots [22:38:41] Betacommand: It's not good to hate yourself. [22:38:48] * a930913 runs. [22:39:13] <^d> Easier just to hate everyone. Then you're fair :) [22:40:43] a930913: Ive been trying to help a guy on the mailing list about issues with his cron, and Ive asked at least 5 times for details and all he has said is " Its the same as user XXXX" and in the case of user XXXX Coren stepped in and fixed it without giving associated details on the mailing list [22:40:54] Ah, could you imagine politically correct hating? Having to publicly define your criteria for hating someone so that everyone can be hated on a level playing field. [22:41:19] a930913: Im an equal opportunity hater [22:44:04] I take it the "there are no emergencies on Wikipedia" thing doesn't apply to labs? [22:44:06] a930913: if your asking for help and you think youve given them the needed information once already and they ask for it again whats the common sense thing to do? re-send it right? [22:44:43] Betacommand: Most people do. [22:45:24] Coren: this guy as refused to send it 5 times and gets snarky when I finally get irritated repeating myself [22:45:37] And still hasnt provided anything usable [22:46:27] really tempted to request a root revoke is account on competency requirements [22:46:41] Well, he did send the name of the tool, which helps a little bit. I hadn't noticed that to begin with. [22:47:10] Coren: if someone asks you 4-5 times and you dont send what their asking for???? [22:47:24] ... which doesn't help because as far as I can tell the script edited his crontab exactly right. [22:48:51] Coren: exactly I wish the internet had a {{trout}} button [22:49:22] Smacking someone across the face with a fish is fairly effective at making a point [22:50:10] Betacommand: I think you have made your point more than clear :-). [22:50:37] scfc_de: then why doesnt the user get it? [22:52:19] The biggest problem, that I can see, is that he hasn't said what was supposed to be wrong. None of his scripts relied on standard output or error (since he redirected both to /dev/null anyways), and he didn't use the fancy conditionals which made some break. [22:53:03] So saying "my entries have been prepended with jsub [...] like X" isn't useful since that's exactly what was supposed to happen. [22:54:08] Betacommand: Pride? Not everyone wants to beg when others don't have to. [22:54:35] scfc_de: Im offering help, no begging needed [22:59:45] Betacommand: Then I don't think you'll understand his position. [23:11:53] Coren: take a look at his most recent reply [23:11:57] again nothing useful [23:12:11] "The [cron] jobs are going to the queue, but dying."