[00:00:03] hasteur: you can also look at g13_driver.out and g13_driver.err in $HOME for more info [00:02:44] Ok, I set it for 1 min after the hour and still no execution, nor is there log files [00:04:54] hasteur: I may be going out on a limb here, but would you add me to the tool account and Ill take a look, jlocal was created because I made coren do it :P [00:08:05] Ok, what's your user? [00:08:11] hasteur: I closed the wrong window, so if you responded I missed it [00:08:26] the "Ok" was the only response. [00:08:40] Had to futz with the "manage the service group membeship" [00:09:01] Betacommand: [00:10:46] local-betacommand-dev? [00:10:53] No [00:10:59] thats my tool [00:11:07] betacommand [00:11:13] Ah... I figured it out. The membership tool leaves much in UI design [00:12:12] Ok, I added you to the group, but I'm not seeing it reflected when I query your groups [00:15:01] hasteur: one sec [00:15:20] hasteur: it just worked [00:15:25] little lag [00:26:41] hasteur: sorry for the lag, was fighting with vim [00:26:56] should know in about 2 minutes if my idea works [00:31:10] Betacommand: Hrm... I don't think it worked [00:31:22] hasteur: it didnt Im investigating now [00:32:35] I'm almost wondering that because the cron parser doesn't see jsub on the very begining of the line, it takes it into it's own head to jsub it on it's own. [00:33:21] No, that like should work [00:43:25] hasteur: facepalm. I Ill know if I figured it out in 2 minutes [00:46:00] Betacommand: I'm confused by that last statement [00:46:35] hasteur: /facepalm [00:46:58] hasteur: I think i figured out the issue. In 2 minutes Ill know [00:52:13] hasteur: can you set the drives.sh to https://dpaste.de/WcG4 [00:52:23] *drivers3.sh [00:54:11] g13_nudge_driver3.sh has been modified per the paste [00:58:28] hasteur: check your email [00:59:50] No emails, no sys mail [01:03:14] Ok just tweaked it again will know at 0105 [01:03:54] Oh hello there... looks like it fired without giving any messages [01:04:07] As of 01:03 UTC [01:04:37] Looks like it spun up right at 01:00 [01:05:11] hasteur: that was me testing manually [01:05:25] making sure that the .sh script worked [01:06:16] betacommand: You do realize that once it's parsed those jobs for the day, it won't have any more to do on those jobs for the day. [01:06:44] hasteur: Im just trying to see if the script is invoked correctly [01:07:29] and I think it is being invoked correctly now [01:07:43] a930913: You don't need jobs submitting jobs for a fork bomb. You could achieve that with "while true; do jsub ...; done" as well. [01:07:56] Test by swapping some other "component jobs" in? [01:08:15] hasteur: yeah, and tweak the cron time [01:09:35] Betacommand: Ok, we'll find out at 10 past the hour [01:10:39] YuviPanda|zzz: /var on tools-mail is full. [01:11:05] hasteur: looks like it kicked off correctly [01:11:24] Thank you sir. [01:11:58] Not sure if the bot is working or not, but at least the first cron is :) [01:12:14] !log tools tools-mail: /var is full [01:12:18] Logged the message, Master [01:13:20] Coren: [01:13:23] Betacommand: I swapped some other jobs into that bash script and it works. I'll change back to my main driver and schedule it for regular running tomorrow. Thank you immensely for helping me with this conundrum. [01:13:51] hasteur: it shouldnt have taken that long I was being a complete idiot [01:15:23] scfc_de: looks like outgoing mail is stuck somewhere [01:16:47] Betacommand: YuviPanda|zzz: /var on tools-mail is full. [01:19:40] hasteur: thanks [01:21:39] Betacommand: You're very welcome [01:26:24] andrewbogott_afk, Coren: On tools-mail, with "sudo puppetd -tv" I get "notice: Skipping run of Puppet configuration client; administratively disabled; use 'puppet Puppet configuration client --enable' to re-enable.". On "sudo puppet Puppet configuration client --enable" I get "Error: Unknown Puppet subcommand 'Puppet'". How do I enable Puppet again? [01:26:48] hasteur: now for the flood of emails [01:27:19] Betacommand: The outbound mail server is constipated? [01:27:33] hasteur: yep [01:27:52] hasteur: *The* mail server was; but you only have five mails waiting. Bigger mess is Yahoo! that clogs our logs. [01:28:19] hasteur: IE the emails that would have helped us debug :) [01:28:52] scfc_de: contact the yahoo members and suggest that they use google :P [01:29:05] *smirk* [01:29:48] hasteur: feel free to remove me from the tool [01:30:13] well Im hungry, so Ill see you guys later after I eat [01:30:25] Eat well! [01:36:02] !log tools tools-mail: Freezing all messages to Yahoo!: "421 4.7.1 [TS03] All messages from 208.80.155.162 will be permanently deferred; Retrying will NOT succeed. See http://postmaster.yahoo.com/421-ts03.html" [01:36:05] Logged the message, Master [01:36:43] scfc_de, ?? [01:37:01] GDI.... Go home yahoo, you're drunk with faux power. [01:46:49] !log tools hazard-bot: Disabled minutely cron job github-updater [01:46:52] Logged the message, Master [01:48:48] Removing /var/lib/puppet/state/puppetdlock fixed the Puppet thing. [02:36:34] !log tools tools-mail: Removed all jsub notifications from hazard-bot from queue. [02:36:36] Logged the message, Master [02:44:58] !log tools tools-mail: Enabled role::labs::lvm::biglogs, moved data around & rebooted. [02:45:01] Logged the message, Master [03:49:36] Coren? [03:49:41] Coren: https://en.wikipedia.org/w/index.php?title=Special:Log/spamblacklist&limit=500&type=spamblacklist&user= [03:50:07] And the one-and-a-half hour before that .. [03:53:10] Coren, I blocked CorenSearchBot [10:01:45] !log integration rebase operations/puppet on puppetmaster. A bunch of contint related changes have been merged yesterday and this morning. [10:01:47] Logged the message, Master [11:06:10] YuviPanda|zzz / legoktm: did either of you fix wikibugs / what was the issue? [11:19:33] valhallasw: tools-mail was full, thus blocking mail to wikibugs (I think; I just looked at the mail queue). Is wikibugs still receiving mail or has it been unsubscribed due to that? [12:08:43] scfc_de: not sure. Let me check. [12:09:25] last mail seems to be Date: Wed, 21 May 2014 14:28:17 +0000 [12:09:49] so probably broken :-p. Let me re-subscribe, then. [12:11:13] yet mail delivery is set to 'enabled'. Hrm. [12:17:32] !log local-wikibugs mail delivery broken; direct mails complain about open("~/mailout.log", "a") in to_redis.py; commented out those lines [12:17:34] Logged the message, Master [12:18:07] !log local-wikibugs gmail-to-wikibugs delivery is now functional; hopefully wikibugs-l@lists.wm.o delivery too... [12:18:09] Logged the message, Master [12:19:13] YuviPanda|zzz: https://github.com/keithw/mosh/issues/120 :) [12:27:51] !log local-wikibugs wikibugs-l@l.wm.o delivery functional again and wikibugs is correctly reporting to IRC [12:27:52] Logged the message, Master [12:29:22] scfc_de: oh goodie, another reason yahoo sucks :-p [12:33:25] valhallasw: IIRC I read in a backscroll some time ago that Cyberpower678 once manually classified those submit messages as spam. And as there's no "normal" mail from tools-mail, for the filter it probably looks as if tools-mail only sends out those :-). [12:38:06] hi all [12:38:14] is it possible to rename an instance? [12:43:58] scfc_de: *facepalm* [13:03:31] scfc_de, can you send that Yahoo block email again. I inadvertantly deleted it. [13:08:06] Cyberpower678: http://permalink.gmane.org/gmane.org.wikimedia.labs/2551 [13:09:09] scfc_de, my tool is creating mail? :O [13:09:25] but all cronjobs use -quiet now. [13:10:15] Also, I'm not running a cronjob every minute. [13:13:09] * Cyberpower678 recalls setting up a block on his account for all mail coming from tools, that night he was being spammed by it. [13:13:16] scfc_de, ^ [13:15:43] Unfortunately, I don't know how to undo it. [13:18:43] Cyberpower678: No, that wasn't /caused/ by you. But you are affected by it in that a (1) mail from xtools is stuck ("Your job 1035416 ("clearArticleinfo") has been submitted"). [13:19:09] Urgh. [13:19:12] Oh that. [13:24:53] scfc_de: 'puppetd --enable' The error message just plain lies. :-) [13:32:36] a) Yes :-). b) I'm not sure if that would have helped; I believe the problem was a stale lock file. But I'm sure I'll see that message again :-). [13:35:09] Best headline ever: "Galaxy Nexus: Android Ice Cream Sandwich guinea pig" [13:40:20] 3Wikimedia Labs / 3tools: Harden mail server against incoming spam - 10https://bugzilla.wikimedia.org/65629 (10Tim Landscheidt) 3NEW p:3Unprio s:3enhanc a:3Marc A. Pelletier Currently, the mail queue has a handful of outgoing bounces that relate to mails to user@tools.wmflabs.org (an existing mail ad... [14:56:41] I want to test an nginx config change from the public internet ( https://www.ssllabs.com/ ). That one: https://gerrit.wikimedia.org/r/#/c/132393/ [15:05:23] I suppose that requires a public IP, thus can anyone assign me one? [15:05:56] It does; it normally can be assigned from wikitech by any project admin unless you have no quota. Have you tried? [15:06:00] Any suggestions for a project? otherwise: https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet-cleanup [15:06:09] hi jzerebecki [15:06:13] hey mutante [15:06:18] Coren: i think he needs the quota raised.. i can do? [15:06:34] yes please [15:06:37] jzerebecki: ah, it depends on the project [15:06:42] the quota for IPs [15:07:07] jzerebecki: there's no "nginx" project, is there? [15:07:29] mutante: You want to raise the quota? Be my guest. [15:07:48] I was about to but you're welcome to it. :-P [15:07:49] Coren: yes, and i think i'm tempted to just make a new project [15:07:57] vs. putting all kinds of things into generic "puppet test" [15:08:09] does that sound good to you? [15:08:19] just giving him an nginx project [15:08:50] mutante: I should say that depends greattly on whether you expect that you're going to bring down and up instances often for different tests; the generic project sounds sane in that case. If you see a more continual use for the nginx testing aspect, then it makes sense to spin it off. [15:09:12] oh there is an nginx project [15:09:31] Either way, a 'project' is a very lightweight entity and there is no overhead to creating more of them except for names. [15:09:37] my test is not permanent in any sense [15:10:21] mutante: In other words: I have no opinion. :-) Do what makes sense to you organizationally. :-) [15:10:42] Coren: hehe, ok :) [15:10:56] https://wikitech.wikimedia.org/wiki/Nova_Resource:Nginx "A project for building and testing our modifications to nginx." [15:13:13] mutante: so please add me as admin there and increase the public IP limit if there is none free [15:15:41] !log nginx added jzerebecki as member and admin [15:15:41] Logged the message, Master [15:15:41] !log nginx raised floating IP quota to 1 [15:15:42] Logged the message, Master [15:15:55] jzerebecki: now go ahead and make yourself a new instance i'd say [15:16:05] then you should be able to assign the IP to that [15:17:12] mutante: thx. allocating an ip address worked. [15:18:06] cool! [15:22:33] Coren: ..and i updated the docs mostly https://wikitech.wikimedia.org/w/index.php?title=Help%3ANova-manage&diff=113934&oldid=72983 [15:22:43] the syntax for this changed since last nova version etc [15:23:18] nova-manage -> nova quota-update .. [15:23:24] Ah, yes, it did. In fact, they keep changing it every release. :-) [15:23:38] @replag [15:23:38] Replication lag is approximately 00:00:00.5216710 [15:23:41] I think they want people to use the API and not the command-line utilities. [15:24:06] command-line FTW. kill these people with fire [15:24:15] hehe [15:27:36] hello, periodically i get "Permission denied (public key)." on one of my instances.. has someone encountered similar access problems to eqiad-instances? [15:30:21] mukil1: The problem is intermittent? [15:30:29] mukil1: I think that almost everyone who ever used labs did encounter it [15:30:32] :P [15:31:47] @Coren: yes, the problems is intermittent. [15:32:44] mukil1: It might be worthwhile to examine the logs (especially /var/log/auth.log) when that happens; the information there will clarify /what/ fails at least and that'll help figuring out how to prevent it. [15:33:19] for example: just this morning I had no problems at all logging in and now out of a sudden, I cant access the instance anymore (using the ProxyCommand option) [15:33:41] Coren: yes, as soon as i get in again, i will do so and investiage the auth.log [15:33:50] mukil1: That happened just now? What's the name of the instance? I could go see. [15:34:23] thanks a lot, since i just wanted to update the deployment (and demo that in 90mins or so).. [15:34:39] wikidata-topicmaps.eqiad.wmflabs [15:38:33] mukil1: I think your instance is in need of a reboot; even my root keys won't let me in. [15:42:34] mukil1: One possible cause: was that instance migrated from Tampa? [15:42:43] hmm.. do we ever actually delete users? [15:43:00] from ldap? [15:43:02] got a request to delete a gerrit user [15:43:04] mutante: Generally not, though service group "users" can be deleted. [15:43:06] yea [15:43:21] well, it's about what users show up in gerrit [15:43:25] I think some people were renamed in ldap in past [15:43:32] not sure if removed though [15:43:37] mutante: But I'm pretty sure it's like projects and there are issues with attribution if we delete accounts. [15:43:58] that's pretty much what i said first [15:44:03] then i got "It's okay cos there is no work associated with the bogus awight account." [15:44:52] valhallasw: sorry, couldn't. crashed, but figured it was the mail server. [15:45:01] mutante: I don't think it's been /tried/ before, and honestly I'd be wary because I know gerrit is finicky and that even just /renaming/ a user caused headaches. Is there a particularily compelling reason to delete the account that'd justify the trouble? [15:45:21] "My work email awight@wikimedia.org appears under two accounts in gerrit, adamw and awight, which is confusing to people trying to add me as a reviewer, and is a pain for me to deal with because only one account is active and has privileges. [15:45:38] Otherwise, I'd answer "just stop using it, it's harmless" [15:45:40] " I was one of the people whose production accounts got cleaned up recently, I assume this is related fallout." [15:45:45] i'm not sure if the second part is true [15:45:55] hmm. yea [15:47:20] mutante: It might be easier to strip the email from the "bad" account than delete it [15:47:50] I'm pretty sure that won't break anything. [15:48:01] Coren: you are saying exactly what i said :) [15:48:11] "afaik deleting a gerrit user entirely will be problematic without messing up gerrit/git history. but you/somebody should be able to change the email address at least" [15:48:12] GMTA [15:48:30] "Changing the email will not help, my coworkers will type "awight" and it [15:48:33] will find the bad account by username. Are you suggesting the email is [15:48:36] "don't pick me@Wmf"? Because that is hilarious and terrible." [15:49:02] :) oh well [15:51:50] Yeah, that'd work. :-) I'm not comfortable enough with our slightly baroque gerrit auth infrastructure to even reasonably estimate how likely we'd be to break things by removing the account, but I know abogott has just finished a fight with it to /rename/ an account so he might be able to say how feasable tthat is. [15:53:01] (03PS1) 10Alexandros Kosiaris: osm: add ganglia postgres credentials [labs/private] - 10https://gerrit.wikimedia.org/r/134839 [15:56:04] (wikidata-topicmaps is "rebooting") [15:57:00] Coren: I think it should be possible to remove a gerrit account… it might be messy and/or broken, but since the goal is not to have it work in the end anyway... [15:57:32] (wikidata-topicmaps is ACTIVE again) [15:57:36] You'd want to delete the user from ldap by hand first. And then muck around in the gerrit db to remove the entries. [15:58:50] mukil: Can you log in? [15:59:24] no, actually the ssh did not got up [15:59:42] mukil: I'm on, so I'll be able to take a look around. [16:13:26] Coren: thank you.. so, ssh of wikidata-topicmaps is up now but i still get a "Permission denied.." [16:17:31] mukil: Lemme look at what's up. [16:19:15] mukil: I'm seeing your public keys as visible, and sshd complains that you just don't have a matching key; it's not permissions at least. [16:20:18] Coren: Ok, thanks for the info, sounds really like a client-side issue at my side .. [17:15:18] 3Wikimedia Labs / 3Infrastructure: Add trebuchet user to wikidev group - 10https://bugzilla.wikimedia.org/62843#c1 (10Daniel Zahn) Bryan, i did that. please let me know if all is fine and it worked this way. modify-ldap-group --addmembers=trebuchet wikidev root@silver:~# ldaplist -l group wikidev | grep tr... [17:20:15] (03PS1) 10JanZerebecki: Add labs ssl key for unified.wikimedia.org [labs/private] - 10https://gerrit.wikimedia.org/r/134852 [17:35:20] scfc_de: re: the disk being full on the mail host. Do we have some sort of monitoring for those things at all? [17:44:18] 3Wikimedia Labs / 3Infrastructure: Add trebuchet user to wikidev group - 10https://bugzilla.wikimedia.org/62843#c2 (10Bryan Davis) 5NEW>3RES/FIX deployment-bastion:~ bd808$ id trebuchet uid=604(trebuchet) gid=604(trebuchet) groups=500(wikidev),604(trebuchet) tin:~ bd808$ id trebuchet uid=995(trebuchet)... [17:50:56] Hi, I want to host a server side redirect of my client side app for wikimedia login, please tell me where, if WMFLabs, then how to apply for one? [17:57:14] Ok now i have my test instance set up, correct security group, public ip and dns associated... but https://https-test.wmflabs.org/ gives connection refused. [17:57:34] instance name jzerebecki-forwardsecrecy project nginx [17:57:42] any ideas? [17:57:47] mutante: ^ [17:57:47] jzerebecki: You'll have to open port 80 to the outside from within the security groups [17:57:55] It's closed by default. [17:58:20] Coren: 80 and 443 are open [17:58:51] And can you connect to nginx on those ports from the instance proper? [17:59:27] oops [17:59:30] thx [18:13:39] Coren: Hm.. I'm trying to stop/start a webservice but it seems to take forever. [18:13:46] Checking the job status directly I get the same result [18:13:51] qdel -j 1057632 [18:14:08] qdel 1057632# * [18:14:52] Krinkle: lighttpd normally won't quit unless its last connection is closed; there may be a client holding it open? [18:15:18] The only request I made to it is one by myself which resulted in a 500 internal server because I had a php syntax error [18:15:35] however that one returned immediately [18:15:35] is there a non graceful stop? [18:15:48] maybe it's holding it in the backend due to some bug [18:16:11] great [18:16:49] Krinkle: Not with qdel, but you can go on the node and forcibly kill it any way you want (including -9). But lemme take a peek at the job first to see if there is something more serious at issue. [18:17:03] It just stopped finally [18:17:09] sorry :) [18:17:21] jobid 1057632 for your inspection (if possible) [18:17:37] Krinkle: No worries. If it was over a few tens of seconds' delay it /was/ unusual. [18:17:52] Coren: it took a total of 4.5 minutes [18:18:03] no, 9 minutes [18:18:24] YuviPanda: Yes and no. The minimalistic Icinga that was set up in early pmtpa days had an uncustomized check for x % free on all partitions IIRC, and looking at http://icinga.wmflabs.org/cgi-bin/icinga/status.cgi?search_string=tools-mail, that seems still up. But it was never set up to alert anyone specific, and Icinga wasn't very stable either, and the #wikimedia-labs-nagios (or -icinga?) channel is unbearable if you just want to [18:18:24] focus on one project. [18:18:40] Coren: https://gist.github.com/Krinkle/0b677d777ce0d2c36f1c [18:19:05] (and in a separate tab I tried qdel, but that just returned saying it was already "in deletion") [18:20:36] Krinkle: Yeah, webservice stop just figured out what to qdel then does just that. [18:20:40] figures* [18:22:55] apsdehal: Welcome to Tools! However I don't know anything about your specific use case and how to achieve it most easily. [18:25:49] scfc_de: Thanks, I just need a server and ssh access to it, may be a domain, rest is upto me [18:26:50] apsdehal: Just take a look at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help and see if it fits. [18:31:41] scfc_de It seems this will work, thanks a lot [19:11:35] (03PS1) 10Merlijn van Deen: Move to new pywikibot channel [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/134866 [19:11:57] (03CR) 10Merlijn van Deen: [C: 032 V: 032] Move to new pywikibot channel [labs/tools/pywikibugs] - 10https://gerrit.wikimedia.org/r/134866 (owner: 10Merlijn van Deen) [19:14:14] !log local-wikibugs changed git repo to have gerrit as master [19:14:16] Logged the message, Master [19:14:56] wikibuuugs [19:15:25] * valhallasw prods wikibugs [19:15:44] what happpened to ittt [19:16:33] I just missed the rejoin :-p should be OK now [19:16:43] YuviPanda, you adminster grrrit-wm, right? [19:16:50] could you change #pywikipediabot to #pywikibot? [19:16:54] valhallasw: yeah, sure [19:17:57] (03PS1) 10Yuvipanda: Move to new pywikibot channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/134867 [19:17:59] valhallasw: ^ [19:19:35] (03CR) 10Merlijn van Deen: [C: 031] Move to new pywikibot channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/134867 (owner: 10Yuvipanda) [19:19:50] (03CR) 10Yuvipanda: [C: 032] Move to new pywikibot channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/134867 (owner: 10Yuvipanda) [19:19:53] (03Merged) 10jenkins-bot: Move to new pywikibot channel [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/134867 (owner: 10Yuvipanda) [19:19:58] valhallasw: imma deploy [19:20:49] valhallasw: deployed! [19:23:45] \o/ [19:36:18] 3Wikimedia Labs / 3tools: webservice does not start - 10https://bugzilla.wikimedia.org/58931#c7 (10seth) 5ASS>3RES/FIX it runs! ;-) [20:53:31] is there any workaround to the lack of the text table in databases? [20:56:40] jackmcbarn: call the api [21:38:20] !log deployment-prep Deployed scap 096cb3f [21:38:22] Logged the message, Master [22:59:06] !log deployment-prep Added matanya as a project memeber [22:59:08] Logged the message, Master [23:00:12] matanya: ^ [23:00:20] not that you weren't pinged with the !log, but, ya know [23:00:48] !log deployment-prep Added 20after4 as a project admin [23:00:50] Logged the message, Master [23:01:02] twent: damn [23:03:48] mwalker: FYI in beta someone has to manually pull in new changes from operations/puppet.git. See https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Puppet_and_Salt [23:04:19] I do it a couple times a day and "have it on my list" to automate with cron [23:05:05] ah! [23:05:13] can you do that now or soonish? [23:08:05] hmm, i wonder if that could be a post-merge hook [23:08:10] that actively tells labs there is new stuff [23:08:54] btw.. this is like back when we had a prod and a test branch, just now we have repos [23:09:04] and 2 masters [23:23:06] mwalker: {{done}} [23:23:51] mutante: What I'll probably end up doing is making it a jenkins jobs that is triggered by zuul [23:23:59] that would be optimal [23:24:05] less work for everyone once its set up [23:24:10] but might be a lot of work for you now [23:24:12] sounds good [23:24:46] I have a shell script that does it so the hard parts are just making that host a jenkins slave and writing a little jjb [23:25:19] I might use it as a teaching exercise for Mukunda [23:27:24] perfect, or it would have been a hashar thing [23:27:36] and we want to load-balance:) [23:27:45] gotta run.. no fooood [23:28:08] now to determine why my service isn't picking up the new variable name [23:29:05] its a puppet thing; because it's not in the config file [23:29:09] might be related to the ssh thing [23:35:51] Vi sitter här I venten och spelar wm-bota [23:36:28] Has it always been called that? [23:37:27] bd808, can you check my work for the pdf thingy? for some reason still things the redis host should be the one in production [23:37:57] mwalker: I was just starting to look. :) [23:40:00] mwalker: Try again? It turns out that when I said I'd updated the puppet master I was lying [23:40:13] My shell script is messed up apparently [23:40:31] * mwalker kicks puppet [23:41:03] bd808, looks like that did it [23:41:48] thanks much [23:41:58] * mwalker now attempts to figure out why the service itself is still unhappy [23:42:12] much easier problem than puppet :)