[02:52:09] I'm back to having permission problems on a labs instance. `sudo su - vagrant; touch file` (permission denied). But /mnt/mediawiki/extensions is drwxr-xr-x 9 vagrant www-data. [02:54:06] as user vagrant I can create files in /mnt/vagrant/mediawiki but not /mnt/vagrant/mediawiki/extensions and its subdirectories. I don't see why [02:55:06] Is it extra ACLs or extended attributes? This happened on ee-flow-extra.eqiad.wmflabs and then it went away. [02:58:59] ... and now it's working [03:11:22] (03PS1) 10coren: Tool Labs: tweaks to the landing page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123509 [03:11:59] (03CR) 10coren: [C: 032 V: 032] Tool Labs: tweaks to the landing page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123509 (owner: 10coren) [03:25:25] (03PS1) 10coren: More minor tweaks to website [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123510 [03:25:46] (03CR) 10coren: [C: 032 V: 032] "It's all good." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123510 (owner: 10coren) [03:32:39] (03PS1) 10coren: Forgot two HTTP_ [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123511 [03:33:00] (03CR) 10coren: [C: 032 V: 032] Forgot two HTTP_ [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123511 (owner: 10coren) [03:41:44] (03PS1) 10coren: Use /admin/libs rather than /libs [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123513 [03:42:04] (03CR) 10coren: [C: 032 V: 032] Use /admin/libs rather than /libs [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123513 (owner: 10coren) [03:43:43] (03PS1) 10coren: ... also /img for the same reason [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123514 [03:44:16] (03CR) 10coren: [C: 032 V: 032] ... also /img for the same reason [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123514 (owner: 10coren) [07:32:52] hello [08:08:21] hi hashar [08:08:43] can you please share with me your plans for the puppet-lint jenkins jobs ? [08:14:43] matanya: there is no plan [08:14:55] just thought it might be a good idea to show the puppet-lint errors automatically :-] [08:15:51] that is a good plan [08:18:28] matanya: do you happen to be ruby proficient? [08:18:46] not quite [08:18:57] it would be awesome for puppet-lint to support reading a puppetlintrc file or something like that to ignore checks on a per repo basis :D [08:19:12] I guess I should fill a bug report [08:19:20] it is missing check style output support as well [08:20:08] hashar: you can add a rack test for this task [08:20:26] rack -> rake ? [08:20:29] yes [08:20:37] add a require statement in a rake file [08:20:37] yeah that is what we do on operations/puppet [08:20:44] PuppetLint.configuration.send("disable_") [08:21:01] e.g. PuppetLint.configuration.send("disable_80chars") [08:21:02] see rake lint in operations/puppet [08:22:46] oh, see. so why doesn't this fulfil your needs? [08:23:09] sounds easier to edit a flat text file over a rake file [08:25:23] Oh, LOLs, my guide on wikitech is higher on googel results than the official docs :) [08:25:33] link? [08:25:46] note that google search results are customized for you [08:26:09] I get https://www.etsy.com/search?q=muppet+style+puppet [08:26:10] :D [08:26:10] they aren't, i have no google account [08:26:21] cookie / user-agent!!!! [08:26:27] hehe [08:27:09] hashar: i don't save cookies [08:27:25] execpt for wiki(m|p)dia [08:27:32] so they are left with your user agent and ip :-] [08:27:48] + all the google analytics data they gathered with that couple [08:27:50] which changes every browsing [08:27:53] but I am being paranoid hehe [08:28:16] well [08:28:18] https://www.google.co.il/search?q=puppet+code+style [08:28:26] this is the search term ^ [08:28:31] @home and still haven't took a shower so gottta clean up a bit then I will update https://wikitech.wikimedia.org/wiki/Ganglia [08:28:48] nice [08:28:51] i get https://wikitech.wikimedia.org/wiki/Puppet_coding‎ as the first result [08:29:20] have fun with that. poke me if you need help [08:29:49] with safari private navigation I get the official guides first "Style Guide" and "Best Practices" [08:29:54] ie https://encrypted.google.com/url?sa=t&rct=j&q=puppet%20code%20style&source=web&cd=2&cad=rja&uact=8&ved=0CC0QFjAB&url=http%3A%2F%2Fdocs.puppetlabs.com%2Fguides%2Fbest_practices.html&ei=4xs9U9T-MaOI0AXbrICQBA&usg=AFQjCNGLGs-kLJ3955989aRElbS7IYYDQA&bvm=bv.63934634,d.d2k [08:29:57] baah [08:30:02] http://docs.puppetlabs.com/guides/best_practices.html [08:30:12] then puppet coding, which is still awesome [08:31:04] bbl [08:48:35] !ping [08:48:35] !pong [08:58:01] Coren: I see you've fixed my inept socket code (my first socket code as such) and it now works as it should! Thanks and sorry for the painfully bad code :) [08:58:27] I realized it this morning and also remembered that when I last tested it I had to ctrl-D before I would get a response. [10:21:37] !log integration rebased puppet repository on puppetmaster [10:21:40] Logged the message, Master [10:22:33] !log integration attempting to reinstall hhvm on Jenkins slaves (cherry pick of {{gerrit|123573}} ) [10:22:35] Logged the message, Master [11:59:31] YuviPanda: I'm still tweaking the actual lua code, have you worked on it since yesterday at all? [13:27:08] anyone online who might be able to add a user on the beta cluster to the GWToolset group [13:27:27] user name is Theobald Tiger [13:51:15] hashar: can i import wiki pages to the beta cluster ? [13:51:32] matanya: Special:Import ? [13:51:41] it doesn't have wikipedia [13:51:46] only sister projects [13:51:58] need mediawiki-config change ? [13:52:00] what do you mean ? [13:52:04] http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page [13:52:10] there is the beta cluster wikipedia [13:52:27] http://he.wikipedia.beta.wmflabs.org/wiki/%D7%A2%D7%9E%D7%95%D7%93_%D7%A8%D7%90%D7%A9%D7%99 [13:52:44] i want to test some RTL issues here ^ [13:52:52] but not many pages :) [13:54:03] hehe [13:54:16] I am not sure how to grant import rights [13:54:47] i have there hashar [13:56:03] it is about the import tool doesn't have wikipedia as a source, and i think it need a mediawiki-config change to allow that [13:57:00] ah yeah [13:57:27] I am not sure how it is configured though [13:57:45] not in the same tree as production ? [13:57:57] under operations/mediawiki-config [13:58:19] hashar, has just https to gerrit been blocked? And since when? [13:58:50] valhallasw: toolslabs server have been blacklisted overnight [13:58:54] I posted to labs-l about it [13:59:01] some bot was hammering Gerrit and killing it :-/ [13:59:09] Yeah, I saw the e-mail. Strangely enough, Gerrit Reviewer Bot is not erroring out. [13:59:36] and it's using the gerrit REST api, which should be blocked if I understand you correctly [13:59:37] it might use gerrit stream-events command over ssh [13:59:40] ah [13:59:46] it uses REST to get info on patchsets [13:59:47] maybe the iptables got removed :-] [14:00:47] yeah, or python-gerrit-rest is buggy and just returns nothing if the connection fails [14:01:15] valhallasw: did you just run it? [14:01:23] gerrit seems to be accessible from tools-exec-03 (telnet port 443) [14:01:29] matanya: it runs every five minutes [14:01:32] valhallasw: What User-Agent do your tools use? AFAIUI, the bad one's known. [14:01:42] yeah, that's a good one to check. Not sure. [14:02:45] the standard python-requests user agent, I'm guessing. I should fix that. [14:03:24] yeah we should 403 whenever we receive a default user agent :] [14:03:27] Even so, as your bot has been running for months (?), unlikely that it went silently amok yesterday. [14:03:55] hashar: I see. 'python-requests/x.y.z' does not count as default? ;-) [14:04:16] better than using urllib ! [14:05:09] well, this is retarded. There is no docs on how to change the user-agent... [14:08:12] valhallasw: http://stackoverflow.com/questions/10606133/how-to-send-user-agent-in-requests-library-in-python not useful? [14:08:56] yeah, that works. It's bizarre there's not just a global setting, though. [14:09:06] anyway, should be 'Gerrit-Reviewer-Bot GerritREST python-requests/2.2.1' now [14:14:48] great :-] [14:21:06] hi [14:24:30] http://tools.wmflabs.org/vtwo/ , this address not work(Recently). i have shell account(for bot)... [14:25:27] Mehrdad: You need to "become vtwo", and then "webservice start". [14:30:07] thanks. please explain in more detail... [14:40:18] Mehrdad: After the migration to the new data center, you now have to explicitely start a webserver if your tool has a web part. You can find information on the details at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Web_services [14:46:56] scfc_de: thank you very much. The problem was solved. "Starting webservice... started." [14:57:35] (03Abandoned) 10Tim Landscheidt: Use absolute paths especially in error pages [labs/toollabs] - 10https://gerrit.wikimedia.org/r/103641 (owner: 10Tim Landscheidt) [15:00:20] (03PS1) 10coren: Make /admin/index.php handle every URI [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123642 [15:03:07] (03CR) 10coren: [C: 032 V: 032] Make /admin/index.php handle every URI [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123642 (owner: 10coren) [15:11:57] Bah, sorry for disruption to the landing page; trying to track down the bug now. [15:20:06] (03PS1) 10coren: We only care about a few mime types [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123652 [15:20:28] (03CR) 10coren: [C: 032 V: 032] We only care about a few mime types [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123652 (owner: 10coren) [15:33:53] !log deployment-prep Restarted logstash on deploymnet-logstash1; Stuck in a bad state due to jvm oom logged at 2014-04-03T12:03:43Z [15:33:55] Logged the message, Master [16:36:27] Sorry about the disruption to the web, people. I managed to restart every single webservice by accident which had the poor grid struggle to start everything. [16:42:03] (03PS1) 10coren: Final tweaks to the /admin/index.php [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123666 [16:42:29] (03CR) 10coren: [C: 032 V: 032] Final tweaks to the /admin/index.php [labs/toollabs] - 10https://gerrit.wikimedia.org/r/123666 (owner: 10coren) [16:49:06] YuviPanda: FYI: I think I'm done. [16:50:26] YuviPanda: All that's left is (should be) to create the 'real' proxy instance, turn SSL on, and switch away from the current one. Alternately, we could just switch the class on the existing tools-webproxy [16:51:00] Eitherway, it's a brief outage since webservices will have to be restarted (unless you know of a way to export then reimport the reddis db?) [17:22:27] Coren, is there a glitch in the queue master of grid? [17:23:07] Coren, because I think I'm looking at one right now. [17:24:08] petan, scfc_de ^^ [17:25:59] Hmm... No one vital to this is here. [17:28:05] Cyberpower678: Perhaps you should describe actual issues, not vague thing like "a glitch in the queue master" [17:28:30] Coren, 2 identical tasks are running on the grid under cyberbot [17:28:35] Have a look. [17:28:42] 2 spambot instances. [17:28:52] ... yes? [17:28:59] Coren, that shouldn't [17:29:01] happen [17:29:10] Why is that? [17:29:33] The grid would usually return an error that the job already exists. [17:29:43] This time it didn't [17:30:48] Coren, I depend on the grid doing that. [17:31:40] Interesting. I take it you used -once? [17:31:48] Yes. [17:32:05] Hm. [17:32:20] !log deployment-prep Fixed certname in /etc/puppet/puppet.conf manually on deployment-bastion so puppet would run again. [17:32:23] Logged the message, Master [17:33:22] I'm not clear how that could have happened; I suppose that jstart's attempt to find out if there already was a job by that name failed for some reason -- I should probably add code to be paranoid and refuse to start at all if it can't determine whether it is with certainty. [17:33:23] Cyberpower678: Don't you use "jsub -quiet -q cyberbot -mem 2g -cwd -continuous"? -continuous includes -once, but just for clarity. [17:33:51] scfc_de, all the continuous tasks use -continuous [17:34:07] Yeah, -continuous implies -once anyways. [17:34:09] scfc_de, all the cron powered tasks use -once [17:34:31] But the second job was started at Wednesday 0:00Z, so I would have expected more duplettes. [17:34:48] Hence me calling it a glitch. [17:34:54] Yeah, it's obviously a transient failiure. [17:35:06] I'll need to bulletproof that check. Plz to open a bz? [17:35:06] Cyberpower678: The crontab for tools.cyberbot uses "-continuous". [17:35:42] I meant that midnight UTC is a time where probably many jsubs are started with similar configurations, so similar problems would have occured. [17:35:45] scfc_de: That's immaterial, -continuous includes -once anyways. [17:36:14] scfc_de, did you read the entire crontab? [17:36:22] It's somewhat big. [17:37:27] scfc_de, the crontab entries listed on the bottom are continuous tasks that are supposed to remain active but get killed for what ever reason, and it's job removed. [17:37:47] Especially with transient failures I want to be very certain what the actual trigger is. So, if Cyberpower678 says "the cron powered tasks use -once", and the crontab then uses "-continuous", there's a discrepancy. [17:38:57] They're listed to restart those tasks in the event of such failures. [17:39:21] scfc_de, are you reading my messages? [17:40:04] Take a guess. [17:40:09] Yes. [17:40:51] So in short the non-continuous tasks use -once. The rest use -continuous [17:41:54] * Cyberpower678 needs to declutter the crontab [17:42:45] Coren, I hope your issue is easier to fix than my issue is. :P [17:42:55] scfc_de: were you working on setting up a wikitech-style image in labs? Or was that someone else? [17:44:27] Coren: tools.cluebot's cbng_redis_relay seems to be another instance (though Monday 0:00Z, and the crontab is *§#!3). [17:44:41] andrewbogott: No? What do you mean by wikitech-style image? [17:44:59] scfc_de: ok then :) [17:45:14] In pmtpa we had labs instances that were set up like wikitech, e.g. nova-precise2 [17:45:28] nova-precise2 died during migration. Someone was trying to build a new one, can't remember who. [17:46:10] andrewbogott: That's only a secret dream of me to test OpenStackManager and LDAP patches :-). But I think ... Damianz? Let's look in the logs. [17:47:13] Coren, the work you were doing with Yuvi about proxying… is that an expansion of the proxy that's running on dynamicproxy-gateway.eqiad.wmflabs or unrelated? [17:47:36] (Asking because I did a fair bit of dev/puppetizing for that, want to make sure we aren't accidentally forking) [17:55:44] andrewbogott: It's a variant; it uses mostly different nginx config but it otherwise the same. [17:56:16] Coren: ok… puppetized someplace? [17:56:45] andrewbogott: Yeah, same class but different parameter. [17:56:59] ok… I'll stop worrying then :) [17:57:32] Coren, my next task that you care about is… the service-groups thing, right? Or are there other things you're waiting on from me? [17:57:53] andrewbogott: No, the service group thing is basically it afaik. [17:57:59] ok [17:58:11] And even then, it's mostly cosmetic so if you have higher priority stuff I'll live. :-) [17:58:34] So, if I recall correctly… all of the old-school groups are now copied into the new format, right? So I can rip out any code that supports local-* [17:58:42] Right. [17:59:16] ok. I will try to look at that 'soon' [17:59:25] * andrewbogott is distracted by impending travel [18:00:23] Speaking of ripping out. [18:03:54] Coren, did you find the transient failure? [18:04:32] Cyberpower678: I haven't looked, nor am I likely to for some time. [18:04:50] Coren, ok. [18:05:00] I'll let you know if it happens again. [18:46:01] Coren: you fixed it! [18:46:04] Coren: yay! [18:46:20] Coren: I figured that was the problem today morning and then went to fix it and found it was already fixed :) [18:47:30] YuviPanda: I think it's fully working now. Well, except for the bit about no ssl. [18:47:37] Coren: no ssl? [18:47:44] Coren: oh, right. that just needs a cert, yeah. [18:48:02] No point it putting one until it's made the "official" proxy. [18:48:07] Coren: yeah, agreed. [18:48:20] The other point is: is the redis db synced to disk? [18:48:46] Coren: yeah, by default. why? [18:48:54] Because otherwise if it ever goes down all routes are lost and can't return until all webservices are restarted -- which would be bad. [18:49:19] Coren: it's written to disk, yeah. *worst* case failure mode is that it loses the currently in-progress transaction. [18:49:24] Also, if you make the "real" proxy, we'd want to copy the redis db over for that reason. So all is good. [18:49:53] Coren: but it's problematic when it goes down anyway since the proxylistener would also die and with it all sockets and then I don't know how portgrabber will handle it and no cleanup either. [18:50:15] so in terms of crashyness I'm more worried about proxylistener than anything else [18:50:38] YuviPanda: Well, worse case scenarios is that the keys linger in the redis DB until they are replaced. [18:50:51] Coren: hmm, that sounds not too bad. [18:51:02] provided they are replaced properly and I see no reason why they won't be. [18:51:20] YuviPanda: It's actually a pretty good failure mode because it means that if proxylistener dies no webservice looses contact. [18:51:29] yeah! [18:51:37] The only "real" issue is a webservice that is later stopped will have a key lingering until it is restarted again. [18:51:39] Coren: will portgrabber crash if the socsket dies? [18:51:48] *socket [18:51:49] YuviPanda: No. It doesn't care. [18:51:55] Coren: ah, right. then cool :) [18:52:05] Coren: so what's left before making this a 'real' proxy? [18:52:17] Coren: create tools-proxy and just apply the puppet class? [18:52:21] Coren: and how large do you want to make it? [18:52:34] it's runrning redis and nginx, so I'd like to give it at least 2 cores if not more. [18:52:37] YuviPanda: Moar testing? Otherwise, the simplest thing to do would be to apply the class to tools-webproxy. [18:53:05] Coren: if we apply it to that won't it conflict with apache for port 80? [18:53:13] Coren: yeah for more testing, especially loadtesting. [18:53:33] Coren: is there a way for us to setup DNS so maybe 10% of tools.wmflabs.org resolves to this and like 90% goes to the other? [18:53:44] YuviPanda: It means turning of the apache first (and removing the class), but then the box can easily be switched between one and the other. [18:53:50] right [18:54:18] YuviPanda: I /could/ setup a RR in DNS, but I really don't want to. That makes debugging nearly impossible. [18:54:26] Coren: right. [18:54:43] Coren: how about just having the current URL on labs-l and just asking people to test? Or is that mostly useless feedback? [18:54:54] Coren: also wouldn't all webservices need to be restarted at least once? [18:54:56] I think I'm just going to throw endusers at it for a while, then switch. [18:55:03] heh [18:55:18] YuviPanda: I already did so. Both proxies are talking to all webservices now. [18:55:24] Coren: woah! sweet [18:55:36] Coren: I got a patch in earlier today that logged proxylistner to /var/log/proxylistener [18:56:01] * Coren nods. [18:56:23] I don't know if you noticed, but I've split the .conf according to proxy type too. [18:56:32] Coren: yeah, I saw the fixing up patches. YAY :) [18:57:14] Coren: so when do you want to throw people at it? [18:57:33] Meh, now's as good a time as any. Wanna do the email to labs-l? [18:57:44] Coren: cooool! :) [18:57:47] Coren: I'll write it up now. [18:57:52] Coren: but cert? [18:58:04] Yeah, no SSL for the testing. [18:58:06] Coren: it needs an update in the role to point it to the cert. [18:58:11] Not like it's all that important. [18:58:27] Coren: oh, are we going to throw people at tools-proxy-testing or are we just going to switch tools.? [18:58:31] Hi [18:58:44] I've been trying to use Tool Labs and I have a question; can anyone help me [18:58:50] Well, at tools-proxy-testing. Otherwise, it's not a test. :-) [18:58:55] Coren: Are you and andrewbogott travelling together? [18:58:55] Bluma: What's your issue? [18:58:59] Bluma: hello! Just ask the question, you usually have people helping :) [18:59:03] Coren: heh, alright ;) [18:59:08] I've been running queries on the database and they're really, really slow [18:59:15] Coren: I'd still like ssl for it. Think we can get star.wmflabs on to it? [18:59:23] a930913: going to the same place but not on the same flights as far as I know. [18:59:26] queries that should return >10 rows are timing out [18:59:34] sorry, <10 rows [18:59:37] a930913: We're not flying from the same place(s), but we're both going to the same places. Why? [18:59:52] YuviPanda: Star is dead and revoked. [19:00:06] Coren: oh? what does Special:Novaproxy use then? [19:00:20] Bluma: Need moar context. What query? Done how? [19:00:37] YuviPanda: Oh, a new one I expect. :-) But I'd rather not spread that one around. [19:00:44] Coren: yeah, alright. makes sense [19:00:52] Coren: anyway, am writing email now :) [19:01:03] Coren: andrewbogott: I've heard of policies that means sysadmins can't travel together in case of a catastrophe. :p [19:01:32] a930913: The WMF has no rule against it, though Travel does avoid stuffing /all/ of us in the same plane. :-) [19:01:33] say, SELECT * FROM revision WHERE revision.rev_user_text = someuser; [19:01:41] a930913: we try to avoid it, but right now the WMF Ops pretty much all live in different cities so that risk is small [19:01:54] where someuser has only done 5 revisions [19:02:15] So the prime time to strike is when you're all collected? :p [19:02:28] Bluma: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs [19:02:29] a930913: yep! [19:02:42] * YuviPanda gets a blue glove out for a930913 [19:03:01] ah - great! [19:03:03] thanks! [19:03:08] will try and see [19:03:19] also, is there any way to run queries with something like nohup? [19:03:25] my connection keeps timing out [19:04:10] Bluma: No, but you way wish to look into using mosh if you have network issues. [19:04:22] http://mosh.mit.edu/ [19:04:26] ok [19:04:29] thank you very much [19:04:34] I'll take a look [19:04:37] have a great day [19:06:14] Coren: do you have a timeline of switching tools.wmflabs.org to it? [19:06:24] so I can mention it in the email [19:06:34] YuviPanda: If all goes well, as soon as we return from Athens I'd say. The 16th? [19:06:45] Coren: yeah, that sounds good to me [19:06:45] chrismcmahon: we broke Flow on beta labs http://en.wikipedia.beta.wmflabs.org/wiki/Talk:Flow_QA :( How do we make DB changes there, do you trust me with mysql access or do we make a request to a beta labs DBA? [19:06:48] 16th of April [19:07:10] hi spagewmf [19:08:23] spagewmf: I don't know the mechanism, but the db on beta labs should be getting updated automatically -- might be every hour, or every 3 hours, I don't remember [19:11:32] chrismcmahon: sounds right; beta labs still worked after the bad DB script was merged and after we merged a fix to it. I guess the first version was later run and broke it :-/ [19:11:37] Coren: sent [19:12:28] bsitu: chrismcmahon in #labs-qa said "I don't know the mechanism, but the db on beta labs shouldbe getting updated automatically -- might be every hour, or every 3hours, I don't remember" [19:12:39] d'oh :) [19:12:53] spagewmf: it's been a while since hashar set that up [19:37:45] YuviPanda: Coren Ahh a new proxy appeared ;) great work, lads. First thing to mention: ol' friend trailing slash is missing again. move on ;) [19:38:48] D'oh! That worked when I tested it earlier. I musta broken it.