[14:21:01] hello ! [14:21:16] if anyone is interested in my current progress or what I'm working on, please add this page to your watchlist [14:21:19] http://www.mediawiki.org/wiki/User:Spetrea [14:21:43] if anyone is interested in my current progress or what I'm working on, please add this page to your watchlist [14:21:46] http://www.mediawiki.org/wiki/User:Spetrea [14:21:54] anytime you find yourself asking the question "I wonder what Stefan is doing ?" you can just check that page [16:17:14] Drdee around? [16:21:01] i think he's on an airplane [16:21:06] K [16:39:40] It wouldn't hurt if I knew some R [16:39:47] but I don't :| [16:41:44] average_1rifter: don't have context for your desire to know R, but I will say that Python Pandas has been a pretty nice experience and has basically the same functionality, plus being in python [16:48:27] erosen: I need to do various charts. I'm not tied down to any language or framework [16:48:29] I know some Python too [16:48:47] but right now I'm a bit tied down to js & perl to do them [16:48:59] i see [16:49:06] i bet there is a decent js lib somewhere [16:49:13] but don't know of one myself [16:49:29] erosen: can Panda render in HTML ? [16:49:31] or SVG ? [16:49:46] or Canvas or using some 3rd party js lib for charting ? [16:49:54] it is part of the a suite of tools know as pydata I belive [16:49:55] e [16:50:03] it uses matplotlib to plot [16:50:09] and i'm pretty sure you can render to almost anything [16:50:42] I've also heard of a d3 interface [16:50:55] that d3 interface would make my eyes glow [16:51:44] this seems pretty legit [16:51:44] https://github.com/mikedewar/d3py [16:51:56] and uses the PyData tools pandas and numpy [17:01:29] ottomata - do we have a doc somewhere to configure services that want to talk to our LDAP? [17:01:43] I wanted to get this Redmine instance I put up to do that [17:01:48] oh yeah sorry [17:01:49] barely [17:01:52] where is it hosted? [17:01:57] labs [17:02:11] (i'm working on ldap stuff right now too :p ) [17:02:12] um [17:02:22] i had to ask Ryan Lane for most of the details [17:02:25] but let's see... [17:02:27] in windows world i used to just use a basic account that had read access [17:03:17] here are some helpful details: https://github.com/wikimedia-incubator/operations-puppet/blob/analytics/modules/kraken/manifests/hue.pp [17:04:23] ah cool, at first look that's all I need [17:04:29] I'll bug you if there's other stuff [17:04:31] thanks! [17:04:34] yup! [17:13:03] OK, ottomata, two questions: [17:13:03] Is the port 636 on ldaps://virt0.wikimedia.org? [17:13:03] Any idea what the first name / last name / email attributes are? uid is the username attribute for example [17:14:37] brb 50m [17:16:24] hmmmmmmmmmm [17:16:29] i think you can perform a search against those [17:16:35] i think you can do it anonymously [17:16:37] errggggh [17:16:37] using [17:16:39] ldapsearch [17:17:05] checking... [17:17:16] oh maybe you can't do it anonymously [17:17:17] i thought you could [17:17:18] hmmm [17:18:25] i'm not sure [17:18:33] ok there is a password you need, but I don't think I can give it to you [17:18:38] you'll have to ask Ryan Lane [17:18:49] there is a test ldap instance though [17:18:51] uhhhhh [17:19:03] don't remember where it is though [17:19:42] ottomata,milimetric: whatch'y'all working on? [17:20:29] ldap configuration for an instance of Redmine I set up [17:21:04] I wanted to see if it was viable to modify an existing trello - like open source project. Redmine + this plugin called Redminebacklogs looks to be the closest thing [17:21:15] nice [17:21:22] good luck [17:22:23] ok, ottomata, thanks, I'll talk to Ryan when he's around. Know if he's particularly busy this week? [17:22:53] dunno :/ [17:26:16] grrrrrr, why doesn't oliver hang out in the right channels? :) [17:33:54] hey milimetric [17:33:55] hang out? [17:34:00] hi [17:34:06] yep, I invited your personal email [17:34:16] k, one sec [17:59:29] bwerp bwerp [17:59:30] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [18:03:08] erosen - standupy? [18:13:57] yo. [18:13:59] scrum happening? [18:14:01] i'm looking for diederik here [18:14:03] but i guess he's not in yet [18:14:14] i've got a grant making meeting [18:14:21] so i'll be skipping [18:14:33] ottomata, milimetric ? [18:14:39] howdy [18:14:47] yeah, you missed scrum [18:14:48] shall we wait on scrum? [18:14:51] oh, ok [18:15:05] i was late anyway [18:15:10] but there's no diederik [18:15:11] so! [18:15:16] diederik's in the airport [18:15:21] ja [18:15:22] so otto and I scrummed [18:15:29] and erosen's gone missing, probably hunting bears [18:15:30] coolio [18:15:34] but like not killing them [18:15:36] he said he has a gp meeting [18:15:40] oh [18:16:26] so I'm migrating Dario's dashboard and ottomata's working on LDAP-ing our stuff [18:16:31] sweet [18:16:36] whatcha doin? [18:16:44] we have the security meeting tomrorow [18:16:56] so i'm going to spend some time today doing a dry run with diederik [18:17:01] to see if there are any loose ends. [18:17:01] this is me live-editing notes that I need to make DarTar's graphs btw: http://etherpad.wmflabs.org/pad/p/EEDashboards [18:17:11] and this is the repo it'll end up in: https://github.com/wikimedia/limn-editor-engagement [18:17:20] it'd be great if people looked over https://www.mediawiki.org/wiki/Analytics/Kraken/Security_Review_Meeting [18:17:22] cool [18:17:31] ok, yeah, I agree that's important [18:17:35] I'll take a look when I'm done [18:17:57] aiight. i'm going to grab breakfast right quick [18:18:01] bus was crazy late [18:18:03] brb [18:26:03] mk [18:26:32] lunchtime! [20:05:39] anyone have other thoughts about things i should look into and add to the security review page? [20:06:15] I can't think of anything, but that doesn't mean much :P [20:09:20] !log restarting namenode to test out ldap settings [20:09:24] (dschoon, will look in a bit) [20:09:29] dschoon - looking at it now [20:09:31] kk [20:09:48] maybe listing out any precautions we can take against DDOS? [20:11:29] any access we provide to mysql slaves? [20:11:41] how's wikihadoop do that? [20:11:49] it imports dumps [20:12:00] research slaves aren't in the cluster [20:12:05] (separate questions/answers) [20:12:49] in the data retention, are we considering cookies for anything still? [20:13:36] ok, I gotta run to the bank, that's the only stuff that stood out to me, but I'm thinking I don't understand the audience of this meeeting. Perhaps drafting a short description of the audience would be useful [20:14:55] dschoon, (not looking at document), we should talk about authentication, particularly the fact that anyone that has root powers on any node that can talk to analytics1010 can pretend to be hdfs superuser [20:15:16] hm. [20:15:31] also, dschoon, we might have a research slave in our cluster eventually, if we need a dedicated one for sqoop [20:15:40] ottomata: is that different from having root on any other system? [20:15:41] so as not to intefere with analyst queries [20:15:45] sure [20:15:47] yes [20:15:53] issue described well here: [20:15:56] http://blog.cloudera.com/blog/2012/03/authorization-and-authentication-in-hadoop/ [20:16:04] ok, will read [20:16:10] read the bit starting at Authorization [20:16:22] oh [20:16:22] sorry [20:16:22] no [20:16:25] authentication [20:18:08] i will read it all :) [20:23:17] oh. i didn't realize unix users/groups were identical with hdfs users/groups [20:23:21] ottomata ^^ [20:23:32] is this true in mysql as well? [20:23:52] like, even if i have a password in mysql, if i'm logged in as that user, do i automatically authenticate? [20:24:25] no [20:24:37] mysql is separate [20:24:40] hdfs just decided to do that [20:24:57] its kinda nice, because then you can rely on whatever system you already have in place for user permissions [20:25:04] and it maps better to hdfs usage than mysql usage [20:25:08] because hdfs is a file system [20:25:20] and actually [20:25:24] they aren't identical in hdfs [20:25:38] its just the default [20:26:00] the unix user that is currently running the hdfs command will be the user that is running in hdfs [20:26:05] but that's it [20:26:07] and its only based on textual username [20:26:09] so [20:26:26] if an 'hdfs' user is running an hdfs command [20:26:35] hadoop will say "oh look, it is the hdfs user" [20:26:42] and give whatever permissions are relevant for that [20:27:09] if all commands are only run on the namenode, this isn't a problem [20:27:23] because assumedly the hdfs user on the namenode is the correct superuser [20:27:25] yeah, feh. [20:27:30] but, on some remote node, who knows what it is [20:27:42] okay. [20:27:44] hadoop recommends kerberos to solve this problem [20:27:52] surely there is a normal process for this? [20:28:06] and also, users not having root helps :P [20:28:24] ja, but this would be on any node that could talk to an10 [20:28:28] (namenode) [20:28:35] ah. [20:28:37] true. [20:28:39] so maybe someone has sudo powers on some node in our system we don't know about [20:28:43] they could create an hdfs user [20:29:00] set up hadoop config files to point at an10 namenode [20:29:00] yes yes. [20:29:00] and then talk to hadoop as hdfs [20:29:00] aye [20:29:00] i see. [20:29:10] the problem is that ssh doesn't mediate hadoop transactions [20:29:24] for most applications, you need shell access on the *target* node [20:29:33] here you do not, due to the nature of hdfs. [20:42:11] right, exactly [20:49:40] dschoon, in services access and needs [20:49:46] well [20:49:47] there are two pieces to the ldap thing [20:49:51] sure [20:49:54] go ahead [20:49:59] hue uses ldap to authenticate you when you sign in [20:50:04] but file access is done by hadoop, not hue [20:50:04] (though you might want to just add them yourself :)) [20:50:18] i can rearrange [20:50:27] i was going to suggest that perhaps we ACL away the ability for non-cluster nodes to connect to us? [20:50:34] firewall off those ports [20:50:56] yeah, i suggested that too, we can even do that ourselves via iptables [20:51:02] i can't think of any reason that's a good idea otherwise [20:51:03] yes. [20:51:04] when I was talking to faidon [20:51:05] agreed. [20:51:07] we talked about that a bit [20:51:18] there might be occasions we want to connect to hadoop from stat1 [20:51:20] iptables is probably smarter [20:51:21] right [20:51:22] but we'll deal with that when needed [20:51:31] at least it makes it easier for us to modify [20:51:34] yeha [21:05:55] erosen: q about your impressive array of datasources [21:06:01] sup [21:06:52] do your definitions of "editor" and "edit" conform to the canonical definitions in https://www.mediawiki.org/wiki/Analytics/Metric_definitions#Actors ? [21:07:04] Do you filter bots etc? [21:07:14] yeah [21:07:25] I don't do anything super fancy with the bots [21:07:40] just the bots user_group [21:07:48] and an outdated list of global bots from ErikZ [21:07:53] *nod* [21:08:02] that problem in particular is hard. [21:08:06] ya [21:08:25] also: you forgot to link to the dashboards themselves :) [21:08:32] there is a table on s1 in the "prod" db which is slated as a canonical up to date copy [21:08:47] whoops [21:08:50] i'll reply now [21:09:51] wait [21:10:00] it's okay [21:10:01] dschoon: you mean on the message to you? [21:10:03] no need to do anything. [21:10:07] gotcah [21:10:07] yeah. [21:10:11] i assume it's gp [21:10:29] yeha [21:10:32] they are all the same [21:10:49] gp-dev, gp and global-dev all use the same data repo [21:11:10] the gp-dev one just has a quick sed command run on the datasources to account for the differences in directory structure [21:13:48] word. [21:13:51] yeah [21:17:02] quick limn q [21:17:08] dschoon ^ [21:17:19] i'm sitting behind you, to [21:17:20] *y [22:48:35] drdee: hi ! [22:49:05] hi [22:49:13] http://stat1.wikimedia.org/spetrea/new_pageview_mobile_reports/r26-only-december-added-diagnostics-mimetype-density-barchart/pageviews.html [22:49:21] relocating to office be online in 15 minutes [22:49:25] the density of mimetypes 1-14dec 14-31dec [22:49:39] nice! [22:49:43] these are only the processed mimetypes [22:49:49] so the ones that passed all our filters [22:50:28] so there is massive increase in requests with mime type '-' [22:50:39] can you make a sample of a couple 1000 of such requests? [22:50:41] drdee, ohello [22:50:46] in hotel [22:50:50] coolio. [22:50:51] leaving now [22:50:52] in office. [22:51:04] drdee: yes [23:04:27] i'm out for the eve [23:04:30] laters all! [23:05:45] dieds and i are gonna get food. he hasn't had lunch [23:05:55] (i had a smoothie) [23:05:57] so brb! [23:11:24] milimetric: limn q [23:11:45] do you know if limn will assign colors if they are left as default [23:11:46] hey erosen - I'm here [23:12:11] limn currently should assign default colors if nothing is specified [23:12:17] great [23:12:38] but there are more "if else" type statements than you'd imagine, so if you have any trouble let me know [23:12:39] i'm rewriting the graph generation (python) code and I am going to have the default not assign colors [23:12:42] and hope limn can handle that [23:12:58] sounds good [23:17:02] k, gonna go cook dinner now. Shoot me an email if you need me. [23:18:36] word thanks