[03:52:56] http://www.wikitrust.net/ doesn't work on English Wikiedpia? [06:14:53] purplepopple: says there "currently, the the English, French, German, or Polish Wikipedias" [07:01:20] * purplepopple didn't get it to work in English :( [13:47:27] drdee: hi :) [13:47:49] HHiIiiii HAAAAAAAAAAAA bam bam bam bam bam another week of joy!! [13:47:59] mmoooooornniiningg average_drifter!!! [13:48:08] do you you have some code reviews waiting for me? [13:48:22] milimetric I AM HERE !!!! [13:48:28] hey drdee :) [13:48:29] morning [13:48:36] ottomata I AM HERE AS WELL [13:48:36] http://en.wikipedia.org/wiki/Special:NewPagesFeed is soo cool [13:48:40] drdee: yes there is one https://gerrit.wikimedia.org/r/#/c/28506/3/ [13:48:50] hiiiiii [13:48:50] morning! [13:49:24] milimetric, cool indeed [13:49:50] i just saw the talk from last week and the curating ui is great [13:59:38] ottomata, i meant the regular 'stats' user who use for the report card etc [13:59:47] ah ok [14:00:28] hm, so that is the aws user's public key? [14:01:09] ah no ok [14:01:14] you want stats to push to aws [14:01:14] i see [14:02:15] all the datasets on dataset2, so 'stats' needs to become a user on dataset2 first [14:02:34] do i need to file an rt ticket for this [14:02:35] ? [14:04:05] yeah maybe, i don't do much on dataset2 [14:04:19] i mean i could, just haven't in the past [14:05:36] there is a 'backup' user that all these public files are owned as [14:06:58] so what would you recommend? purpose is to have a daily rsync between dataset2 and the aws instance (details will follow) [14:10:34] backup user could probably push to aws [14:10:40] since it owns the files anyway [14:14:31] ok, then maybe add that pem key to the backup user [14:14:44] probably need a descriptive commit message to explain why / who [14:24:59] yeahhhhhhhhhh, it isn't managed in puppet [14:25:06] nor does it have an ssh thing right now [14:25:13] this will probably need an rt ticket and input from another ops person [14:25:23] sorry, .ssh directory [14:25:33] since this hasn't been done on dataset2 before [14:25:37] external pushes of data [14:25:38] etc. [14:35:12] ok [14:48:51] drdee, hive is mostly puppetized, [14:48:56] but, do you know what this means? [14:48:59] # Configure the port the beeswax thrift server runs on [14:48:59] ## beeswax_server_port=8002 [14:49:10] is beeswax just a hue thing? [14:49:32] i don't think we are gonna run a thrift server [14:49:46] i would leave it for now as-is [14:55:32] hm, ok [14:55:45] hive is running now, but hue beeswax is still not happy [14:58:38] oh, just neede to restart hue [14:58:39] cool [15:21:39] hei drdee did you received my mail? :) [15:22:01] yes i did, i'll respond a bit later [16:22:29] hey drdee, I wrote a UDF to parse a URI to check if it's language, if it's mobile, and record it's domain name but I could use some data to test it on [16:23:02] very very cool, where is the source code? [16:23:15] average_drifter, do you have a spare second? [16:24:03] drdee: https://github.com/louisdang/kraken/blob/master/src/org/wikimedia/analytics/kraken/pig/ParseWikiUrl.java [16:24:48] you can clone the following repo: gerrit.wikimedia.org/r/p/analytics/udp-filters.git [16:25:07] there is a file in there called example.log or example2.log and that contains some log lines that you can use to test [16:25:31] alright [16:25:45] I'll take a look. Thanks, drdee [16:33:08] ottomata, there is a 500 error with hue: http://analytics1001.wikimedia.org:8888/jobbrowser/ [16:36:26] job browser does not work with yarn [16:36:57] Note that JobBrowser only works with MRv1. [16:37:20] k [16:37:42] IIRC, hue had hive shell enabled as well but now i see Hbase shell but not hive shell [16:38:02] hm [16:38:23] hm [16:41:50] drdee: yes [16:42:00] busy with wikistats? [16:42:09] yes, but I have free cycles [16:46:25] drdee, fixed [16:46:34] thanks! [16:53:16] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [17:10:33] Reminder: Review! https://www.mediawiki.org/wiki/Analytics/Roadmap [17:15:50] https://www.mediawiki.org/wiki/Analytics/Kraken [17:15:53] https://www.mediawiki.org/wiki/Analytics/Kraken/Pixel_Service [17:16:04] https://www.mediawiki.org/wiki/Roadmap#Analytics [17:16:21] Comments encouraged as well as review. [17:17:42] dschoon: ottomata - have you heard about http://prose.io/about.html ? thought you might want to kno [17:17:43] know* [17:18:10] I hadn't, but that seems interesting! [17:18:42] Yeah, figured you'd want to know of it [17:18:49] oh coooool! i use octopress (which is jekyll) for my blog [17:18:53] cool [17:18:53] hooray for markdown [17:19:07] yeah, i'm a fan of jekyll. [17:19:21] tho i've usually used markdoc for my stuff [17:19:29] because i was interested in hacking the source, and it's python [17:19:35] markdoc.org [17:19:36] brb coffee [17:21:36] drdee: here's the example output: https://gist.github.com/3932574 [17:22:46] that's looking good, we need some example log lines for mobile sites [17:24:11] agrred. I only ran a test using jUnit with "en.m.wikipedia.org" and that works [17:24:29] I'd love an open source codereview recommendation engine that plugs into Gerrit -- "People who/Commits that changed this file also changed..." [17:24:46] has anyone here ever heard of such a thing? [17:25:35] this would help code reviewers and give them more context [17:25:52] and help find omissions, where committers hadn't fixed all the relevant related files [17:26:52] also, louisdang, thanks for your contributions to Wikimedia tech. was wondering if there's anything I can do to help you out [17:27:03] I'm the engineering community manager for WMF [17:27:37] nice to meet you sumanah [17:27:59] you as well! [17:28:50] maybe you can help me find other interesting projects that I can work on? [17:29:30] sure! although I don't want to "steal" you from Analytics ;-) [17:30:56] louisdang: back in 30 min. [17:37:37] hey drdee, I was wondering if you have anything for tagging ip addresses as from an educational institution? [17:38:38] back [17:39:41] milimetric: you should push so i can play later :) [17:40:10] i think it's all up there. oh I'll push the json [17:40:23] i saw no updates when i pulled [17:40:25] nothing new though, just added a bar chart [17:41:10] you start on the options UI yet? [17:41:52] yeah, but on paper [17:41:58] word [17:42:11] i was thinking of doing a balsamiq mockup and sending it to the design guys Diederik mentioned [17:42:14] heather and pau [17:42:20] they're kinda slammed [17:42:27] i don't think that would get any results for weeks [17:43:02] on the other hand, i know a guy who considers limn a labor of love... [17:45:22] :) obviously your opinion comes first [17:45:50] psh [17:45:50] louisdang: you could have a list of addresses of all technical education institutions [17:45:52] I just wanted to get some reactions from others. Maybe evan [17:45:57] yeah, totally. [17:46:08] louisdang: and then do a distance between the geolocated lat/long and the tech institution [17:46:46] well, if you want to do some UI shopping, i could throw together a mockup in the gap before the firehose meeting today [17:47:18] average_drifter: there's a list of ip ranges for educational institution, but I don't know if there's licensing issues in using that [17:47:35] oh didn't know about that list [17:47:36] totally, I'm just scratching my head right now [17:48:13] we can compare notes when you've got something [17:48:23] average_drifter: http://www.iblocklist.com/list.php?list=bt_edu&fileformat=dat&archiveformat=gz if you're interested [17:49:06] http://list.iblocklist.com/?list=bt_edu&fileformat=p2p&archiveformat=gz for the free version [17:50:14] I think it's mostly universities though [17:55:34] dschoon milimetric: what system do you use for automatically assigning colors? [17:55:46] d3 has a color scale [17:55:57] hmm [17:55:58] originally it was Prismatic from ColorBrewer [17:56:03] colorbrewer2.org [17:56:26] yeah, I think we can build whatever system we need. why do you ask? [17:56:29] so if I were trying to generate the json programmatically, would I need to figure out how to use a d3 scale from python [17:56:43] no [17:56:49] ohh. [17:56:54] for automatic coloring the json doesn't need to specify anything [17:57:00] yeah, just leave it null. [17:57:00] interesting cool [17:57:08] you can override it with a color [17:57:10] great!!! [17:57:10] right [17:57:13] thanks [17:57:28] but the idea is that there's a field higher up called "palette" that picks out a colorbrewer palette [17:58:12] i started writing a self-contained package for js and python for this [17:58:21] nice [17:58:23] because i was also annoyed i had to copy-paste the values every time [17:58:31] doesn't look like i uploaded it [17:58:38] well no rush, i think leaving it null is best for now [18:01:00] dschoon: what does "Product Code wiki & email stakeholders [dsc]" mean in https://www.mediawiki.org/wiki/Analytics/Roadmap#Kraken ? is there going to be a wiki about "product code," is someone coding a wiki, or am I misreading in another way ....? [18:01:20] no no. [18:01:37] as in, each product that desires to send in analytics data will be given a unique product code [18:01:56] mostly for debugging, but also to make it easier for the stakeholders to find and manipulate their data [18:02:16] heh. the todo is "make a page explaining this" for a reason ;) [18:02:44] anyway, it's essential because if anything goes wrong, it's almost impossible to tell where junk data is coming from. [18:03:32] I am in favor of naming data sources so that we can shame them if they turn out to be polluters :) [18:03:40] exactly. [18:03:40] (where "shame" equals "fix") [18:03:54] a lot of it, though, is detecting errors [18:04:23] so it's best practice to have a pc, as well as send the deploy version with each packet [18:04:51] ok, mind if I turn that into a para and just stick it into https://www.mediawiki.org/wiki/Analytics/Kraken#Planning or the like? [18:06:02] go for it. though you've shamed me into just making the page now [18:06:10] so you can hold off if you'd like :) [18:06:30] I shall hold off then! [18:06:46] * sumanah avoids accidentally inserting errors through her misunderstanding [18:20:54] hi louisdang - sorry I was pulled away [18:21:08] hi [18:21:40] louisdang: so, have you ever checked out https://meta.wikimedia.org/wiki/Wikimedia_developer_hub ? drdee you might want to list a couple things there [18:32:53] dschoon, ottomata, about zookeeper [18:32:53] i suggest as an experimental setup to install zookeeper on stat1, stat10001 and an01. [18:32:55] this way we get experience with running zookeeper at two datacenters, [18:32:56] figure out the load they generate, puppetize them [18:33:01] obviously, this will not work for the kafka setup but it does give us a gentle learning curve [18:33:01] what d'yah think? [18:33:05] that won't work. [18:33:14] subnets [18:33:24] stat1 & stat1001 are on different subnets. [18:33:29] i dunno what ops has set up, but they can all communicate just fine [18:33:29] (from the analytics DC) [18:33:34] really? [18:33:36] hm. [18:33:43] maybe through internet though [18:33:47] since they both have public IPs [18:33:52] well, it would have to be. [18:34:04] i don't really see why we'd want to do that, anyway [18:34:12] ZK is not a public service at all [18:34:17] it's actually really sensitive [18:34:28] so [18:34:39] stat1 is in pmtpa, an03 is eqiad: [18:34:46] otto@stat1:~$ traceroute analytics1003.eqiad.wmnet [18:34:46] traceroute to analytics1003.eqiad.wmnet (10.64.21.103), 30 hops max, 60 byte packets [18:34:46] 1 ae0-101.cr1-sdtpa.wikimedia.org (208.80.152.144) 0.165 ms 0.209 ms 0.203 ms [18:34:46] 2 xe-5-2-1.cr2-eqiad.wikimedia.org (208.80.154.213) 26.399 ms 26.442 ms 26.436 ms [18:34:46] 3 analytics1003.eqiad.wmnet (10.64.21.103) 26.552 ms 26.537 ms 26.532 ms [18:34:55] huh [18:35:02] an03 does not have public IP [18:35:08] that is totally unexpected, but good to know [18:35:12] must be router magic. [18:35:22] most likely BGP magic [18:35:24] ja iunnnoooooo [18:35:30] but, i was just thinking about ZK too [18:35:31] since [18:35:31] (the Blackbox Gateway Protocol!) [18:35:36] RobH is going to take away our ciscos [18:35:42] wha? [18:35:44] ??? [18:35:45] sorry [18:35:50] haha [18:35:50] Dells [18:35:51] mistyped :p [18:35:53] pffeewwww [18:35:55] you all freaked out! [18:35:55] haha [18:35:59] :D [18:36:09] so, an23-27 will be around still [18:36:18] and i'm not using them at all [18:36:24] right. [18:36:32] those are the "utility" boxes. [18:36:36] which, topically, were earmarked for ZK [18:36:55] ok cool [18:37:00] didn't realize that, let's use those [18:37:15] should I look into puppetizing ZK on an23-25? [18:37:18] 3 ZKs? [18:37:33] as long as it's an odd number [18:37:33] I will need this to work on Kafka anyway, might as well make ZK official [18:37:51] could do all 5 of those [18:37:52] ? [18:37:57] 3 sounds right for now [18:37:57] ok [18:38:00] i think we'll probably end up running 3 ZKs and at least 3 KKs [18:38:06] KKs? [18:38:11] kafkas? [18:38:13] (KK is so much easier to type than Kafka) [18:38:14] yes :) [18:38:15] haha [18:38:15] ok [18:38:35] ja but they need space so not on the R310s [18:39:11] if we have the same drives as before on the c2100s [18:39:22] 3KKs would give us 48TB of log buffer [18:39:37] hm. good point. [18:39:50] we need more disks :( [18:41:57] well, maybe? uncompressed udp2log firehose was generating 3.5TB / day [18:42:01] that's 25TB / week [18:42:11] and I assume we don't want to buffer more than a week in Kafka [18:42:29] 2 KKs gives us 32 TB buffer [18:43:08] that seems reasonable. [18:43:13] and i'm pretty sure we won't have any problems with throughput [18:43:25] yeah, 2KKs were able to keep up fine [18:43:37] with the full firehose [18:43:40] we're not sure about consumption though [18:43:44] but we can find out [18:44:06] yeah. [18:44:08] that's definitely an empirical question [18:51:04] ottomata, quick question are we running the snappy byte counter right now? [18:51:19] last i recall, no bytes were appearing in the file [18:51:19] nope [18:51:32] think that wasn't working, never fixed it :/ [18:51:35] i can look at that [18:52:03] ottomata, dschoon quik sync up before meeting with asher and mark? [18:52:14] sure [18:52:24] https://plus.google.com/hangouts/_/26af97f7def9d9cfe7a11e17181f8148e719d439 [18:52:29] ok [18:57:00] https://plus.google.com/hangouts/_/a5c3abf6f7c28dddd22c08f9e80b4aa87c246d63 [19:20:07] gentle reminder, please update/add/close your asana tasks today [19:20:13] me too ? [19:20:21] what is the closing hour ? [19:20:44] I am talking with Erik right now [19:22:01] yes you too :) [19:26:26] milimetric, dschoon: any chance the auto coloring works a little different, or maybe is a recent commit? [19:26:35] basically all the lines are black [19:26:40] oh crap [19:26:53] i didn't test what I said with old limn [19:26:53] :) [19:27:13] but hm, it should. you got a link? [19:27:52] it is running locally now [19:27:54] but I can put it on global dev [19:27:55] one sec [19:30:30] milimetric: http://global-dev.wmflabs.org/graphs/digi_malaysia_color [19:30:42] ty, let's take a look [19:31:42] brb [19:33:47] b [19:35:47] dschoon and I are looking at it [19:35:48] louisdang: did you have a chance to look at that meta.wikimedia.org page? [19:35:55] cool [19:35:59] thanks [19:36:15] sumanah, yeah hubble seems interesting. [19:36:41] louisdang: "huggle" (I think you mean) is definitely open to new developers [19:37:23] louisdang: petan and mmovchin are sometimes in #mediawiki on Freenode but also available via email/talk pages [19:37:35] ok [19:38:31] oh, sorry, erosen :( [19:38:35] hehe [19:38:45] the palette support is shitty, i had forgotten [19:38:52] no worries [19:38:58] it auto-populates the color field when you click "new metric" [19:38:58] can you recommend an interim solution [19:39:02] yeah [19:39:02] yeah, one sec [19:39:04] I suspected [19:39:13] it also has a table of Project Colors [19:39:35] http://reportcard.wmflabs.org/src/data/project-colors.co [19:40:00] so if the color is null, and the label matches any of those names, it fills it with that value [19:40:13] otherwise the default is ... black [19:40:21] cool [19:40:30] https://github.com/wikimedia/limn/blob/master/src/data/metric-model.co#L87 [19:40:32] is that list ordered [19:40:35] one sec [19:40:44] according to color, that is [19:41:10] https://github.com/wikimedia/limn/blob/master/static/vendor/colorbrewer/colorbrewer.js [19:41:28] that's the set of colors that we pull from to auto-populate in the UI [19:41:33] cool [19:41:39] i'll just copy it for now [19:41:40] it's colorbrewer.Prismatic[11] [19:41:44] i have a python version [19:41:47] let me upload real fast [19:41:52] cool [19:41:54] 'preciate it [19:42:36] sorry, don't know why I thought you were looking at new limn when it's not even deployed [19:42:58] i actually was running a checked out copy [19:43:19] but i didn't want to pull unless i needed to [19:43:19] heh, the new stuff is on my branch - even worse [19:43:19] hehe [19:43:34] a branch of my fork rather (/milimetric/limn) [19:43:49] yeah... don't pull :) [19:45:41] https://github.com/dsc/colorbrewer-python [19:45:41] erosen ^^ [19:45:49] sweet [19:47:14] import colorbrewer [19:47:34] colorbrewer.Spectral[11] [19:47:37] is what we use. [19:48:02] "11" is the number of color-points in the spectrum. if you know how many metrics you have, it's better to use that number [19:48:11] they'll be equidistant and therefore easier to differentiate [19:48:38] almost all the palettes in there are good. you can preview them at colorbrewer2.org [19:49:08] i read that in the tune of "1 is the loneliest number" [19:51:07] great [19:51:22] also, what is best practice for installing a python package in development [19:51:34] it looks like pip -e requires an egg [19:51:43] dschoon ^^ [19:51:58] since you're just cloning it [19:52:10] you clone it, then pip install -e PATH_TO_CLONE [19:52:17] if it's cwd, use . [19:53:05] yeah I was trying to avoid having to find a place to put it [19:53:15] but this will work fine [19:54:04] well, i haven't uploaded it to pypi [19:54:05] sorry. [19:54:18] it needs examples in the docs, more or less [19:54:25] no worries [19:54:44] i was just trying to think if there was a good way to associate it with limnpy [20:00:10] well, i can package it :) [20:00:19] you can just test with your checkout for now [20:00:29] add it to your requirements in setup.py [20:00:42] since you already installed it, your stuff will work fine [20:00:46] yeah [20:00:55] i'll get it on pypi later this week [20:01:00] sort of [20:01:14] it does look like something odd has happened thogh [20:01:22] i get an exception when I import [20:01:35] File "/home/erosen/src/colorbrewer-python/colorbrewer.py", line 358, in [20:01:35] for k, v in globals.items(): [20:01:35] AttributeError: 'builtin_function_or_method' object has no attribute 'items' [20:01:55] i made it a function call like 'globals()' [20:04:15] which fixed it [20:05:17] oh [20:05:18] yeah [20:05:20] that's correct [20:05:25] heh [20:05:25] i didn't test much [20:08:54] brb a bit [20:42:03] dscoon [20:42:05] you t here? [20:43:23] you misspelled: dschoon ^^ [20:44:06] but he's out to lunch I think [20:46:17] hey analytics lovers [20:46:26] hi milimetric [20:46:49] if anyone's interested on brainstorming the options that will be available in editing visualizations, I've got a document up and I welcome collaboration [20:47:09] I should've probably gone with etherpad but old habits: https://docs.google.com/document/d/1MEkGKxCiPZ2dp2gRY3OIe2BVDb3kVhl78JirT-oS1Kk/edit [20:47:09] sure [20:47:17] I'll set up a hangout, one sec [20:47:53] https://plus.google.com/hangouts/_/287e51bd2acca787ac160ac10f2dc171703ee16a?authuser=0&hl=en-US [20:48:07] drdee, erosen ^^ [20:48:13] dschoon, ottomata ^^ [20:48:26] average_drifter if you're up at this ungodly hour :) ^^ [20:48:58] I write code during the day, write code during the night, all day ! [20:54:10] ok, adding stuff, milimetric [21:01:04] milimetric: added some ideas, not sure whether they are in scope [21:01:06] or feasible [21:18:28] thanks sumanah, we incorporated your suggestions. Sorry we were on the hangout and I can't multitask :) [21:18:36] No prob! [21:19:05] yeah, so the metadata you speak of has been delegated to the legend editing UI and it'll probably show up somewhere above the metrics I think [21:20:31] ah [21:20:39] so I'd like to point out this book [21:20:47] actually there's this guy called Edward Tufte [21:21:34] yep, big name in stats :) [21:21:37] he wrote some stuff about visualization and uhm, charts .. I'm not sure if it's the best work about this but I've heard very good things about it from some people I worked with in the past [21:21:49] the book is called "The Visual Display of Quantitative Information" [21:23:43] hey dschoon [21:23:51] you there? [21:24:35] tuft is awesome, he actually wrote like 3 or 4 books [21:25:06] bwerr asana looks busted atm [21:25:18] drdee, zk puppetized, woohoo [21:25:22] was trying to see what needed to be done for hive etc. [21:25:24] sweeet sweet sweet [21:25:47] in hive-site.xml you need to specify the zookeeper hosts [21:26:06] and that's pretty much it IIRC [21:26:06] hmk [21:26:23] 2 properties [21:26:24] hive.support.concurrency [21:26:26] set to true [21:26:29] and [21:26:38] hive.zookeeper.quorum [21:26:46] that's a comma separated list of the zookeeper hosts [21:27:06] they are both elements [21:29:37] cool book average_drifter, I unfortunately might have to do this too quickly to have time for it. But I will definitely read that and finish http://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448 by the time we roll out the next major version of Limn [21:40:40] back. [21:41:15] ottomata: what's up? [21:43:48] was talkign with asher about FR stream stuff [21:43:58] hey dschoon we had an impromptu brainstorm on limn, I'll fill you in when you have a min. [21:44:06] q for you [21:44:21] is there a standard bit of information that will come along with all event.gif logs? [21:44:30] e.g. request IP, referrer, etc. [21:44:30] ? [21:45:04] milimetric, i've got it open [21:45:13] will there be any data other than product_code and the payload (as query params) else transmitted? [21:45:13] yeah, many standard bits [21:45:48] ok, so asher told me that ori wants to not have anything extra in his stream, which is why asher is looking into configuring a separate stream for FR [21:46:11] gotcha. [21:46:16] well, we can work around that. [21:46:24] yeah, but [21:46:30] but, if we want to have all the same info fields that FR has, then we might want to convince Ori to use the same format [21:46:31] right? [21:46:33] it might mean the final production path for the endpoint won't be /event.gif [21:46:33] then all use the same stream? [21:46:43] or, rather, that there will be several paths [21:47:06] (cool, ping me if you wanna chat) [21:47:27] the fr stuff i was talking about is banner serving from bits, unrelated to event.gif [21:48:02] but yeah, you guys and ori should get on the same page about what pixel service logs will actually contain [21:48:39] hm, binasher, is banner serving from bits going to be a different format than the usual web log format we use? [21:48:51] yes [21:49:15] it will have some extra fields specific to fundraising [21:50:06] ok [21:50:28] but aside from that it will pretty much be request logs? [21:52:07] ahhhhh, i gotta run, i might be back and online in a little bit [21:52:13] yup [21:52:21] ok drdee, I think the hive/zk thing is done [21:52:25] it looks good on an01 anyway [21:52:35] hive services refreshed [21:53:17] the extra fields will just be to indicate banner / language / project / country, since they'll all be loaded from a single url [21:53:32] thanks ottomata [23:02:57] drdee_: git review containing -g fix as last field [23:04:09] drdee_: https://gerrit.wikimedia.org/r/29469 [23:22:54] thanks