[00:00:44] :( [00:01:11] Is there an easy way to visualise changes to an article history over time other than manually screengrabbing and putting into a video on my own? [00:02:24] buh. [00:02:32] what do you mean by "visualise"? [00:02:59] i think it's a rather complicated problem since there isn't really any structure to text [00:03:18] unless you just mean "flash a color behind deletions and insertions as they appear"? [00:03:19] http://ozziesport.com/wp-content/uploads/2012/11/Tyan-Taylor.wmv [00:03:21] Like that. [00:03:33] That would be nice but wouldn't even need that. [00:03:55] I'll look, but this is probably a question for someone who better-knows tools that people have made for wp. [00:03:58] maybe drdee? [00:04:22] Not a big deal. I can manually compile it. [00:04:35] Presenting at a conference on Tuesday or Wednesday. [00:06:31] That's what I'd do [00:06:40] Wouldn't be too hard to do in a webbrowser with JS. [00:15:10] I'm manually getting them. A little monotonous but getting done. [00:17:26] you could probably just download a dump, but that'd definitely require all manner of annoying parsing [00:17:52] I just want one example page. [00:18:28] Presenting at a conference about sourcing issues for women's football in Africa articles when not in Africa and in dealing with Wikipedia standards [00:18:35] *nod* [00:18:55] Realised I was expected to do a powerpoint. [00:19:00] would kind of be a neat browser-extension [00:19:14] add a button on each history page and use the API [00:19:18] mmm :) [00:19:26] I've seen the visualization done. [00:19:34] I can't recall where I saw it though. [00:19:48] It is really nice though for showing people how articles can and do develop. [00:20:02] For my audience, I'd like to contrast edits with page views. ;) [00:20:11] but that's neither here nor there. [00:20:51] I put together a big report on the state of Australian women's sport material on Wikipedia, Commons and Wikinews. [00:21:15] Which will be presented through my university to parts of the Australian government later this year. [00:38:14] purplepopple: totally agree with dschoon [00:38:27] i am calling it a day, dschoon, see you tomorrow :) [00:38:38] see ya drdee [00:44:02] dschoon, today was a very productive day btw: ottomata puppetized storm, i fixed instrumentation for hive, otto and i fixed the hadoop job history server and robh racked the dell machines [00:44:15] nice [00:44:19] that's hot shit [00:44:26] i wrote a bunch of limn code [00:44:37] and wasted half the day trying to get internet. [00:44:41] (tragic) [00:46:14] :) [00:52:35] somehow after 20 years of ISP's that still seems to be problematic :( [00:52:41] now i am really leaving [00:52:51] :) nite [00:54:55] they seriously are [00:54:56] night man [12:41:50] good morning peoples [13:47:36] man, that's way too early! [13:47:45] milimetric1! [13:59:55] milimetric morning and ^^ [14:00:05] morning! [14:00:17] oh :) [14:00:24] um, yeah, I am super-obsessed [14:00:38] there's an amazing elegant solution here. I can friggin smell it [14:00:48] but I don't think I have enough time to figure it out before the demo [14:01:04] either way, I think we're scrapping this approach until after december [14:01:12] but like - SOO close :) [14:06:46] check this out: http://ganglia.wikimedia.org/latest/?c=Analytics%20cluster%20eqiad it shows raw PHP source code for me ??!??!?!??! [14:08:26] milimetric ,before the demo, you mean today? [14:08:36] yep [14:09:13] hey drdee that's a very cool dashboard [14:09:20] what is it :) [14:09:46] is it real and pointing to our cluster? [14:09:50] it's WMF"s standard monitoring infrastructure for all our boxes [14:09:59] that view is on the current 7 Cisco's [14:10:10] so yes this is baby kraken [14:23:19] mooooooorning ottomata [14:23:22] morning! [14:24:37] quick question, it seems that analytics1001 is not currently used for MR jobs, any specific reason? [14:26:24] whatcha mean [14:26:26] ? [14:26:46] as in like datanode and nodemanager are not running there? [14:26:59] because its the namenode, and cdh4 recommends keeping namenode and datanodes separate [14:27:15] yes, look at http://analytics1001.wikimedia.org:8088/cluster/nodes [14:27:31] but then we are underutilizing a very beefy machine [14:28:35] we might wanna move all the stuff on an01 to another box, or not? [14:30:43] yeah probably so [14:30:48] when we set this up we just didn't have any other machines [14:31:27] i wonder if we could do failover testing when we do that! [14:32:42] aight [14:33:01] so, what's up with webstatscollector? [14:33:26] average_drifter and i wrote a new version that we could deploy [14:33:39] ideally side-by-side with the old version to make sure it works [14:33:48] ok [14:33:58] where does it run? i don't really know much about this guy [14:34:06] oh this is a udp2log thing right/ [14:34:07] ? [14:34:09] which guy? [14:34:14] i think on locke [14:34:15] webstatscollector [14:34:19] :D [14:34:20] locke [14:34:23] it runs [14:34:35] and it just consumes a udp2log stream [14:34:38] aye [14:34:43] without the PIPE command [14:34:52] so it's unsampled [14:35:04] webstatscollector consists of two components: [14:35:22] a) daemon that stores data in berekely db and writes db to a file every hour [14:35:40] b) filter that does the filtering, we deprecated filter and moved the functionality to udp-filter [14:35:41] # domas' stuff. [14:35:41] # (This looks like a bunch of C to filter for mobile pages [14:35:41] # and output things by language.) [14:35:41] pipe 1 /a/webstats/bin/filter | log2udp -h 127.0.0.1 -p 3815 [14:35:42] ? [14:35:50] yes that's it [14:36:07] so that's 'filter' and we are deprecating that part in the new version [14:36:12] ok [14:36:17] we will use udp-filter [14:36:26] but there is also a daemon running called collector [14:36:58] yeah [14:36:59] i see that [14:37:06] how is it launched? [14:37:10] let's wait for average_drifter to show up [14:37:13] ok [14:37:17] the daemon? [14:37:21] yeah [14:37:30] i think: './collector' [14:37:36] yeah but by who? [14:37:41] its running on locke [14:37:42] no init script though [14:37:49] but we provided some command line parameters to customize the port it's running on [14:37:52] and I don't see it in puppet [14:37:56] i have no clue [14:38:04] it could be that it's not yet puppetized [14:38:10] probably not [14:38:32] we will give you a shiny new debian package [14:38:42] how's that? [14:38:46] coool [14:40:52] brb need some juicy coffee, my coffee shop got broken in yesterday, the entire front door was kicked in [14:54:29] milimetric: node.js has been installed on the jenkins ci server [14:54:40] ah, cool [14:55:16] so we don't need travis ci then maybe [14:55:44] but I am not sure we'll be able to get to unit testing until after December. It'd be awesome if we could [14:58:12] true, it's not high urgency for sure but just wanted to let you know [15:04:27] ottomata, is it okay if i start fiddling with the FairScheduler for Hadoop today? [15:04:32] sure [15:04:59] i was thinking of two queues: one for core jobs and one for regular users [15:05:52] sounds good, i don't know that much about fair scheduler yet [15:05:59] only what it is for [15:06:08] that makes two of us :D [15:06:24] started looking at it the other day, but stopped because I wanted to have a use to test it with [15:06:30] i could easily just set the parameters you had mentioned [15:06:36] but it'd be nice if we could compare before and after [15:07:02] compare what? [15:07:10] right now it's First In First Out [15:07:20] for jobs that are submitted [15:07:48] with FairScheduler you can basically manage the priorities of different jobs [15:08:22] i'd like to see it work, so [15:08:24] so you need at least to submit multiple jobs at the same time [15:08:25] for example [15:08:27] right [15:08:35] so we submit a buncha jobs, (of two types) [15:08:41] ok [15:08:41] and see that the ones last in line have to wait for the first ones [15:08:44] then, we change [15:08:48] and see them execute in parallel [15:08:50] ok [15:08:58] first coffee then fiddling ;) [15:09:01] k [15:23:26] oh hi [15:58:12] hey average_drifter [15:58:26] can you create a debian package for webstatscollector [15:58:26] ? [16:15:30] ottomata, are you editing mapred-site.xml? [16:15:46] nope [16:15:52] is puppet running? [16:16:08] ah yes! [16:16:08] grrr [16:16:10] probably a cronjob [16:16:13] i made changes but nano says " File was modified since you opened it, continue saving ? " [16:16:41] ok, stopped puppet again and commented out a cronjob [16:16:48] that restarted puppet [16:16:49] ty [16:20:37] ottomata, can i restart hadoop? [16:20:44] sure [16:39:26] morning all [16:39:44] milimetric: so i didn't write much code last night, but i *did* read quite a bit of code [16:39:51] dude [16:40:00] I awoke with like awesome power [16:40:05] haha [16:40:07] sleep does that :) [16:40:08] my brain was dead [16:40:12] DEAD man [16:40:12] :) [16:40:21] I'm writing this thing the way knockout was meant to be used [16:40:36] anything for me to pull? [16:40:38] I just had to go google.com -> "knockout async" and refresh my mind a bit [16:41:06] um still working on it but if you pull my fork and look at /models/Graph.co you'll see that part at least working now [16:41:15] k. [16:41:17] I'm working on simplifying all the children [16:41:20] like, it still does nothing now [16:41:21] i wrote some useful things, i think. [16:41:26] but it's not dead [16:41:30] cool [16:41:45] I mean I still think we abandon this 'cause I failed but it's definitely useful [16:41:48] for later [16:42:02] well [16:43:38] i created a new branch, and did some refactoring. [16:43:54] i mostly got rid of `www` and made a `views` directory for the server views. [16:44:21] i also set it up so the templates could still be written in jade but work in knockout [16:44:49] i thought about using that template loader thing, but decided against it. [16:45:18] my father once told me that the most accurate metaphor for the heart and mind are an elephant and a donkey tied to the same chariot, you being in the chariot [16:45:30] our hearts are definitely strong :) [16:45:36] poor donkey [16:46:00] haha [16:46:21] seriously though, we have to make a cool headed decision later today [16:46:39] last 100 yards, lets sprint so we have all the data we need [16:46:47] did you pull my changes? [16:46:50] for the server stuff? [16:47:04] i'm afraid to because i want to be isolated a little longer while I fight with this [16:47:09] then I will [16:47:10] mk [16:47:15] i just get conflicts when i pull. [16:47:24] oh that sucks [16:47:44] was it working with just your stuff? [16:47:55] sorta. [16:47:59] the server was working. [16:48:19] the response from /dashboards.json was different in master than you were expecting [16:48:29] so i made some changes to the stuff that handled responses [16:49:51] because yours just loads files [16:49:54] which isn't quite right [16:51:18] i'll fix it again, but please pull this time [17:12:13] packages ready [17:12:15] on build1 [17:12:19] /home/spetrea/wikistats [17:12:24] udp-filters_0.3.23_amd64.deb [17:12:28] webstatscollector_0.2.34_amd64.deb [17:12:44] ottomata: do you have some time so we can deploy these ? [17:15:21] yeah, let's do after standup [17:16:30] ok [17:24:39] battery dying, back on in a bit [17:30:59] average_drifter: just sent you your recommendation letter, let me know if all is good [17:52:48] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [18:01:07] https://plus.google.com/hangouts/_/2e8127ccf7baae1df74153f25553c443bd351e90 [18:01:12] ottomata ^^ [18:32:22] hey dudes, webstast collector time? [18:36:16] yooooo [18:39:18] average_drifter ^^ [18:41:02] hey hey ready [18:41:15] ok, so [18:41:22] this what i recommend: [18:41:23] let's do webstats collector first [18:41:29] i can add that deb to wikimedia apt [18:41:31] should I do that? [18:41:33] keep the current stuff running on locke [18:41:41] i think so [18:42:03] then setup udp-filter not to filter out bots (stefan knows how to do this) [18:42:47] yes, there's a switch [18:42:54] let's run it for 48 hours and compare the old and new files on monday [18:43:16] -B is the switch for bot detection [18:44:36] -B turns it on or off? [18:44:47] on [18:45:29] average_drifter, how do you set udp-filter in webstatscollector mode? [18:45:47] (there is a switch for that as well) [18:47:21] the switch is -t [18:47:38] so udp-filter has a switch called -t which does exactly what the filter in webstatscollector does [18:47:39] okay so just run udp-filter -t [18:48:26] for filter, we still need collector? [18:48:42] wait, is there a reason to use the .deb then? if collector is already installed and running? [18:48:48] we only have to change the udp2log filter then, right? [18:49:41] so webstatscollector provides a collector (and filter, but we moved the functionality for the filter in udp-filters) [18:50:04] the new collector is only required if we enable bots filtering [18:50:06] has collector changed at all in the .deb? [18:50:08] hm [18:50:11] ok [18:50:15] which we are not doing atm [18:50:21] so i guess the old collector is fine [18:50:37] so let's just do udp-filter for now, and do webstats collector if/when we have to [18:50:42] but then you have to run it on a box not being locke [18:51:03] because old collector uses a hardcoded port [18:52:30] but, filter is already running as a udp2log filter [18:52:34] in the udp2log config file [18:52:40] so if we don't do anything to collector at all [18:52:48] and just change the line in udp2log-locke config [18:52:55] it shoudl be the same, right? [18:53:01] pipe 1 /a/webstats/bin/filter | log2udp -h 127.0.0.1 -p 3815 [18:53:03] changes to [18:53:07] right but then we cannot compare the results [18:53:09] pipe 1 udp-filter -t [18:53:11] or add a line [18:53:13] or whatever [18:53:17] right? [18:53:47] is that in order to multiplex the output ? [18:53:59] bwer? [18:54:04] syntax looks good but i would be more relaxed if we would spin this on a different box [18:54:09] we can do that [18:54:18] especially with udp2log multicast stream [18:54:23] we can try this on stat1 even [18:54:25] kool [18:54:33] wait, what does collector do? [18:54:35] i'm so confused [18:54:41] does it send to udp2log? [18:54:48] no [18:54:48] on locke? [18:54:57] collector listens on a port and receives data specifically crafter for it from udp-filter [18:55:01] udp2log -> filter -> collector [18:55:03] ohhhhhhhh [18:55:04] ok [18:55:04] *crafted [18:55:08] cool, oh [18:55:08] oh ok [18:55:15] and now udp-filter can send to collector? [18:55:23] uhmmmmm [18:55:34] use this | log2udp -h 127.0.0.1 -p 3815 [18:55:36] sorry [18:55:46] udp2log > filter > log2 udp > collector [18:56:00] ok [18:56:02] and we are changing to [18:56:03] and filter is now udp-filter [18:56:09] ok [18:56:17] so yeah, we can run this elsewhere, if we have collector installed [18:56:22] check [18:56:24] is collector on stat1 already? [18:56:28] don't' think so [18:56:38] i'm just going to copy the /a/webstats dir over there [18:56:39] afaik it's only installed on locke [18:56:39] and do it that way [18:57:13] sounds good [18:58:41] back in a hour or so [18:59:21] I'll be looking at the editor configuration for wikistats in the meantime [19:13:21] ung, average_drifter, i can't run webstats on stat1 [19:13:24] different versions of libdb [19:14:26] ottomata: I built it on build1 [19:14:41] ottomata: can you install libdb with the same version from build1 ? [19:14:50] no, bulid1 is lucid, stat1 is precise [19:14:54] oh [19:14:55] libdb is at 4.8 or somethign now [19:15:00] then should we build it on stat1 ? [19:15:14] or [19:15:17] maybe you can change dep [19:15:18] >= [19:15:18] ? [19:15:28] alright, give me a version and I'll change dep [19:15:28] :) [19:15:44] or I can check on stat1 [19:15:48] to see what the version is of libdb [19:15:54] oh hmmmmmm, but in this case the package name contains the version [19:16:06] libdb4.6 vs. libdb4.8 [19:16:12] or libdb5.1 [19:16:41] hm, [19:16:45] ungh [19:17:06] wait [19:17:10] do we really need to run this at the same time? [19:17:14] why don't we just take a sampled file [19:17:18] and run it through old filter [19:17:21] and then take the same file [19:17:26] and run it through udp-filter [19:17:29] and compare? [19:18:53] we can try that, but can I update the libdb also [19:18:55] so it fits [19:19:08] so on stat1 [19:19:08] p libdb4.8:i386 - Berkeley v4.8 Database Libraries [runtime] [19:19:23] oh that is avail on build1? [19:19:24] cool [19:19:26] that should work then [19:19:52] ah yeah [19:19:55] 4.8 is on build1 [19:19:58] link against that and it should be fine [19:20:03] 4.8 on stat1 [19:20:06] but it's not installed [19:20:20] it says "p" , so I think it should say "i" if it was installed (libdb 4.8 I mean) [19:20:34] i A libdb4.8 - Berkeley v4.8 Database Libraries [runtime] [19:20:38] p libdb4.8:i386 - Berkeley v4.8 Database Libraries [runtime] [19:20:42] yeah, its also on stat 1? [19:20:42] yeah it is [19:20:43] i libdb4.8 [19:20:43] p libdb4.8-dev - Berkeley v4.8 Database Libraries [development] [19:20:47] p libdb4.8-dev:i386 - Berkeley v4.8 Database Libraries [development] [19:20:54] p libdb4o-cil-dev - native OODBMS for CLI - development files [19:20:58] oh wait [19:21:00] i libdb4.8 [19:21:01] it is installed [19:21:03] so stat1 has libdb 4.8 [19:21:09] ja [19:21:11] so does build1 [19:21:17] so if you build on build1 and link against 4.8 [19:21:19] should be fine [19:21:19] right? [19:21:26] yes [19:21:31] k cool [19:22:33] also, FYI, we need a precise and a lucid version of udp-filter [19:22:35] .dev [19:22:37] .deb* [19:23:15] so you should build one on build2 as well, and name the package with the dist [19:23:18] for example [19:23:19] right now we have [19:23:33] udp-filter_0.2.6-1~lucid_amd64.deb [19:23:33] and [19:23:33] udp-filter_0.2.6-1~precise_amd64.deb [19:24:09] here are the instructions I have to follow to put a package in our apt repo: [19:24:10] http://wikitech.wikimedia.org/view/Reprepro#Importing_packages [19:24:15] (oh maybe you can't read those) [19:25:04] For distribution, use the distribution that the package has been compiled for, and under. Usually, any given compiled package should be for one distribution only, e.g. hardy-wikimedia OR lucid-wikimedia. This should match the field in the package's Changelog. [19:28:09] reading [19:30:07] so, for these [19:30:17] i left of the '-wikimedia' bit when I built them before [19:30:24] and just named them after their ubuntu dist name [19:30:30] so lucid and precise [19:30:42] the apt repository contains these files: [19:30:54]