[00:16:11] drdee: i'm talking to dan foy about VUMI stuff, and I keep forgetting whether we actually have a reliable unsampled log for all requests [00:16:13] do you know? [00:18:33] dschoon: do you know if we have an unsampled log? ^ [00:23:07] i guess it's unsampled [00:23:21] but you would have to check with ottomata to be sure [00:24:21] k, basically dan raised the question of how we track how much SMS traffic is coming form each partner [00:24:53] and I realized I don't think we can do that unless we have an unsampled log, or go down the udp-filter route, or add custom varnish behavior [00:24:59] drdee: thoughts? [00:50:14] i'll think about it, erosen. i glanced at those varnish configs when looking into the beta site stuff. [00:53:47] cool [01:49:46] dschoon: what is the chart type for the new datasources? [01:50:33] that field is obsolete [01:50:36] erosen ^^ [01:50:40] awersome [01:50:51] I'm updating the limnpy to use the new formats [01:51:21] so it is just 'chart' : {'type' : 'timeseries'}? [01:51:26] these are the fields: [01:51:26] https://github.com/wikimedia/limn/blob/master/src/data/datasource/datasource.co#L65 [01:51:52] cool [01:52:03] usually, "defaults" contains an exhaustive list of the attributes a configuration object supports [01:52:04] do you know where in the limn object in the browser I can find examples? [01:52:12] totes [01:52:35] so, the src directory hierarchy is replicated as an object hierarchy on limn [01:52:45] k [01:53:00] src/graph/node/vis/line-node.co --> limn.graph.node.vis.LineNode [01:53:11] (for the classes, that is) [01:53:25] the active model and view are at limn.model and limn.view [01:53:39] aha that's what I was forgetting [01:53:48] for a page with a single graph, limn.view is a GraphView [01:53:55] limn.model will be a Graph [01:54:01] you can also go [01:54:05] I see [01:54:20] how do i find a datasource object model from such page [01:55:34] limn.data.DataSource.lookup( DATASOURCE_ID, function(err, ds) ) [01:56:00] if you're already sure the source is loaded, you can just go: [01:56:15] limn.data.DataSource.get(ID) [01:56:33] that merely checks the cache. [01:56:54] cool [01:57:05] actually maybe i was making things too complicated [01:57:20] does something like this yield an up to date version? http://reportcard.wmflabs.org/data/datasources/rc/rc_comscore_region_uv.yaml [01:58:44] pretty sure datasources have not changed at all [01:59:12] excepting the new field "type" to go along with "format" [01:59:21] type defaults to "timeseries", so you should be fine [02:00:53] dschoon: can you clarify: """ excepting the new field "type" to go along with "format" """ [02:01:16] check out the comment here: https://github.com/wikimedia/limn/blob/master/src/data/datasource/datasource.co#L69 [02:01:21] and lmk if that helps? [02:01:56] the idea is that "format" is how the data is serialized/encoded/formatted. csv, tsv, xml, json, etc [02:02:05] that makes sense [02:02:08] type is about the contents [02:02:17] gotcha [02:02:29] so geojson has special meaning. it's polygon data for drawing map outlines [02:02:29] i just couldn't tell if there was some relationship between the two [02:02:58] because i think format was in the old version [02:03:12] they're usually independent, but the XXXjson types obviously imply format=json [02:03:14] so really just type is new [02:03:20] yep. [02:03:20] ya, fo sho [02:03:24] as i said, i think :) [02:03:29] hehe [02:03:30] and type defaults to "timeseries" [02:03:31] we'll find out [02:03:35] cool [02:03:36] so i think you don't have to do anything? [02:03:51] so hopefully datasources haven't changed at all [02:04:02] ...it also occurs to me i'm sitting in legal. [02:04:05] well i'm going to drop the "chart" key [02:04:08] and you might be on the other side of the room [02:04:10] nope [02:04:13] ah, ok :) [02:04:15] i'm in PA today [02:04:15] hehe [02:04:17] yeah, drop it [02:04:26] cool [02:07:27] aiight, i should head home [02:07:27] i'll be back online in a bit. [02:07:27] laterz [02:07:28] cool [02:07:29] in case you have questions [02:07:30] lates [05:05:34] erosen [05:05:37] oh too late [05:05:42] haha, i just read his question [05:05:52] I have numbers on unsampled reliability [05:06:03] want to graph it once I get data syncing from /wmf/public to stat1001 [14:27:59] mornin [15:10:33] gooooood morning guys [15:14:17] milimetric, ottomata [15:14:30] hey diederik :) [15:16:37] morning [15:18:41] mooooororoning [15:27:49] milimetric [15:27:52] shall we puppetize limn? [15:28:04] yes! [15:28:06] let's do it [15:28:30] hokay, so, uhhh, yeah, what's needed [15:28:39] install package... [15:28:53] set some configs [15:28:54] start a service? [15:29:08] ok, so here's the list of things: [15:29:14] 1. apache installed [15:29:22] 2. supervisor installed [15:29:41] 3. deb package that average_drifter built installed [15:30:00] i wonder if we can use upstart instead of supervisor, ops would be much happier about that [15:30:10] 4. conf file for apache -> input is the path where 3. installs to and the port that 2. will use [15:30:26] 5. conf file for supervisor -> input is the same as apache [15:30:41] oh, one more input for 4. -> the domain name that it's going to serve on [15:30:49] oh ok, lemme see what upstart says [15:31:01] should be easy [15:31:06] its just running a process [15:31:13] setting some env vars [15:31:18] yeah, looks the same from their description [15:31:32] where is stefan's .deb? [15:32:10] well, it can be built using his scripts and debianize submodule - he's working on it in a limn branch [15:32:22] let me pull latest. average_drifter if you're around we're about to use your deb [15:32:24] ah ok [15:32:37] do we have puppet modules for the other stuff? [15:32:39] apache I'm sure [15:32:40] well i just want to see where it puts stuff, and if we should make the upstart script part of it [15:32:42] but upstart? [15:32:44] apache yeah, supervisor no [15:32:50] upstart no, but it is just a couple of files [15:32:53] upstart comes with ubuntu [15:32:57] ok, cool [15:32:57] but, usually [15:33:02] packages that need to be run as services [15:33:15] include their own init (init.d or upstart) scripts [15:33:18] unless [15:33:24] we intend to run multiple limn instance on a single node [15:33:28] which probably makes sense, right? [15:33:33] well, the long term plan is to allow that [15:33:37] milimetric , ottomata can I have the apache vhost conf for limn to include it in the .deb please ? [15:33:44] but right now, we would need to install it to multiple directories [15:33:50] hmmmm [15:33:58] wait limn doesn't require apache, does it? [15:34:07] no, it's just how it's set up on labs now [15:34:10] milimetric: deb is not ready yet, working on it [15:34:14] that's just for static files, right? [15:34:36] i think it handles the routing to the localhost:LIMN_PORT [15:34:36] running server.co and going http://whatever:LIMN_PORT [15:34:38] right [15:34:40] ok [15:34:47] i don't think apache confs should be part of the .deb then [15:34:51] server.co is run by upstart [15:34:51] ottomata: cool [15:34:51] we can puppetize that bit [15:34:54] right [15:35:07] but yeah, if we want to run multiple instances [15:35:15] then maybe dont' worry about including an upstart script [15:35:20] we'll puppetize it [15:35:30] so that puppet will include an init/upstart script per instance [15:35:38] yeah, ok [15:35:42] and when the deb supports that, it'll be easier [15:36:00] is it possible to run multiple instance from the same install? [15:36:01] like [15:36:12] server.co is basically the binary [15:36:17] I should be able to set env vars [15:36:38] or config vars [15:36:42] well [15:36:45] yea, BUT [15:36:46] and run server.co from the same path but in different processes [15:36:47] milimetric: did you talk to dschoon about the version thingie ?> [15:36:47] right? [15:36:54] hi drdee [15:37:06] it uses the limn_install/var/data directory [15:37:16] so that's in common [15:37:17] what's limn_install ? [15:37:18] ooooh! [15:37:27] wherever the deb puts it average_drifter [15:37:35] but I just realized, it doesn't have to use that - we can configure it [15:37:39] milimetric: /srv/limn ? [15:37:42] yes [15:37:45] ok [15:38:05] milimetric: we need to version thing ready to include it in the package [15:38:06] so the limn code should be installed somewhere common, right? [15:38:11] ottomata: ok, so in theory, it should be able to use the same limn install to run multiple instances if I change the data directory to be configurable [15:38:12] so basically the version is governed by the tags [15:38:16] /usr/lib/limn or whatever [15:38:26] we were doing /srv/limn ottomata, is that no good? [15:38:34] as default data dir? [15:38:42] or for installing the code? [15:38:44] no as the place to install the code [15:38:46] hm [15:39:03] we can move it wherever we think it'd be more standard [15:39:20] yeah, i'm not sure in this case, limn is kinda a standalone service, but its also a website [15:39:22] right? [15:39:35] i think /srv should contain instance specific stuff [15:39:50] common reusable code and executables shouldn't go there [15:40:13] where is node installed? [15:40:55] no idea actually [15:41:32] I have a big fat juicy 5.6M limn.deb [15:41:44] containing all the node_modules [15:41:46] and all the stuff [15:42:00] /usr/lib/nodejs/ [15:42:11] ok, so /usr/lib/limn then? [15:42:28] but i feel like limn is more like apache than node [15:42:39] yeah i think so [15:42:41] (currently, of course) [15:42:44] apache is in /usr/lib/apache2 [15:42:48] ok, perfect [15:43:05] also, maybe we can symlink /usr/bin/limn to server.co [15:43:07] ottomata: so the node_modules should go in /usr/lib/nodejs/ ? [15:43:12] yo ottomata: i know you are like super super super busy but if you could setup the rsync of /wmf/public to stat1001 that would be really really helpful [15:43:13] so you could spawn up a non daemonize instance by doing [15:43:16] so average_drifter, we should change the limn install to /usr/lib and I have to make it so you can configure Limn to read data from any directory [15:43:25] limn —config-file path/to/limn.conf [15:43:27] or whatever [15:43:58] milimetric , ottomata , dschoon where would you prefer to have node_modules installed by the deb package ? [15:44:00] well, how about "set environment variables" && server.co [15:44:13] hmm, yeah I guess that's fine [15:44:19] we can make a wrapper that does that too [15:44:24] which could be /usr/bin/limn [15:44:25] /usr/lib/limn/node_modules <- average_drifter [15:44:37] ok [15:44:50] well no, the deb has to have those because ops doesn't want npm doing it's rogue crap on target machines [15:45:13] drdee, that's on my list, I just thought I'd talk with milimetric this morning about this, since that change will require review and won't happen until SF wakes up at the earliest [15:45:22] ok [15:45:41] reason i am asking is i am dying to show some of the stuff that's actually build [15:45:51] yeah [15:46:07] i was still kiiinda hoping that we could just producitonize limn and not have to sync /wmf/public [15:46:20] if I could install hadoop client on stat1, then it would be easy too :p [15:46:24] could just do hadoop fs -get [15:46:44] buuuut [15:46:44] ok ok [15:46:45] i can do [15:46:47] yeah, drdee we can have this all working and in prod by the 28th [15:46:58] and then we can show it off as a team [15:47:07] it'll probably be good to have the public stuff synced anyway [15:47:17] so we don't have to deal with any lockdowns of analytics cluster blocking access [15:47:37] rigt [15:48:09] ok, ottomata, average_drifter, I'll work on those configuration improvements now [15:48:14] mk, danke [15:48:50] milimetric: ok, and if you can find out about the version thing, we'll talk about it soon, ping me when you want me to pull [15:48:52] drdee, do you want to use concat_sort on /wmf/public/mobile [15:48:53] ? [15:49:07] so that you only have a single file to deal with? [15:49:07] yes [15:49:32] yep, average_drifter, I'm updating that as well [15:49:56] drdee, here's an example of how to do that: [15:49:56] https://github.com/wikimedia/kraken/blob/master/oozie/webrequest_loss_by_hour/workflow.xml [15:50:36] oink [15:51:54] what happened to stat1 ? [15:52:20] http://stat1.wikimedia.org/spetrea/ <== doesn't load [15:53:39] hm [15:53:40] uhh [15:53:42] we turned off apache [15:53:43] let ssee [15:53:44] we did? [15:53:47] on stat1? [15:53:54] yes me and paravoid yesterday night [15:54:05] oh, apache is running though [15:54:14] i think we stopped it [15:54:53] can we have it up again ? [15:55:00] not on stat1 [15:55:07] we really should be running all our web stuff on stat10001 [15:55:22] alright, uhm [15:55:31] can I move my stuff to stat1 ? [15:55:34] sorry [15:55:37] to stat1001 [15:55:40] or stat10001 [15:55:47] stat1001 [15:55:51] i mean, i guess so, the idea is that stat1001 is production web stuff [15:56:01] not for random 1 off hosting of things [15:56:01] no, i spoke with faidon about this [15:56:04] oh? [15:56:14] the issue is that stat1 contains private data [15:56:23] and we should minimize public access to that machine [15:56:57] i think stat1001 should be both random and production things [15:57:26] but average_drifter, you can just send me the raw data files or put them in my home folder [15:57:36] hm, ok, i was hoping that it wouldn't be random things, that way things that we want to be online are more stable (stats.wikimedia, limn report card eventually, metrics-api, etc.) [15:57:37] and right now you are working on debianization anyways :) [15:57:55] well, real random things should live in labs [15:58:00] sure [15:58:05] but not on stat1 either [15:58:10] that's fine with me [15:58:13] drdee: yes, well, a place to put the .debs somewhere where milimetric and ottomata can download them easily is good.. [15:58:17] but it's optional.. [15:58:22] right [15:58:45] i can scp from stat1 pretty easy [15:58:53] average_drifter - I think as long as anyone can build the deb easily, it doesn't matter if it takes a little long to go through the revlog [15:58:59] since that's only gonna be the first time anyway right? [15:59:18] milimetric: --update isn't implemented yet so it would be every time for now [15:59:22] uhm [15:59:27] but yeah we'll have that soon as well [16:00:05] exactly, I think in idealistic ways :) [16:14:35] average_drifter: I have a pageview by country question / proposal [16:14:51] do you have the code you use for doing this somewhere I can look at? [16:15:25] and would you be interested in starting a little repository of canonical (or de facto) methods for doing page view counting, so that we could compare different implementations on a standard data set? [16:15:54] erosen: sure [16:16:22] I gave Amit a version of the page views by country report and he found a few cases where the numbers don't match wikistats by significant margin [16:16:43] so I'm hoping we can pin down the problem, and start on a shared metrics repo [16:17:40] erosen: https://plus.google.com/hangouts/_/96856de55c688666f7bc3f769d67f799fa69298f [16:18:17] average_drifter: can't join quite yet--still on my train [16:18:22] erosen: ok [16:18:30] i can do that in 40 min, if that still works for you [16:18:50] erosen: average_drifter: let's make some flow diagrams we started for mobile page views alrady [16:18:58] drdee: yes [16:19:43] drdee: sure, but I'm primarily concerning with finding the differences at the moment (though i can imagine that the data streams could be different--but I think that is unlikely) [16:19:56] s/concerning/concerned/ [16:21:43] i can tell you all the differene [16:21:51] don't' worry :) [16:21:58] swing by my desk this morning! [16:22:01] hehe [16:22:07] seriously [16:22:28] i also pushed a pig script called Pageview to come up with a canonical page view count for kraken [16:22:30] drdee: not that I don't have utter faith in your knowledge of the stats goings on, but I'm talking about the difference between my counting algorithm and wikistats [16:22:37] it's not finished yet [16:22:42] very nice [16:22:50] right but i can tell you how wikistats works [16:22:55] and then you will be like [16:23:03] ohhhhhh yes of course there is a discrepancy [16:23:11] i see [16:23:13] see you after scrum? [16:23:18] relocating to office right now [16:23:29] sounds good [16:41:16] drdee: can I request permission to publish stuff on stat1001 ? [16:41:19] I don't have access to it atm [16:41:33] I'm refering to the new mobile pageviews reports [17:07:21] milimetric: i mostly finished the viewport refactor yesterday [17:07:36] just a few tweaks to make sure it works for all node types [17:21:10] awesome dschoon - that should be a very clean happy feeling [17:33:11] erosen: hey. thanks for the last link. I'venow gathered everything here: https://github.com/geohacker/indicwiki/tree/master/data [17:33:47] erosen: https://www.mediawiki.org/wiki/User:Spetrea/What_is_a_pageview [17:33:55] nice [17:34:05] erosen: also, about the edits by geography - do you think we can find edits within India? may be at the state level? [17:34:08] erosen: add that to your watch list, I'm adding more details to it [17:34:13] average_drifter: is that a graphviz graph? [17:34:28] erosen: yes, it is [17:34:31] average_drifter: will do, do you want to video chat after scrum with drdee? [17:34:38] erosen: yes [17:34:40] geohacker: definitely [17:35:14] geohacker i have a rather rich store of data on that matter. What level of granularity are you looking for? [17:35:31] geohacker: I can give you by city edits (for cities which contribute more than 10% of total edits) [17:35:44] or I can just give you country level edits for each language [17:36:04] erosen: city and country level would make sense right now. [17:36:14] I can think of a pretty nifty map mashup. [17:36:24] only if time allows me to code it up. [17:36:41] but otherwise we can look at both sets separately. [17:37:06] erosen: would be fantastic. [17:41:30] geohacker: I see you're having fun :) [17:41:53] geohacker: so here is the country level data http://gp-dev.wmflabs.org/graphs/hi_top10 [17:43:58] YuviPanda: indeed. did you see https://github.com/geohacker/indicwiki/tree/master/data [17:44:03] ah no :) [17:44:04] erosen: checking [17:45:43] erosen: ah yes. when I said country level I imagined these are edits within the country. [17:46:00] erosen: do you have city level data handy? [17:46:21] geohacker: sort of [17:46:39] erosen: awesome :) [17:48:35] erosen: ping me with a g+ link when you're ready [17:49:02] k [17:53:21] geohacker: I have a database with all of the city edits for all languages. can I just give you a csv version and you can filter out the languages / countries you care about? [17:53:37] geohacker: meanwhile I'll work on making some line charts of city-level data [17:53:43] but that will take much longer [17:54:42] erosen: that sounds perfect. [17:55:29] erosen: I'll filter all the indic projects and their activity in Indian cities to begin with. [17:55:38] cool [17:56:03] oh I'll geocode them and send it back to you if I'm successful. [18:02:53] milimetric: scrum? [18:10:38] erosen: just poke me whenever you have the link. thanks! [18:11:28] geohacker: sounds good -- in a meeting for a bit more [18:11:56] erosen: no worries. later. [18:36:20] average_drifter: how about now for the country report hangout? [18:36:34] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [18:36:50] erosen: sure [19:32:17] erosen: https://github.com/embr/metrics <== this is the repo ? [19:32:23] yup [19:32:26] ok [19:32:27] just putting my code in it, now [19:32:30] ok [19:40:13] milimetric: can you please try following this Deb.md file and see if you can get a deb ? https://github.com/wikimedia/limn/blob/debianization/Deb.md [19:40:29] milimetric: you should be able to get a deb that way. if you encounter any problems please let me know so I can fix them [19:41:06] ok, awesome average_drifter. I'm working on the data directory thing and will try as soon as I finish [19:41:14] ok [19:43:39] erosen: I need to ask a question. So what I've been working on is a monthly mobile pageviews report per wiki project [19:43:46] kk [19:43:48] sup? [19:43:59] erosen: but you mentioned a monthly mobile pageviews report per country [19:44:02] right ? [19:44:08] yeah [19:44:18] so we'll be working on one per country [19:44:39] we can do either [19:44:51] erosen: is Amit ok with your monthly mobile pageviews report per country ? [19:44:57] not sure [19:44:59] I should talk to him about tthat [19:45:02] I'll do that today [19:45:07] ok [20:02:53] ottomata, drdee: I'm stuck, can't access kripke or reportcard in labs [20:03:12] i think labs is having issues, i hear ryan lane say anyway [20:03:22] ssh kripke tells me Permission denied [20:03:25] k [20:10:04] erosen: I'm stalking your embr/metrics repo :) [20:10:09] hehe [20:10:12] updating the dot file presently [20:10:17] ok [20:18:28] okay, average_drifter: at long last: https://github.com/embr/metrics/tree/master/pageviews/embr_py [20:18:57] also, I just added you as a contributor [20:19:01] erosen: thanks [20:19:06] ok average_drifter, ottomata - Limn can now serve multiple instances from the same install / clone [20:19:23] erosen: I'll add my stuff too and we can try to compare our results on a 4-day period ? [20:19:31] erosen: what do you think of that ? [20:19:32] sounds good [20:19:41]