[06:07:34] (PS1) QChris: Add data for 'Banglalink Bangladesh' and 'Umniah Jordan' [analytics/wp-zero/data] - https://gerrit.wikimedia.org/r/90087 [06:51:32] t [06:51:42] Sorry wrong window ... :-) [06:52:27] u [06:52:30] :P [09:50:42] @rss- mingle [09:50:42] Item was removed from db [10:09:49] (CR) QChris: "(1 comment)" [analytics/geowiki] - https://gerrit.wikimedia.org/r/85625 (owner: QChris) [10:11:42] (PS2) QChris: Tread map-world_countries as yaml datasource [analytics/geowiki] - https://gerrit.wikimedia.org/r/85614 [10:12:08] (PS2) QChris: Allow to override the default expected date for datafiles [analytics/geowiki] - https://gerrit.wikimedia.org/r/85615 [10:12:26] (PS2) QChris: Expect editor fractions updates every day [analytics/geowiki] - https://gerrit.wikimedia.org/r/85616 [10:12:44] (PS2) QChris: Update monitoring expectations for 2013-09-18 [analytics/geowiki] - https://gerrit.wikimedia.org/r/85617 [10:13:10] (PS2) QChris: Add "look 30-days back" heuristic for small wikis without updates [analytics/geowiki] - https://gerrit.wikimedia.org/r/85618 [10:13:28] (PS2) QChris: Monitor active editor totals [analytics/geowiki] - https://gerrit.wikimedia.org/r/85619 [10:13:56] (PS2) QChris: Update monitoring expectations for 2013-09-21 [analytics/geowiki] - https://gerrit.wikimedia.org/r/85620 [10:14:16] (PS2) QChris: Use log mechanism to signal monitoring passed [analytics/geowiki] - https://gerrit.wikimedia.org/r/85621 [10:14:32] (PS2) QChris: Add quiet mode for monitoring [analytics/geowiki] - https://gerrit.wikimedia.org/r/85622 [10:14:48] (PS2) QChris: Switch default expected date for monitoring to yesterday [analytics/geowiki] - https://gerrit.wikimedia.org/r/85623 [11:36:31] (CR) Milimetric: [C: 2 V: 2] "(1 comment)" [analytics/wp-zero/data] - https://gerrit.wikimedia.org/r/90087 (owner: QChris) [11:44:35] (CR) QChris: "(1 comment)" [analytics/wp-zero/data] - https://gerrit.wikimedia.org/r/90087 (owner: QChris) [11:44:48] Thanks milimetric! [11:44:53] np [12:30:59] qchris, I'm leaving for the doctor soon [12:31:06] are there any other reviews I should look at? [12:31:13] No. [12:31:17] I might be gone all day, not sure yet [12:31:19] Enjoy your visit at the doctor :-/ [12:31:27] will do :) [12:31:50] The one you did was the one that I needed. [12:31:54] Thanks again! [12:53:09] morning qchris [12:53:09] (CR) Ottomata: [C: 2 V: 2] Clarify that format applies to data file when checking data sources [analytics/geowiki] - https://gerrit.wikimedia.org/r/85611 (owner: QChris) [12:53:13] morrninng [12:53:17] qchris so many review requests! [12:53:20] can you try this feed [12:53:29] http://www.devtacular.com/utilities/atomtorss/?url=https%3a%2f%2fmingle.corp.wikimedia.org%2fprojects%2fanalytics%2ffeeds [12:53:50] (CR) Ottomata: [C: 2 V: 2] Swap check_datasource parameters [analytics/geowiki] - https://gerrit.wikimedia.org/r/85612 (owner: QChris) [12:54:13] (CR) Ottomata: [C: 2 V: 2] When checking datasources, make clean which formats are supported [analytics/geowiki] - https://gerrit.wikimedia.org/r/85613 (owner: QChris) [12:55:39] (CR) Ottomata: [C: 2 V: 2] Tread map-world_countries as yaml datasource [analytics/geowiki] - https://gerrit.wikimedia.org/r/85614 (owner: QChris) [12:56:16] (CR) Ottomata: [C: 2 V: 2] Allow to override the default expected date for datafiles [analytics/geowiki] - https://gerrit.wikimedia.org/r/85615 (owner: QChris) [12:56:34] (CR) Ottomata: [C: 2 V: 2] Expect editor fractions updates every day [analytics/geowiki] - https://gerrit.wikimedia.org/r/85616 (owner: QChris) [12:56:53] (CR) Ottomata: [C: 2 V: 2] Update monitoring expectations for 2013-09-18 [analytics/geowiki] - https://gerrit.wikimedia.org/r/85617 (owner: QChris) [12:58:08] (CR) Ottomata: [C: 2 V: 2] Add "look 30-days back" heuristic for small wikis without updates [analytics/geowiki] - https://gerrit.wikimedia.org/r/85618 (owner: QChris) [12:58:42] (CR) Ottomata: [C: 2 V: 2] Monitor active editor totals [analytics/geowiki] - https://gerrit.wikimedia.org/r/85619 (owner: QChris) [12:59:01] (CR) Ottomata: [C: 2] Update monitoring expectations for 2013-09-21 [analytics/geowiki] - https://gerrit.wikimedia.org/r/85620 (owner: QChris) [12:59:06] (CR) Ottomata: [V: 1] Update monitoring expectations for 2013-09-21 [analytics/geowiki] - https://gerrit.wikimedia.org/r/85620 (owner: QChris) [12:59:11] (CR) Ottomata: [V: 2] Update monitoring expectations for 2013-09-21 [analytics/geowiki] - https://gerrit.wikimedia.org/r/85620 (owner: QChris) [12:59:28] (CR) Ottomata: [C: 2 V: 2] Use log mechanism to signal monitoring passed [analytics/geowiki] - https://gerrit.wikimedia.org/r/85621 (owner: QChris) [12:59:43] (CR) Ottomata: [C: 2 V: 2] Add quiet mode for monitoring [analytics/geowiki] - https://gerrit.wikimedia.org/r/85622 (owner: QChris) [12:59:57] (CR) Ottomata: [C: 2 V: 2] Switch default expected date for monitoring to yesterday [analytics/geowiki] - https://gerrit.wikimedia.org/r/85623 (owner: QChris) [13:03:20] mil metric before you leave [13:03:25] can you update the wall? [13:04:51] average, ottomata, qchris: can you guys update the wall as well? [13:04:52] ty! [13:54:59] average around? [13:58:40] qchris, ottomata, can you please update the mingle wall [13:58:44] https://mingle.corp.wikimedia.org/projects/analytics/cards/grid?aggregate_property%5Bcolumn%5D=estimate&aggregate_type%5Bcolumn%5D=sum&color_by=project+name&filters%5B%5D=%5BType%5D%5Bis%5D%5BFeature%5D&filters%5B%5D=%5BType%5D%5Bis%5D%5BDefect%5D&filters%5B%5D=%5BType%5D%5Bis%5D%5BInfrastructure+Task%5D&filters%5B%5D=%5BRelease+Schedule+-+Release%5D%5Bis%5D%5B%28Current+Release%29%5D&filters%5B%5D=%5BRelease+Schedule+-+Sprint%5D%5Bis%5D%5B%2 [13:58:45] rent+Sprint%29%5D&group_by%5Blane%5D=development+status&group_by%5Brow%5D=class+of+service&lanes=Ready+for+Dev%2CCoding+and+Testing%2CSign-off%2CShipping%2CShowcasing%2CDone&style=grid&tab=WIP+-+Features [14:00:28] can you give short url? [14:00:30] i can't link to that [14:01:15] http://bit.ly/13HvBJq [14:02:24] drdee I will tell you this since you milimetric is not here [14:02:31] this automatic hive partitioner thing I've been working on [14:02:34] is built in in hive 0.11 [14:02:38] I jsut found out [14:02:41] :) [14:02:44] https://issues.apache.org/jira/browse/HIVE-3231 [14:02:50] we have 0.10 right now [14:03:19] yarrghhhhh i just spent lots of time on this! :p [14:03:21] aaahhhh [14:03:42] it's cool [14:03:45] drdee, I cannot remember, did we come up with a decision about the mobile data to udp2log [14:03:46] ? [14:04:12] i think we need one canonical stream that contains web request data from all cache servers [14:04:28] so we need #1074 [14:04:29] drdee_ hopes that someone will have a look at https://mingle.corp.wikimedia.org/projects/analytics/cards/1074 [14:04:31] imho [14:05:04] yeah, but didn't we have a big discussion about this and come to some kind of official conclusion? [14:05:14] i remember we talked about it a bunch last week [14:05:15] i think we had multiple discussions about this [14:05:18] don't remember the outcome [14:05:38] the outcome was that people don't realize that we need a canonical stream [14:06:01] webstatscollector and the general 1:1000 sampled streams rely on that [14:06:17] particularly webstatscollector needs a live stream [14:06:29] yes it does but we don't put mobile in webstatscollector right now anyway, right? [14:06:34] so that part doesn't matter [14:06:35] we cannot fix that at the end of the month by ima [14:06:37] NOOO [14:06:39] we DOO [14:06:46] i have said this a million times now [14:06:48] haha [14:06:51] i know i don't listen well. [14:06:53] heheh [14:06:58] we do, just not well? [14:07:03] it works well [14:07:09] it just does mobile *project* level counts [14:07:14] oh right [14:07:19] so how my page view requests for en.m.wikipedia.org [14:07:19] because of the way mobile urls look? [14:07:29] aye [14:07:32] s/my/many [14:07:46] it just does not page view counts for individual mobile articles [14:07:53] ok, so to move forward on that, we need to summarize the issues for ops i think [14:08:06] that sounds good [14:08:14] etherpadding an email draft... [14:09:00] happy to help, let me know [14:12:33] writing now http://etherpad.wikimedia.org/p/Kafka-udp2log [14:12:58] drdee: you there? [14:23:16] drdee_ mhmm ... so we are allowed to move cards by ourselves now? I am getting confused as of when we must not do this, and when we should do it :-/ [14:24:07] drdee_ My cards are now in the correct columns. [14:24:10] i feel like i am a broken record player these days [14:24:32] on showcase days i always ask everybody to update their cards themselves [14:34:04] qchris: can you / do you want to demo something today? [14:34:09] average…..here? [14:34:24] ty qchris for updating the wall [14:34:47] qchris: I do not think that my cards are demoable. [14:34:53] drdee_: ^ [14:35:12] mmmmm [14:35:28] yeah i was thinking that as well [14:35:35] ottomata can you demo some hive stuff? [14:35:40] drdee, how's this looking? [14:35:43] accurate? [14:35:43] http://etherpad.wikimedia.org/p/Kafka-udp2log [14:35:48] for example querying of webstatscollector data? [14:35:51] let me look [14:36:11] drdee_,I have never queried webstats collector data [14:36:12] hm [14:36:17] k [14:36:19] milimetric said he imported it [14:36:23] but I don't htink we ahve partitions on it yet [14:36:25] I could create them [14:36:29] can you demo something? [14:36:30] and then we could query it, but I dunno [14:36:32] hm [14:36:37] nothing complete [14:36:46] because milimetric is not around either [14:36:51] so what are we going to demo? [14:39:14] ottomata, qchris, average: ^^ [14:39:31] drdee_: Do we need to demo anything? [14:39:42] well it's called the showcase meeting [14:39:53] the idea is that we showcase stuff :D [14:39:58] :-D [14:40:08] Point taken. [14:40:37] I think there is a hive table in dan's homedir [14:40:58] it doesn't contain a lot of data but oliver and I dumped the schema [14:44:05] since last night there should be a lot more data [14:44:09] but it doesn't have partiitons yet [14:44:12] we can put the partitions on it [14:44:15] yes but who could demo that? and is demo the ability to query really interesting to our customers? [14:44:18] but I thought we were only suppposed to demo complete things? [14:44:31] yes i agree with ottomata [14:44:40] fine with me [14:45:00] btw -- on the kafka-upd2log topic [14:45:18] why can't udp2log users start consuming kafka topics? [14:45:19] I was told to stop working on that when something important came up [14:45:23] drdee, can we get more info on the fundraising use of udp2log? [14:45:27] from Katy? [14:45:37] sure, let's talk to jeff green [14:45:42] he runs the udp2log instances for them [14:45:51] tnegrin: that is one of the solutions, but the data is in a different format [14:46:06] sure -- there would be work involved [14:46:11] yeah so [14:46:21] which solution? [14:46:31] C and/or D [14:46:55] I'll make E be more explictily that, and make F be the hybrid [14:48:15] drdee, let's leave B in, I want to describe all of the options, and why they will or will not work [14:48:16] I didn't anticipate that I would make any errors [14:48:33] tnegrin: we could make Kafka consumer processes for each of the consumers [14:48:33] ok, but maybe rank-order them by feasibility [14:48:36] no -- we need to have an option where upd2log goes away [14:48:42] that's sort of a 3 way hybrid solution [14:48:59] tnegrin: none of these make udp2log go away [14:49:08] we have to use kafka for all webrequests to do that first [14:49:19] this is just an intermediate solution [14:49:21] isn't that the goal? [14:51:06] but that's the goal, right? [14:51:53] totally [14:53:31] I feel like we should be specific about this in the document -- that way udp2log users would start thinking about moving off of udp2log completely [14:54:32] that might be a bit too early [14:54:50] why? [14:54:51] How is that possible? [14:55:00] and we are the main customers of udp2log anyways [14:55:25] because we haven't' planned the real phasing out of udp2log [14:55:45] and we need to talk with ops about them phasing out the squid servers [14:55:54] and replace them with varnish [14:56:26] drdee, ottomata : hangout? [14:56:27] because, afaik, there is not a squid kafka producer [14:56:40] i am working on the slidedeck [14:56:42] :) [14:56:46] 5 minutes [14:56:49] promise [14:58:00] G: Add udp-output to varnishkafka [14:58:53] Snaps: that only kinda helps us [14:59:01] as we'd have to run two instances of varnishkafka, right? [14:59:39] no, the same instance could output both kafka and udp [15:00:17] bwwaahHHH!hhhahh? [15:00:19] that would do it [15:00:57] that works [15:02:08] Snaps, wanna join us? [15:02:13] https://plus.google.com/hangouts/_/calendar/d2lraW1lZGlhLm9yZ19jYjM3bXU0OGNuaHRkN2hybmE4czI3b25hb0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t.c6j7qidqs491nhi7ovk9pi4h14?authuser=1 [15:02:22] please [15:02:24] https://plus.google.com/hangouts/_/calendar/d2lraW1lZGlhLm9yZ19jYjM3bXU0OGNuaHRkN2hybmE4czI3b25hb0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t.c6j7qidqs491nhi7ovk9pi4h14?authuser=1 [15:02:27] ack, no authuser for you [15:02:35] https://plus.google.com/hangouts/_/calendar/d2lraW1lZGlhLm9yZ19jYjM3bXU0OGNuaHRkN2hybmE4czI3b25hb0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t.c6j7qidqs491nhi7ovk9pi4h14?authuser=1 [15:02:36] ack! [15:02:51] https://plus.google.com/hangouts/_/calendar/d2lraW1lZGlhLm9yZ19jYjM3bXU0OGNuaHRkN2hybmE4czI3b25hb0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t.c6j7qidqs491nhi7ovk9pi4h14 [15:02:53] there. [15:06:30] ottomata: sorry, stuck with them little people at the moment [15:06:37] no probs [15:07:59] but I'll give you an estimate for G: 8 metric hours. It takes some refactoring for multiple formattings and outputs (which will be useful for other purposes as well) [15:08:36] iiiiiIIIinteresting [15:08:41] Snaps, I think that would make all of our lives really easy [15:11:31] Snaps: if we do that, is it time to change the name? [15:12:01] exactly what I was going to say [15:13:56] varnishkraken :D [15:29:09] tnegrin: Hi. I am having problems joining the hangout. [15:29:19] tnegrin: Which hangout are you in? [15:29:22] ok -- give me 2 mins [15:45:56] getting unexpected results for threshold [15:47:28] IOW it's still buggy, have to check where the problem is [15:48:58] how buggy? [15:49:09] can you do a small demo? [15:49:25] I can do a demo of the bug yes [15:49:36] it doesn't give the results I would've expected it to give back [15:49:41] mmmmmm [15:49:47] then we should not demo it [15:50:03] true [15:50:09] I am working on a fix [15:50:31] I don't expect to have it in time for the demo (deployment would also be required) [15:51:19] wait ! [15:51:26] I was reading the results wrong [15:51:29] it works ! [15:51:38] drdee_: ok can demo [15:51:40] hangout ? [15:51:43] sure [15:51:47] batcave? [15:52:33] yes [17:00:06] ottomata: I wanted to push my modifications for camus to origin, but noted that our camus fork lives on github. [17:00:15] ottomata: Would it be ok if I brought it to gerrit? [17:00:26] s/brought/bring/ [17:00:54] hm [17:01:16] hm, what do you need to push to camus? [17:01:33] i thought we were just going to add dependencies to kraken, eh? [17:01:53] We have to release the jars. otherwise we base our things on kaftka-0.8-SNAPSHOT [17:02:10] So we mant that to be kafka-0.8-SNAPSHOT-wmf-1 [17:02:18] s/mant/want/ [17:02:29] So we know which part got built against what. [17:03:00] drdee: you joining us on the hangout? [17:03:01] My changes are just 4 tiny patch sets on top of camus-kafka-0.8 [17:04:47] And we want JsonStringMessageDecoder to live under kraken-etl... [17:05:03] So maintaining our camus fork gets easier. [17:07:04] oh so it builds with Kafka in nexus? [17:07:32] hm, yeah i htink you can put it in gerrit then, as long as it is still possible to send pull requests to linked in from github [17:07:32] I just uploaded the kafka jars to our nexus. [17:07:38] cool [17:07:45] which kafka jars? [17:07:50] from camus or from our build? [17:07:58] The one that come shipped with camus. [17:08:01] aye ok [17:08:16] i think that's fine, we might want to get it directly from Kafka if we are going to put it in a repository [17:08:20] of course you have camus with kafka [17:08:20] but it is probably the same thing [17:08:35] http://kafka.apache.org/downloads.html [17:08:36] it's kafka-0.8 beta1 IIRC. [17:08:38] heheh [17:08:38] yeah [17:08:40] it is [17:08:48] they are pretty close to stable [17:08:50] at least, they keep saying that [17:08:57] :-) [17:09:14] Yes, we can bump kafka if we decide to. [17:09:46] drdee, I wouldn't mention adhoc debugging as a consumer in that email [17:09:51] we can do that with kafka [17:10:29] why not? [17:10:29] That's already fixed it just hasn't taken effect yet [17:11:53] it isn't relevant [17:11:56] ops will ask for clarification [17:12:09] we can consume the stream from kafka more easily than we can from the firehose [17:13:19] i think the point of that list is to give an overview of current consumers of the mobile stream [17:13:37] debugging is a current usecase [17:23:44] but it is the same before and after [17:23:52] no thought is needed as to what to do for it [17:26:24] (PS1) QChris: Update Hadoop to cdh4.2.0 [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90160 [17:26:25] (PS1) QChris: Tie version of child artifacts to version of parent artifact [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90161 [17:26:26] (PS1) QChris: Switch kafka groupId to org.apache.kafka [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90162 [17:26:27] (PS1) QChris: Bump SNAPSHOT version to wmf-1 [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90163 [17:27:33] (PS1) QChris: Add JsonStringMessageDecoder from camus' wikimedia branch [analytics/kraken] - https://gerrit.wikimedia.org/r/90164 [17:27:34] (PS1) QChris: Adapt JsonStringMessageDecoder to kraken conventions [analytics/kraken] - https://gerrit.wikimedia.org/r/90165 [17:27:35] (PS1) QChris: Cleanup unused imports [analytics/kraken] - https://gerrit.wikimedia.org/r/90166 [17:27:36] (PS1) QChris: Add basic test infrastructure [analytics/kraken] - https://gerrit.wikimedia.org/r/90167 [17:27:37] (PS1) QChris: Basic tests for JsonStringMessageDecoderTest [analytics/kraken] - https://gerrit.wikimedia.org/r/90168 [17:39:15] drdee: https://wikitech.wikimedia.org/wiki/Analytics/Kafka_Udp2log [17:39:22] OHHHH mgoodness :D [17:39:46] kool -- shouldn't' we express a preference? [17:39:52] i do in the text [17:40:00] G? [17:40:07] yeah, actually, i'll add that more explicitly [17:40:11] ty [17:41:36] generally the most preferred option should not be option G [17:41:44] perhaps you can make it option A? [17:41:58] or prioritize them in some other way [17:44:31] tnegrin: [17:44:32] Analytics would prefer a solution that keeps the udp2log firehose intact. Solution G. seems the most ideal, since it makes Ops and Analytics both the most happy. Solution D. would be satisfactory, but might be difficult to maintain in the longer term. Solution A. is fine with Analytics, but is not ok for Ops. [17:45:05] yeah that's fine - I'm just saying that people generally stop reading around option C [17:45:23] (it's always better to prioritize lists, since people have short attention spans) [17:45:26] oh you mean just put the bette roptions first? [17:45:35] yes [17:50:21] FYI guys, I have a sprint planning meeting that conflicts with the showcase today. I'm sorry to miss it. :( [18:01:51] average showcaes [18:02:02] https://plus.google.com/hangouts/_/calendar/d2lraW1lZGlhLm9yZ19jYjM3bXU0OGNuaHRkN2hybmE4czI3b25hb0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t.qji67gekhg91hg0qn0rcbd522s [18:21:03] (CR) Ottomata: [C: 2 V: 2] Update Hadoop to cdh4.2.0 [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90160 (owner: QChris) [18:21:18] (CR) Ottomata: [C: 2 V: 2] Tie version of child artifacts to version of parent artifact [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90161 (owner: QChris) [18:21:47] (CR) Ottomata: [C: 2 V: 2] Switch kafka groupId to org.apache.kafka [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90162 (owner: QChris) [18:23:03] (CR) Ottomata: "Hm, do we really have to change the version in all of these locations when we want to do a release?" [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90163 (owner: QChris) [18:35:30] (CR) Ottomata: [C: 2 V: 2] Add JsonStringMessageDecoder from camus' wikimedia branch [analytics/kraken] - https://gerrit.wikimedia.org/r/90164 (owner: QChris) [18:35:48] (CR) Ottomata: [C: 2 V: 2] Adapt JsonStringMessageDecoder to kraken conventions [analytics/kraken] - https://gerrit.wikimedia.org/r/90165 (owner: QChris) [18:36:09] (CR) Ottomata: [C: 2 V: 2] Cleanup unused imports [analytics/kraken] - https://gerrit.wikimedia.org/r/90166 (owner: QChris) [18:36:53] (CR) Milimetric: [C: 2 V: 2] Whitespace cleanup [analytics/geowiki] - https://gerrit.wikimedia.org/r/85624 (owner: QChris) [18:37:07] (CR) Ottomata: [C: 2 V: 2] Add basic test infrastructure [analytics/kraken] - https://gerrit.wikimedia.org/r/90167 (owner: QChris) [18:37:50] (CR) Milimetric: [C: 2 V: 2] Cleanup whitespaces [analytics/geowiki] - https://gerrit.wikimedia.org/r/85627 (owner: QChris) [18:39:59] (CR) Ottomata: [C: 2 V: 2] "AWESOOOMMME" [analytics/kraken] - https://gerrit.wikimedia.org/r/90168 (owner: QChris) [18:50:46] (CR) Milimetric: "(1 comment)" [analytics/geowiki] - https://gerrit.wikimedia.org/r/86183 (owner: QChris) [18:51:04] where is the planning meeting? [18:52:20] (CR) Ottomata: [C: 2 V: 2] Bump SNAPSHOT version to wmf-1 [analytics/camus] (wmf-1) - https://gerrit.wikimedia.org/r/90163 (owner: QChris) [18:52:42] (CR) Milimetric: "(1 comment)" [analytics/geowiki] - https://gerrit.wikimedia.org/r/86184 (owner: QChris) [19:08:21] Hi everyone. I couldn't find it on the FAQ (er let alone anything else, at this time) for Wiki Metrics - but, I haven't been able to upload a Catalan wiki cohort. Wiki Metrics keeps telling me it's part of something not supported. Does this mean other language wiki's aren't or…? [19:08:51] this is language code : ca [19:10:24] I guess I'll just file a bug. [19:45:34] http://lists.wikimedia.org/pipermail/analytics/2013-October/001078.html [19:45:39] we decided to do a 'quick-and-dirty' approach that we can deliver in [19:45:44] a single sprint (== 2 weeks). [19:50:12] DarTar: Do you get a weird error when you visit https://meta.wikimedia.org/wiki/Research:Metrics? [19:50:46] crap, I do [19:50:58] I don't get it for any other pages (so far as I can tell) [19:51:12] LOL when I finally have a little bit of time to hack on meta docs. [19:51:40] have you reported that? [19:52:09] Nope. Just hoping to have someone else confirm first. Where is the Right Place(TM) to report? [19:52:52] #wikimedia-tech (or #wikimedia-staff for a quick check, but two users already confirmed that) [19:54:02] Received: from localhost ([::1]:8779 helo=sodium.wikimedia.org) [19:54:02] by sodium.wikimedia.org with esmtp (Exim 4.71) [19:54:02] (envelope-from ) [19:54:02] id 1VRQBs-00019v-R4; Wed, 02 Oct 2013 17:26:25 +0000 [19:54:02] Received: from mail-lb0-x22c.google.com ([2a00:1450:4010:c04::22c]:34571) [19:54:03] by sodium.wikimedia.org with esmtp (Exim 4.71) [19:54:05] (envelope-from ) id 1VRQBp-00019e-4I [19:54:07] for analytics@lists.wikimedia.org; Wed, 02 Oct 2013 17:26:22 +0000 [19:54:09] Received: by mail-lb0-f172.google.com with SMTP id x18so1054986lbi.3 [19:54:11] for ; [19:54:13] Wed, 02 Oct 2013 10:26:20 -0700 (PDT) [19:54:32] drdee, drdee_:^ [19:54:53] DarTar: Thanks. I'll pursue there. [19:55:05] see, fast response from Reedy ;) [20:01:18] drdee, tnegrin: getting started with James in Browne [20:01:26] arriving [20:01:28] yes -- we're coming sorry [20:28:39] (CR) Milimetric: "(1 comment)" [analytics/geowiki] - https://gerrit.wikimedia.org/r/86185 (owner: QChris) [20:28:46] regarding queryable pageview data... the pagecounts dumps are pretty bad quality. lots of "false" hits and weird pages. [20:29:21] yes because 404's are not filtered out [20:29:21] :( [20:30:10] yeah, but even with 404 filtered there's some weird stuff going on [20:31:24] I use the data to run wikitrends on toolserver and I did a "most visited 2012" summary and there was some very strange pages that got unexplainable views counts [20:32:12] (CR) Milimetric: [C: 2 V: 2] "(1 comment)" [analytics/geowiki] - https://gerrit.wikimedia.org/r/85625 (owner: QChris) [20:32:59] can you share a gist? [20:35:05] most viewed page on dutch wikipedia "hua shan" got 12M views 2012. all of them during 1-2 weeks on july/aug: http://stats.grok.se/nl/201208/Hua%20Shan [20:35:21] second most visited page got 1M views... over the full year. [20:35:42] clearly bad data and not organic views. [20:35:54] (http://toolserver.org/~johang/2012.html) [20:37:05] there are many examples like it. the data source it not really to be trusted for serious work, sadly. [20:48:57] yeah johang, I definitely agree [20:49:06] I've seen some really weird stuff in those dumps [20:50:05] that's why using the dumps is only a first step because it's available right now and we can get something up and running. I have every intention of steering the project towards using our own, we believe more accurate, pageview definitions [20:50:06] We spent three months debugging it because we only had one month to build it [20:50:37] this includes adding mobile, cleaning up weird things like you're mentioning etc. So look out for when we say "and the data's now clean!" and keep us honest :) [20:55:19] milimetric: cool. it's good pagecounts get attention :) [20:59:57] I have aggregated pagecounts data in /mnt/user-store/johang/wikitrends4/buckets on toolserver. however I do lots of scrubbing and filtering to keep the storage down. I also do incremental updates to be able to refresh often. [21:03:21] what language do you use? [21:03:34] have you seen https://www.mediawiki.org/wiki/Analytics/Hypercube ? [21:05:36] johang: ^^