[00:29:55] ottomata: that's very exciting news! [01:11:13] drdee: http://www.youtube.com/watch?v=H20cKjz-bjw [04:38:52] (PS1) Milimetric: implements columnar timeseries csv [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/86068 [04:39:01] (CR) Milimetric: [C: 2 V: 2] implements columnar timeseries csv [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/86068 (owner: Milimetric) [13:03:25] yoyo [13:03:28] qchris!!!!!! [13:03:45] what can i do???? [13:04:01] you bash A* research like it is nothing :D [13:06:04] i was thinking we might be able to totally rethink our approach with geowiki using l-diversity [13:06:12] i don't think we should just 'apply' it [13:08:29] drdee, qchris, I may not have time to read that paper, but I agree with the rethinking approach [13:08:53] as we restructure how we collect and analyze data, we have to have *some* approach for anonimization through aggregation [13:09:12] yes agree [13:09:22] and k-anonymity is basically outdated [13:10:51] I'm optimistic based on the fact that some of our data is anonymous (like the daily active editors total on enwiki). So we just have to isolate what makes that true and figure use it to classify things as "allowed to share with public" and "not allowed to share with public" [13:11:20] *figure out how to [13:23:47] qchris: w0 tagging problem has been solved, fix has been merged so we should see a 10% drop in traffic quite soon [13:23:53] and that's expected :) [13:23:56] no worries [13:36:04] drdee: I am not bashing their research. On the contrary. I like their research. [13:36:16] i know [13:36:18] drdee: And I fully agree with their conclusions [13:36:30] Oh ok :-) [13:36:42] The wp0 fix is deployed already? [13:36:56] well it has been merged [13:36:59] Awesome! [13:37:10] so deployment should happen / has happened [13:37:15] depending on puppet [13:37:19] : i was thinking we might be able to totally rethink our approach with geowiki using l-diversity [13:37:52] batcave in 20 minutes? [13:37:56] breakfast first! [13:37:59] Ok. Cool. [13:53:23] drdee, domas' dataset doesn't have commons pageviews? [13:53:39] no [14:13:42] drdee, what's Emw in the Public API google doc? [14:14:09] a community member [14:21:46] average in tha house? [14:26:32] milimetric got a second for some mingle card questions? [14:26:36] i am in the batcave [14:26:44] coming [14:27:50] drdee: I'm here yeah [14:34:27] qchris check out giggle in action at https://mingle.corp.wikimedia.org/projects/analytics/cards/1166 [14:34:35] big props to milimetric! [14:34:58] Awesome! [15:56:00] ottomata: is this still relevant / or already done: https://mingle.corp.wikimedia.org/projects/analytics/cards/717 ? [16:00:59] was done, no longer relevant [16:03:30] we don't want icinga alerts? [16:04:28] cya later guys. https://gerrit.wikimedia.org/r/#/q/project:operations/puppet+status:open+message:statistics.pp,n,z [16:05:35] sweet mutante [16:05:56] yay retabbing! [16:05:57] thank you! [16:33:19] DarTar: I can't access https://trello.com/b/k5N0ivoM/research-and-data [16:37:10] Trello? I am looking forward to a 'tingle' :-) [16:47:31] don't worry that will happen [16:48:16] i wanted a bungle [16:51:43] bungle it is [16:51:57] ottomata: what about the icinga alerts as mentioned in #717 [16:52:00] are those enabled? [16:52:31] no because we repaved the thing [16:52:34] doesn't make sense anymore [16:52:38] need to do taht for 0.8 metrics [16:54:14] aight so we need to update the card :) [16:54:25] can you make the card kafka 0.8 compatible :D [16:55:48] guys have a look at this: http://www.slideshare.net/asavory/adaptto-2013-slinging-multichannel-content-the-browsermap-way-device-detection [16:55:57] it's a great intro to Devicemap [16:59:14] average: scrum [16:59:16] ottomata: scrum [17:44:02] DarTar: I can't access https://trello.com/b/k5N0ivoM/research-and-data [17:44:14] hmm weird [17:44:21] are you using your wmf account? [17:47:14] "Your email address is currently dvanliere@wikimedia.org. " [17:55:48] hm! evan just sent me this: [17:55:49] https://github.com/spotify/luigi [18:00:29] drdee_: Is story grooming in the batcave? [18:00:38] no [18:00:39] https://plus.google.com/hangouts/_/4c94605e6b89f0c413a17cf4d3e7e00a3f0ad979 [18:00:44] Ah sorry. [18:00:46] Thanks. [18:01:19] drdee: let me check [18:01:44] average: grooming https://plus.google.com/hangouts/_/4c94605e6b89f0c413a17cf4d3e7e00a3f0ad979 [18:01:47] ottomata https://plus.google.com/hangouts/_/4c94605e6b89f0c413a17cf4d3e7e00a3f0ad979 [18:02:19] drdee: I invited you to the board, let me know if that works [19:45:02] qchris: 714 lines! [19:45:10] Mhmm? [19:45:33] make_dashboards.py [19:45:42] but looks like something we could improve [19:46:00] :-D [19:46:37] Patches welcome :-) [19:50:36] bug reports also welcome :D [19:55:01] qchris: ^^ [19:55:44] :-D [19:56:22] If need arises, I'll file bugs. [19:56:39] I have some patches ready that need some cleanup. [19:56:53] I'll push them afterwards. [20:44:24] (PS1) QChris: Add .gitreview [analytics/limn] - https://gerrit.wikimedia.org/r/86169 [20:44:25] (PS1) QChris: Add .gitreview [analytics/limn] (develop) - https://gerrit.wikimedia.org/r/86170 [20:59:06] hm, yeah, I'm definitely not OK with limn being in gerrit [20:59:33] for a variety of reasons, and definitely not before we discuss it as a team [20:59:56] now there's https://github.com/wikimedia/analytics-limn and https://github.com/wikimedia/limn [21:00:02] this is bad qchris ^^ [21:00:16] milimetric: This is just what we have for many projects. [21:00:25] two repos? [21:00:33] milimetric: yes. [21:00:47] I'm confused, wikimetrics doesn't have that for instance [21:01:26] Let me find you some examples. [21:01:45] I mean specifically that theres wikimedia/limn and wikimedia/analytics-limn [21:02:08] and that limn has 51 stars and 13 forks while analytics-limn is forever trapped in a world that hates git-flow and all that is nice [21:02:32] puppet-cdh4 and operations-puppet-cdh4 [21:02:46] kraket and analytics-kraken [21:03:03] puppet-kafka and operations-puppet-kafka [21:03:04] ... [21:03:28] yeah, I really hate those as well [21:03:32] Stars and forks are nice and shiny, but looking at [21:03:35] https://github.com/wikimedia/limn/pulse/monthly [21:03:45] :-( [21:04:02] milimetric: Do not get me wronge. I am fine if we do not use it. [21:04:19] milimetric: I just do not want "we have no gerrit repo" blokc us. [21:04:33] it doesn't, we can just use pull requests to do review [21:04:44] and then it's really public and not just public to the few people that know how gerrit works [21:04:55] I do not know how github works. [21:05:22] guys batcave? [21:05:29] github doesn't try to change git at all. So if you know git, you know github [21:05:32] I prefer having my wmf-related activity tracked only by wmf and not by a third party company. [21:05:43] sure, I agree with that [21:05:54] Booting google machine. [21:06:02] but I prefer that this not mean we sacrifice how we develop [21:06:04] eg. git flow [21:06:27] which I've used a lot in the past on limn [21:06:43] gerrit supports git workflow + review :-) [21:06:53] and which we use currently, so that i'm not even sure what migrating it to gerrit did (we work on a master/develop paradigm) [21:07:25] gerrit doesn't support git basically, like branching properly and long-standing branches, as far as I know (but I'm bad at gerrit) [21:07:41] I've asked everyone including chad, but they all seemed to agree [21:07:42] The author of jgit develops gerrit. So if any program supports "git workflow", gerrit does. [21:07:52] : ^^^^^^^^^^^^^^^^^^ [21:08:02] https://plus.google.com/hangouts/_/007e7c89d52705592fc54d3309c75bac48029a07 [21:38:45] drdee, is it interesting to you [21:38:50] if I just sampled the udp2log stream [21:38:56] filtered for a single host (cp1048) [21:39:08] and looked for missing sequence numbers in about a 60 second period? [21:39:12] and found 5% loss? [21:39:16] yes [21:39:20] this is the full stream, not on a udp2log instance [21:39:31] that would scare me [21:39:46] this is what you found? [21:40:01] (PS1) Milimetric: finished csv output by timeseries [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/86179 [21:40:16] heh qchris, I feel so much cleaner now :) [21:40:18] remember, these tests are just me playing, not anything systematic [21:40:22] but yes, that is what I just found [21:40:28] milimetric: :-) [21:40:29] mmmmmmm [21:40:36] that would suck :( [21:41:03] qchris you can do this, which is neat: [21:41:04] git config remote.review.push refs/heads/*:refs/for/* [21:41:07] and then git push review [21:41:27] :-) [21:41:42] That looks scary if your using more than a view branches. [21:42:35] ok, drdee here is some good news though [21:42:52] santa is in town? [21:42:59] i'm pretty sure the reason there is so much data from the varnishkafka test hosts, is because they just have way more traffic [21:44:22] why is that? [21:46:12] well, more than mobile hosts [21:46:30] ahhh too many numbers flying by me right now [21:46:30] ok [21:46:38] here is what i am coming from [21:47:06] an03 is using almost 1.5T of disk space since I turned on production from cp1048 and cp3003 almost 24 hours ago [21:47:14] sweet [21:47:21] well, not really, that would be a bit of a worry [21:47:21] (lord) [21:47:28] if varnishes produce that much data [21:47:31] how much does the whole cluster? [21:47:34] how much does mobile? [21:47:48] so, i captured varnishkafka json data from cp1048 for 60 secs [21:47:51] well any easy solution is to drop all bits.wikimedia.org traffic [21:47:58] we don't care about that [21:48:02] and also cp1048 udp2log data for 60 secs [21:48:34] they are roughly the same size, when you take account for json field names [21:48:43] so those correlate [21:48:54] ahh i have to run in like 2 mins [21:48:59] k [21:50:04] ok yeah [21:50:05] and [21:50:40] one more question before you go: is the disk size measured on the broker or on HDFS? [21:50:54] (the broker will have compressed messages, but is just an interim storage I assume) [21:50:57] on broker [21:51:00] so it is compressed [21:51:09] if we have to drop bits.wikimedia.org traffic from varnishkafka then probably we cannot deprecate udp2log, we probably will want to have a sampled stream for bits traffic [21:51:10] ah i don't ahve time to crunch these numbers sorry [21:51:10] but [21:51:14] here's why i am less worried [21:51:28] in about the same amount of time [21:51:36] (PS2) Milimetric: finished csv output by timeseries [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/86179 [21:51:42] all 8 mobile hosts produced about the same amount (actually less) data than cp1048 [21:51:51] not even concerning varnishkafka [21:52:08] just looking at udp2log scoops [21:52:20] drdee: https://gerrit.wikimedia.org/r/#/c/84338/2 [21:52:33] (CR) Milimetric: [C: 2 V: 2] finished csv output by timeseries [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/86179 (owner: Milimetric) [21:55:41] drdee, the wikimetrics repo has the same problem with getting the latest git version as user-metrics had [21:55:56] if you or ottomata remember how you fixed that in apache, I'd greatly appreciate it [21:56:01] 1 sec [21:56:05] I think that might fix my filename problem as well [21:57:08] ok, i am out! [21:57:12] sorry, i gotta run [21:57:17] i will help tomorrow if i can milimetric [21:57:18] laaaata [21:57:21] k, no prob [21:57:23] later ottomata [21:57:40] drdee_: whats bits.wmf? [21:58:29] Snaps_: it's just css / js / static resources [21:58:37] ah, o [21:58:38] k [21:58:54] (PS1) QChris: Remove '100+' and 'all' cohorts from active editors graph [analytics/geowiki] - https://gerrit.wikimedia.org/r/86183 [21:58:55] (PS1) QChris: Add known issues of active editors total to graph description [analytics/geowiki] - https://gerrit.wikimedia.org/r/86184 [21:58:56] (PS1) QChris: Drop callout widget from active editors total [analytics/geowiki] - https://gerrit.wikimedia.org/r/86185 [22:24:01] tnegrin: yt? [22:24:19] ya -- talking to dan, 5 mins [22:24:24] k cool [22:40:04] later everyone, have a good night