[09:39:36] Analytics-EventLogging, Analytics-Kanban: Troubleshoot EL performance problems on 2015-05-06 and backfill missing data - https://phabricator.wikimedia.org/T98588#1271960 (mforns) NEW a:mforns [09:58:06] Analytics-Tech-community-metrics, ECT-May-2015: Maniphest backend for Metrics Grimoire - https://phabricator.wikimedia.org/T96238#1271975 (sduenas) >>! In T96238#1269748, @chasemp wrote: >>>! In T96238#1269704, @sduenas wrote: >> We have developed the first version of Maniphest backend. This means we're n... [11:22:57] Analytics-Kanban, operations: Event Logging data is not showing up in Graphite anymore since last week - https://phabricator.wikimedia.org/T98380#1272092 (fgiunchedi) @milimetric no problem -- I'll give a bit more context :) we've switched statsd implementation from txstatsd to statsite for performance/eff... [11:30:54] goood morning [11:31:12] Morning Ironholds [11:31:22] hey joal :). How goes? [11:31:29] Not bad :0 [11:31:40] Man, I continuously miss my smileys :) [11:31:43] hahah [11:31:50] Will stop to type them ! [11:31:56] you ? [11:33:37] pretty good! I couldn't sleep so I built a data visualisation platform for search instead [11:33:45] still all prototypey and shit but https://upload.wikimedia.org/wikipedia/commons/0/06/Dashboard_example.png [11:35:25] Nice ! [11:35:49] What front end tols do you use ? [11:36:46] bootstrap + Angulerjs ? [11:36:50] Ironholds: --^ [11:36:57] I really looks good :) [11:37:05] joal, thank you�! [11:37:06] s/I/It [11:37:16] want the crazy thing? [11:37:21] tell me [11:37:21] that's not JavaScript [11:37:24] Cool [11:37:28] Pure html ? [11:37:43] nope [11:37:44] ;) [11:37:48] huhu [11:37:53] joal, what is the first rule of Oliver? [11:38:00] c++ :) [11:38:03] "If it can be written in R, Oliver will write it in R" [11:38:10] "If it cannot be written in R, Oliver will still write it in R" [11:38:12] https://github.com/Ironholds/search-visualisations tadaaaa [11:38:28] Nice :) [11:38:51] The other question I have is: where do you get the data from ;) [11:39:29] at the moment it's a manually gathered dataset; it'll be EventLogging aggregates when I'm done :) [11:39:40] ok [11:39:45] Based on sampled data then [11:39:52] makes sense :) [11:40:04] maybe? I'm not sure; there are so many events it's plausibly unsampled [11:40:08] (2m in <1 month) [11:40:17] hmm [11:40:24] but I will ask Deskana when he wakes up :) [11:40:25] If it's eventLogging, I think it's sampled [11:40:35] But I might be wrong :) [11:40:46] eventually it'll be the same as limn in collection terms (script grabs it from log.schema_name, sticks it in /public-datasets/, rsyncs across) [11:41:57] I really do hope that at some point we'll manage to get data out in an easier way ... [11:42:29] agreed [11:42:35] As milimetric said yesterday: being both infrastructure and having a product to build is sometime overwhelming :) [11:42:36] like, you know what I'd love? [11:42:47] tell me [11:42:51] if we could rsync from labs instances to $directory, somehow [11:43:03] instead of $directory to datasets.wikimedia.org/$directory [11:43:10] and then a manual call to stream it over the internet, and... [11:43:18] because that's the hard part at the moment [11:43:32] (in my wildest dreams I would like eventlogging replicated to a closed platform on labs, but like that'll happen ;p) [11:43:37] my view would be to provide data api style [11:43:45] You request an url, get CSV back [11:43:51] Would love that [11:43:54] TSV! But otherwise agreed [11:44:00] huhuhu :) [11:44:23] arf, anyway, not ready yet [11:44:40] Thanks for building cool visualization :) [11:44:52] I am a backend guy, but love when data is beautifull [11:47:01] ditto, actually [11:47:11] I normally get my kicks making things fast [11:47:43] but I got tasked with a visualisation platform and the alternatives are more painful to stand up [11:47:53] this shouldn't be understood as a victory of brilliance so much as a victory of sheer laziness [11:49:10] Even if I understand the thing, I think getting things done, one way or anoither, is always good [11:49:45] Maybe not perfect, or as good as it could have been, or even as good as we would have liked it to be, but good nontheless :) [11:49:53] non-action I can't stand :) [11:50:07] Ironholds: will be back, got to lunch :) [11:50:54] take care! [11:56:22] (CR) Mforns: "This was a tricky change :] Congrats" (2 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/206346 (https://phabricator.wikimedia.org/T78339) (owner: Madhuvishy) [13:35:00] Analytics-Kanban, Patch-For-Review: Split up language reportcard queries, data files, and graphs - https://phabricator.wikimedia.org/T98532#1272268 (Milimetric) Open>Resolved Looks like the graphs are both updating again: http://language-reportcard.wmflabs.org/ The data between April 30 to May 8 is... [13:38:17] Analytics, Mobile-Web, Mobile-Web-Sprint-46-Taken:-The-Dan-Garry-Story: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1171644 (phuedx) [13:38:27] Analytics, Mobile-Web, Mobile-Web-Sprint-46-Taken:-The-Dan-Garry-Story: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1272280 (phuedx) Open>Resolved [14:04:32] Morning ottomata [14:05:07] little feedback on yesterday's deploy: I forgot an hour of refine when restarting jobs :0;2~( [14:05:50] backfilled it this morning, back to normal (still computing legacy tsvs though) [14:13:14] ottomata: I'd also like to discuss a bit on the code review for hadoop memoty :) [14:13:26] when you have time, obviously [14:15:34] mooorning [14:15:41] i saw that! :) in the report. but ja [14:15:55] but then checked this mornign and saw taht you fixed it :) [14:16:00] Yup [14:16:06] Sorry about that :) [14:16:20] oh np, i do that all the time :) [14:16:35] umm, joal, ja, i don't plan on applying that hadoop stuff for a while [14:16:40] Yeah, But i prefer solving than causing problems [14:16:40] since today is friday, and i'm not working next week really [14:16:43] until the end of the week [14:17:12] Yeah, no problem, it's more about how this things work together, to make sure we are on the same page [14:39:46] joal: ja let's talk after standup, ja? [14:48:20] sounds good ottomata [15:01:04] Analytics-Tech-community-metrics, ECT-May-2015: Maniphest backend for Metrics Grimoire - https://phabricator.wikimedia.org/T96238#1272418 (chasemp) >> >> can you tip us off when this process begins so we can be on the lookout? > > @chasemp, we already started. If you find any problem, please let me kno... [15:07:44] yikes, we are low on worker nodes right now [15:07:47] so much data! [15:08:09] wow [15:08:10] the balancer stopped runnning when we restarted cluster or something [15:08:13] Didn't notice .. [15:08:15] i just started it manually [15:08:17] me neither [15:08:19] 5 are out [15:08:19] k thaks [15:08:33] i have it confiured to launch once a week, but i think maybe we need to run it once a day or osmething [15:08:45] still though [15:08:48] cluster is kinda full [15:08:52] might need to prune more data somewhere [15:08:56] maybe keep less raw [15:12:12] yeah, we could [15:12:32] Maybe we can also ask people to clean a bit their folders [15:12:44] 2 are fairly big [15:13:41] oh yes we should [15:13:42] how big? [15:13:47] and then it's raw and refined webrequests [15:15:53] balancer works well: already 1 node back in the game [15:39:06] Analytics, Ops-Access-Requests, operations: Access to stat1003 for jdouglas - https://phabricator.wikimedia.org/T98209#1272507 (coren) @tfinc, I need manager approval language, if you please. [16:01:22] mforns, oh, btw [16:01:28] Ironholds, yep [16:01:28] that was the best timed EventLogging outage ever [16:01:49] I'm building a dashboarding system for the search and discovery team and I needed something to demonstrate, hey, we can easily and conveniently list outages that impact data quality [16:01:49] Ironholds, what do you mean? [16:02:01] Ironholds, oh! I see [16:02:04] and just as I was trying to think of an example, there was an actual EL outage and accompanying report :D [16:02:29] (https://upload.wikimedia.org/wikipedia/commons/0/06/Dashboard_example.png is a screenshot of the prototype. Preeeeetty) [16:02:29] Ironholds, I'm glad, hehehe [16:03:50] cool [16:11:20] joal talk now? [16:11:27] actually, hm, gimme 2 mins [16:12:18] in batcave [16:15:19] joal you got 20 mins! then i have to go feed the parking meter [16:15:20] :) [16:40:22] hmm [16:40:32] Sorry ottomata, got past the timing :( [16:40:37] After parkmeter ? [17:01:22] joal: ja few mins [17:01:30] np [17:07:46] ok joal, to the batcave! [17:09:53] go ! [17:24:35] Analytics-Engineering: Hive error accessing block - https://phabricator.wikimedia.org/T98622#1272704 (Ironholds) NEW [17:24:40] ottomata, https://phabricator.wikimedia.org/T98622 a fun bug for yinz :D [17:32:03] joal: I'm around. Ping me when you are free :) [17:32:33] milimetric: I added WMF-Last-Access to https://wikitech.wikimedia.org/wiki/X-Analytics#Keys. Not sure what team and contact person should be [17:33:16] madhuvishy: you can put Analytics for team and me or you for contact [17:33:31] milimetric: cool [17:36:14] Ironholds: when did you get that? [17:40:46] Ironholds: try your query again, we had a few datanodes offline a couple of hours ago, i think it will work now though [17:52:10] hey madhuvishy [17:52:19] joal: Hi [17:52:58] wanna talk ? [17:54:14] (CR) Madhuvishy: Fixing encoding on json responses at the encoder level (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203505 (https://phabricator.wikimedia.org/T93023) (owner: Nuria) [17:54:31] joal: sure [17:58:59] Analytics-Engineering, Wikimedia-Logstash: Zookeeper logging to Logstash - https://phabricator.wikimedia.org/T84908#1272834 (Gage) p:Triage>Low [17:59:22] Analytics-Engineering, Wikimedia-Logstash: Kafka logging to Logstash - https://phabricator.wikimedia.org/T84907#1272836 (Gage) p:Triage>Low [18:24:14] milimetric: yt? [18:24:21] hey [18:24:32] you wanted to play with spark streaming? for maybe el stuff? [18:25:20] well, maybe, but right now I'm trying to finish other stuff [18:25:37] sure [18:25:44] not now, but i'll be out most of next week [18:25:48] spark streaming seems pretty straightforward, except how it was dying after a few minutes when we tried it [18:25:52] and i got a standalone streaming cluster [18:25:55] using 1.3 [18:25:59] in prod? [18:26:00] 1.3 is suppsoed to be better ? [18:26:14] ah, cool, how do we use it? [18:26:18] ja, 3 nodes, but 24 cores each and ~192 G of mem each [18:26:19] all for spark : ) [18:26:29] these are the old ciscos that no one wants to use in prod [18:26:32] cause they flake out sometimes [18:26:37] but they are beefy [18:26:37] right, i remember those things [18:26:45] from stat1002, do [18:26:50] spark-shell --master spark://analytics1003.eqiad.wmnet:7077 [18:27:05] and whatever else you need. [18:27:12] i noticed that the kafka streaming stuff isn't included by cdh [18:27:20] so, i dled the dep jars to my home dir on stat1002 [18:27:28] i'm doing [18:27:29] cd /home/otto [18:27:30] spark-shell --master spark://analytics1003.eqiad.wmnet:7077 --jars ./kafka_2.10-0.8.1.1.jar,./spark-streaming-kafka_2.10-1.3.0.jar,./metrics-core-2.2.0.jar [18:28:01] you want to use the direct stream approach, not the reciever approach [18:28:01] https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html [18:28:27] milimetric: i'm considering seeing if i can fire up an eventlogging consuemr in prod somewhere and produce all-events (or maybe just client?) to kafka [18:28:38] is that worthwhile? [18:28:53] thinking [18:29:21] how would you produce them to kafka? Would it take up EL capacity? [18:30:47] i would consume from the zmq stream [18:30:50] but not on eventlog1001 [18:30:52] maybe hafnium [18:30:59] maybe stat1001 (or one of the ciscos?) [18:31:02] sorry [18:31:02] stat1002 [18:32:50] hm, consuming everything seems like it would use up a bunch of resources and we don't have immediate plans to work with the result [18:33:02] hm [18:33:07] If you want to try it out, I'd do a small schema [18:33:12] like Navigation Timing or something [18:33:13] hm [18:33:15] ok [18:33:21] can I consume just a small schema? [18:33:24] no, they are all in the same stream [18:33:49] milimetric: fwiw, we used to to this, maybe a year ago EL -> kafka [18:33:55] nobody every used it so we stopped [18:34:03] yeah, i remember [18:34:09] but now the volume is higher [18:34:13] and it spikes sometimes like yesterday [18:34:19] uh whoa, load is low on eventlog1001 [18:34:30] i think the problem is mysql no? [18:34:33] not eventlog1001? [18:34:54] mysql's still a problem, we have another patch idea that should make it really really fast though [18:35:09] but yea, right now eventlog1001 is fine [18:35:17] why do the volume spkes cause holes? [18:35:31] right now, inserting is still kinda slow [18:35:42] but we should theoretically be able to go 10-100x faster [18:36:05] aye, ok, so, it sounds like consuming for kafka won't hurt then? [18:36:10] if mysql insertion is the problem [18:36:34] no, i don't think it'll hurt, more just that it won't help too much right now [18:36:46] maybe next week we'll be ready to start playing with it though [18:36:54] right but i won't be around next week :) [18:37:13] right. [18:37:19] that's ok, we'll just clean house and get ready [18:37:20] btw, i'd totally be fine with just consuming a schema, but i can't do that with 0mq, right? [18:37:26] then when you get back we can all tackle this togeter [18:37:37] daw, i want you to play with spark streaming while i'm gone! :) [18:37:51] the 0mq stream will give you everything, yea, you could check the schema with grep or something [18:38:06] right, but at that point it wouldn't affect eventlog1001 at all [18:38:12] if i was running the el consumer for this elsewhere [18:38:25] my head's a bit explodey lately, sorry I'm not being any fun here :) [18:38:33] haha, ok [18:39:17] I think we should be able to do most of this without you around, though [18:39:43] the 0mq stream is accessible just to play around with, so I should be able to hook it up to spark? maybe... [18:40:05] you want to use the kafka consumer to play i think [18:40:19] ottomata, volume spikes cause holes because the buffer in the EL consumer gets so big in memory that at some point the system kills it and all what was inside gets lost [18:40:28] in the consumer ok [18:40:41] but not in 0mq itself (don't know how 0mq works) [18:40:47] you are saying in the python consumer process, yes? [18:40:54] oh! [18:41:05] I thought it was the mysql batch buffers [18:41:14] ottomata, no, I suspect there is also zmq data loss, but it does not create gaps [18:41:42] well, milimetric, if you could insert to mysql faster, you wouldn't have such a large buffer in the python consumer [18:42:03] mforns: would you see a problem if I set up an eventlogging consumer (probably on hafnium?) to produce all-events to kafka? [18:42:34] milimetric, yes, exactly the buffer gets large because the consumer can't insert as fast as it receives events [18:42:53] ottomata, I think this would be totally independent. right? [18:43:09] yes, except i'm not sure how zmq works [18:43:10] yes [18:43:16] not sure if another consumer would cause zmq load problems [18:43:24] ottomata, oh... [18:43:35] i tend to think it won't [18:43:42] there really is very little traffic here [18:43:59] making zmq send a little seems like a drop in the bucket [18:44:22] i guess so, it's only at most 1000 per second [18:44:23] ottomata, I am with you. With this last issue, we learned that log consumer is able to write 2x-3x more events than normal without problems... [18:44:41] milimetric, yes [18:46:31] mforns: you have sudo on hafnium, yes? [18:46:36] or some kind of privilege? [18:47:18] yes you do! [18:47:19] cool [18:47:27] ottomata, yes I have root [18:47:33] ok [18:47:41] oh, how did you know before me? :] [18:47:45] looked in puppet [18:47:46] :) [18:48:00] good, milimetric does too [18:48:02] ok [18:48:06] on hafnium right now [18:48:11] i am running in a screen as my user [18:48:12] eventlogging-consumer 'tcp://eventlog1001.eqiad.wmnet:8600?socket_id=otto-test' 'kafka://?brokers=analytics1012.eqiad.wmnet,analytics1018.eqiad.wmnet,analytics1021.eqiad.wmnet,analytics1022.eqiad.wmnet&topic=eventlogging-all' [18:49:44] if that causes you any troulbe [18:49:47] just kill it [18:49:53] k, cool [18:50:07] ottomata, ok [18:50:17] note to self: otto is running an extra eventlogging-consumer on hafnium [18:50:23] ottomata: so where's the data end up [18:50:44] buffered in kafka, dropped, consumed into hdfs? [18:53:07] just in kafka [18:53:13] if you want it, you consume from kakfa [18:53:18] using spark streaming :D [18:53:22] or kafkacat maybe? [18:53:31] example comin atcha in a gist shortly [18:58:51] (CR) Mforns: Fixing encoding on json responses at the encoder level (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203505 (https://phabricator.wikimedia.org/T93023) (owner: Nuria) [19:01:13] mforns: also do you have input files that i can use to reproduce the bug? [19:02:32] madhuvishy, the way to repro the bug is to change the name of the wiki_users to something with utf-8 chars [19:02:53] madhuvishy, do you know how to create fake users with vagrant wikimetrics? [19:03:09] mforns - not really [19:03:16] madhuvishy, you can: localhost:5000/demo/create/fake-enwiki-users/10 [19:03:33] and this will create 10 wiki users for enwiki [19:03:39] mforns: ooh cool [19:05:04] madhuvishy, after that, you can change their names to i.e. التصني and try to create a report with Configure Output -> Individual Results checked [19:05:38] madhuvishy, yopu only need to change one of them really, and then when the report is done, look at the json data [19:06:06] madhuvishy, the utf-8 chars should be displayed as unicode points [19:06:14] mforns: alright [19:06:23] trying that now [19:07:29] mforns: can I ask you some EL schema questions? [19:07:29] ottomata, sure [19:07:30] so, the schemas on meta [19:07:31] like this one [19:07:35] https://meta.wikimedia.org/w/index.php?title=Schema:NavigationTiming&action=edit [19:07:40] in the event json [19:07:47] is actual the schema inside the event: field [19:07:49] is that correct? [19:08:35] ottomata, you mean that the schema that you passed represents the 'event' field in the log json? yes [19:08:48] yes [19:08:53] hm ok [19:09:10] ottomata, wrapping the schema there is the EventCapsule, which is information common to all schemas [19:09:12] is there a way to get the full jsonschema definition of an event? [19:09:39] ottomata, yes, EventCapsule is also a schema, you can view it in Schema:EventCapsule [19:09:55] https://meta.wikimedia.org/wiki/Schema:EventCapsule [19:10:42] ok, so i have to wrap it manually? [19:10:45] ottomata, I'm not sure, but I'd say that part of the EventCapsule is populated by the EL clients, and the other part is populated by varnishncsa? [19:10:48] meta is acting sort of like a schema API, right? [19:11:00] ottomata, yes [19:11:04] is there a way to query it to get a full schema [19:11:04] ? [19:11:07] with the wrapper? [19:11:20] ottomata, mmm I don't know [19:11:26] I guess not [19:11:43] ah hmmmm [19:11:47] eventlogging python i think does it [19:11:50] def get_schema(scid, encapsulate=False): [19:11:52] CAPSULE_SCID = ('EventCapsule', 10981547) [19:12:01] ottomata, aha makes sense [19:12:11] capsule['properties']['event'] = schema [19:12:12] hm [19:12:14] k [19:12:17] hm [19:12:31] xD [19:14:11] and, mforns, how do I know the schema_id [19:14:13] is it in meta>? [19:14:29] ottomata, yes below the schema title [19:14:43] ottomata, I also tried to get it through the api, but it does not come... [19:14:47] ah revision [19:14:48] cool [19:15:05] (CR) Madhuvishy: [C: 1] "Tested the change and it works fine. (I can't +2 though, no rights :))" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203505 (https://phabricator.wikimedia.org/T93023) (owner: Nuria) [19:15:33] Coool [19:15:34] eventlogging.schema.get_schema(('NavigationTiming', 10785754), True) [19:15:39] ottomata, it would be cool if getting the current (last) revision from a schema, would give you the revision id, but I could not find it [19:17:57] yeah that would be nice [19:18:49] cool [19:19:37] https://gist.github.com/ottomata/44c35f3c14571fd00fac [19:19:39] got it [19:19:44] madhuvishy, did you try to merge the code of https://gerrit.wikimedia.org/r/#/c/203505/3 ? [19:20:03] mforns: yes [19:20:15] mforns: ummm, i did in my local [19:20:45] and tested with and without [19:21:06] madhuvishy, shouldn't I change what we discussed in the comments before? [19:21:30] mforns: oh sure. i just thought it won't break it. [19:21:40] i can review again after no? [19:21:47] madhuvishy, sure :] [19:22:34] madhuvishy, the usual is however, if you want some changes to be made still, you give a -1 [19:22:52] madhuvishy, don't be afraid of -1ing folks :] [19:23:03] mforns: what is a very simple schema that has a decent number of events in it? [19:23:04] mforns: aah. okay. I am not afraid - i feel bad to :P [19:23:23] madhuvishy, I know, me too [19:23:31] ottomata, mmmm [19:23:32] mforns: but yeah if something's wrong - i might. but this was only style suggestions. anyway, cool :) [19:24:07] ottomata, what about: https://meta.wikimedia.org/wiki/Schema:MobileWebUIClickTracking [19:24:20] hmm, not bad i think [19:24:21] lemme see [19:24:45] ottomata, that's the one with most volume now, and it's quite simple [19:26:59] ok cool [19:32:01] (PS4) Mforns: Fixing encoding on json responses at the encoder level [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203505 (https://phabricator.wikimedia.org/T93023) (owner: Nuria) [19:34:38] joal, yt? [19:40:31] (CR) Madhuvishy: [C: 1] "Tested this and it works fine." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203505 (https://phabricator.wikimedia.org/T93023) (owner: Nuria) [19:41:33] madhuvishy, while you can not merge this change, I will self-merge it with your consent, ok? [19:41:44] mforns: yup, alright [19:41:48] cool [19:50:18] hey mforns, I'm here :0 [19:50:20] :) [19:52:02] ottomata, milimetric, I followed your discussion on EL / spark-streaming -> That's awesome :) [19:56:04] joal, I wanted to ask you if I can do something on EL+Kafka this weekend? [19:56:17] mforns: Weekend really ? [19:56:32] joal, I may be backfilling EL [19:56:40] You sure :) ? [19:56:44] https://github.com/blog/1995-github-jupyter-notebooks-3 btw, has references to ewulczyn’s work :) [19:56:48] halfak: ^ [19:56:49] xD [19:57:19] joal, ok, so I could not do anything today... [19:57:34] Yeah, me almost neither [19:57:36] joal, should we continue monday morning? [19:57:53] For sure :) [19:57:57] ok [19:58:08] And we also have spark streaming with Andrew's setup ;) [19:58:18] :] [19:58:23] haha, uhhh,, non produciton, don't forget! [19:58:28] but, all-events are now in kafka [19:58:32] non production! [19:58:32] Next week I take over EL, so I'll do it anyway :) [19:58:33] :) [19:59:10] I know, but even non-production, some fun can be done :) [20:00:34] milimetric: do you have a minute for me ? [20:00:50] joal: sure! [20:00:54] batcave ? [20:00:57] omw [20:02:25] (CR) Mforns: [C: 2 V: 2] Fixing encoding on json responses at the encoder level [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203505 (https://phabricator.wikimedia.org/T93023) (owner: Nuria) [20:04:51] h [20:59:29] joal: still around? [20:59:34] yup [20:59:49] joal: it looks like this code is pretty outdated - https://gerrit.wikimedia.org/r/#/c/182350/3/oozie/mobile-apps/generate_daily_uniques/workflow.xml [20:59:50] oozifying :) [21:00:04] It is for sure [21:00:24] joal: ha ha. the shell action archiving is already in utils i see [21:01:29] I think this function is needed to prevent our customers (some product owners) to have to get to the cluster to find some small data [21:01:42] The idea is to send them by email if it's small enough [21:01:54] joal: ah right [21:02:12] joal: but email will be a separate module right? [21:02:19] correct :) [21:02:32] okay [21:04:08] joal: do you have a sample command of how you'd run the daily mobile-apps uniques job through oozie cli without destroying production data? [21:04:48] not easy [21:05:14] It would mean creating a test hive database, and use it as base database [21:05:22] not that hard though :) [21:05:33] But I am not sure you need to that :) [21:05:48] You can test emailing outside of any other job [21:08:59] sigh [21:09:00] https://gist.github.com/ottomata/b440376274637aa63867 [21:09:05] joal, milimetric ^ [21:09:09] NOT what I wanted [21:09:11] i wanted something cooler [21:09:21] i'm not sure why I can't filter on schema! [21:09:31] i keep getting TaskNotSerializeable and ArrayOutOfBounds crap [21:09:36] but, it is time for me to head out. [21:10:01] ottomata: Will try to get it working next week with milimetric ;) [21:10:07] Cool, it's ok, we're gonna rock this next week [21:10:08] thanks for the heads up ! [21:10:11] Thx otto [21:10:25] what we really need [21:10:33] is a way to use those jsonschemas in scala [21:10:40] that would be awesome [21:10:50] probably have to use the java impl. [21:10:53] but that's fine i guess [21:10:58] what would be amazing, is if we could figure this out [21:11:46] this i thin [21:11:46] https://github.com/julianpeeters/case-class-generator [21:11:55] can't say I understand that at all [21:11:58] but, if we could do [21:12:04] jsonschema -> case class in scala [21:12:18] then we could use jackson to parse and validate the messages [21:12:21] Not sure that's what we want thoughj [21:12:26] maybe not! [21:12:31] or just validate? not sure. [21:12:41] for parsing/validating --> jackson / json4s [21:12:44] if possible [21:12:46] would be realllly nice to have case classes in scala run time based on jsonschema [21:12:54] validate first, then parse only if needed ! [21:12:54] i think jackson makes you have a case class [21:13:10] or, at least it is really hard to work with otherwise [21:13:11] validate == parsing [21:13:11] no? [21:13:21] how you gonna validate w/o parsing? [21:13:29] yes, but do we need to convert them into case class ? [21:13:43] i think jackson makes you [21:13:47] or, it is hard to use jackson w/o [21:13:53] would be nice if we could do that automatically [21:14:00] i wouldn't dare try to do it manually [21:14:01] Should be able to [21:14:03] too many schemas [21:14:14] according to this description [21:14:14] https://github.com/julianpeeters/case-class-generator [21:14:16] that will do it [21:14:19] but i do not understand :) [21:14:35] would be similar to https://github.com/julianpeeters/avro-scala-macro-annotations [21:14:40] since avro schemas are defined in JSON anyway [21:14:51] woudl be so awesome: [21:14:52] @AvroTypeProvider("data/input.avro") [21:14:52] @AvroRecord [21:14:52] case class MyRecord() [21:14:56] but insetad [21:15:13] @JsonSchemaTypeProvider("schemas/X.json") [21:15:13] case Class X() [21:15:19] dunno if that is possible [21:16:21] or maybe not, dunno [21:16:37] joal: fyi, the kafka messages keys are Schema_Revision [21:16:40] which is awesome! [21:16:50] Yeah, it is :) [21:16:56] Well done dude :) [21:17:02] joal: sorry was away for a minute. yeah i just want to test emails. i think i understand what you're saying. write just the simple email module and send test stuff through it. [21:17:09] so you can filter on keys without parsing any json [21:17:13] IF IT WORKED [21:17:16] :) [21:17:20] you can def print out keys, that is no problem, dunno why i can't filter [21:17:22] madhuvishy: exactly [21:17:25] i must be doing something dummy [21:19:14] ok, laters all! [21:19:19] have a good weekend! [21:20:21] Have a nice bike ride! [21:34:55] (PS10) Milimetric: Make the gulp build layout specific [analytics/dashiki] - https://gerrit.wikimedia.org/r/204951 (https://phabricator.wikimedia.org/T96337) [22:00:17] (PS11) Milimetric: Make the gulp build layout specific [analytics/dashiki] - https://gerrit.wikimedia.org/r/204951 (https://phabricator.wikimedia.org/T96337) [22:00:29] woo! done with that mess [22:00:36] dashiki is kinda spiffy now! [22:00:45] have a nice weekend everyone [23:15:32] Analytics: Update http://reportcard.wmflabs.org/ with March 2015 dump data - https://phabricator.wikimedia.org/T97379#1273763 (Tbayer) [23:33:04] Analytics-Kanban, VisualEditor, Editing Department 2014/15 Q4 blockers: Schema:Edit seems to incorrectly set users as anonymous {lion} - https://phabricator.wikimedia.org/T92596#1273830 (Jdforrester-WMF)