[00:00:14] (PS1) Milimetric: Fix gulpfile lint and fonts [analytics/dashiki] - https://gerrit.wikimedia.org/r/216013 [00:01:27] ottomata: hey that's pretty cool [00:01:29] kevinator: https://phabricator.wikimedia.org/T45250 [00:01:43] (CR) Milimetric: [C: 2 V: 2] Fix gulpfile lint and fonts [analytics/dashiki] - https://gerrit.wikimedia.org/r/216013 (owner: Milimetric) [00:02:40] milimetric: not sure it works yet, but i think it will.... [00:02:41] :) [00:03:09] kevinator: https://vital-signs.wmflabs.org/ [00:03:10] :) [00:03:11] yay [00:03:21] (the metrics can now be configured fully from the wiki) [00:03:36] so whenever the new pageviews are up, we need 0 code [00:04:12] ottomata: that's very cool. I'd still want to have two topics, one valid and one invalid, then split up from there into the per-schema topics [00:04:19] but that should be trivial if you make this work [00:05:33] aye [00:21:57] milimetric: i need to produce some test eventlogging data in labs [00:21:59] what do I do?! [00:23:16] Use the server/tests/test_load.py and adjust the hardcoded target endpoint [00:23:45] There's no docs on it but just check out the argparse code, pretty self explanatory [00:24:00] I can help more tomorrow but gotta finish dinner :) [00:24:09] ottomata: ^ [00:24:21] cooOOl [00:24:25] oh this is perfect! [00:24:29] thank you! [00:25:49] oh it uses event.gif? [00:25:54] not directly to eventloggin [00:25:57] hm, thats' ok i can make this happen [00:26:51] milimetric: very cool. I immediately started using it in discussions on what's going on with Mega pageviews [00:27:04] :) yay [00:27:13] It's central notice's fault of course [00:27:32] milimetric: er I mean pageviews on meta not mega [00:28:50] milimetric: https://vital-signs.wmflabs.org/#projects=enwiki,metawiki/metrics=DailyPageviews (webstatscollector) [00:29:21] Heh, yeah [00:29:46] Hey this works ok on mobile, but there seems to be some race condition with the metric now [00:37:59] kevinator: able to point me to the uniques app table or add me to the sheet? was skimming the scala madhuvishy had posted, but hard grokking on a phone [00:41:45] dr0ptp4kt: the tables are... [00:41:50] wmf.mobile_apps_uniques_daily [00:42:20] wmf.mobile_apps_uniques_monthly [00:44:16] kevinator: thanks! [00:45:23] dr0ptp4kt: you should get an account on Hue [00:45:36] it's a web gui for hive [00:46:01] I can show you when you're in the office [00:50:38] kevinator: yeah, been a while since i proxies to it. you around tomorrow? I'm usually just tunneling to the hive cli, but I suppose I could join the html ui crowd ;) [00:51:25] dr0ptp4kt: hue.wikimedia.org [00:51:29] log in using your shell username and your ldap pw [00:51:57] the UI is good for browsing tables, looking at jobs and running simple queries. It sometimes fails to run bigger jobs (which may need more heapspace) [00:53:08] HWHAT!? that's too easy. [00:53:38] alright, thanks gents. I'm gonna wrap up, but good to know I can use that! [00:54:43] peace kevinator and ottomatta [01:05:57] laterrrs [06:56:03] Analytics-Tech-community-metrics: Community Bonding evaluation for "Allowing contributors to update their own details in tech metrics" - https://phabricator.wikimedia.org/T98045#1340384 (NiharikaKohli) @dicortazar, kindly evaluate this task and close/comment as appropriate. [09:46:40] joal|night, yt? [10:28:07] (PS1) Mforns: Add app version to user agent map [analytics/refinery/source] - https://gerrit.wikimedia.org/r/216060 (https://phabricator.wikimedia.org/T99932) [10:42:10] (CR) Mforns: [C: -1] "Still WIP!" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/216060 (https://phabricator.wikimedia.org/T99932) (owner: Mforns) [10:53:02] (PS2) Mforns: Add app version to user agent map [analytics/refinery/source] - https://gerrit.wikimedia.org/r/216060 (https://phabricator.wikimedia.org/T99932) [10:53:43] (CR) Mforns: [C: -1] "Tests pass, but still need to test in the cluster." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/216060 (https://phabricator.wikimedia.org/T99932) (owner: Mforns) [11:08:57] (PS3) Mforns: Add app version to user agent map [analytics/refinery/source] - https://gerrit.wikimedia.org/r/216060 (https://phabricator.wikimedia.org/T99932) [11:09:29] (CR) Mforns: [C: -1] "Tests pass, but still need to test in the cluster." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/216060 (https://phabricator.wikimedia.org/T99932) (owner: Mforns) [11:13:24] (CR) Mforns: Add app version to user agent map (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/216060 (https://phabricator.wikimedia.org/T99932) (owner: Mforns) [11:33:23] joal|night: I did a dumb thing :) [11:33:36] is there any way to recover a spark shell session output? [12:35:44] joal|night: it's ok, I started the job again. I looked through the /tmp folder at the directories that spark sets up for me, but all of the old ones were empty [12:36:04] and I tried sending ~~# to the ssh shell to get it back, but that didn't work [12:36:25] yet another good lesson in always using screen, as I should have done [12:41:08] morning [12:44:37] mornin [13:09:04] btw, it's going to drive me crazy that we misspelled referrer in referer_class [13:09:10] joal|night: can we still change that ^ [13:12:37] milimetric, we spelt it "referrer"? Aw :/ [13:13:17] Ironholds: oh wait! no, I'm blind [13:13:24] we're good, spelt it referer [13:13:26] phew! [13:13:28] ..you know [13:13:39] anywhere outside of computing, this would be a bizarre conversation [13:13:49] :) [13:13:53] thank you, IETF spellcheck fails [13:48:40] is there any way of locally testing if an oozie job would work? :/ [14:03:49] Hi lads ! [14:03:56] sorry, late start today [14:04:24] milimetric: unfortunately, I don't think so about session output recovery :( [14:05:03] milimetric: http://en.wikipedia.org/wiki/HTTP_referer [14:06:41] It's ok, i started the job again. And yeah, i was wrong about the referer. English is the worst [14:06:47] :D [14:07:23] hmm, you were actually right about mispelling, but wrong about this particular case [14:07:35] Like the penguins you know, bird that don't fly :) [14:07:55] milimetric: A good w2ay not to loose data out of spark shell is to write them out [14:08:14] Let's batcave if you want, I can help a bit on that if you wish [14:08:44] It's ok, i saw a sample on how to do that and illogically chose to println instead. My fault [14:08:53] huhu [14:08:55] Also, i use screen [14:08:55] ok :) [14:08:59] Like, always [14:09:06] Except this time :) [14:09:12] Like, almost always ;) [14:09:35] dr0ptp4kt: Hellloooooo [14:09:40] dr0ptp4kt: you there ? [14:33:56] joal: he's on the west coast, so he won't be around for a while [14:34:07] ok makes sense [14:34:13] thx milimetric [14:58:13] (CR) Manybubbles: "I see nothing wrong with this but left a bunch of style and general questions. The theory seems sound to me though I don't know the analyt" (12 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [15:01:03] (CR) Manybubbles: "`mvn install` from the root of the project should run _all_ the tests." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [15:18:07] mforns: Heya :) [15:18:11] joal, hi! [15:18:16] you've been looking after me [15:18:27] joal, yes, but found a workaround... [15:18:34] Ah, ok :) [15:18:36] cool [15:18:53] joal, but today after standup if you have 10 minutes I'd like to comment 2 things with you [15:19:08] Of course sounds great [15:19:22] joal, 1) executing refinery_source tests locally. 2) testing the execution in the cluster [15:19:45] ok [15:20:10] joal, I managed the unit tests by executing from my home folder in stat1002, but in my machine I have mvn problems (unauthorized) [15:20:30] hm, that's weird ! [15:20:53] You might have the same problem Ironholds have ... [15:20:54] oh [15:21:01] not sure [15:21:13] I have a meeting after standup in fact mforns [15:21:19] With kevin [15:21:29] And I know you are right after me :) [15:21:38] Maybe at the end of standup we'll manage :) [15:21:54] oh, no problem joal, what about 19:30h? too late for friday? [15:22:08] ok [15:22:09] nope, late start, late end :) [15:22:17] ok [15:37:36] (CR) OliverKeyes: Add Search-centric UDFs [WIP] (12 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [15:45:18] Ironholds: i'll get to reviewing that today too :) [15:45:51] joal: what's up? [15:45:57] heya ... [15:46:14] dr0ptp4kt: really bad luck, still in standup :( [15:46:14] ottomata, cool! [15:46:20] the biggest block is "WHY WON'T THESE TESTS RUN" [15:46:26] Will catchup with you befotre I finish my day :) [15:46:48] joal: cool. [15:46:49] (CR) OliverKeyes: "Yes, I'm fully aware of how to run unit tests, I'm just not aware of why they keep /breaking/ on empty strings ;)" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [15:48:07] (CR) OliverKeyes: Add Search-centric UDFs [WIP] (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [15:50:29] (CR) Manybubbles: Add Search-centric UDFs [WIP] (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [15:58:27] (PS2) OliverKeyes: Add Search-centric UDFs [WIP] [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 [16:03:46] milimetric: playing quickly with Dashiki: looks awesome man :D [16:04:52] (CR) Madhuvishy: "lgtm, will +1 when you've made any changes! Added recommendation to insert phabricator bug number in commit message :)" (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215628 (owner: Joal) [16:25:33] joal :) glad you like it, I like the architecture of it, I think we executed well on that project. And it has a nice future if we put a little more effort into it [16:33:40] (PS3) OliverKeyes: Add Search-centric UDFs [WIP] [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 [16:34:35] mforns: https://edit-analysis.wmflabs.org/compare/ has your stack bars now [16:35:02] milimetric, cool! thanks for adding the config [16:35:12] milimetric, and deploying it! [16:44:43] (CR) Ottomata: Add Search-centric UDFs [WIP] (5 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [16:46:09] (CR) Ottomata: "You should add tests for the class in refinery core as well. Arguably those are more important that the UDF ones." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [16:47:13] Ironholds: what's your test problem? [16:47:16] the tests run and fail for me [16:47:33] ottomata, I don't understand why they fail is the test problem :/ [16:47:48] do you see this? [16:47:48] java.lang.IllegalArgumentException: Cannot parse parameters. Did you use , as column separator? An API request that isn't any of the right actions,,false,w/api.php,maxlag=5&format=json&rawcontinue=&meta=userinfo&action=query&uiprop=blockinfo|groups|hasmsg|r [16:50:13] Caused by: java.lang.IllegalArgumentException: Number of parameters inside @Parameters annotation doesn't match the number of test method parameters. [16:50:51] ja, Ironholds, some of those lines only have 4 fields [16:50:55] your tests expect 5 [16:53:12] ottomata, y'mean, they have empty strings in some fields? [16:53:27] okay. How would I represent empty strings or, for that matter, NULLs, in CSVs? [16:53:35] or am I better off writing out full unit tests to handle those kinds of cases? [16:55:08] that i don't know [16:55:12] but, in eithe rcase [16:55:20] An API request that isn't any of the right actions,,false,w/api.php,maxlag=5&format=json&rawcontinue=&meta=userinfo&action=query&uiprop=blockinfo|groups|hasmsg|rights [16:55:22] only has 4 columns [16:55:28] actions [16:55:31] [16:55:34] oh [16:55:35] wait [16:55:41] no i see 5 [16:55:43] apologies [16:55:53] didn't see the one after api.php [16:57:29] something is wrong with arg count though, don't see it... [16:58:39] mforns: I will be 2 minutes late... I'm on my way down to Mushroom Kingdom [16:58:52] kevinator, np, I'll be there [17:03:20] ottomata, hmmn. [17:03:32] ottomata, I may just write them as standalone tests, if that works for you? [17:03:36] a la the referer checking or IP checking [17:04:01] (CR) Manybubbles: Add Search-centric UDFs [WIP] (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [17:07:54] (CR) OliverKeyes: Add Search-centric UDFs [WIP] (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [17:12:23] (CR) OliverKeyes: Add Search-centric UDFs [WIP] (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [17:21:17] dr0ptp4kt: Ready ! [17:21:22] (finaly) [17:27:29] joal: ok if i call you on google hangout? [17:27:34] sure ! [17:29:25] dr0ptp4kt: doesn't work ? [17:32:14] joal, do you have 10 minutes, now or later? [17:32:25] depends on dr0ptp4kt :) [17:32:28] I have some time [17:32:47] joal, oh so you're waiting for him for a meeting? [17:32:52] Let's batcave if you wish, I'll try to catch up with dr0ptp4kt later [17:32:58] He's supposed to cal me [17:33:02] mmm, you decide! [17:33:14] ok [17:33:16] Let's batcave, I'll run if needed [17:33:30] omw [17:35:04] joal: did your job finish? :) [17:35:10] not yet :) [17:35:30] (CR) Manybubbles: [C: 1] Add Search-centric UDFs [WIP] (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [17:36:25] Ironholds: ithink that's fine [17:36:53] ottomata: mforns has the ssl issue Ironholds had yesterday I think :( [17:37:01] Have you managed to solve that? [17:38:32] ottomata, awesome :) [17:39:15] (CR) OliverKeyes: Add Search-centric UDFs [WIP] (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [17:39:20] joal: no, i tried to add keys via keytool [17:39:23] but i can't reproduce locally [17:39:28] hm [17:39:36] Ironholds: Still under issue ? [17:40:20] joal, actually it started working fine for me this morning [17:40:27] I have literally no idea why and am trying not to think about it [17:40:42] ??? [17:40:47] WEIRDO 1 [17:40:58] mforns: has similiar issue right now ... [17:41:10] I think Ironholds passed the curse to me [17:54:35] ottomata, almost done, just need to pull some raw data for a few more unit tests [17:54:41] although I'm now getting refinery-jobs errors [17:54:48] but given that I haven't touched any of those: not my issue :D [17:54:56] oh wait, those work too now. Whee! [17:56:31] (PS4) OliverKeyes: Add Search-centric UDFs [WIP] [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 [17:56:33] boom [17:57:50] (CR) Manybubbles: [C: 1] Add Search-centric UDFs [WIP] [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [17:58:49] joal: shoot, had another meeting, you available now? [17:58:55] sure :) [17:59:06] was waiting for you doing other stuff ;) [17:59:14] dr0ptp4kt: --^ [17:59:59] joal: okay for me to call now? [18:00:11] dr0ptp4kt: yes, very much [18:05:56] joal: you looked over my el kafka code? [18:06:02] i see you +1 so i guess so [18:06:03] I did :) [18:06:07] and I +1 yours, so I'm going to merge those [18:06:10] not deploy, just merge [18:06:17] and then play with that on analytics1004 [18:06:28] sounds cool [18:07:04] ottomata, no +1ing mine! It still needs tests :D [18:08:09] ok! [18:11:31] joal: i think the oozie queue is sitll causing problems [18:11:47] kevin noticed that the mobile apps uniques stuff hadn't been run for a while [18:12:00] it seems because the oozie launcher was stuck on accepted for a loong time [18:12:02] and it is in the oozie queue [18:12:21] Yeah, wanted to talk with you about that [18:12:54] ja? [18:13:00] yup [18:13:47] joal: maybe I can't self merge? [18:13:48] https://gerrit.wikimedia.org/r/#/c/215982/ [18:13:49] would you merge that? [18:14:26] I do [18:14:43] done [18:14:49] danke [18:14:56] oh nope, you didn't merge [18:14:57] just +2ed [18:14:59] same as me [18:15:09] si the flake8 keeping it form merging? [18:15:19] OHH line to long [18:15:20] i thin it is [18:15:21] will patch [18:28:32] ottomata: batcave for a minute ? [18:29:39] sure [18:50:09] ottomata, okay, last unit tests added :) [19:37:08] joal: fyi,i just relaunched the mobile-apps-uniques daily job to use the production queue for oozie launcher [19:38:44] on the subject of oozie; ottomata, is there any way for me to test an oozie job locally? [19:39:59] I'm sort of mentally debating whether the whole infrastructure is needed (I just want a daily hive job) but.. [19:41:44] oof, not really that I know of. well, sure actually, you can use vagrant [19:41:51] it comes with oozie if you enable the analytics role [19:42:02] but, testing it in production is not hard [19:42:16] you just change your paths in your properties to use fake data and output dirs [19:45:05] gootcha [19:51:00] hey joal, milimetric, i'm looking at [19:51:01] http://etherpad.wikimedia.org/p/EL_python [19:51:09] do we really need valid client and server side? [19:51:12] can't we just have [19:51:16] client + server raw [19:51:23] -> processor -> valid + invalid [19:51:24] ? [19:56:49] ottomata: that's how it's done in the puppet code we submitted, one sec [19:57:15] ottomata: https://gerrit.wikimedia.org/r/#/c/210765/2/manifests/role/eventlogging.pp [19:57:39] forwarder forwards to the same topic, all one place [19:57:52] processor validates and publishes to two different topics, valid and invalid [19:58:53] does processor support that? [19:58:56] i don't see that [19:59:01] invalidOutput [19:59:20] yea [19:59:37] line 64 in that file [19:59:51] and the code changes I made to the processor, hang on, I'll find that too [20:00:26] ottomata: https://gerrit.wikimedia.org/r/#/c/210729/3/server/bin/eventlogging-processor [20:00:43] ah not merged, didn't see this one [20:01:21] cool ok, can we merge that? [20:01:26] hm, maybe some comments.. [20:01:57] meh whatever, actually i like [20:02:05] makes sense milimetric [20:02:30] hmm [20:02:32] no [20:02:38] milimetric: let's not prepend the event with ERROR [20:02:38] right? [20:02:42] HMMM [20:02:55] yeah, i didn't know, feel free to -1 it [20:02:56] we should just log the message, but still send the event as is to the stream, no? [20:02:59] can I just patch it? [20:05:52] ottomata: sure [20:06:34] joal: or anyone: I'm scared [20:06:41] I've got my output in a val output in spark shell [20:06:52] and I can scroll around for some reason (seems like a glitch) [20:06:59] and output.foreach(println) doesn't seem to do anything [20:07:07] I'd love to write it somewhere, but don't know how [20:08:05] how big is output? [20:08:11] it is an rdd? [20:09:24] yea [20:09:29] it's small [20:09:34] hm, what does [20:09:35] 21 projects, 100 articles for each [20:09:39] output.take(10) do? [20:09:41] ottomata, so who do I throw the patch at now? [20:10:01] ottomata: it outputs the first 10 lines, they look like what I want [20:10:13] Ironholds: looking [20:10:16] oh ok [20:10:23] so, you can save the stuff to a file? [20:10:38] ottomata: I tried to make a PrintWriter and output.map(pw.write) [20:10:44] but it said PrintWriter wasn't serializable [20:10:49] http://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/rdd/RDD.html#saveAsTextFile(java.lang.String) [20:10:50] ? [20:10:58] oh that is java, sorry [20:10:58] and I'm scared because yesterday when it said something wasn't serializable Joseph had me restart the shell [20:11:00] one sec [20:11:05] which would lose me another 10 hours :) [20:11:21] same though [20:11:25] output.saveAsTextFile(path) [20:11:26] ? [20:11:38] (CR) Jdouglas: [C: 1] "I'm not familiar with the context of the tests, so I can't +2, but the tests are passing for me and the code looks good." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [20:11:51] Ironholds: can we call the class singular [20:11:54] SearchRequest [20:11:54] ? [20:12:01] same for the method [20:13:19] ottomata: what's that path relative to? I did it, and it looked like it sort of worked... [20:13:29] hdfs root [20:13:41] if you didn't specify root [20:13:47] your hdfs user dir probably [20:14:39] (CR) Ottomata: Add Search-centric UDFs (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [20:14:53] ottomata, sure [20:18:08] ottomata: heh, it wrote it into a directory with snappy compression [20:20:17] didn't we have some tool to read snappy out? something handy dandy that our favorite guy wrote? [20:22:47] got it, hdfs dfs -text filename [20:22:49] sweeet! [20:24:17] yup! [20:42:33] (PS6) OliverKeyes: Add Search-centric UDFs [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 [20:42:35] ottomata, change made [20:43:09] ack, just saw your third note [20:43:14] * Ironholds goes through again, curses at self [20:43:39] actually, wait [20:43:50] ottomata, I don't get the note at https://gerrit.wikimedia.org/r/#/c/215964/5/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/IsSearchUDF.java - all the UDFs have [name]UDF as a classname [20:47:13] sorry [20:47:13] yes [20:47:18] IsSearchRequestUDF [20:52:30] (PS7) OliverKeyes: Add Search-centric UDFs [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 [20:52:34] ottomata, oh, gotcha. Yeah, patched [20:59:48] (CR) Ottomata: [C: 2 V: 2] Add Search-centric UDFs [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 (owner: OliverKeyes) [21:00:59] ottomata, yaaaay! [21:01:09] :) [21:01:12] (when will there be a deploy next? Sorry to be annoying, I need to get this out for Metrics) [21:02:46] um, not too soon i think, joal is handling that at the moment, and he's got a few thigns in motion [21:02:54] you should probably just use your own jar build for now [21:11:21] milimetric: still there? [21:11:33] ottomata: yeah [21:11:39] so... this thing with not sending the errors [21:11:42] on the invalid stream [21:11:44] I'm a little torn [21:12:26] the invalid events might still be parsable json, just not valid event schemas [21:12:26] maybe we could send a JSON object that had {raw_event: ..., validation_error: ...} [21:12:26] if you put the error in there [21:12:26] you can't parse it [21:12:26] hm [21:12:26] i ugesssss [21:12:26] i dunno [21:12:28] you've got the errors in the logs [21:12:31] because the error is important, and there are a few cases where we lost data 'cause it's hard to analyze it in the raw logs [21:12:48] yeah, but honestly who has time to go write log grepping tools? [21:12:54] it's 2015, I barely have time to brush my teeth [21:12:56] you'll be doing that anyway, wont' you? [21:13:01] cept kafka grepping [21:13:10] not kafka grepping, hopefully hive [21:13:21] nope [21:13:26] not a contanst schema [21:13:38] like, select count(*), error from invalid_events group by error [21:13:47] raw_event can be stored as text [21:13:51] in a lot of cases it doesn't parse [21:14:01] milimetric: i don't like it, but mainly i don't like it because I dont' even think we shoudl have invalid events [21:14:02] ottomata, okie :) [21:14:05] most commonly it's too long and gets truncated [21:14:23] i mean, i understand we might need it now [21:14:28] :) ottomata that's unreasonable, this is the invalid event stream we're talking about [21:14:43] one day we will not have invalid events to deal with! [21:14:47] client's shouldn't be able to produce them [21:14:53] that day is lovely, and I can't wait [21:15:14] and when we talk about the valid event stream I promise not to make you store raw json into a field [21:15:17] tha'ts why i htink the log would be a better place for them, does python logging do throttling? [21:15:24] like: Last message repeated 100 times [21:15:25] not sure [21:15:25] or something? [21:15:31] no, doubt it [21:15:33] also, though [21:15:51] the python logs will be all but useless once we parallelize the processor [21:16:02] you'd have to zip together two logs, oh god [21:16:05] hm. [21:16:27] ya, hm. ok, so you want an invalid event capsule around the encoded event capsule which contains the encoded event value data [21:16:28] ? [21:16:55] it could be decoded, I'm not sure if there are ever decoding problems [21:17:05] Hm, ja [21:17:10] but i guess there could be, yeah, it should be able to handle anything, so {raw_event, error} [21:17:11] well, if it was truncated [21:17:27] yeah, it might not even decode properly if it's truncated [21:17:29] oh, but that's not encoded [21:17:29] yeah [21:17:30] hm. [21:17:34] just not parsable [21:17:40] i think we can probably decode [21:17:41] no, you're right [21:17:50] 'cause it could truncate on %(here)20 [21:17:53] hm [21:17:55] k [21:17:57] eyah [21:17:58] ok [21:18:16] { 'raw_message': xxx 'error': yyy } [21:18:25] right, that seems good for now [21:18:32] 'event' is already a field tha tmeans the schema event data inside of the message [21:18:41] kinda confusing already [21:18:41] :) [21:18:53] oook [21:19:24] yeah, it's not the best schema, but what's inside the message doesn't matter too much for this purpose [21:20:04] what we would do with this in some cases would be "find fixable errors", "find all events with that error", "fix the raw_message", "run it back through the processor" [21:20:57] these are usually things like event.action.abort_timing=null when it's supposed to be an integer. So it's fixable with string manipulation [21:21:02] brb [21:21:21] ah whatever raw event [21:32:10] :) [21:36:19] milimetric: need brain bounce [21:36:45] ottomata: to the batcave! [21:37:05] btw all: http://mrdovideo.com/la7/ [21:37:06] :) [21:37:14] oh game just over [21:50:21] polo! [21:50:27] have a good weekend all [21:50:30] * milimetric gone climbing [22:01:16] laterrrs, thanks ahve fun! [22:47:14] Hi analytics!! Quick question here... [22:47:36] does anyone know if ther happened to be a geoip cookie outage exactly on June 1 (UTC) ? [22:49:57] ottomata: ^ ? anyone else? :) [22:50:51] milimetric: ^ (only if ur not left for climbing quite yet ;p ) [22:51:35] I'm still here and was following your convo in #ops a bit, but no, i don't know anything [22:51:44] and i doubt someone would just know if that happened :) [22:53:12] ottomata: hmmm thanks... mmm would any analytics tools provide any ways of checking if it did? [22:53:30] Like maybe storing the GeoIP cookies that get sent back? [22:58:46] hmm, i don' think we store all cookies [22:58:49] just specific ones [22:59:06] only if they are manually put into the X-Analytilcs header [22:59:08] do we get them [22:59:39] Hmmm [22:59:55] So I guess GeoIP ones aren't gotten anywhere... [23:02:19] naw, i dont' think so [23:10:24] ottomata: hmm oh well! K thanks much in any case :) [23:11:12] ja, sorry can't help more :/ [23:13:41] ottomata: np! Either we'll figure it out... or... we won't! Either way hopefully it doesn't bite again :) [23:23:22] Analytics-Cluster, operations: stat1002 - dpkg reports broken packages - https://phabricator.wikimedia.org/T101582#1342821 (Dzahn) NEW [23:52:13] (PS1) Madhuvishy: [WIP] Daily and monthly uniques oozie jobs based on WMF-Last-Access cookie [analytics/refinery] - https://gerrit.wikimedia.org/r/216341 (https://phabricator.wikimedia.org/T92977)