[05:03:12] Analytics-Cluster, FINCH, Wikimedia-General-or-Unknown: Browser and platform stats for logged-in vs. anon users for security and product support decisions - https://phabricator.wikimedia.org/T58575#1595808 (Mattflaschen) [06:57:56] Analytics-Tech-community-metrics, ECT-September-2015: Labeling some bots (active in Git/Gerrit) as bots - https://phabricator.wikimedia.org/T110545#1595957 (jgbarah) >>! In T110545#1592408, @Aklapper wrote: >>>! In T110545#1592030, @jgbarah wrote: >> I couldn't find @jdlrobson+frankie@gmail.com in our dat... [06:59:32] Analytics-Tech-community-metrics, ECT-September-2015: Labeling some bots (active in Git/Gerrit) as bots - https://phabricator.wikimedia.org/T110545#1595959 (Qgil) Open>Resolved Agreed. :) [06:59:33] Analytics-Tech-community-metrics, ECT-September-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1595961 (Qgil) [09:20:35] Analytics-Kanban, operations, Monitoring, Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1596240 (fgiunchedi) @ottomata let's sync up on this on hangout/irc and report conclusions here, seems like it'll speed things up! [09:21:51] hi a-team! [09:22:36] hi mforns [09:22:42] hi joal :] [09:29:33] (CR) Mforns: [C: 2 V: 2] "Self-merging after discussion in stand-up :]" [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/234739 (https://phabricator.wikimedia.org/T107504) (owner: Mforns) [11:23:20] Analytics-Visualization: http://reportcard.wmflabs.org/ is not updating automatically - https://phabricator.wikimedia.org/T58030#1596442 (Aklapper) >>! In T58030#617496, @Tnegrin wrote: > This was brought up at the Analytics Quarterly review and we are working with Ken/ops to address the root cause. @Tnegri... [12:09:19] Analytics-Backlog: Backfill pageviews to exclude arbcom wikis - https://phabricator.wikimedia.org/T110701#1596573 (Aklapper) As this task has [[ https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities | "Unbreak Now!" priority ]], who should be set as task assignee? [12:17:06] Analytics-Backlog: Backfill pageviews to exclude arbcom wikis - https://phabricator.wikimedia.org/T110701#1596604 (JAllemandou) This has been done in pageview_hourly in that task: https://phabricator.wikimedia.org/T110614 webrequest has not been recomputed, but data will be deleted in two month. Is that ok o... [13:55:03] (CR) Bgerstle: "Given that we only just added our preview request and haven't even distributed betas of the app yet, I think it's a bit premature for us t" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [13:59:46] holaaaaaa [14:00:07] Hi nuria [14:00:30] nuria: something I think we might not hqve told you [14:01:00] We have setup a highlight for 'a-team' on our ircs clients :) [14:01:11] PONNGGG [14:01:15] hey! [14:01:20] xD [14:01:28] A-team.. jaja [14:02:15] Analytics-Backlog: Create white list for pageview data {hawk} [8 pts] - https://phabricator.wikimedia.org/T110061#1596843 (Nuria) a:Nuria [14:02:33] joal, do you have time now to talk about this one? https://phabricator.wikimedia.org/T110061 [14:02:50] I do nuria [14:03:00] k, batcave? [14:03:09] omw [14:08:31] Analytics, Labs, Labs-Infrastructure, Labs-Sprint-108, Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1596854 (Ottomata) @ellery fyi, you can do this now: https://wikitech.wikimedia.org/wiki/Analytics/FAQ#How_d... [14:14:52] yay, a nuria! :D [14:16:28] holaaa Ironholds [14:19:45] how is the...nurica? nuruela? :P [14:28:00] milimetric, I'm going to deploy the dashiki stuff, do you want to be there? [14:28:49] Analytics-General-or-Unknown, Database: Create a table in labs with replication lag data - https://phabricator.wikimedia.org/T71463#1596899 (Krenair) [14:30:05] Analytics-EventLogging: Fix auto_offset_reset bug in pykafka and build new .deb package {stag} - https://phabricator.wikimedia.org/T111182#1596906 (Ottomata) NEW a:Ottomata [14:30:12] Analytics-EventLogging: Fix auto_offset_reset bug in pykafka and build new .deb package {stag} - https://phabricator.wikimedia.org/T111182#1596914 (Ottomata) Pull request into Parsely: https://github.com/Parsely/pykafka/pull/245/files Building patched 2.0.0 and testing on analytics1010 and analytics1004 [14:30:35] Analytics-EventLogging, Analytics-Kanban: Fix auto_offset_reset bug in pykafka and build new .deb package {stag} - https://phabricator.wikimedia.org/T111182#1596915 (Ottomata) [14:31:00] Analytics-Visualization: http://reportcard.wmflabs.org/ is not updating automatically - https://phabricator.wikimedia.org/T58030#1596918 (Milimetric) @Aklapper, Toby shouldn't be assigned this any more, he's not the director of Analytics. But no, no progress has been made in the last 22 months. Wikistats 2... [14:31:10] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Prep work for Eventlogging on Kafka {stag} - https://phabricator.wikimedia.org/T102831#1596924 (Ottomata) [14:31:11] Analytics-EventLogging, Analytics-Kanban: Fix auto_offset_reset bug in pykafka and build new .deb package {stag} - https://phabricator.wikimedia.org/T111182#1596906 (Ottomata) [14:31:27] Analytics-Visualization: http://reportcard.wmflabs.org/ is not updating automatically - https://phabricator.wikimedia.org/T58030#1596926 (Milimetric) [14:31:29] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1596927 (Milimetric) [14:33:48] Analytics-Kanban, Reading-Admin, Patch-For-Review, Wikipedia-Android-App: Update definition of page view and implementation for mobile apps {hawk} [8 pts] - https://phabricator.wikimedia.org/T109383#1596933 (Milimetric) a:Milimetric [14:34:41] mforns: no, just let me know if anything goes wrong [14:34:48] milimetric, cool [14:45:42] milimetric, I think it went fine :] [14:45:53] language-reportcard.wmflabs.org [14:46:04] cool! [14:46:34] milimetric, btw, can you create a repo for me? I'm looking for a place to put the metrics code [14:46:42] I thought of limn-analytics-data [14:47:00] oh, right, yes, one sec [14:47:10] because it's meant to be executed by reportupdater when it supports executing scripts [14:47:20] I don't have permits to create repos [14:47:43] mforns: https://gerrit.wikimedia.org/r/#/admin/projects/analytics/limn-analytics-data [14:47:58] milimetric, thank you! [14:48:11] np, the dashboard looks good, you should write the language folks about it [14:48:37] kartik, santosh, niklas, pau I guess to start [14:53:22] joal: hi, looking at your code, wanna talk about it? [14:53:34] I am with nuria, give me aminute [14:53:42] milimetric: --^ [14:53:52] np, anytime [14:54:20] (PS1) Mforns: Add script to calculate kanban productivity metrics [analytics/limn-analytics-data] - https://gerrit.wikimedia.org/r/235470 (https://phabricator.wikimedia.org/T108209) [15:01:50] milimetric: ready when you want :) [15:02:16] joal: I'm in the batcave [15:02:22] joining ! [15:38:15] Analytics-Backlog: Generalize cube building for hadoop - https://phabricator.wikimedia.org/T111202#1597348 (JAllemandou) NEW [15:40:13] are Java regexes case-sensitive or insensitive by default? [15:40:36] Ironholds: cqse sensitive by default IIRC [15:40:42] danke! [15:40:56] and how do I declare a single character insensitive again? ;p [15:41:32] I don't recall this one unfortunately :) [15:42:25] Analytics-Kanban, Research-and-Data: Legal request, data point 1 [13 pts] - https://phabricator.wikimedia.org/T109626#1597369 (ggellerman) [15:42:26] I'll find it; thanks! [15:48:17] (CR) Milimetric: "The X-Analytics header can be used, but it sounded like the Android team was too far along to implement that change. From our point of vi" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [15:59:18] (CR) BearND: "@Milimetric I think we could use the X-Analytics header, too. I'm for it esp. if this make life simpler. Can we use it for request headers" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [15:59:26] (CR) Joal: "Thanks for the change !" (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) (owner: Madhuvishy) [16:03:04] (CR) Milimetric: "Yeah, if you add it to the request header, it's preserved by Varnish and passed on. And whatever you pass into X-Analytics would be avail" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [16:05:27] (CR) BearND: "Cool. How about just setting it to "preview"?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [17:02:16] joal: a better link to the hive docs, sorry: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup [17:02:43] and a link to the sql server version of it, which is much easier to understand: https://msdn.microsoft.com/en-us/library/bb522495.aspx [17:15:08] a-team: http://korben.info/la-veritable-histoire-du-nouveau-logo-de-google.html [17:15:20] If yopu can't read french, use whatever translator :) [17:30:20] joal, xD [17:36:31] joal, do you have 10 mins? [17:37:07] I do [17:37:15] cave ? [17:37:19] :] can you explain how can I [17:37:20] sure [17:57:37] (CR) BearND: [C: -1] "Let's not merge this until we really know what we want." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [17:57:44] (CR) Nuria: "+1 to send an explicit header, it is a solution that keeps the "what is/what is not a pageview" completely decoupled from the way apps do " [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [17:59:18] (CR) BearND: "@Nuria @Milimetric Let's continue this discussion on T110702" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [18:02:54] (PS6) Madhuvishy: [WIP] Report RESTBase traffic metrics to Graphite [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) [18:09:48] (PS1) Madhuvishy: [WIP] Add oozie job to schedule restbase metrics generation job [analytics/refinery] - https://gerrit.wikimedia.org/r/235519 (https://phabricator.wikimedia.org/T110691) [18:10:34] joal: I wrote the oozie code to schedule the hourly job, but the Spark job runs, and errors out at the end [18:11:07] joal: oozie job -log 0135106-150605005438095-oozie-oozi-W has one such traces [18:11:33] I tried running the same thing through spark-submit with the same options I'm passing through oozie, and it runs fine [18:13:13] joal: if you see something obvious that's off let me know! [18:24:20] Hey madhu [18:24:30] will look into later tonight or tomorrow [18:37:44] thanks joal [18:39:54] (CR) Ottomata: "I think just oozie/restbase is fine for the at least the directory and maybe also the job and dataset names. 'metrics' isn't necessary." [analytics/refinery] - https://gerrit.wikimedia.org/r/235519 (https://phabricator.wikimedia.org/T110691) (owner: Madhuvishy) [18:52:54] ottomata: yt? [18:53:57] yup [18:53:58] hiya [18:56:19] ottomata: read TT https://phabricator.wikimedia.org/T88459 and [18:56:29] ottomata: i do not get why don't we want avro json? [18:56:39] avro's schemas are written in json [18:57:04] and data can be serialized as json or binary [18:57:07] yes, and with confluent (and whatever service) we would support producing avro data in json, like rest proxy does [18:57:12] but, consumption is more difficult [18:57:36] consuming from the rest proxy isn't ideal, and many clients would likely need to then use avro dependencies to consume from kafka [18:57:47] but, either way, i think we want both [18:58:02] its going to be hard to force avro on people [18:58:08] ottomata: why? [18:58:15] ottomata: it is just a json subset [18:58:21] no [18:58:24] its not [18:58:29] if you consume from kafka, you get binary dat [18:58:30] a [18:58:31] ottomata: ahhh [18:58:40] you need to have some avro lib in front of kafka to get you the json representation [18:58:47] ottomata: but from the docs (and ahem...maybe i missreaded) [18:58:48] that's what rest proxy does [18:58:58] ottomata: seems like the binary representation is optional [18:59:07] naw [18:59:11] avro is a binary format [18:59:12] ottomata: that you can serialize to plain (text-based json) or binary [18:59:16] ? [18:59:18] pretty sure not. [18:59:19] i mean [18:59:19] yes [18:59:25] there is a json representation of avro [18:59:28] hm. [18:59:31] hm. [18:59:49] "Avro has a JSON like data model, but can be represented as either JSON or in a compact binary form. It comes with a very sophisticated schema description language that describes data." [18:59:52] that would be the json serialized version. hm. but, it can't be the same, because the avro types are much stricter than json [19:00:02] ottomata: right, it is a subset [19:00:06] ottomata: which is fine [19:00:15] ottomata: ahem [19:00:19] ottomata: seems fine [19:00:35] hm, maybe what you say is possible, i hadn't considered that. clients produce JSON form and if they like THAT is what is produced to kafka. [19:00:35] ottomata: actually this is how (many years ago) amazn catalog described its data [19:00:35] hm. [19:00:59] ottomata: a subset of json (to keep things simpler) that optionally (for perf) [19:01:05] can be consumed in binary form [19:01:14] ottomata: seems pretty good to me [19:01:22] ottomata: but again [19:01:30] ottomata: i tend to reda too fast ... [19:01:32] *read [19:02:29] let me recheck [19:03:38] madhuvishy: I have looked to the job logs - nothing clear [19:03:47] It says java.lang.InterruptedException [19:03:51] joal: yeah [19:04:51] Just to confirm: not the oozie logs, the hadoop job logs :) [19:04:58] madhuvishy: --^ [19:05:04] joal: hmmm [19:05:13] Maybe try another run ? [19:05:20] That sounds bizarre to me [19:05:26] i've tried multiple times [19:05:30] k [19:06:49] madhuvishy: Have you checked if data is sent to graphite ? [19:07:08] Cause the job seems to have finished, but to have an issue at clean time [19:07:12] joal: hmmm let me check [19:07:58] joal: no... i dont see anything on 1st august [19:08:54] hm [19:15:35] (CR) Joal: [C: 2] "Let's not merge that while the oozie aspect of it is not working :)" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) (owner: Madhuvishy) [19:16:03] (CR) Joal: "And the commit message still has WIP :)" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) (owner: Madhuvishy) [19:16:28] joal: yeah, din't want to remove WIP because I want to figure out this oozie thing too [19:17:15] madhuvishy: can you add me to the CR for oozie please [19:17:23] Will be be easier to follow [19:17:24] joal: sure [19:18:06] ottomata: will do some avro experiments [19:18:15] ottomata: let me know if i am way offtrack here [19:19:55] nuria: i think what you say makes sense, but i'm not sure how validation is done [19:20:07] maybe check out confluent rest proxy and see how they do validation? [19:20:14] do they do it with the json value that comes in? [19:20:27] or do they attempt to use avro libs to convert that to avro binary? [19:21:04] ottomata: i bet every record constains schema [19:21:08] *contains [19:21:11] no [19:21:12] it can [19:21:17] aham [19:21:20] but it also can be passed a schema id [19:21:27] the schema is looked up from the schema registry [19:21:31] aham [19:21:46] nuria [19:21:46] http://docs.confluent.io/1.0/ [19:23:46] madhuvishy: the error seemes to be at container decommisioning [19:23:50] It's weird [19:24:15] joal: one thing i noticed was i was closing socket before out in the connection [19:24:32] Analytics-Backlog: Provide the Wikimedia DE folks with Hive access/training {flea} [8 pts] - https://phabricator.wikimedia.org/T106042#1598742 (Nuria) We have this existing doc on Hive: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries That includes how to get access and some sample querie... [19:24:38] not sure if that's a thing though, because it works with spark submit - although i'm gonna change it [19:24:49] hm, I don't get it [19:25:14] joal: hmm [19:25:18] https://www.irccloud.com/pastebin/zIM5qX8E/ [19:25:31] Right, socket.close then out.close [19:25:33] i was closing socket first before [19:25:37] right [19:25:43] i should close out first [19:25:50] hm, should not be the issue [19:26:02] ya thats what i thought [19:26:16] I really think it comes from something else [19:26:17] sudo -u hdfs hadoop job -logs job_1434651818028_154272 [19:26:58] only error is after the worker has been set to finished successfully ... [19:27:03] Doesn't make any sense [19:28:00] Something to try: write the result in a dummy file after having sent it to graphite. [19:28:06] madhuvishy: --^ [19:28:56] Two reasons: check if the file exists after error, and possibly have a place for spark to store its temp files [19:29:18] But it's a wild idea :( [19:29:29] I'll spend more time on that tomorrow [19:29:52] milimetric: Tried the GROUPING SETS in hive --> works fine ! [19:30:22] Just long to describe the full set of sets to be worked ;) [19:30:39] Even performance-wise, it's not that bad [19:30:56] With that, lads, I go to diner ! [19:31:01] Hqve q good end of day :) [19:31:35] * joal hqtes hqving broken his keyboqrd [19:31:54] Bye a-team [19:32:10] nite joal|night [19:32:17] nighters! [19:34:41] nuria: btw, this is handy for checking up on EL: http://grafana.wikimedia.org/#/dashboard/db/eventlogging [19:34:58] the last graph has the thing I usually look at, and you'll notice a new insert attempted metric [19:35:14] madhu added that recently, it's the number of records that the mysql consumer sees [19:35:30] milimetric: i see, dat ais coming from graphite roight? [19:35:33] *right? [19:35:35] yes [19:40:41] joal|night: thanks [20:09:56] madhuvishy: didn't wanna self merge, so I added you to https://gerrit.wikimedia.org/r/#/c/234202/ [20:10:04] I'll deploy it if you merge it [20:15:32] ottomata: I wanna puppetize dashiki dashboards in a way that the host wouldn't need to be a puppet master [20:17:15] ok! [20:19:48] so what should I use? [20:20:19] how do I get an array of dashboard configs into somewhere where puppet can load them [20:21:24] the array would have elements that would need a few properties each [20:22:52] like { wikiConfig: "VitalSigns", layout: "metrics-by-project", piwikSiteId: 2 } [20:24:52] if I made a wiki page like this, would that be the right way? https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [20:26:15] ottomata: ^ [20:26:32] milimetric: sure, will merge [20:26:43] milimetric: i think so yes [20:27:14] ok, I'll try to figure out how that's used by the beta cluster [20:28:52] milimetric: we ahve it in analytics project too [20:28:59] https://wikitech.wikimedia.org/wiki/Hiera:Analytics [20:29:06] oooh [20:29:21] milimetric: there are a few ways in which those get sent to puppet [20:29:24] but the simplest is [20:29:28] if you set something there like [20:29:34] dashiki_config: [20:29:35] ... [20:29:38] then in puppet you can do [20:29:57] $dashiki_config = hiera('dashiki_config', { 'default config here' } [20:30:01] (if you want a defaul) [20:30:41] right, but in this case, $dashiki_config would be an array, so I'd want it to instantiate a module for each element in the array [20:32:32] oh hm. [20:33:33] milimetric: its hard for me to tell from all the way up here how it would work best, but maybe your dashiki module could ahve a dashiki::instance define [20:33:39] that woudl look up its config based on its name [20:33:47] and, instead of an array [20:34:06] you'd have a config hash for each instance [20:34:10] something like [20:34:17] oh, but you want people to be able to edit hiera? [20:34:18] hmmm [20:34:21] oh that should work too [20:34:21] hm. [20:34:22] ok [20:34:57] define dashiki::instance(...) { $config = hiera("dashiki_conifg_${title}") } [20:34:59] and in hiera: [20:35:12] dashiki_config_VitalSigns: ... [20:35:15] and [20:35:34] dashiki_instances: [20:35:34] - VitalSigns [20:35:34] - ... [20:35:42] and then back in puppet, when you declare the instances [20:35:58] dashiki::instance { hiera('dashiki_instances'): } [20:37:05] so that last line would be in a role that would get selected in the instance configuration? [20:54:23] (CR) Milimetric: [C: -1] "this is great, clean and easy to understand. -1 because it should have a requirements.txt with the dependencies" (1 comment) [analytics/limn-analytics-data] - https://gerrit.wikimedia.org/r/235470 (https://phabricator.wikimedia.org/T108209) (owner: Mforns) [20:54:57] thanks milimetric [20:55:03] will look at that [20:55:19] thank you, this is gonna be fun to see the numbers [21:09:36] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [30.0] [21:12:39] there's a 5 minute hole in eventlogging valid events metric [21:12:59] but it seems recovered now [21:14:24] ottomata: hola, yt? [21:15:09] nuria: we should chat about uniques sometime. work on that is pretty much paused [21:15:44] milimetric: yes think so [21:15:49] nuria: hiya [21:16:13] madhuvishy: at your convenience, i think that work is in the goals right? [21:16:34] nuria: yup! [21:19:30] madhuvishy: ok, let's talk tomorrow [21:19:44] ottomata: been looking at rest kafka endpoint code [21:20:01] ottomata: here: https://github.com/confluentinc/kafka-rest/tree/master/src/main/java/io/confluent/kafkarest [21:20:03] nuria: cool [21:20:52] ottomata: but i cannot find validation [21:21:01] the avrodecoder is elsewhere at: [21:21:09] nuria: https://github.com/confluentinc/schema-registry [21:21:32] ottomata: looking [21:22:38] nuria: i think the validation is done when attempting to convert the incoming json into the avro binary using the avro schema [21:22:44] if that fails, it doesn't validate [21:22:45] not sure though [21:23:51] ottomata: from docs seems that format and transport are separated and that you can have avro non binary [21:23:56] thus json [21:24:23] ottomata: now, from code i cannot tell where the validation is done, decoding is here: https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/KafkaAvroDecoder.java [21:25:55] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [21:26:26] ottomata: wait, it has to be here, lemme look: https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroDeserializer.java [21:28:19] nuria: maybe https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java [21:28:19] ? [21:28:27] ottomata: ok, i *think* is doing binary: https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroDeserializer.java#L107 [21:29:48] yes, nuria, but this is the deserialization, right? [21:29:52] so that is when reading from kafka [21:30:08] ottomata: not when inputing messages into it? [21:30:19] i think so, otherwise it would be serializing, right? [21:30:34] mmmm [21:30:39] Rest Proxy takes the JSON representation of Avro as input from produce requests [21:31:20] ottomata: ok so http payload with avro json->serialization-> kafka storage then [21:31:55] yeah [21:32:17] (PS2) Mforns: Add script to calculate kanban productivity metrics [analytics/limn-analytics-data] - https://gerrit.wikimedia.org/r/235470 (https://phabricator.wikimedia.org/T108209) [21:32:44] (CR) Milimetric: [C: 2 V: 2] Add script to calculate kanban productivity metrics [analytics/limn-analytics-data] - https://gerrit.wikimedia.org/r/235470 (https://phabricator.wikimedia.org/T108209) (owner: Mforns) [21:33:01] thx! [21:33:02] nuria: also [21:33:04] https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/formatter/AvroMessageReader.java#L185 [21:33:15] i think that code is only used from the command line interfaces [21:33:21] but i betcha there is something else like that somehwere [21:33:32] that is using avro libs to convert the json rep into avro objet [21:33:49] ottomata: https://github.com/confluentinc/kafka-rest/blob/master/src/main/java/io/confluent/kafkarest/converters/AvroConverter.java [21:33:50] hmm, nuria i could be wrong about the deserilaze vs serialize [21:33:59] ah! [21:34:00] :) [21:34:07] this may be? [21:34:56] but ... if you can consume/produce through rest endpoint in avro-json directly... [21:35:06] ottomata: if you can consume/produce through rest endpoint in avro-json directly... [21:35:13] ottomata: isn't it all good? [21:35:45] nuria: writing the avro json representation is not super simple [21:35:46] nuria: no, because we think the consume part isn't very useufl [21:35:54] and yeah, it is pretty annoying to get it right [21:36:04] rest consumption isn't how most clients will use this [21:36:20] that basically requires polling for new data [21:36:31] issueing REST consume requset in some for loop or something [21:36:41] ottomata, madhuvishy .. ok i see for consumption [21:36:46] gabriel's big use case is a new job queue [21:36:50] woudl suck to have to implement a job queue that way [21:36:56] ottomata, madhuvishy but for format , avro seems ok, if strict [21:36:59] the job queue should just consume a stream of events [21:37:15] nuria: yes, i like avro in general, it has many features that make it better than JSON schema [21:38:44] ottomata, madhuvishy: then, ideally we would like consumers to consume/produce in text format avro json if possible? is that right? [21:38:50] but i think we want this system to support both. unless you can convince everyone that avro is cool [21:38:54] nuria: +1. I think it would be hard to get wide adoption - especially because people have existing systems in json, like if we were to move Eventlogging to this new system - we'd have to convince people to port 100+ schemas [21:39:27] madhuvishy: I think we can auto-convert the schemas to avro schemas [21:39:31] they're very similar [21:40:22] madhuvishy: if we port it the ability of registering schemas and validating them w/o our own code [21:40:39] madhuvishy: i think will make up for the "recreation" of schemas [21:40:57] madhuvishy: but not sure 100% [21:41:39] milimetric: all clients woul ahve to change too [21:41:43] the format of the data woudl be different [21:41:45] nuria: yeah, I like avro and I'm up for going that direction - but getting everyone to use it would be a challenge [21:42:02] madhuvishy, ottomata: i think is worth trying to convince people that avro is cool not worrying about EL for now [21:42:13] nuria: I'm with you, good luck! [21:42:23] ottomata: hahahahja [21:42:26] :D [21:42:28] we should have waited for you til our arch meeting :) [21:42:33] ottomata: YES [21:42:37] we folded [21:42:39] ottomata: you are the good cop [21:42:53] its ok, no decisions really made yet [21:43:00] we just formed a preliminary plan of action [21:43:02] ottomata: you convince people and i will be the bad cop [21:43:04] gonna test some things out [21:43:29] i'm actually kind of partial to the idea of adapting eventlogging to have a rest service [21:43:45] or, at least trying it [21:45:47] ottomata: what are the downsides to doing that [21:46:06] ottomata: ok, i am not going to do anything else now, just looked and python avro and it is super easy to use [21:46:31] ottomata: rest service to replace the varnish endpoint? [21:47:03] nuria: no, it would allow for internal uses of the service [21:47:12] allow people to produce validated events to kafka [21:48:41] ottomata: to replace the php side then [21:48:46] ottomata: is that so? [21:49:01] php side? [21:49:13] ah, yessss and other clients to [21:49:25] but, yeah, to remove the server-side-forwarder for sure [21:49:39] that is a big ol spof. [21:50:06] but that is just the MW eventlogging use [21:50:10] there will be other clients producing too [22:03:42] hey milimetric what simple REST python lib would you use, if you were starting something new? [22:04:13] flask? [22:04:16] serving REST? [22:04:29] right, um, flask sure [22:04:36] there are lighter weight things but flask is too popular [22:05:54] if it had to be crazy performant, I'd probably look at tornado and a few others [22:07:08] hmmmm [22:25:27] ottomata: do you know how i can log errors from a spark job [22:28:14] madhuvishy: off the top of my head no. there's got to be some way though. [22:28:25] hmm it says it supports log4j [22:28:34] wondering where the logs would show up [22:28:34] you could maybe make a dumb tcp socket and send data to it? [22:28:35] :p [22:28:50] ottomata: lol, that's what my job already does :P [22:50:59] Analytics-Backlog: Create white list for pageview data {hawk} [8 pts] - https://phabricator.wikimedia.org/T110061#1599943 (Nuria) Describing a bit more in detail what this ticket is about: In the wmf database in hive there are two tables that store pageviews: webrequest and pageview_hourly. The webrequest t... [22:56:10] madhuvishy: the tests in EL master are failing cause they are trying to run the tests on pykafka, i think i can fix it by changing the tox.ini config [22:56:42] nuria: that'd be great! thanks [22:56:58] i'm stuck with some stupid oozie problem :/ [22:57:27] madhuvishy: ya oozie problems are slooow to resolve cause it takes forever to re-run jobs sometimes [22:57:32] let me know if i can help [22:59:45] nuria: sure, it's something super vague - there is no proper error message anywhere [23:00:25] madhuvishy: not even on the hadoop node? [23:00:43] it says something like, class not found or something [23:01:02] i've run it through spark-submit, and it all runs fine [23:01:45] mmm... [23:01:48] nuria: weirdly, oozie does launch the job [23:01:57] it runs fine [23:02:04] so if you schedule the job using oozie +spark it no works [23:02:07] and then restarts and starts running again [23:02:17] Yupp [23:02:31] but if you run it yourself (as madhu) it finds the needed class [23:02:36] is that so? [23:03:37] nuria: yeah [23:03:46] it doesn't look like a class not found thing at all [23:03:52] can i see the error? [23:04:03] sure one sec [23:04:09] a-team bye! see you tomorrow :] [23:04:32] nuria: it's different now though - it just keeps restarting [23:04:53] mforns: ciaooo [23:05:00] bye mforns! [23:05:12] madhuvishy: does it eventually stop restarting and errors? [23:05:24] bye [23:05:31] nuria: hmmm, not sure, i killed it. let me leave it to run [23:05:52] nuria: okay, this thing is one such - https://yarn.wikimedia.org/cluster/app/application_1434651818028_154813 [23:05:57] it's on the 3rd attempt [23:06:52] nuria: https://yarn.wikimedia.org/proxy/application_1434651818028_154813/ you can see the spark stuff here [23:08:28] madhuvishy: are you telling oozie to execute it every minute? that is why it keeps restarting maybe? [23:08:47] nuria: Nope! It's scheduled every hour [23:09:12] https://gerrit.wikimedia.org/r/#/c/235519/ [23:11:02] madhu: looking [23:16:17] madhu: brb [23:23:48] madhu: and it reruns the job for the same time over & over? [23:26:32] nuria: yeah [23:26:44] i think so [23:26:49] testing again [23:28:10] madhuvishy: for that to happen (if frequency is correct like it looks it is) oozie is not writing to itself that it did run the job for that hour so it must be erroing and exiting right away [23:28:34] nuria: hmmm, ya possibly [23:32:59] madhuvishy: did you try the verbose flag? [23:33:09] in oozie? [23:33:23] oh spark [23:33:47] problem is i can't recover the original spark logs [23:34:53] nuria: ^ [23:35:14] when i run through oozie atleast [23:35:15] madhuvishy: no in oozie [23:35:30] madhuvishy: oozie job -v [23:35:34] hmmm [23:36:01] madhuvishy: it might not help you much but just to rule it out [23:36:56] nuria: with oozie job -run, i add -v? [23:37:01] yes [23:38:33] i am going for a short run will be back in 1 hour [23:38:37] nuria: it says invalid flag [23:38:41] let me know if the -v works [23:39:00] no.. it's okay - i'm gonna head to the gym and home in a bit too