[00:03:34] madhuvishy: fyi, if you think gobblin would be easier than camus for this cirrussearch avro problem, we can use it. [00:04:05] mforns: sure, i think you can deploy it now and add schema later, no? [00:04:14] ottomata, OK [00:04:21] removed the WIP flag [00:04:31] i think adding more data to the thing will just make it nicer to use [00:04:39] but not a blocker for getting it into logstash [00:13:34] ottomata, will you merge it then? :D [00:13:48] maybe not now.. maybe tomorrow [00:14:10] I'm leaving now [00:28:03] bye a-team! see you tomorrow! [00:58:26] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1691919 (madhuvishy) Yay!!! :D [01:21:25] bye all! [01:21:39] madhuvishy: nuria, let's chat tomorrow about this camus avro thing. i will work with you on it [01:26:21] Analytics, MediaWiki-API: api.log does not indicate errors and exceptions - https://phabricator.wikimedia.org/T113672#1691943 (Tgr) @Spage if you think error codes should be logged please update T102079 so there is a single canonical description of what needs to be logged. [01:33:25] Analytics-Backlog, Developer-Relations, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1691952 (Tgr) I think this is now pretty close to actually happening. Could someone from {#DevRel} update the "Metric... [01:58:14] Analytics-Kanban: {flea} Self-serve Analysis - https://phabricator.wikimedia.org/T107955#1691979 (Krenair) [02:02:09] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1691980 (Tgr) >>! In T114078#1689560, @BBlack wrote: > Issues with uneccessary data within... [08:39:40] !log Deploying refinery on stat1002 [08:41:26] !log refinery deployed on stat1002 [08:41:44] !log oozie refine bundle paused [08:41:55] !log oozie load job killed [08:45:06] !log oozie load bundle restarted with new configuration [08:46:09] Analytics-Backlog, Developer-Relations, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1692335 (Qgil) [08:51:45] Analytics-Backlog, Developer-Relations, MediaWiki-API, Research consulting, and 3 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1692344 (Qgil) I tried, check the description. I have added the reasoning on each item, in order to help others help... [08:55:26] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1692369 (Dicortazar) @Aklapper, according to the process previously defined, this is a task that can be easily done by you or @Qgil. The... [08:59:46] !log resuming oozie refine bundle [10:39:48] (PS1) Addshore: Add admin & crat count metrics [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242845 [10:47:12] (PS2) Addshore: Add admin & crat count metrics [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242845 [10:50:30] (CR) Addshore: [C: 2 V: 2] Add admin & crat count metrics [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242845 (owner: Addshore) [11:17:09] (PS1) Addshore: Fix duplication in getclaims_property_use generation [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242849 [11:17:30] (CR) Addshore: [C: 2 V: 2] Fix duplication in getclaims_property_use generation [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242849 (owner: Addshore) [11:23:34] joal, hi! [12:36:50] o/ joal [12:37:13] looks like my last run of the JsonRevisionsSortedPerPage failed. [12:37:13] job_1442877556644_0009 [12:37:18] I'll go file a task [12:38:16] Analytics-Backlog: JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history - https://phabricator.wikimedia.org/T114359#1693022 (Halfak) NEW [12:40:57] Analytics-Backlog: JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history - https://phabricator.wikimedia.org/T114359#1693031 (Halfak) [12:41:11] Analytics-Backlog: JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history - https://phabricator.wikimedia.org/T114359#1693022 (Halfak) Looks like we were running out of memory in the reducer. The job took nearly 48 hours to arrive at this failed state. ``` Job Name: org.wikimedia.wikihad... [12:41:39] * halfak gets on bike to head to the university [13:48:30] Analytics-Backlog: JsonRevisionsSortedPerPage failed on enwiki-20150901-pages-meta-history - https://phabricator.wikimedia.org/T114359#1693218 (Halfak) Here's the command I ran: ``` hadoop jar ~/jars/wikihadoop-0.2.jar \ org.wikimedia.wikihadoop.job.JsonRevisionsSortedPerPage \ -i /user/halfak/streaming/... [14:01:42] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1693283 (Aklapper) >>! In T112527#1692369, @Dicortazar wrote: > Do you want to give it a try? @Dicortazar: I'd like to. But I'm afraid I... [14:02:02] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1693284 (Aklapper) a:Dicortazar>Aklapper [14:16:04] Analytics-Tech-community-metrics, DevRel-October-2015, DevRel-September-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1693338 (Aklapper) Thank you a lot! So as the //technical infrastructure// seems to be in place to support automated generation... [14:31:57] Analytics-Tech-community-metrics, Developer-Relations, DevRel-September-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1693361 (Aklapper) **: It's complicated.** One reason could be WMF teams movin... [14:37:52] joal, yt? [14:46:48] holaaa [14:46:54] heeey! [14:47:03] good morning [14:55:54] Analytics-Backlog: Flag in x-analytics in varnish any request that comes with no cookies whatsoever - https://phabricator.wikimedia.org/T114370#1693384 (Aklapper) [15:01:27] joal: do you know how we get uri_path, uri_host, and uri_query all split up in the wmf_raw.webrequest table? [15:02:53] or anyone else, nuria ^? [15:03:07] yes milimetric [15:03:44] Hi a-team ! [15:03:50] hi :) [15:03:52] hi! [15:04:09] Mannnnnn, I'm sorry, apointme [15:04:15] ahem let me see, how does it get from varnish to kafka into the raw request?, right? [15:04:20] nt at lunch took way loner than expected [15:04:52] Also, I'll have to take care of Lino cause Melissa is not here --> I'll count this day as off for me :) [15:04:52] nuria: right [15:05:19] joal: good luck, we'll leave you alone to smile at your baby :) [15:05:27] :D [15:05:33] joal, oh, OK, np here [15:06:54] milimetric: let me see, i think we log that from varnish into varnish kafka as varnish alredy has that data [15:06:57] *already [15:08:43] milimetric: let's see if this makes sense: https://github.com/wikimedia/varnishkafka/blob/master/varnishkafka.conf.example [15:08:46] k, cool, thx, we're trying to "manually refine" in case you're curious: https://gist.github.com/milimetric/d18d60d48240107768c3 [15:08:52] no that makes sense nuria [15:09:06] I think varnish logs those (just like it does for EL) [15:09:26] milimetric: and varnishkafka reads the crazy varnishlog and publishes to teh topic [15:09:32] so it turns out you can fairly simply use sampled logs and call UDFs except for the pageview one which needs this split up (in the sampled logs it was just hostname / uri) [15:09:39] milimetric: so if you read the topic they are alredy split [15:09:41] nuria: makes sense, yea [15:09:42] *already [15:10:39] milimetric: I did that already, can send you the piece of code [15:11:04] joal: day off rememeber? [15:11:09] *remember [15:13:11] Sou:D [15:14:43] milimetric: to do that sampledlog data needs to be into a table right? [15:16:22] nuria, milimetric : https://gist.github.com/jobar/f694b0c2745861608dd4 [15:16:48] Then almost off, will be there at the beginning of standup telling Kevin [15:16:57] halfak: Sorry mate [15:17:24] Analytics-Backlog, Developer-Relations, MediaWiki-API, Research consulting, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1693478 (bd808) >>! In T102079#1692344, @Qgil wrote: > I tried, check the description. I have added the reasoning on... [15:19:39] Analytics-Backlog, Developer-Relations, MediaWiki-API, Research consulting, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1693490 (bd808) a:Qgil>bd808 @tnegrin has suggested that I act as product manager for this epic to help coordin... [15:20:34] Analytics-Backlog, Developer-Relations, MediaWiki-API, Reading-Infrastructure-Team, and 5 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1693497 (bd808) [15:41:46] joal: following up on T108925 - could we do sync-up here in about 2h? [15:42:17] HaeB, joal is taking the day off today [15:44:00] https://phabricator.wikimedia.org/T102225 [15:45:31] i see (scrolling up) .. thanks mforns [15:46:20] ...that timing is a bit unfortunate as we have to generate some quarterly review numbers today, but let's see [15:54:30] (PS1) Nuria: [WIP] Changes to camus for avro [analytics/camus] - https://gerrit.wikimedia.org/r/242907 [16:02:40] nuria: interview? [16:02:52] send jarret a note to start 10 mins later [16:02:54] sorry a-team I slept off during standup [16:02:58] is he there? [16:03:01] aah he's in the meeting [16:03:28] madhuvishy: need to postpone 10 mins, please ask him if he got my e-mail [16:04:26] nuria: apparently not [16:05:30] its okay I told him [16:05:55] madhuvishy: ok [16:10:46] Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1693683 (ezachte) NEW a:kevinator [16:10:57] madhuvishy: omw [16:13:11] !log test [16:13:25] hrm. [16:15:13] ottomata, https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/default [16:15:34] Analytics-Backlog: Notify potential users of the UDP sampled logs that we're preparing to purge them - https://phabricator.wikimedia.org/T114380#1693697 (Milimetric) NEW [16:16:28] Analytics-Backlog: Clean up and possibly refine UDP sampled logs (which go back to 2014) - https://phabricator.wikimedia.org/T114381#1693712 (Milimetric) [16:17:40] * valhallasw`cloud prods analytics-logbot [16:18:50] valhallasw`cloud: our logs are picked up by stashbot - not sure what analytics-logbot is [16:19:15] Analytics-Backlog, Documentation: Clean up and possibly refine UDP sampled logs (which go back to 2014) - https://phabricator.wikimedia.org/T114381#1693720 (Krenair) [16:19:34] madhuvishy: it's the bot that edits wikitech [16:19:54] but if that's unnecessary for analytics, I'll happily turn it off :-) [16:20:13] aah, edit wikitech how? [16:20:55] valhallasw`cloud: I don't mind, more the merrier :) [16:21:03] madhuvishy: it edits https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:21:18] oh cool [16:27:04] !log testing again [16:27:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [16:33:08] ottomata: I found the schema repo for avro they are always talking about - https://github.com/schema-repo/schema-repo [16:33:42] Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1693798 (ezachte) [16:34:14] madhuvishy: hm! interesting [16:34:58] madhuvishy: i'm worried about this whole thing! [16:35:11] in order for the binary messages to be consumed by camus [16:35:16] they have to have a schema id attached with them [16:35:27] oh? [16:35:38] at least with, the kafkavromessagedecoder [16:35:39] yeah [16:35:41] aah [16:35:42] um.. [16:35:44] that's why [16:35:53] yeah i was asking yesterday [16:36:02] how does it know which schema to validate against [16:36:10] https://github.com/wikimedia/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageDecoder.java#L95 [16:36:23] ottomata: hmmm [16:36:24] it expects the kafka message to be something like [16:36:37] 123450x0 [16:36:45] where 12345 is the schema id [16:36:51] mm hmmm [16:36:54] so it actually reads a binary int off the top of the message [16:37:07] (not sure if that is the exact format, but ja) [16:37:32] hmmm, wonder if we can change that [16:37:35] madhuvishy: been thinking about this, want to talk to nuria too, but i'm leaning towards going back to erik and asking him to still use avro schema, but to produce the json encoding [16:37:39] sure, you can implement your own decoder [16:37:46] but, you'd have to hardcode the schema i think [16:37:51] ottomata: okay we are at the interview [16:37:54] OH [16:37:54] lets talk after [16:37:56] pay attention! [16:37:56] yes. [16:37:59] yes yes [16:39:15] :) [16:42:21] ottomata: on interview, can talk in 30 mins [16:43:48] aye [16:49:02] madhuvishy: can you answer questions? [16:49:08] yeah [16:51:14] nuria, milimetric: quick question about https://phabricator.wikimedia.org/T110702 - i'm a bit confused: do app link previews count as regular pageviews right now (until this is deployed), or not? [16:52:48] HaeB: on interview will answer in a bit [16:56:47] nuria: batcave? [16:56:57] madhuvishy: HaeB will be online in a bit again [16:57:11] okay [16:57:12] cool thanks! [17:10:38] HaeB: I think they do count as pageviews now. I'm running a small query to confirm [17:12:40] madhuvishy: and after the change, they won't be counted as pageviews, right? [17:13:03] (looking at the abandoned patch: https://gerrit.wikimedia.org/r/#/c/237274 "If x-analytics header includes tag preview the request should not be counted as pageview." ) [17:13:33] HaeB: think it's the other way round - indicated by https://wikitech.wikimedia.org/wiki/X-Analytics [17:13:57] sorry this is confusing [17:14:05] the other way round? [17:14:25] oh [17:15:10] yeah, if it's tagged preview, it's not a pageview [17:15:12] I ran, select is_pageview, x_analytics_map["preview"] from webrequest where year=2015 and month=9 and day=30 and x_analytics_map["preview"] is not NULL limit 10; [17:15:15] to check [17:15:18] but no results [17:15:51] Analytics-Backlog, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client - https://phabricator.wikimedia.org/T106257#1693967 (kevinator) [17:16:06] ok, but does that mean they are counted as pageviews now? [17:16:48] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1693975 (kevinator) [17:16:49] Analytics-Backlog, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client - https://phabricator.wikimedia.org/T106257#1463159 (kevinator) [17:16:59] HaeB: No, they shouldn't be [17:17:15] if preview = true, is_pageview should be false [17:17:18] Analytics-Backlog, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client - https://phabricator.wikimedia.org/T106257#1463159 (kevinator) removing this as blocking the Stag project. [17:19:21] madhuvishy: i'm still confused - what does "preview" refer to now if not to the X-Analytics header ? [17:19:32] Analytics-Backlog, Research consulting, Research-and-Data: Workshop to teach analysts, etc about Quarry, Hive, Wikimetrics and EL {flea} - https://phabricator.wikimedia.org/T105544#1446198 (kevinator) we could also make this a WMDS talk. [17:19:53] HaeB: the x-analytics header has multiple things in it, preview is one of the fields [17:20:18] preview indicates if a request this is a preview request [17:20:33] i mean how can it be that preview = true if that information is not being sent yet in the X-analytics header [17:21:42] HaeB: aah [17:22:02] well, if it was sent, we wouldn't count it as a pageview, if preview is true [17:22:03] if i tap a link in the android app right now and get a link preview, does that generate a pageview that's counted in (say) pageview_hourly? [17:22:48] Analytics-Backlog: Backfill data for the API from the historic pageview dumps - https://phabricator.wikimedia.org/T108596#1694014 (kevinator) Open>declined a:kevinator we are not going to do this. Once the API is out if there is a strong use case, we can re-open this task. [17:22:48] I don't know if the apps changed to add that information when they send requests [17:23:11] Analytics-Backlog: Need a Dashiki namespace so we can protect configs {crow} - https://phabricator.wikimedia.org/T112268#1694017 (kevinator) p:Triage>Normal [17:23:57] Analytics-Backlog: Update passport-mediawiki module URLs and documentation - https://phabricator.wikimedia.org/T113234#1694020 (kevinator) p:High>Normal [17:24:44] HaeB: back [17:24:59] HaeB: to your question [17:25:14] Analytics-Backlog: Re-baselining checkpoints periodically - https://phabricator.wikimedia.org/T112009#1694027 (Milimetric) I'm sorry we don't have the context of the meeting mentioned in the description, can we get more detail? [17:25:27] HaeB: see https://wikitech.wikimedia.org/wiki/X-Analytics [17:25:46] HaeB: if X-Analytics header has preview key [17:25:55] HaeB: request is not counted as a pageview [17:26:15] nuria: but do you know if the apps are sending the preview key yet? [17:26:20] HaeB: as of us deploying these changes [17:27:07] madhuvishy: no, i will need to look at the apps code, i imagine that until they release a new version they will not include those changes [17:27:26] madhuvishy: i have not seen a CR to that fact so i do not think they are sending that value yet [17:27:51] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1694037 (kevinator) @EdErhart-WMF we thin you have everything you need so we are closing this task. Reopen or log new task if you need anything else. [17:27:59] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1694039 (kevinator) Open>Resolved a:kevinator [17:28:04] nuria: right, so we don't know if they are sending it - if they were it would work. [17:28:23] i checked by running select is_pageview, x_analytics_map["preview"] from webrequest where year=2015 and month=9 and day=30 and x_analytics_map["preview"] is not NULL limit 10; [17:28:27] for a couple days [17:28:30] and no results [17:28:35] so i'm guessing not [17:28:50] madhuvishy: we can know by looking at x-analytics field [17:28:59] madhuvishy: sorry [17:28:59] nuria: ok, so https://phabricator.wikimedia.org/T110702 is still marked as open, but the change (in the PV definition) that it describes is already live? [17:29:03] madhuvishy: correct [17:29:09] (sorry for being dense ;) [17:29:26] HaeB: yes [17:29:52] got it [17:29:59] HaeB: there are two sides to this change : analytics and apps [17:30:29] HaeB: https://phabricator.wikimedia.org/T109383 [17:30:36] HaeB: analytics is completed, apps is wip thus it makes sense ticket stays open, but either way [17:30:56] Analytics-Kanban: Report on zh wikipedia for Zhou - https://phabricator.wikimedia.org/T114190#1694057 (Milimetric) [17:32:29] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1694071 (Anmolkalia) Hi @jgbarah, I went through the MediaWiki API documentation and understood the code in mediawiki_analysis.py. I was able to understand most of what I read.... [17:35:32] Hi HaeB, I want to apologize for not being available today [17:35:33] :S [17:36:42] Analytics-Backlog: Re-baselining checkpoints periodically - https://phabricator.wikimedia.org/T112009#1694081 (ggellerman) @DarTar @JAufrecht: could you please add some context or details for @Milimetric? Thanks! [17:36:43] joal: i saw you have important duties ;) so no worries [17:37:14] HaeB: :) Will you have tomorrow ? [17:37:30] About same time as now ? [17:39:09] ottomata: madhuvishy want to talk about the avro stuff? [17:40:21] nuria: yes! [17:40:29] joal: yes i could do after 11 PDT (18 UTC) tomorrow [17:40:40] madhuvishy: and ottomata ?? holaaa ottomata ! [17:41:00] HaeB: I'll be there - I send you an invite to be sure I don't forget :) [17:41:01] hiYaaa [17:42:18] nuria, madhuvishy: ok, i just ran the following (= madhu's query from above without is_pageview), with no results: [17:42:24] select x_analytics_map["preview"] from wmf.webrequest where year=2015 and month=9 and day=30 and x_analytics_map["preview"] is not NULL limit 10; [17:42:41] Yup [17:42:44] just to confirm that the app is not sending that yet [17:42:45] HaeB: right, that is what madhu got too [17:42:58] HaeB: I think she just run it. [17:43:11] yeah i also ran for 29th [17:43:15] i understand madhu was looking for preview requests that were classified as pageviews [17:43:35] [10:15] I ran, select is_pageview, x_analytics_map["preview"] from webrequest where year=2015 and month=9 and day=30 and x_analytics_map["preview"] is not NULL limit 10; [10:15] to check [10:15] but no results [17:43:51] no is_pageview would have been false [17:44:02] nuria: ottomata batcave? [17:44:10] yes [17:44:29] HaeB: she just selected teh is_pageview column, see: https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest [17:44:44] Analytics-Backlog: Re-baselining checkpoints periodically - https://phabricator.wikimedia.org/T112009#1694124 (DarTar) The idea @JAufrecht brought up (which I really liked) was to have a SoS for data, i.e. periodic meetings to discuss if baselines are still valid or if any change implemented by individual tea... [17:44:45] HaeB: and you can see what values each column has [17:44:51] *the [17:44:52] yes, i know [17:45:16] i was looking if there are *any* rows with x_analytics_map["preview"] is not NULL [17:45:17] HaeB: ah , ok, did you find the info you needed then? [17:45:18] nuria: in batcave with madhuvishy [17:45:32] isn't that a way to do that? [17:45:51] HaeB: both your query and madhuvishy 's do the same thing, so yes, correct [17:45:56] ottomata: omw [17:46:19] oh ok [17:47:15] (i thought the first was testing whether the preview -> is_pageview false logic works correctly, but of course it's not in the "where" clause) [17:47:39] anyway, so i'll assume that previews are still counted as pageviews right now because the app does not sent the preview header yet [17:48:13] HaeB: correct [18:08:09] Analytics-Backlog: Re-baselining checkpoints periodically - https://phabricator.wikimedia.org/T112009#1694210 (Milimetric) I think the idea sounds great, but it doesn't seem too infrastructure-related. Wouldn't research be more interested in this kind of re-baselining? [18:09:42] Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1694213 (ezachte) Now as PDF with clickable links: Monthly Pageviews Report ver 0.2, Oct 1, 2015 {F2650964} Traffic Breakdown Reports ver 0.3, Oct 1, 2015 {F2650966} [18:09:50] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1694214 (EdErhart-WMF) @kevinator, @ezachte, those numbers should work! Thank you very much, everyone! [18:10:23] Analytics-Kanban: {flea} Self-serve Analysis - https://phabricator.wikimedia.org/T107955#1694217 (RobH) [18:25:37] madhuvishy: nuria, fyi, 1pm meeting is fine, my lunch date cancelled on me [18:27:49] ottomata: k [18:27:59] ebernhardson: yt? [18:28:55] ottomata: take a look at : https://github.com/linkedin/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/LatestSchemaKafkaAvroMessageDecoder.java [18:29:02] ebernhardson: is at discovery team offsite this week [18:29:08] looking [18:29:20] ottomata: looks at schema by topic [18:29:24] ottomata: which makes sense [18:29:40] yes, but still [18:29:53] kafka.message.Message.MagicOffset(), message.getPayload().length - kafka.message.Message.MagicOffset() [18:30:41] also, nuria that class does not implement decode() [18:30:48] that is in parent class KafkaAvroMessageDecoder [18:31:17] ahem [18:31:22] which is what OH [18:31:24] doh [18:31:25] uhhh [18:31:27] yes it does :p [18:31:31] sorry [18:31:37] ahhh [18:31:39] ok [18:32:18] nuria: hmmm [18:32:28] i think you can do like they do, extend KafkaAvroMessageDecoder [18:32:38] but make it look up schema from properties config? [18:32:44] maybe something like [18:32:44] ottomata: i think we can use that minus [18:32:49] ottomata: magic offsets [18:32:58] ottomata: right [18:33:06] schema.class = org.mediawiki.search.CirrusRequestWhatever [18:33:15] and then, get an instance of the class from that [18:33:20] and pull the schema from that [18:33:56] ottomata: yes, but the thing is that this: [18:34:06] decoderFactory.jsonDecoder(schema [18:34:19] which is the biggest deal is done [18:35:24] that was the class i was using for madhuvishy & I tests so we'll continue with our hardcoded schema and once we have that going we will move schema lookups out. [18:35:58] ok, nuria why does it use jsonDecoder? not following this [18:36:08] ya me too [18:37:42] ottomata: because you can consume the avro in json (i think they mean text) format [18:37:53] ottomata, madhuvishy makes sense? [18:38:06] hmmm [18:38:22] yes, but is that what this code expects? [18:38:24] what decoder does that use? [18:38:45] OH [18:38:47] so [18:38:48] weird [18:38:51] KafkaAvroMessageDecoder [18:38:53] uses binaryDecoder [18:38:57] but [18:38:59] LatestSchemaKafkaAvroMessageDecoder [18:39:02] uses jsonDecoder [18:39:08] Yeah [18:39:23] LatestSchema blah overrides the decode method [18:39:47] nuria: i think if you are implementing this, you'll want to use binary...or maybe make 2 classes! one that uses jsonDecoder and one that uses binaryDecoder [18:39:59] each of which infers schema out of config [18:40:01] rather than out of message [18:40:17] that way we could more easily use camus to import Json or Binary avro [18:40:20] hehe :) [18:40:25] yeah [18:41:03] ottomata: right , i think what is confusing is the terminology [18:41:13] yeah poorly named class [18:41:16] both json and binary avro ARE json [18:41:22] btw, nuria, madhuvishy, if it is easier to use gobblin for this data, we can [18:41:23] binary is just teh transport [18:41:28] since it is new [18:41:31] ottomata: mm hmmm [18:41:48] i can try that - are they feeding any data now? [18:41:50] binary requires a different setup of messages [18:41:58] search [18:42:08] madhuvishy: would goblin make things easier in any way? [18:42:12] no, because the topic is not even setup yet [18:42:20] madhuvishy: cause if it doesn't i rather have less moving parts [18:42:30] nuria: not very sure [18:42:34] madhuvishy: no [18:42:56] madhuvishy: then if there is no clear value of using goblin let's not add one more piece to the cluster [18:42:58] nuria: https://github.com/linkedin/gobblin/wiki/Kafka-HDFS-Ingestion [18:43:04] nuria: not true [18:43:04] binary is not json [18:43:12] Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1694343 (Tnegrin) Hi Erik -- I can't seem to bring up the image in the attached task. It says the image isn't viewable. -Toby [18:44:08] ottomata: ahhh, no? ok, i take it back [18:44:11] nuria: https://github.com/linkedin/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageDecoder.java#L120 [18:44:19] return new CamusAvroWrapper(reader.read(null, [18:44:19] decoderFactory.binaryDecoder ... [18:45:26] Also, nuria, note that the LatsetSchema one... does not set timestamp properly [18:45:53] it returns a generic CamusWrapper, constructed using this constructor [18:45:54] public CamusWrapper(R record) { [18:45:54] this(record, System.currentTimeMillis()); [18:45:54] } [18:46:07] ottomata: the latest schema one [18:46:14] refers to decoderFactory [18:46:21] but wth is it really? [18:46:27] its not even instantiated [18:46:29] OH [18:46:46] do i not know java or is it magic [18:47:18] madhuvishy: it is avialable on super class [18:47:25] madhuvishy: ummm, maybe init() is something that is called by the framework [18:47:30] nuria: hmmm [18:47:32] yeah, and init() instantiates it [18:47:43] which, nuria, means to me that you shoudl not extend KafkaAvroMessageDecoder [18:48:50] Oh, nuria check out JSONToAvroMessageDecoder [18:48:52] ottomata: why no? [18:49:04] ottomata: yeah, that looks for a schema id field [18:49:12] ottomata: the init is called by the messagedecoderfactory [18:49:13] yes [18:49:34] ottomata: we can not do that [18:49:45] but we could copy that one and make it get schema class out of config properties [18:49:46] and look for it by name from config [18:49:51] instead of ID [18:50:10] yup, to consume json avro [18:50:36] right well, both, aye, either way, it will be a sublcass of MessageDecoder using techniques from all of these [18:51:43] ottomata: he he so we need two classes to do the right thing [18:52:08] madhuvishy: wait, no, not really [18:52:31] madhuvishy: i think... [18:52:53] madhuvishy: we just need 1 decoder if our goal is to read JSON and output avro [18:52:56] nuria: one to decode json, and other binary, both looking up schema by class [18:53:06] nuria: true [18:53:13] but i thought we were gonna support both [18:53:16] up to you [18:53:21] nuria: at the momemnt, with what erik is going ot produce [18:53:26] and in the search case, they'd be sending binary [18:53:26] your goal is to read binary avro [18:53:28] parse the timestamp [18:53:32] and then output avro binary [18:53:37] Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data - https://phabricator.wikimedia.org/T114379#1694373 (ezachte) Toby, the pdf version? Did you try "Download file"? For earlier png version you should clickable thumbnails. [18:53:43] yup [18:54:25] it could be one class that can handle both, and we just instantiate binary and json decoder based on a config too [18:54:41] madhuvishy, ottomata: nah, better convention than configuration, always [18:54:50] madhuvishy: but let's go 1 by 1 [18:54:55] nuria: okay [18:54:59] madhuvishy: let's work on json to avro case 1st [18:55:07] i can start with the json one so we can do out test [18:55:11] nuria: yup [18:55:37] nuria: ok, but keep in mind that search is producing binary :) [18:55:43] madhuvishy: and once we have that we can move into the binary consumer [18:55:45] aye k [18:55:48] alright [18:55:51] ja probably easier to test [18:55:56] but still, we can ASK them to produce json [18:56:00] i think it will not be hard [18:56:07] for them, i think he already built that support in [18:56:10] ottomata: yup, but shouldn't be hard to suport binary [18:58:08] madhuvishy: ok, let's keep on working on our example from yesterday [18:58:27] nuria: wanna batcave? [18:58:59] madhuvishy: omw [19:13:38] just noticed some inconsistencies in the wmf.mobile_apps_uniques_daily table on hive (see below) - is phabricator the best place to discuss this? [19:13:40] hive (default)> SELECT * FROM wmf.mobile_apps_uniques_daily WHERE platform = 'Android' AND year = 2015 AND month = 8 AND day = 3; [19:13:50] mobile_apps_uniques_daily.year mobile_apps_uniques_daily.month mobile_apps_uniques_daily.day mobile_apps_uniques_daily.platform mobile_apps_uniques_daily.unique_count [19:13:56] 2015 8 3 Android 1025697 [19:14:03] 2015 8 3 Android 1000882 [19:14:19] 2015 8 3 Android 1000865 [19:14:32] Time taken: 0.132 seconds, Fetched: 3 row(s) [19:26:52] Analytics, Mobile-Apps: Investigate and fix inconsistent data in mobile_apps_uniques_daily - https://phabricator.wikimedia.org/T114406#1694528 (Tbayer) NEW [19:31:47] oh [19:32:21] nuria: milimetric: hello, I did some change to the CI configuration of some of your repositories. Jenkins now invokes 'tox' [19:32:29] which runs any env you might have defined in tox.ini [19:32:47] I am looking at having analytics/limn-mobile-data to run the test suite now :-} [19:32:49] hashar: thank you, was it for wikimetrics? or eventlogging? [19:32:57] hashar: limn ... ahhh [19:33:15] hashar: ok, that makes more sense as i think the other ones you handled a while back [19:34:40] all of them I guess [19:36:59] I will probably send a few patchsets [19:40:35] (PS1) Hashar: Fix tox to be able to run tests [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/242966 [19:43:12] (CR) Hashar: "CI currently only run the 'flake8' environment. With https://gerrit.wikimedia.org/r/242977 it will instead run 'tox', and thus run the 'py" [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/242966 (owner: Hashar) [19:43:23] nuria: stuff like above :-} [19:43:45] hashar: nice! thank you [19:48:20] (PS2) Nuria: [WIP] Changes to camus for avro testing in 1002 [analytics/camus] - https://gerrit.wikimedia.org/r/242907 [20:03:12] (PS1) Hashar: Fix up tox setup and setup.py parse_requirements() [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/243012 [20:03:34] nuria: analytics/wikimetrics I talked about it with qchris a while back. I think I even filled a bug. The problem is that it needs a redis backend :D [20:03:52] maybe the test suite / celery could start one [20:47:30] ottomata: for later, if we are gonna write avro compatible json schemas, why do we need json schema? [20:47:43] Analytics-Kanban: {kudu} Wikimetrics for IPL - https://phabricator.wikimedia.org/T114423#1694971 (kevinator) NEW [21:09:35] Analytics, operations: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1695116 (RobH) NEW [21:11:52] Analytics, operations: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1695134 (RobH) a:Ottomata I'm going with the assumption that I should refer all this analytics to @ottomata for his review or recommendation. Andrew: Please review the above. I'm not sure if you guys... [21:16:05] Analytics, operations: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1695162 (Ottomata) Uhhhh, I would say that I don't have much info on who accesses these systems. Many people ask for access, managers grant permission, and then opsen give access as part of triage duty.... [21:18:42] Analytics, operations: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1695191 (RobH) [21:36:32] Analytics-Backlog, Mobile-Apps: Investigate and fix inconsistent data in mobile_apps_uniques_daily - https://phabricator.wikimedia.org/T114406#1695263 (kevinator) [21:36:55] Analytics-Backlog, Mobile-Apps: Investigate and fix inconsistent data in mobile_apps_uniques_daily - https://phabricator.wikimedia.org/T114406#1695266 (kevinator) p:Triage>High [21:37:03] ottomata, yt? [21:41:12] ottomata, I wanted to ask you a couple things about the quarterly review presentation [21:41:19] if you have 10 mins, please ping me :] [21:44:08] mforns: yes [21:44:08] here [21:44:11] batcave? [21:44:21] yes! [21:44:43] omw [21:49:28] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1695323 (jgbarah) [21:56:49] Analytics-Tech-community-metrics, Possible-Tech-Projects: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1695368 (jgbarah) NEW [21:58:31] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1695384 (jgbarah) [22:00:32] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improve performance - https://phabricator.wikimedia.org/T114439#1695392 (jgbarah) NEW [22:01:05] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1695401 (jgbarah) [22:02:39] Analytics-Tech-community-metrics, Possible-Tech-Projects: Implement some missing information from the MediaWiki API - https://phabricator.wikimedia.org/T114440#1695405 (jgbarah) NEW [22:02:51] nuria: I think I know what happened with our example [22:02:57] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1027974 (jgbarah) [22:03:02] (I know you're busy so read later) [22:03:27] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1695424 (jgbarah) >>! In T89135#1689285, @Qgil wrote: > Thank you @jgbarah! Please update the description and propose some microtasks. When this is done, we will promote this p... [22:03:30] nuria: So the dummySchemaRegistry gets built with camus-example and not camus-wmf - we din't replace the example jar [22:03:39] hence the Schema not found [22:04:18] ottomata, btw, I can't find the charts of the EL load test you did yesterday [22:04:26] were they on a Phab task? [22:04:58] ja the parent stag ticekt [22:05:00] ticket [22:05:09] oh! ok, thx [22:05:12] https://phabricator.wikimedia.org/T102225 [22:05:26] :] [22:06:49] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1695434 (jgbarah) >>! In T89135#1694071, @Anmolkalia wrote: > Also, as for organizing the data in a database, I noticed that the code creates a relational database containing t... [22:16:08] ottomata, another question: now with EL through Kafka, the events flow until Kafka [22:16:19] but do they get written in HDFS? [22:16:21] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695470 (Ottomata) NEW [22:16:43] oh! they do, of course [22:16:55] or else, no purging would be needed [22:17:00] never mind [22:17:06] mforns: :) [22:17:16] madhuvishy, :] hehe [22:17:36] the data is in/mnt/hdfs/wmf/data/raw/eventlogging [22:17:38] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695483 (Ottomata) Does the description look ok? Feel free to edit and discuss here. [22:17:51] madhuvishy, aha, cool thanks [22:18:27] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695486 (Ottomata) Nuria pointed out to me that (as engineers love to do) we are focusing a lot here on technical architecture, but haven't thought a lot about what... [22:18:34] mforns: yes [22:18:57] i added minimal documentation about how to use it there [22:18:57] https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Hive [22:19:06] thx! [22:19:34] cool! [22:20:47] ottomata, do you want me to link this docs in the spreadsheet? [22:23:05] sure [22:23:14] it'd be cool to have some spark in there, ellery said he'd fill it out [22:23:17] when is the review? [22:23:28] ottomata, next thursday [22:23:54] k [22:23:59] will link it [22:27:36] Analytics-Backlog: Enable use of Python 3 in Spark {hawk} [8 pts] - https://phabricator.wikimedia.org/T113419#1695514 (Ottomata) So, I talked with Ellery today and learned a little more about this. Python 3 does not currently work with the Spark version we are running in production. This environment variabl... [22:31:51] ottomata, one more thing, do we want to mention Kafka upgrade outage? [22:32:12] I think we should no? [22:34:00] yes [22:34:08] ok [22:34:35] we can mention that it was caused by an upstream bug in a version that we upgraded to so we could support eventlogging on kafka [22:34:46] ottomata, OK [22:34:50] Analytics-Backlog: Enable use of Python 3 in Spark {hawk} [8 pts] - https://phabricator.wikimedia.org/T113419#1695562 (ellery) @Ottomata It looks like the environment variable is also used in Python 2: https://docs.python.org/2/using/cmdline.html#envvar-PYTHONHASHSEED Based on the description, I don't unders... [22:39:42] Analytics-Backlog: Install snzip on stat1002 and stat1003 {hawk} - https://phabricator.wikimedia.org/T112770#1695575 (Halfak) [22:44:23] madhuvishy: back [22:44:30] madhuvishy: replacing both jars [22:44:31] nuria: hey :) [22:44:39] yeah it gets further [22:44:51] madhuvishy: let me look at log [22:45:26] nuria: you can look at the logs for the job i last ran here - /home/madhuvishy/avro-kafka/log_camus_avro_test.txt [22:47:21] madhuvishy: i see, batcave? [22:47:27] nuria: yup [22:59:19] Analytics-Kanban: Bug: puppet not running on wikimetrics1 instance, Vital Signs stale {musk} [5 pts] - https://phabricator.wikimedia.org/T105047#1695678 (mforns) [22:59:20] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Python Aggregator: Solve inconsistencies in data ranges when using --all-projects flag {musk} [5 pts] - https://phabricator.wikimedia.org/T106554#1695677 (mforns) [22:59:22] Analytics-Kanban, Patch-For-Review: Link to new projectcounts data and serve via wikimetrics {Musk} [5 pts] - https://phabricator.wikimedia.org/T104003#1695679 (mforns) [22:59:23] Analytics-Kanban, Analytics-Visualization: {Epic} Community reads pageviews per project in Vital Signs {crow} - https://phabricator.wikimedia.org/T95336#1695676 (mforns) [22:59:25] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Create current-definition/projectcounts {musk} [13 pts] - https://phabricator.wikimedia.org/T101118#1695680 (mforns) [22:59:27] Analytics-Kanban, Analytics-Visualization, Patch-For-Review: Integrate Dygraphs into Vital Signs {musk} [13 pts] - https://phabricator.wikimedia.org/T96339#1695681 (mforns) [23:13:37] Hi! Is this up-to-date? https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/Data_representations [23:14:09] Specifically to check validated events, is it here analytics-store.eqiad.wmnet? [23:15:23] ottomata: milimetric: ^ ? [23:16:11] (I'm trying to ssh in and am getting rejected, could be a ssh config error, or maybe I don't have permission...) [23:16:41] AndyRussG: stat1003.eqiad.wmnet [23:16:50] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695706 (Ottomata) [23:17:14] AndyRussG: And from there, mysql --defaults-extra-file=/etc/mysql/conf.d/research-client.cnf --host analytics-store.eqiad.wmnet [23:17:32] madhuvishy: OK thanks! yeah getting Permission denied, lemme see what's up... [23:18:25] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695708 (GWicke) @ottomata, I would perhaps leave out the section "The MVP might also include". Much of it isn't so minimally viable, IMHO. Re use cases, the follow... [23:18:25] madhuvishy: I think our events don't work in MySQL 'cause of a nested schema, but I'd like to check that they are getting through as JSON and look at some errors I saw mentioned [23:20:45] AndyRussG: Hmmm, you can simply consume from kafka to check [23:21:15] AndyRussG: I think errors will go Eventlogging_EventError topic [23:21:18] madhuvishy: K hmm I think I was supposed to have been granted access there, have never tried it yet :) [23:21:40] AndyRussG: hmmm [23:23:20] AndyRussG: this should work from stat1002/other prod machines (which ones is grey area to me) [23:23:21] kafkacat -o end -t eventlogging_EventError -b kafka1012.eqiad.wmnet:9092 [23:23:39] you can watch the errors flowing in, if you send them [23:23:53] if you want all, use -o beginning [23:25:34] madhuvishy: amazing!! That's working great :) Thanks so much, really appreciate it!!! [23:27:15] AndyRussG: np :) [23:28:38] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695732 (GWicke) [23:29:00] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695470 (GWicke) I have now integrated some of those changes into the description. [23:29:21] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1695740 (bd808) [23:30:57] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1093688 (bd808) >>! In T91701#165205... [23:36:42] Analytics-Backlog: Re-baselining checkpoints periodically - https://phabricator.wikimedia.org/T112009#1695754 (DarTar) Nope, Research doesn't own the metric definitions (which belong to the respective audience teams per [[ https://office.wikimedia.org/wiki/Research/Who_owns_what | Lila ]]) so this is really s... [23:41:53] (PS3) Nuria: [WIP] Changes to camus for avro testing in 1002 [analytics/camus] - https://gerrit.wikimedia.org/r/242907 [23:49:58] AndyRussG: those will soon be in logstash too [23:54:48] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1695829 (Tgr) That would be cool alt...