[00:08:06] Analytics, Project-Creators: Dedicated and/or automated Wikimedia pageviews API project/tag in Phabricator Maniphest - https://phabricator.wikimedia.org/T119151#1819445 (MZMcBride) NEW [00:19:10] Analytics-Kanban, Datasets-General-or-Unknown: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1819480 (Milimetric) @JAllemandou: I fixed the RESTBase part here: https://github.com/wikimedia/restbase/pull/422 so you'll have to make sure the str... [00:19:24] Analytics-Kanban, Datasets-General-or-Unknown: Wikimedia "top" pageviews API has problematic double-encoded JSON - https://phabricator.wikimedia.org/T118931#1819481 (Milimetric) a:Milimetric>JAllemandou [00:21:27] Analytics, Project-Creators: Dedicated and/or automated Wikimedia pageviews API project/tag in Phabricator Maniphest - https://phabricator.wikimedia.org/T119151#1819490 (madhuvishy) I agree that this would help find all the pageview api tasks in one place, but it would be problematic for the Analytics tea... [00:22:20] Analytics-Backlog, Project-Creators: Dedicated and/or automated Wikimedia pageviews API project/tag in Phabricator Maniphest - https://phabricator.wikimedia.org/T119151#1819497 (madhuvishy) [00:37:40] Analytics-Backlog, Project-Creators: Dedicated and/or automated Wikimedia pageviews API project/tag in Phabricator Maniphest - https://phabricator.wikimedia.org/T119151#1819524 (madhuvishy) @ggellerman just mentioned to me that there are Phabricator tags, that behave differently from projects and it won't... [00:53:42] (PS1) Nuria: [WIP] Split schema handler in chain of responsability [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254347 [00:55:27] (PS1) BryanDavis: Create and cleanup temp dirs in TestCamusPartitionChecker.scala [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) [00:55:32] (PS2) Nuria: [WIP] Split schema handler in chain of responsability [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254347 [00:56:28] (CR) BryanDavis: "Feel free to mock my severe lack of scala knowledge" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) (owner: BryanDavis) [01:21:13] Analytics-Backlog, Project-Creators: Dedicated and/or automated Wikimedia pageviews API project/tag in Phabricator Maniphest - https://phabricator.wikimedia.org/T119151#1819667 (Krenair) The admins can make a Herald rule that ensures a project like #Analytics-backlog is added to everything inside this new... [01:35:15] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1819682 (Tbayer) Below is a closer look on how these numbers developed over time, for two of the three tables examined in the task. Seems something happened in June, around the... [01:52:55] Analytics-Backlog, Analytics-Cluster, Improving access, Research-and-Data: Hashed IP addresses in refined webrequest logs - https://phabricator.wikimedia.org/T118595#1819697 (Tbayer) Separately from privacy concerns about such deanonymizing techniques, a heads-up that the hashed IPs in EL might not... [05:11:50] Analytics-Backlog, Analytics-Cluster, Improving access, Research-and-Data: Hashed IP addresses in refined webrequest logs - https://phabricator.wikimedia.org/T118595#1819879 (leila) [06:04:34] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1820035 (ori) Thanks for the detailed investigation, @Tbayer. Querying the WikimediaBlogVisit data was a very clever idea -- I wish I had thought of it :) I think we should ju... [07:38:36] Analytics-Backlog, Analytics-EventLogging, Privacy: Opt-out from logging some of the default EventLogging fields - https://phabricator.wikimedia.org/T108757#1820107 (Tbayer) Actually it looks like many or almost all schemas have already involuntarily opted out of logging valid clientIPs since more than... [07:56:41] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1820111 (Tbayer) >>! In T119144#1820035, @ori wrote: > I think we should just drop this field and the associated code. I cannot recall a single case of it being used for its i... [10:47:04] Analytics-Backlog, Analytics-General-or-Unknown, WMDE-Analytics-Engineering, Wikidata, Story: [Story] Statistics for Special:EntityData usage - https://phabricator.wikimedia.org/T64874#1820289 (Addshore) @Nuria yes I do! I should be able to do this but of course if there is any chance of your te... [11:27:27] (PS11) DCausse: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [11:35:39] (PS12) DCausse: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [11:41:12] (CR) DCausse: "I've renamed 0.avsc to 101446746400.avsc. The pattern is:" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [11:42:03] (PS3) DCausse: [WIP] Split schema handler in chain of responsability [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254347 (owner: Nuria) [13:45:07] (PS4) DCausse: [WIP] Split schema handler in chain of responsability [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254347 (owner: Nuria) [13:47:22] (CR) DCausse: "Nice!" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254347 (owner: Nuria) [14:08:29] (CR) DCausse: [C: 1] "Tested (camus->oozie->select in hive) and it works with our production data and this camus.properties file : I540712d" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254347 (owner: Nuria) [14:10:26] hey dcausse, from your comments on CR, looks like you're to have data accessible from hive :) [14:11:01] joal: hi [14:11:57] well it works so far :) [14:12:04] That's great :) [14:12:11] avro is huge pain [14:12:25] hm, I'm not really surprised :) [14:12:51] Every time I have had to use a serialization framework, it's been complicated (and each time for different reasons :) [14:12:59] :) [14:13:27] also I realized that camus has been built for very specific usecases, most of the decoders have to be rewritten :/ [14:13:38] Right [14:13:58] I guess it would have been better if we had invested time in gobblin before :/ [14:14:07] probably [14:21:40] (PS2) DCausse: Add 2 payloads map fields to CirrusSearchRequestSet avro schema [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) [15:02:36] joal: hey [15:02:54] wondering what you think about the whole encoding titles issue [15:03:14] so do the current jobs fail ever because of control characters? [15:03:17] Hey ! I have read your modification milimetric [15:03:24] milimetric: nope [15:03:30] no failure [15:03:33] but strange stuff [15:03:35] batcave ? [15:03:36] ok, so then maybe we just don't encode at all? [15:04:05] actually, give me 5 mins [15:05:03] back milimetric :) [15:05:21] joal: I can batcave in 10 min. [15:05:24] i'll ping [15:05:31] pwerfect [15:11:26] ok, joal [15:35:15] Analytics-Tech-community-metrics, DevRel-November-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1820615 (Lcanasdiaz) The panel currently says: **1,078** repositories .. the list of repos shows trends for **1019** repositories ``` mysql... [16:09:52] Analytics-Backlog: Test task - https://phabricator.wikimedia.org/T119201#1820670 (mforns) NEW [16:15:28] Analytics-Backlog: Template for Pageview API task - https://phabricator.wikimedia.org/T119203#1820697 (mforns) NEW [16:29:17] dcausse: hello, let me read backlog for updates! [16:30:56] nuria: hi! [16:31:16] nuria: I've tested with your patch on prod data and it works :) [16:31:56] kafka -> camus -> hive -> queries are ok [16:33:37] dcausse:did you tested [16:33:44] dcausse: with search data [16:33:54] dcausse: i was having teh hardest time testing camus yesterday [16:34:00] yes I ran camus on the topic in prod [16:34:04] dcausse: but it was a completely different set of code [16:34:13] dcausse: ok, great [16:34:43] I have logs in ~dcausse/avro_kafka and hive db dcausse.CirrusSearchRequestSet5 [16:34:51] dcausse: then we can adapt it and add it to your patch, we would need test and such. i am happy to work together on it [16:35:11] nuria: you want only one patch? [16:35:44] dcausse: i think so cause it is "one functioning unit" [16:35:45] your patch is pretty isolated I think but I can merge both patch if it's better [16:35:48] dcausse: right? [16:35:51] ok [16:38:42] Analytics-Tech-community-metrics, DevRel-November-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1820786 (Lcanasdiaz) Now I get the problem, the queries above do not include the bot and subject filtering. So, we're missing 59 repositories... [16:38:56] dcausse: if you do merge code we can split test writting that is easy to parallelize [16:39:57] by merge I mean sqash bot patch together [16:40:06] dcausse: right [16:42:19] (PS13) DCausse: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [16:42:27] nuria: done^ [16:42:50] dcausse: ok, will get it and fix tests and such [16:43:30] tests should pass but maybe we need to write a new one [16:44:16] Analytics-Backlog, Analytics-General-or-Unknown, WMDE-Analytics-Engineering, Wikidata, Story: [Story] Statistics for Special:EntityData usage - https://phabricator.wikimedia.org/T64874#1820813 (Nuria) @addshore: It is on our backlog but we have several things before it so we cannot give an ETA.... [16:45:02] (CR) DCausse: [C: -1] "squashed into Ie575f47" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254347 (owner: Nuria) [16:45:57] (PS14) DCausse: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [16:46:10] sorry PS13 was a partial commit :/ [16:50:26] bd808: want to talk about API needs? [16:50:49] nuria: sure! irc chat or hangout? [16:51:09] bd808: hangout better, can you sendme your doc again and give me 5 mins to read it? [16:51:32] https://www.mediawiki.org/wiki/User:BDavis_%28WMF%29/Projects/Action_API_request_analytics [16:51:41] bd808: we have standup in 10 but we can do a hangout right after, will send invite [16:51:48] perfect [16:56:36] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1819140 (Nuria) Adding @ottomata as we were talking about related ip chnages recently. I am in favour of removing the field entirely even with the awesome way we have to rotat... [17:01:06] Analytics-Kanban: Build a public form that can hit the new API {kudu} [8 pts] - https://phabricator.wikimedia.org/T117289#1820885 (madhuvishy) a:madhuvishy [17:03:28] Analytics-Kanban: Research avro schema evolution, do we need a write and reader schema? [3] - https://phabricator.wikimedia.org/T119092#1820888 (Nuria) [17:04:41] Analytics-Kanban, Patch-For-Review: Investigate / Correct timestamp and data not being properly read from Camus for CirrusSearchRequestSet logs. - https://phabricator.wikimedia.org/T117873#1820891 (Nuria) [17:31:52] Analytics-Kanban: Encapsulating the retrieval of schemas from local depot from KafkaSchemaRegistry [3] - https://phabricator.wikimedia.org/T119211#1820955 (Nuria) NEW [17:36:21] (Abandoned) Nuria: [WIP] Split schema handler in chain of responsability [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254347 (owner: Nuria) [17:50:18] (CR) DCausse: Add support for custom timestamp and schema rev id in avro message decoders (4 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [17:54:04] (CR) DCausse: Add support for custom timestamp and schema rev id in avro message decoders (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [17:55:03] (PS15) DCausse: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [17:57:29] (PS16) Nuria: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [17:57:57] (CR) Nuria: "Patch #16 only adds better javadoc. Working on tests next." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [18:33:32] a-team I'm gone for the weekend :) [18:33:39] have a good one :) [18:33:39] joal: talked to bd808 about his nexts steps with API queries, there is some data available in the cluster than he can use to build his user_agent reports so he is going to start looking into oozie and tables he might need to create to hold his data. he'll ping us as needed be [18:33:45] ciao joal [18:33:45] bye joal, have a nice weekend! [18:34:11] sounds good nuria, we can provide help and CR as needed :) [18:35:25] have a nice weekend joal! [18:41:44] madhuvishy: you wanna chat about the JS problem? [18:43:12] milimetric: yeah, so I defined availableTimezones in site.js, and in the reportCreate js I did availableTimezones: ko.observableArray(site.availableTimezones); [18:43:38] k [18:43:47] like [18:43:51] https://www.irccloud.com/pastebin/BUcT4x7l/ [18:44:00] but this throws a JS error [18:44:26] what's the error? [18:44:36] https://www.irccloud.com/pastebin/18W02WzE/ [18:45:10] let's debug together in the cave? [18:45:45] sure [19:05:29] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1821300 (leila) @Nuria, which field are you referring to? clientIP in EL tables? If so, let's chat about it before removing that field since part of the research we are doing r... [19:07:34] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1821330 (Nuria) @leila: Understood but see comments about this being broken on several tables since 20150616. [19:36:14] Analytics-Backlog, operations, HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1821451 (ori) >>! In T119144#1821300, @leila wrote: > @Nuria, which field are you referring to? clientIP in EL tables? If so, let's chat about it before removing that field sin... [19:38:41] (PS1) EBernhardson: Implemnt ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 [19:39:10] (PS2) EBernhardson: Implemnt ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 [19:39:42] (PS3) EBernhardson: Implemnt ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 [19:44:15] Analytics-EventLogging, Patch-For-Review: EventLogging: Add helper for logging link clicks - https://phabricator.wikimedia.org/T54287#1821483 (Krinkle) [19:44:17] Analytics-EventLogging: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#1821484 (Krinkle) [19:45:27] Analytics-EventLogging: Do analysis for SendBeaconReliability experiment [5 pts] - https://phabricator.wikimedia.org/T78110#1821495 (Krinkle) [19:55:17] (PS1) Bearloga: Functions for categorizing queries. (Work In Progress) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) [19:59:27] very interesting a-team : https://dl.dropboxusercontent.com/u/7516705/QConSF-2015-Stream%20Processing%20in%20Uber.pdf [20:07:16] (PS4) EBernhardson: Implement ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 [20:08:26] (PS5) EBernhardson: Implement ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 [20:48:41] hello there [20:48:46] anyone familiar with analytics/refinery/source repo ? [20:49:04] I would like to add a Jenkins job that simply runs 'maven clean package' [20:55:56] something building under https://integration.wikimedia.org/ci/job/maven/1/ :D [21:05:26] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [21:07:31] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 (owner: EBernhardson) [21:07:36] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [21:07:43] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [21:07:47] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254350 (https://phabricator.wikimedia.org/T119101) (owner: BryanDavis) [21:07:57] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) (owner: OliverKeyes) [21:08:28] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254066 (owner: Nuria) [21:08:35] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253793 (owner: BryanDavis) [21:08:44] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) (owner: Joal) [21:11:01] (CR) jenkins-bot: [V: -1] Add refinery-cassandra module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) (owner: Joal) [21:15:25] (CR) jenkins-bot: [V: -1] Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [21:16:12] well [21:16:26] you now have maven run by Jenkins on analytics/refinery/source :-D [21:34:10] (I poked by email people that voted CR+2 on that repo for the last six months) --- good week-end! [22:27:31] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1821953 (ezachte) [22:47:07] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1822013 (ezachte) I migrated daily/monthly aggregates from WC 1 to WC 3. This concludes migration effort for Monthly Page Views s... [23:01:01] (CR) Mforns: Add sum aggregate by user report (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254068 (https://phabricator.wikimedia.org/T117287) (owner: Mforns) [23:02:43] (PS4) Mforns: Add sum aggregate by user report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254068 (https://phabricator.wikimedia.org/T117287)