[13:49:33] (PS1) Mforns: Reduce size of public reports folder [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) [14:25:28] (CR) Mforns: [C: -1] "WIP" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) (owner: Mforns) [15:16:19] Analytics-Cluster, Patch-For-Review: Upgrade Analytics Cluster to CDH 5.4.0 - https://phabricator.wikimedia.org/T97453#1257092 (Ottomata) I am starting this now. Here is the general plan: ``` # CDH 5.3.1 to CDH 5.4.0 upgrade ## Shutdown and backup http://www.cloudera.com/content/cloudera/en/documen... [15:49:59] Analytics-EventLogging, operations, Patch-For-Review: Reclaim vanadium, move to spares - https://phabricator.wikimedia.org/T95566#1257241 (Cmjohnson) Open>Resolved Disk has been wiped and server added to spares list. [15:50:10] madhuvishy: heya [15:50:24] Sorry, had to ask andrew :) [15:50:55] Wish to get to batcave and talk a bit ? [16:24:20] Analytics-Tech-community-metrics: https://www.openhub.net/p/mediawiki stats out of date - https://phabricator.wikimedia.org/T96819#1257349 (Kghbln) They seem to have pretty serious issues over there at OpenHub. According to Peter's blog article of April 16 things should get fluffy soon. I personally requeste... [16:44:58] Analytics-Kanban: Kafka with current python code (EL re-architecture spike) - https://phabricator.wikimedia.org/T98022#1257468 (ggellerman) NEW [16:48:12] Analytics-Kanban: Kafka + Spark Streaming (EL re-architecture spike) - https://phabricator.wikimedia.org/T98023#1257490 (ggellerman) NEW [16:48:46] Analytics-Kanban: Kafka + Spark Streaming (EL re-architecture spike) - https://phabricator.wikimedia.org/T98023#1257505 (ggellerman) [17:00:30] Analytics-Kanban: Kafka + Spark Streaming (EL re-architecture spike) - https://phabricator.wikimedia.org/T98023#1257577 (ggellerman) a:Milimetric [17:03:16] Analytics-Kanban: Kafka with current python code (EL re-architecture spike) - https://phabricator.wikimedia.org/T98022#1257581 (ggellerman) a:JAllemandou [17:03:37] Analytics-Kanban: Kafka + Spark Streaming (EL re-architecture spike) - https://phabricator.wikimedia.org/T98023#1257584 (ggellerman) [17:07:35] Analytics-Cluster, Analytics-Kanban: Productionize reporting of app session metrics {hawk} - https://phabricator.wikimedia.org/T97876#1257601 (kevinator) [17:10:58] Analytics-Cluster, Analytics-Kanban: Productionize reporting of app session metrics [8pts] {hawk} - https://phabricator.wikimedia.org/T97876#1257609 (kevinator) [17:15:52] Analytics, Mobile-Web: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1257626 (JKatzWMF) @phuedx--this didn't make it into this sprint because it was not yet scoped..and then slipped through the cracks! I met with Leila about this.... [17:17:29] Analytics, Mobile-Web: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1257629 (JKatzWMF) @phuedx and @klans_WMF do you think we could push this into current sprint as was originally intended? It feels like we are going to be able t... [17:25:24] Analytics-EventLogging, Analytics-Kanban: Analytics Eng monitors consumed EL events in Graphite (valid & write rate) {oryx} - https://phabricator.wikimedia.org/T97295#1257670 (kevinator) Reporter on halfnium doesn't have access to how many events were inserted. 4 hour spike needed to try and report an "ev... [17:27:03] Analytics-Cluster, Analytics-Kanban: analytics/refinery/source unbuildable since commit that switched to CDH 5.3.1 packages - https://phabricator.wikimedia.org/T97278#1257694 (kevinator) p:Triage>Normal [17:27:34] Analytics, Mobile-Web: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1257699 (JKatzWMF) @phuedx for "acceptable" above, I think we would want to have the following actions: land_on_page (if this is hard, I coooould do it from webl... [17:28:20] Analytics-Cluster, Analytics-Kanban: analytics/refinery/source unbuildable since commit that switched to CDH 5.3.1 packages - https://phabricator.wikimedia.org/T97278#1257702 (kevinator) a:JAllemandou [17:30:44] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1257723 (kevinator) [17:39:08] Analytics-Kanban, Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1257804 (kevinator) a:Milimetric [17:40:35] Analytics-Cluster, Analytics-Kanban: Upgrade unique mobile jobs to harvest unique id from x-analytics field - https://phabricator.wikimedia.org/T95606#1257816 (kevinator) [17:45:28] Analytics-Cluster, Analytics-Kanban, Easy: Add better detection of wikipediaApp to user agent UDF {hawk} - https://phabricator.wikimedia.org/T96376#1257842 (kevinator) p:Triage>Low [17:47:26] Analytics-Cluster, Analytics-Kanban: Add automata value in agent_type field of the refined table {hawk} - https://phabricator.wikimedia.org/T95693#1257847 (kevinator) p:Triage>Low [17:53:28] Analytics-Cluster, Analytics-Kanban: Add automata value in agent_type field of the refined table {hawk} - https://phabricator.wikimedia.org/T95693#1257889 (kevinator) Open>declined a:kevinator This is a difficult task: identifying automata. And, it's low priority. declining for now. We may revi... [17:57:07] Analytics-Cluster, Analytics-Kanban: Report pageviews to the annual report - https://phabricator.wikimedia.org/T95573#1257897 (kevinator) a:Milimetric [18:05:28] ottomata: Heya ! [18:05:30] you there ? [18:08:20] kevinator: btw, since we added piwik to wikimetrics, it's been getting ~ 7 visits per day [18:09:18] btw also, I might have bad internet for the rest of the day 'cause I think I've overstayed my welcome at the McDonald's [18:20:26] joal: hiya yes [18:20:27] sorry [18:20:29] np [18:20:33] trying to fix hopefully one last thing with cluste rupgrade [18:20:36] snappy native lib is werid now [18:20:39] (PS2) Mforns: Reduce size of public reports folder [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) [18:20:42] arf :( [18:20:50] Maybe why I get issues :) [18:21:00] what happens ? [18:21:00] (CR) Mforns: [C: -1] "Still WIP" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) (owner: Mforns) [18:21:46] java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z [18:21:53] you shouldn't be using the cluster yet anyway :p [18:21:54] jk [18:21:54] its fine [18:22:05] Soooorry ;) [18:22:20] naw its fine now, just this one thing [18:22:26] which is why i haven't emailed yet [18:22:42] And also probably why I can't read data ;) [18:25:04] Analytics-Tech-community-metrics, Engineering-Community, ECT-April-2015, Google-Summer-of-Code-2015, Outreachy-Round-10: Community Bonding evaluation for "Allowing contributors to update their own details in tech metrics" - https://phabricator.wikimedia.org/T98045#1258103 (Dicortazar) [18:31:10] ottomata: got really weird error here with spark [18:31:20] hive seems to be working fime though :) [18:31:39] hm, i'm having troulbe launching some jobs from oozie, that's where i'm seeing the error [18:31:44] but, camus is running fine too [18:31:59] which means it is kinda ok [18:32:03] camus has no snappy, maybe that's why [18:32:06] because it is writing the snappy compressed data [18:32:20] camus outputs snaapy ? [18:32:26] camus outputs snaapy ? [18:32:27] ja it does snappy [18:32:29] ottomata: --^ [18:32:33] hmm [18:33:00] weird [18:33:10] hive seems happy as well [18:33:16] But spark is not :( [18:33:26] hm ,maybe the error i'm seeing is oozie [18:33:26] hm [18:34:49] i'm only really seeing them in the oozie launcher map tasks [18:34:49] hm [18:35:11] and jobs launch correctly ? [18:35:16] That's weirdo as well [18:35:40] Going for diner [18:35:43] Will be back [18:36:21] ottomata: yt? [18:37:11] (CR) Catrope: [V: 2] Hide partial week data from charts [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/207534 (https://phabricator.wikimedia.org/T95249) (owner: Mooeypoo) [18:37:29] kevinator: yes [18:37:39] joal: how did you get hive to work for you? i'm seeing somehting else weird now too [18:37:55] can we talk quickly in batcave about budget? [18:38:18] hm, can it wait? cluster is still not working [18:38:46] yes, I'll send an email with like to the page [18:39:02] I'm about to commute to office very shortly. [18:39:20] k [19:00:05] joal: lemme know when you are back, would love a brain bounce on this [19:13:17] ottomata: here ! [19:13:57] k going to batcave [19:14:02] yup [19:31:47] Analytics-Kanban: Stand up piwik in a permanent and privacy-sensitive way - https://phabricator.wikimedia.org/T98058#1258353 (Milimetric) NEW a:kevinator [19:42:23] good nite all, see you tomorrow :] [19:52:02] Analytics-Kanban: Stand up piwik in a permanent and privacy-sensitive way - https://phabricator.wikimedia.org/T98058#1258490 (Milimetric) a:kevinator>yuvipanda [19:53:07] Analytics-Kanban: Stand up piwik in a permanent and privacy-sensitive way - https://phabricator.wikimedia.org/T98058#1258496 (yuvipanda) Note that any non-labs usage should be in prod, but I think we should make this puppetized properly so we can move it to prod if necessary. [20:15:07] Analytics-Kanban: Stand up piwik in a permanent and privacy-sensitive way - https://phabricator.wikimedia.org/T98058#1258591 (yuvipanda) https://piwik.org/docs/privacy/ is nice :) So, privacy steps: # Auto archiving / purging of any data over 3 months # IP info never reaches piwiki (we'll filter it out at t... [21:00:31] (Abandoned) Madhuvishy: [WIP] Uniques report based on Last Access cookie [analytics/refinery] - https://gerrit.wikimedia.org/r/208813 (https://phabricator.wikimedia.org/T92977) (owner: Madhuvishy) [21:06:32] Analytics-Kanban: Stand up piwik in a permanent and privacy-sensitive way - https://phabricator.wikimedia.org/T98058#1258820 (Milimetric) I agree, but I think if we do all that we can give anonymous users view-only rights. It seems PII wouldn't be an issue. And I think anon access would greatly increase the... [21:07:29] Analytics-Kanban: Stand up piwik in a permanent and privacy-sensitive way - https://phabricator.wikimedia.org/T98058#1258823 (yuvipanda) Hmm, we can figure anon access later, I guess. Shell access should be super restricted, though :) [21:24:58] (PS1) Ottomata: Add hive-hcatalog jar to oozie hive queries that use the raw webrequest table [analytics/refinery] - https://gerrit.wikimedia.org/r/208815 [21:25:16] (CR) Ottomata: [C: 2 V: 2] Add hive-hcatalog jar to oozie hive queries that use the raw webrequest table [analytics/refinery] - https://gerrit.wikimedia.org/r/208815 (owner: Ottomata) [21:37:40] kevinator: are you going to fill out the master project list spreadsheet for the projects you're listed as owner? [21:37:45] or did you want me to estimate? [21:38:16] im in a meeting. give me 20 mins [21:56:10] kevinator: just shoot me an email or text, I'm about to lose this internet [21:56:19] nite everyone [21:56:22] can I call you? [21:56:26] kevinator: sure [21:57:44] (PS1) Madhuvishy: Fix delete cohort member functionality [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208824 [22:01:18] (Abandoned) Madhuvishy: Fix delete cohort member functionality [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208824 (owner: Madhuvishy) [22:01:37] (PS4) Madhuvishy: WIP: Fix validate again functionality on cohort display page [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/206346 (https://phabricator.wikimedia.org/T78339) [22:29:23] milimetric: around? [22:33:01] Hey madhuvishy what's up [22:33:37] So I think I fixed the delete cohort member failing part - and started to look at the tests [22:33:57] milimetric: I am not sure I fully understand why some of the tests fail [22:34:14] milimetric: already existing ones that is [22:36:47] madhuvishy: sadly my internet is not good enough for the batcave. It's possible some tests are bad or outdated after your change, but there are also some tricky tests in there, let's look at an example [22:37:19] milimetric: that's okay.. [22:37:24] https://www.irccloud.com/pastebin/Z0a7MqiV [22:38:32] Analytics-Cluster, Analytics-Kanban: Add automata value in agent_type field of the refined table {hawk} - https://phabricator.wikimedia.org/T95693#1259156 (DarTar) Captured in our incoming request backlog: https://trello.com/c/7Qx6d1Ua/272-identifying-automata [22:39:06] milimetric: I have to go to this office massage thing in 5 minutes - will you be around for a bit? [22:39:50] madhuvishy: on and off, still super busy here helping my brother move. I'll look at this test and see if i can help [22:40:15] milimetric: okay no problem. we can talk whenever you're free tomorrow/after. cya [22:44:08] Analytics-Cluster, Analytics-Kanban: Report pageviews to the annual report - https://phabricator.wikimedia.org/T95573#1259178 (kevinator) @milimetric @jallemandou : is it possible to add static domains (like this one or transparency report) to the existing pageview reporting per project? [23:04:18] madhuvishy: looks like get_membership only returns one record because raw_id_or_name is None in all the records pulled from wiki_user [23:04:23] so that sounds like the test setup is bad [23:04:51] milimetric: aah - yes i thought that might be - I couldn't figure out where it was getting set up though [23:04:57] I'll see if I can paste you a fix for that in a bit, but also, consider code like this that might be easier to think about [23:05:10] I was a little confused at first looking at get_membership [23:05:13] https://www.irccloud.com/pastebin/u27MvcqV [23:06:01] milimetric: ummm where is this code? [23:06:19] milimetric: oh get_membership [23:06:44] madhuvishy: one sec i'm trying to remember where the setup code is [23:08:16] milimetric: okay [23:09:28] madhuvishy: ok, found the silly thin [23:09:32] https://www.irccloud.com/pastebin/bDVGI2gn [23:09:37] in fixtures on 231 [23:09:51] that change should be all you need for this kind of problem [23:10:05] almost all setup code routes through this big helper function [23:10:20] it's got a kitchen sink of parameters and this function does the actual insertion [23:10:30] the other helpers just call into here [23:10:59] aah [23:11:57] milimetric: okay. let me run the tests again after this change. thanks! [23:12:06] np, should be cleaner and the rest should be similar [23:12:31] i'm gonna sign off now (I'm on my family plan's hotspot and these guys are gonna yell at me :)) [23:13:05] milimetric: okay :) [23:19:04] (PS1) Mooeypoo: Remove extra parenthesis from query [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/208841 [23:30:34] (CR) Catrope: [C: 2 V: 2] Remove extra parenthesis from query [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/208841 (owner: Mooeypoo) [23:43:07] Analytics-Kanban, Analytics-Wikimetrics: confirm vagrant setup works for wikimetrics - https://phabricator.wikimedia.org/T95690#1259416 (madhuvishy) I tested this when I joined and this is fine now. There was a problem due to file ownership clashes with the host NFS (ticket here - https://phabricator.wiki... [23:43:40] Analytics-Kanban, Analytics-Wikimetrics: confirm vagrant setup works for wikimetrics - https://phabricator.wikimedia.org/T95690#1259417 (madhuvishy)