[00:39:22] Analytics-EventLogging, operations, Patch-For-Review: Create a package for python-pykafka for ubuntu precise and debian sid - https://phabricator.wikimedia.org/T109567#1683710 (Ottomata) FYI, python-pykafka doesn't work for precise because The following packages have unmet dependencies: python-pykafk... [02:22:21] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1683833 (NiharikaKohli) @Dicortazar, that link is not public. :) [05:50:42] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1683932 (awight) NEW a:Ottomata [06:22:59] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1683959 (Anmolkalia) Hi, I would like to know more about this project. I will go through the links in the description to gain a rough background of what needs to be done. I th... [09:05:41] Analytics-Tech-community-metrics, DevRel-October-2015, DevRel-September-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1684169 (Dicortazar) As similarly done for gerrit, I've added the current list of Git repositories analyzed so far in Korma [1].... [09:14:39] (PS2) Addshore: adds aggregate data URI sources [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/241697 (owner: Christopher Johnson (WMDE)) [09:14:47] (CR) Addshore: [C: 1] adds aggregate data URI sources [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/241697 (owner: Christopher Johnson (WMDE)) [09:40:49] (CR) Addshore: Add social stats tracking script (3 comments) [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/240710 (owner: Addshore) [09:46:34] (PS1) Addshore: Move scripts to a src dir [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242095 [09:47:01] Analytics-Backlog, Database: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1684289 (jcrespo) [09:47:21] Analytics-Backlog, Database: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1684290 (jcrespo) p:Triage>Normal [09:47:36] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1684292 (Qgil) @acs, @dicortazar, @jgbarah, do you still want to propose this project? Are you able to mentor it? [09:50:56] (PS1) Addshore: Classify wikidata_social.php [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242097 [09:56:12] (PS1) Addshore: Move create SQL from comments to own files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242100 [10:04:36] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1684331 (Aklapper) >>! In T60585#1683422, @Dicortazar wrote: > I'm having some discussion at https://phabricator.wikimedia.... [10:12:44] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1684366 (Aklapper) >>! In T89135#1683959, @Anmolkalia wrote: > Hi, I would like to know more about this project. Hi @Anmolkalia, thanks for your interest in contributing! If y... [10:22:46] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1684438 (Anmolkalia) Hi @Aklapper. That sounds very encouraging :) I recently made a small contribution to the android Wikipedia App, so I do have some idea of how to start wit... [10:49:16] (PS2) Joal: [WIP] Add camus helper functions and job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) [10:51:53] !log cluster back to normql state. Some errors are still not explained, need to be carefull. [10:57:25] Analytics-Backlog, Database: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1684502 (jcrespo) @mforns These are the tables I have found that match the conditions on the spreedsheet/phabricator comment: **db1046:** ``` MariaDB EVENTLOGGING m4 localhost log > use log; Database c... [11:16:15] (PS4) Joal: Add oozie email sending subworkflow wrapper [analytics/refinery] - https://gerrit.wikimedia.org/r/240094 (https://phabricator.wikimedia.org/T113253) [11:16:34] Analytics-Backlog: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1684532 (jcrespo) It would really help me if, in addition to the Google Spreadsheet (probably can be done from it), we had a list of table names and the name of the column to count in pure text forma... [12:52:22] Analytics-Backlog: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1684696 (jcrespo) p:Triage>Normal [12:53:24] Analytics-Backlog, Database: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1532299 (jcrespo) [12:54:07] Analytics-Tech-community-metrics: Handling multiple affiliations in tech community metrics - https://phabricator.wikimedia.org/T95238#1684705 (Aklapper) Examples I found after looking at user affilations in korma, to fix once this task has been resolved: SG/shahyar, werdna, Erik Moeller aren't with WMF anymo... [12:57:47] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1684716 (Aklapper) a:Aklapper>Dicortazar Went through http://korma.wmflabs.org/browser/scr-contributors.html by screenscraping and s... [13:11:35] (PS5) Joal: Add email sending on error in webrequest-load [analytics/refinery] - https://gerrit.wikimedia.org/r/240095 (https://phabricator.wikimedia.org/T113253) [13:15:15] Analytics-Cluster, operations, Patch-For-Review: php5-curl for stat1002 - https://phabricator.wikimedia.org/T113602#1684771 (Addshore) Open>Resolved [13:36:35] (PS4) Joal: Add pageview quality check to pageview_hourly [analytics/refinery] - https://gerrit.wikimedia.org/r/240099 (https://phabricator.wikimedia.org/T109739) [13:56:32] Analytics-Cluster, Analytics-Kanban: camus offset fails/continues Load job {hawk} [13 pts] - https://phabricator.wikimedia.org/T113252#1684899 (JAllemandou) a:JAllemandou [14:01:37] (PS5) Joal: Add pageview quality check to pageview_hourly [analytics/refinery] - https://gerrit.wikimedia.org/r/240099 (https://phabricator.wikimedia.org/T109739) [14:03:38] Analytics-Kanban, Patch-For-Review: Add a 'Guard' job for pageviews {hawk} [13 pts] - https://phabricator.wikimedia.org/T109739#1684918 (JAllemandou) [14:03:39] Analytics-Kanban: Create white list for pageview data {hawk} [8 pts] - https://phabricator.wikimedia.org/T110061#1684919 (JAllemandou) [15:03:55] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1685110 (Nuria) Please note that varnish logging limits have been increased and that the l... [15:05:23] hey halfak: anythin you need me for with Altiscale? [15:06:05] joal: will test your code today, i did not get to that yesterday as i got derailed by learning about camus and such [15:06:07] Nope. :) [15:06:14] nuria: cave ? [15:06:21] halfak: ok cool [15:06:26] joal, BTW, will be starting up a new run with the JSON extractor and updated diff strategy today. [15:06:30] joal: ok, in coffee shop [15:06:43] halfak: research task looks good [15:06:49] nuria: I tested everything this afternoon, so no need for you to do it [15:06:50] halfak: just took another look [15:07:00] \o/ [15:07:36] halfak: so json extraction + sort, then diff extraction only ? [15:07:44] Yeah. [15:07:52] great :) [15:08:04] I'm working on moving everything to the mapper and reducing the old complication now. [15:08:14] joal: sorry i did not get to that yesterday but as andrew was on pst time i thought i would use time to learn about camus and take a look at how to do avro [15:08:22] joal, any pro-tips on making sure that my mappers write directly to output files? [15:08:36] If I just don't set a reducer at all, that will be the case, right? [15:08:39] halfak: when (mybe one day ?) I have time, I'll try the diff in spark or hadoop, using mapper only strategy, see if java can be better than python ;) [15:08:53] nuria: no bother :) [15:09:09] I'm sure it can, joal, but "better" is not a simple matter of working in this context :) [15:09:36] halfak: Right, correct: ) [15:09:42] joal: did you had to do changes in your patchsets? I tested e-mails in oozie a while back and they worked fine so i expect not [15:09:59] nuria: I had to make a few changes, mostly typos [15:10:05] joal, so, about those mappers. [15:10:25] If I don't set a reducer, will they just write directly to output file or will hadoop assume a single reducer? [15:10:49] So nuria, both tasks on email sending can be merged, the one on whitelist is blocked by actually have the whitelist :) [15:11:19] halfak: good question [15:11:25] halfak: never had to do that [15:11:30] joal: ok, i just saw your corrections, merging now [15:11:35] I always used dummy mappers, not reducers [15:11:39] Let me have a look [15:11:43] kk thanks :) [15:11:47] awesome, thanks [15:11:59] (CR) Nuria: [C: 2 V: 2] "@joal has tested workflow on cluster" [analytics/refinery] - https://gerrit.wikimedia.org/r/240094 (https://phabricator.wikimedia.org/T113253) (owner: Joal) [15:12:04] nuria: Don't merge the whitelist, though :) [15:13:13] (CR) Nuria: [C: 2 V: 2] Add email sending on error in webrequest-load [analytics/refinery] - https://gerrit.wikimedia.org/r/240095 (https://phabricator.wikimedia.org/T113253) (owner: Joal) [15:14:00] halfak: http://stackoverflow.com/questions/9394409/how-to-write-map-only-hadoop-jobs [15:14:18] Looking at that, I assume you can have rmap only phase [15:14:38] Now the thing to make sure is that the compression is correctly set at map stage [15:14:40] OK. I'll give that a try. [15:14:45] Yeah [15:14:51] Cause usually compression is snappy in [15:15:04] between map and reduce, even if gz or bz after reduce [15:15:07] halfak: --^ [15:15:23] * halfak curses at snappy [15:15:39] If it writes snappy, I'm gonna snap. ;) [15:15:45] :D [15:16:17] This might address our memory issues too. [15:16:33] because of no reducer ? [15:16:35] might [15:16:52] Yeah. At least for the fact that we don't have a second set of processes. [15:17:05] halfak: mapred.map.output.compression.codec [15:17:08] If the mappers cut down their memory usage to that of my streamers, then we could probably boost parallelism 5-6 times [15:17:26] halfak: that could work faster then :) [15:18:12] halfak: But be carreful, java, heap size, garbage collection, blah, blah [15:18:19] :D [15:18:31] thx for merging nuria ! [15:18:39] I'll deploy that tomorrow [15:19:47] joal, +1 [15:34:07] Analytics-Kanban, Patch-For-Review: Support moving and adding new columns in reportupdater {lion} [5 pts] - https://phabricator.wikimedia.org/T113600#1685276 (kevinator) Open>Resolved [15:35:04] Analytics-Kanban: ---- DISCUSSED BELOW ---- - https://phabricator.wikimedia.org/T114124#1685280 (Milimetric) NEW [15:39:06] Analytics-Kanban, Wikimedia-Logstash, Patch-For-Review: Make Logstash consume from Kafka:eventlogging_EventError - https://phabricator.wikimedia.org/T113627#1685300 (bd808) [15:51:10] joal, how about making sure that the input files don't get split up. Any idea how I can ensure that? [15:51:31] halfak: give me a minute, in standup [15:52:15] kk [15:52:22] mforns: lets chat about this patch after standup [15:52:29] sure! [15:52:32] ottomata, ^ [16:09:08] Analytics-EventLogging, Multimedia, UploadWizard: Half the time, 100% of UploadWizardExceptionFlowEvent events are dropped - https://phabricator.wikimedia.org/T113366#1685426 (Jdforrester-WMF) [16:09:32] Analytics-EventLogging, Multimedia, UploadWizard: Half the time, 100% of UploadWizardExceptionFlowEvent events are dropped - https://phabricator.wikimedia.org/T113366#1685433 (Jdforrester-WMF) p:Triage>Normal [16:18:38] halfak: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop [16:19:07] halfak: query is the opposite of what you want, but should work if set to high value ! [16:24:25] Hey madhuvishy [16:24:39] just read your slides: looking good ! [16:24:42] hey joal :) [16:24:45] thanks! [16:25:36] just a comment: agent_type = 'user' would probably be a good addition :) [16:25:39] madhuvishy: --^ [16:25:54] joal: true [16:25:58] i'll mention that [16:26:05] For the rest it's great :) [16:26:18] thanks much [16:26:35] You are welcome ! [16:26:56] milimetric: can I offer help for the discussions on restbase? [16:27:36] halfak: have you seen my previous link / comment ? [16:27:52] joal: you can follow along in -services, but I think we're good maybe [16:31:39] milimetric: you good, man ! [16:49:26] madhuvishy: agree with joal slides are good, one thing to have in mind is that if this is going to be in avideo feed the bigger the font the better [16:49:40] that way slides are readable with low video quality [16:53:14] ottomata: have a minute for me ? [16:54:06] joal: sure [16:54:13] actually, gimme 7 mins [16:54:18] np :) [16:54:29] ping me when ready ottomata [16:58:47] ottomata: actually ned to feed my son, will let you know when b [16:58:48] ack [17:07:43] nuria: thanks, okay will keep that in mind [17:07:47] k i'm back [17:07:48] sorry [17:07:53] can chat joal whenev [17:17:38] Analytics-Tech-community-metrics, DevRel-October-2015, DevRel-September-2015: Present most basic community metrics from T94578 on one page - https://phabricator.wikimedia.org/T100978#1685766 (Aklapper) >>! In T100978#1567070, @Aklapper wrote: > I'd like to give this a shot myself in September if time a... [17:26:41] (PS1) Addshore: Split metrics up [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242199 [17:31:31] (PS2) Addshore: Split metrics up [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/242199 [17:42:02] Analytics-Cluster, Analytics-Kanban: {musk} Pageviews in Vital Signs - https://phabricator.wikimedia.org/T101120#1685917 (kevinator) [17:42:03] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Python Aggregator: Solve inconsistencies in data ranges when using --all-projects flag {musk} [5 pts] - https://phabricator.wikimedia.org/T106554#1685915 (kevinator) Open>Resolved [17:43:58] Analytics-Cluster, Analytics-Kanban: Investigate using camus offset files to start hive load job {hawk} [5 pts] - https://phabricator.wikimedia.org/T113251#1685927 (kevinator) Open>Resolved [17:43:59] Analytics-Backlog, Analytics-Cluster: {epic} Implement better Webrequest load monitoring {hawk} - https://phabricator.wikimedia.org/T109192#1685928 (kevinator) [17:45:14] Analytics-Backlog: English Wikipedia stats for 5 millionth article - https://phabricator.wikimedia.org/T113683#1685936 (Neil_P._Quinn_WMF) [17:53:05] ottomata,joal: question if you are there [17:53:17] once i launch the camu job [17:53:27] what is the best way to follow its progress? [17:53:42] it will log to the log i pass it? [17:53:58] or is there other place to look at ? [17:57:56] Analytics-Kanban, Privacy: Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest {slug} - https://phabricator.wikimedia.org/T108843#1686089 (kevinator) Open>Resolved Documentation on findings from this work are... [18:02:11] nuria: best place I think is to follow the log [18:02:18] joal:k [18:02:46] It doesn't give a lot of details after config, only map-red percent progress, but at least it is trackeable [18:02:52] nuria: --^ [18:03:03] ottomata: I am back :) [18:03:14] joal; ok let's see how does that work [18:03:17] I have a question about the job that will read camus files [18:04:59] joal: for .. ahem... me? [18:05:16] no nuria sorry, for ottomata :) [18:05:29] joal: good, cause ahem... i do not have many answers [18:05:34] But if you want nuria please give me your opinion :) [18:05:38] huhuhu [18:06:48] joal: am in madhu's hive class [18:06:50] can chat in irc [18:07:06] I wonder about the behavior of the check: I think that if a topic doesn't progress from one job to another, or is missing, it should fail with an exception [18:07:27] But camus remembers previsouly used topics, even if they are not used anymore [18:08:02] For instance, we decommissioned the bits topic (nothing new in it), and it's not part of topics camus should import [18:08:18] however since there is an historical value for offset, it keeps it [18:09:01] And in my definition of correctness, would raise an error: this topic is not present in current run, only in previous offsets [18:09:17] nuria, ottomata : are you following so far? [18:09:39] joaL ; yes [18:10:02] but i do not get how camus remembers teh topic [18:10:18] it does so because we pass it on the properties file that starts the job right? [18:10:42] nope, it does so because at some point in time, there was a value for it [18:10:48] so it now keeps it [18:11:09] reading [18:11:13] webrequest_bits is not part of the camus config file anymore [18:11:39] joal, hm. [18:11:45] hm. [18:11:48] joal: ?? [18:11:56] but still present in the "offset-previous" file for any recent run --> camus keeps the old offset value in case it would have to restart it I guess [18:12:01] possible to read the camus config file and only count thos? hmmm, no. because it can be a regex. [18:12:23] ottomata: my idea would be to configure the checker with a regexp of topic to consider [18:12:27] yeah [18:12:28] joal: ahhhh it keeps logs for every run [18:12:35] joal: i think that would make sense [18:12:40] or even a whitelist or blacklist [18:12:44] just like camus properties work [18:12:48] nuria: not logs, actually the core map-red files used by camus [18:12:55] i mean, i guess, you could make it read a camus.propeties file [18:13:00] because that setting is already in there [18:13:09] but, mabye that would be more annoying code [18:13:12] ottomata: yeah, it's a good way to make it [18:13:24] whitelist, blacklist are alredy part of properties file [18:13:24] yea, hm. that would be good [18:13:28] Not that difficult [18:13:28] all the settings you need are there [18:13:31] including locations of the offset files [18:13:33] right [18:13:34] so that is probably good [18:13:43] yes, it is the best idea [18:13:53] And reuse the parameters that could exist [18:14:06] Ok I'll go in that direction :) [18:14:14] Thanks for the brain-bounce :) [18:15:08] nuria: a bit more about camus inner stuff: camus uses "offset sequence files" as input and output of mapers [18:15:15] (no reducers in camus) [18:15:33] joal:k [18:15:40] the actual interesting files (the imported ones), are side effects of the mappers [18:37:07] Analytics-Kanban: {flea} Self-serve Analysis - https://phabricator.wikimedia.org/T107955#1686295 (RobH) [18:37:33] Analytics-Tech-community-metrics, MediaWiki-Extension-Requests, Possible-Tech-Projects: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1686297 (egalvezwmf) I think education might be a specific case. This idea is more broad. It would include WLM, Editathons, Ed program, GL... [18:48:18] Hi nuria: are you around 3pm today? I'd like to get some 1:1 time [18:49:34] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1686358 (Tbayer) >>! In T108925#1660700, @JAllemandou wrote: > @JKatzWMF, @Tbayer : Please let me know if this explains enough :) > Thanks Joseph! I spent some time lo... [18:59:43] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1686425 (Tbayer) PPS: For the record, I noticed that these total numbers on Hive have been very slightly changing retroactively in any case, e.g. decreasing by less than... [18:59:55] FYI, I am deploying eventlogging with etcd to beta, it is down atm. [19:07:35] kevinator: Better tomorrow, if possible as for me tuesdays are kind of short, or otherwise i can be back after 5 (which is totally fine) [19:22:58] (PS3) Christopher Johnson (WMDE): adds aggregate data URI sources sets up prelim charts for latest stats adds developer tab and data table for getClaims usage adds edits/day, new users/day, new pages/day for past 5 days [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/241697 [19:23:59] ottomata: any chance we could resolve this mvn error in 1002 so i can build camus jar? [19:24:00] [ERROR] Plugin org.apache.maven.plugins:maven-compiler-plugin:3.1 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-compiler-plugin:jar:3.1: Failure to find org.apache.maven.plugins:maven-compiler-plugin:pom:3.1 in https://archiva.wikimedia.org/repository/mirrored/ was cached in the local [19:24:00] repository, resolution will not be reattempted until the update interval of system-wide-wmf-mirrored-default has elapsed or updates are forced -> [Help 1] [19:27:22] Analytics-Kanban: Make sunburst and stacked-bars resize with window {crow} [3 pts] - https://phabricator.wikimedia.org/T114162#1686582 (Milimetric) NEW a:Milimetric [19:29:11] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds aggregate data URI sources sets up prelim charts for latest stats adds developer tab and data table for getClaims usage adds edits/day, [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/241697 (owner: Christopher Johnson (WMDE)) [19:45:45] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Add event_pageId and event_pageTitle to quicksurvey or all schema - https://phabricator.wikimedia.org/T114164#1686646 (leila) NEW [20:03:04] ottomata: yt? [20:03:07] Analytics-Kanban: Introduction to Hive class {flea} - https://phabricator.wikimedia.org/T113545#1686801 (madhuvishy) https://docs.google.com/a/wikimedia.org/presentation/d/1jR-0wS8pNV4UFw55gAfH_-sKA623jyFSjkdHkH1wUzM/edit?usp=sharing [20:03:13] ottomata: i was wondering ... [20:03:15] nuria: sorta! [20:03:16] :) [20:03:17] ask! [20:03:30] Analytics-Tech-community-metrics, MediaWiki-Extension-Requests, Possible-Tech-Projects: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1686804 (Abit) Yes, I am trying to rustle up some developers to augment the Wikimetrics API such that it can be used to calculate [[ https... [20:03:37] ottomata: best way to add a "third party" jar to execution of hadoop job [20:03:43] ottomata: i read this one: [20:03:52] https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/ [20:04:01] ottomata: but -libjars does not work [20:05:59] ottomata: if you do not know of a way i will just repackage (for tests) our camus jar [20:06:31] nuria: off the top of my head i don't, but i think I have gotten it to work before>.>..>> [20:06:43] nuria: libjars has worked for me [20:06:58] joal: with camus job? [20:07:05] no [20:07:07] not specifically [20:08:25] nuria: does HADOOP_CLASSPATH work? [20:08:27] Analytics-Cluster, Analytics-Kanban: Setup pipeline for search logs to travel through kafka and camus into hadoop {hawk} [21 pts] - https://phabricator.wikimedia.org/T113521#1686828 (madhuvishy) a:Ottomata>Nuria [20:08:38] joal: right, it will work if the class you are trying to execute "MyClass must use GenericOptionsParser class." [20:08:48] gotta run to lunch, sorry.... [20:08:53] ottomata: np [20:09:01] nuria: I don't get it :( [20:09:12] joal: sorry [20:09:19] joal: letme explain better [20:09:37] joal: this works [20:09:40] hadoop jar /path/to/my.jar com.wordpress.hadoopi.MyClass -libjars ${LIBJARS} value [20:10:01] if MyClass uses GenericOptionsParser [20:10:07] mmm let me see if it is the order [20:10:15] I think it might :) [20:13:07] joal: mmmm..this no work, does it look good to you? [20:13:23] nuria: I moved the search task to in progress and assigned it to you. i'm gonna get lunch now, but may be we can chat about it in a bit and i can work on some parts? [20:13:28] https://www.irccloud.com/pastebin/o4sgGlZX/ [20:13:46] madhuvishy: sounds good, will be here until 2pm [20:14:04] nuria: aah lets chat now then [20:14:34] nuria: you can try 'hadoop jar --libjars ${LIBJARS} /path/to/your.jar your.Class param [20:14:34] madhuvishy, sure [20:14:58] I can't remember if libjars is accepted before jar path or not [20:15:02] nuria: --^ [20:15:04] madhuvishy: let me try joals [20:15:08] okay [20:15:13] joal's suggestion [20:18:25] joal: nah, it no work [20:18:32] mwarf :( [20:18:37] https://www.irccloud.com/pastebin/lNSz87Dm/ [20:18:53] madhuvishy: this is where i am at [20:18:55] nuria: two -- [20:19:06] joal: ah wait [20:19:25] if not work before the path, maybe after, but always with two -- [20:22:41] nuria: are you trying the example avro import? [20:22:59] joal it no work, but let me catch up madhuvishy up [20:23:32] madhuvishy: I am trying to execute the camu job using an avro decoder with a dummy schema class [20:23:44] madhuvishy: as you said yes, it is the sample [20:24:01] madhuvishy: and it is packaged (as our schemas would be) on a different jar [20:24:29] madhuvishy: jar is on /home/nuria/avro-kafka [20:25:03] madhuvishy: but i cannot get to pass to hadoop this third party jar so it loads it as part of the camus job so the decoder class [20:25:13] madhuvishy: and schemas are available at runtime [20:25:31] madhuvishy: found this: https://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/ [20:26:08] madhuvishy: cause i was thinking that way the schemas are packaged on an outside jar (not with camus itself) and that is the jar we deploy when schemas change [20:26:18] madhuvishy: does this make sense? [20:26:31] madhuvishy: was trying to execute job like: [20:26:54] https://www.irccloud.com/pastebin/fsmF66t1/ [20:31:52] nuria: I double checked, the two dashes are for hdfs command, my bad :( So you were right, only one dash, and after the class name. Only last try might be to use 'yarn jar' instead of hadoop jar, but I don't see why it would change anything. [20:33:03] joal: mmm.. then it must be the "GenericOptionsParser" thing [20:33:15] joal: let me do 1 more try [20:36:49] joal: no luck [20:36:56] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1686948 (JAllemandou) Hey Tilman :) Thanks for your comments. First, you are right, we have backfilled data removing arbcom wikis and outreach.wikimedia, this explains... [20:36:57] rhmmm [20:37:23] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1686950 (Tgr) >>! In T114078#1685110, @Nuria wrote: > BTW, cc @trg as he had a ticket on t... [20:37:31] nuria: is it a libjars issue or a class not found one ? [20:38:04] joal: still libjars [20:38:18] That's really weird ! [20:38:30] Can you show me the command you run ? [20:39:24] joal: sure [20:39:28] https://www.irccloud.com/pastebin/ZGDtbWsV/ [20:40:35] nuria: have you tried with the -P after the two default config ? [20:41:05] joal: "-P" after libjars? [20:41:09] yes [20:41:29] joal: i think so cause i have tried 20 combinations but let me try again [20:41:59] cause -D and -libjars are generic hadoop parameters, while -P is specific to the class you are loading I think [20:42:04] :)) [20:43:56] nuria: I'm pretty sure of the thing now [20:44:20] nuria: https://github.com/linkedin/camus/blob/master/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/CamusJob.java#L655 [20:44:27] -P is a camus option [20:44:37] nuria: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html [20:44:45] -D and -libjars are generic ones [20:44:53] So the order matters :) [20:46:14] joal: aham [20:46:16] joal: [20:46:21] this one does not work [20:46:26] https://www.irccloud.com/pastebin/fabVppaF/ [20:46:36] (back) [20:47:17] Gives a "Exception in thread "main" org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: --libjars" [20:47:41] joal: and launches job and all but gets a classnotfound exception as it cannot find the schemas [20:48:01] Ah, better at least :) [20:48:37] joal: ahemmm.. better? [20:50:59] I tried to launch: worked fine, except that I don't have the right to write in your folder :) [20:51:04] I'll try again nuria [20:54:29] nuria: following the thread - i'll keep trying if it doesn't work for you until you leave. [20:55:46] milimetric: q for you about python dependencies in setup.py [20:55:48] yt? [20:55:52] yea [20:55:54] sup [20:56:08] nuria: package com.linkedin.camus.example is not present in out jar :) [20:56:13] so, i don't think i had this problem elsewhere, but that is probably because this is the first time i'm trying this in prod [20:56:19] i'm doing the setup.py install step on eventlog1001 [20:56:21] and getting [20:56:30] Installed /usr/local/lib/python2.7/dist-packages/eventlogging-0.9_20150929-py2.7.egg [20:56:30] Processing dependencies for eventlogging==0.9-20150929 [20:56:30] Searching for python-etcd>=0.4.0 [20:56:30] Reading https://pypi.python.org/simple/python-etcd/ [20:56:32] and then it hangs, obvi [20:56:37] because it can't talk to pypi [20:56:41] right [20:56:46] but, i have installed the python-etcd deb package [20:56:51] nuria: So the problem really comes from the class being absent [20:56:52] which works for other deps [20:56:57] exit [20:56:58] what's pip freeze say? [20:57:00] oops :) [20:57:05] or is there no pip freeze there :) [20:57:08] one sec, logging on [20:57:09] joal: hmmm, makes sense - so we should build a jar with it [20:57:12] no pip [20:57:36] i'm wondering if somehow our python-etcd deb package messes up the version somehow that setup.py doesn't get it [20:57:37] hm, I wonder if one the jars in ottomata home folder might contain it [20:57:41] Don't know [20:57:47] madhuvishy, nuria --^ [20:57:54] joal: thanks [20:57:57] If not, we must build a new one [20:57:58] will find out [20:58:07] it's late for you! [20:58:10] Yup, enough for me tonight :) [20:58:16] good night :) [20:58:23] Have a good one lads, see you tomorrow ! [20:58:26] ottomata: looking through dist-packages now [20:58:33] ja, where is that? [20:58:44] there are so many dirs it takes me many minutes to find [20:59:01] ah found [20:59:05] /usr/lib/python2.7/dist-packages/ yes? [20:59:10] oh [20:59:11] python_etcd-0.3.3.egg-info/ [20:59:13] :? [20:59:22] but [20:59:31] aptitude show python-etcd [20:59:34] Version: 0.4.0~git20150609+ac25bd7ba2-1~trusty0 [20:59:35] (i keep trying to say yes but you're typing too fast) [20:59:35] hm [20:59:38] haha [20:59:46] but yes, weird [21:00:08] yarrrr [21:00:09] yeah [21:00:10] package has [21:00:12] ./usr/lib/python2.7/dist-packages/python_etcd-0.3.3.egg-info/ [21:00:13] hmmm [21:00:19] ok milimetric, makes sense why this doesn't work [21:00:23] asking filippo [21:00:25] yep [21:00:38] ottomata: easy fix would be to just require etcd 3.3, no? [21:00:45] maybe! [21:00:49] I'm assuming if that's what we have packaged, it worked with code similar to yours [21:00:51] all my testing was with 0.4 i guess [21:01:14] kevinator: I'm on fifth, but some thing is going on in collab space [21:02:43] milimetric: this seems to be an oversight on the python-etcd folks part. [21:02:52] madhu, yes it's manager training for the financial tools WMF uses [21:03:19] madhuvishy: can we postpone our meeting? I'd like to review this meeting [21:03:24] HMMM, no. its not. i think that maybe this package was made before the 0.4.0 tag existed. [21:03:25] sighHHGhghgh [21:04:24] sucks, yeah, will it take long to test with 0.3.3? [21:05:10] kevinator: sure. [21:06:30] brb [21:06:31] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1687127 (jgbarah) >>! In T89135#1684292, @Qgil wrote: > @acs, @dicortazar, @jgbarah, do you still want to propose this project? Are you able to mentor it? Yes, I think so. I'm... [21:12:41] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Add event_pageId and event_pageTitle to quicksurvey or all schema - https://phabricator.wikimedia.org/T114164#1687134 (Jdlrobson) Editing and discovery team - are there any reasons why all your... [21:13:30] joal:i know class is not present [21:13:38] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Add event_pageId and event_pageTitle to quicksurvey or all schema - https://phabricator.wikimedia.org/T114164#1687137 (Jdlrobson) [21:13:51] joal: i am passing it on a jar that i locally build, makes sense? [21:14:35] ah, sorry joal , see you tomorrow, we'll revisit this cc ottomata [21:16:08] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1687158 (jgbarah) >>! In T89135#1684438, @Anmolkalia wrote: > Hi @Aklapper. That sounds very encouraging :) I recently made a small contribution to the android Wikipedia App, s... [21:56:05] good night a-team, see you tomorrow! [21:56:13] night mforns! [21:56:27] :] [22:05:41] kevinator: i'm in one of the pink couches [22:08:17] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [5 pts] - https://phabricator.wikimedia.org/T108925#1687354 (Tbayer) Thanks Joseph - if you could also answer the other questions (in the [[https://phabricator.wikimedia.org/T108925#1686425 | comment]] before the PPS abou... [22:17:17] ottomata, madhuvishy : still there? [22:31:02] yes in meeting [22:40:20] Analytics-Backlog, Analytics-EventLogging, MediaWiki-extensions-CentralNotice, Traffic, operations: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1687538 (awight) For more context, we're sending data very occasionally, one message on fa... [22:42:09] nuria: me too, 1-1 with Kevin [22:42:13] but almost donw [22:42:33] Analytics-Kanban: Report on zh wikipedia for Zhou - https://phabricator.wikimedia.org/T114190#1687561 (kevinator) NEW [22:42:43] Analytics-Kanban: Report on zh wikipedia for Zhou - https://phabricator.wikimedia.org/T114190#1687569 (kevinator) a:madhuvishy [22:49:53] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1687605 (awight) Everything Jgreen is saying is true ;) there's no effect on existing banner impression counts... [22:57:10] love and fresh flowers for everyone! [22:57:12] good night :) [22:58:05] milimetric: ciaooo [22:58:08] goodnight! [22:58:47] milimetric: did you get the api deployed ;) you are chirpy :D [22:59:00] good night! [23:07:15] nuria: hiIIiI [23:07:34] ottomata: ay man, need to go in 2 mins but ALMOST got the hadoop [23:07:43] job to accept the -libjars argument [23:07:52] i had to modify our camus jar [23:07:53] COOL [23:07:56] oh? [23:08:23] ottomata: and now job doesn't complain but still doesn't see the third party lib so I am missing something else [23:08:50] like this job does not complain about -libjars option [23:08:53] https://www.irccloud.com/pastebin/HcpmTReH/ [23:09:23] but still i do not think is loading stuff right [23:09:59] ottomata: let's talk tomorrow [23:10:02] nuria: maybe there is a way to output classpath in the job? [23:10:08] not sure, then you could see if you have the jar in your path [23:10:14] does export HADOOP_CLASSPATH work? [23:10:32] na, that probably has to happen on the worker nodes too?