[00:27:23] Analytics-EventLogging, MediaWiki-General-or-Unknown, Performance: Add event tracking queue to MediaWiki core for loose coupling with EventLogging or other interested consumers - https://phabricator.wikimedia.org/T95356#1188141 (bd808) [00:49:14] Analytics-Engineering, Engineering-Community, ECT-April-2015: Tech Talk: May 2015: Kanban - https://phabricator.wikimedia.org/T95202#1188180 (ksmith) [00:52:29] Analytics-Engineering, Engineering-Community, ECT-April-2015: Tech Talk: May 2015: Kanban - https://phabricator.wikimedia.org/T95202#1188191 (ksmith) @Rfarrand: I have only barely started writing the talk, so early May is pushing it. June would be better for me, but if mid-May is the ideal window, I can... [00:52:51] Analytics-Engineering, Engineering-Community, Team-Practices, ECT-April-2015: Tech Talk: May 2015: Kanban - https://phabricator.wikimedia.org/T95202#1188195 (ksmith) [02:09:20] Analytics-Wikimetrics: Wikimetrics backup has no monitoring - https://phabricator.wikimedia.org/T71397#1188310 (Nuria) @Dzahn: we know when things are not working cause the cronjob notifies us of the error [10:19:07] Analytics-Tech-community-metrics, ECT-April-2015: Tech metrics should talk about "Affiliation" instead of organizations or companies - https://phabricator.wikimedia.org/T62091#1189020 (Aklapper) And quoting acs from https://github.com/Bitergia/mediawiki-dashboard/pull/56 : > We are using in the next versi... [11:18:09] Analytics, Scrum-of-Scrums, Wikipedia-Android-App, Wikipedia-iOS-App, and 4 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1189182 (mark) [11:26:16] Analytics-Tech-community-metrics, ECT-April-2015: Tech metrics should talk about "Affiliation" instead of organizations or companies - https://phabricator.wikimedia.org/T62091#1189228 (Qgil) Although I wrote "instead of organizations or companies" in the subject, I didn't provide any specific argument aga... [13:36:49] (PS1) Alexey: Modify access rules [analytics/camus] (refs/meta/config) - https://gerrit.wikimedia.org/r/202728 [13:38:38] (PS1) Alexey: Modify access rules [analytics/camus] (refs/meta/config) - https://gerrit.wikimedia.org/r/202729 [13:46:25] (CR) Ottomata: "Hm, what's this about?" [analytics/camus] (refs/meta/config) - https://gerrit.wikimedia.org/r/202728 (owner: Alexey) [14:02:15] ottomata: standupppppp ! \o/ [14:04:07] ! [14:04:14] oopsi [14:25:59] Wops [14:49:51] halfak: I had a bug on the InputFormat, corrected it and rerun the thing [14:50:37] Full enwiki? [14:50:42] joal, ^ [14:50:51] halfak: Yes [14:52:38] halfak: I think I'll have parsed every rev in about 6/7 hours [14:52:52] Nothing done though except converting to json [14:53:31] halfak: It will also depend on cluster load [14:53:47] joal, awesome. I already have a test job ready for when that is done :) [14:53:55] huhu [14:54:03] What man ? [14:56:06] milimetric: I messed up on madhu’s dates [14:56:22] I just got a request to extract tags from Wikipedia. It seems like that might be a good test :) [14:56:32] tnegrin: oh? [14:56:43] she’s at a conference when you are coming out [14:56:51] Indeed, we could go for that :)n [14:56:58] tnegrin: sounds like I should reschedule [14:56:59] I was wondering if you could push your trip out a week [14:57:04] halfak: --^ [14:57:06] tnegrin: that's fine [14:57:13] I'll email Doreen [14:57:40] joal, :) [14:57:41] yes — that would be best. thanks for being flexible — I totally forgot but I’m in a conference positive mood [14:58:44] milimetric, tnegrin: thanks! [14:58:54] madhuvishy: no problem at all [14:59:20] madhuvishy: so I'm trying to decide between Friday - Thursday or Monday - Friday the next week [14:59:51] milimetric: we can also do Thursday - Wednesday [15:00:01] I'm back on Wednesday noon [15:00:33] cool, thx, I'll check with my brother and friends that I'm staying with and figure it out. [15:00:59] Aah thank you so much! Sorry for the trouble. [15:01:19] Let me know when you've made plans :) [15:02:36] Ironholds: Another quick question on your udfs [15:03:01] joal, sure! [15:03:10] the IsCrawler one, should we consider what it matches as spiders, or as automatas ? [15:04:02] My guess is that those are automatas, spiders being the one tagged as spiders by ua-parser, but I prefer to double check [15:04:20] Ironholds: --^ [15:05:29] joal, so many incoming requests I want to hadoop on. I have another one for detecting reverts of mobile anonymous editors. :) [15:05:42] huhu [15:06:08] We should get you set on up the Altiscale cluster so that we can work together offload some of my work on public data there. [15:06:24] I'll talk to tnegrin about that during our 1:1 today. [15:07:28] joal, isCrawler is spiders [15:07:32] that's why it's called isCrawler [15:07:34] it's for web crawlers [15:07:48] google and some other sites have Wikimedia-specific crawlers that don't appear in the generalised ua-parser definition [15:08:01] ok, good to have checked [15:10:07] Ironholds: Then about automatas, that's where I need to find the wordpress stuff, correct ? [15:10:14] yup [15:10:22] Ironholds: ok got it [15:10:30] Ironholds: thx [15:10:49] np! [15:48:34] Ironholds: can you tell me more about that wordpress thing ? [15:48:52] I looked briefly and google didn't helped (for once ...) [15:50:28] joal, we get spammers stealing and livemirroring our content for legitimate-looking SEO spam bullshit [15:50:31] emphasis "livemirror" [15:50:49] the result is a ton of requests from WordPress user agents that are automated spam live-mirror updates [15:51:11] I understand the process [15:51:16] grep through a sampled log file for WordPress if you want an example UA; I'm afraid I don't have one off the top of my head :( [15:51:31] no prob, I'll double check :) [15:51:33] thx [16:08:56] Analytics-EventLogging, Ops-Access-Requests, operations, Patch-For-Review: Grant user 'tomasz' access to dbstore1002 for Event Logging data - https://phabricator.wikimedia.org/T95036#1189944 (RobH) a:mark [16:17:56] ottomata: question if you are there [16:17:56] yes [16:17:59] when a spark submit job fails you get: [16:18:14] Analytics, Scrum-of-Scrums, Wikipedia-Android-App, Wikipedia-iOS-App, and 4 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1190049 (dr0ptp4kt) Open>Resolved [16:18:21] https://www.irccloud.com/pastebin/2nusiuXt [16:18:42] ottomata: i go and get those logs from the cluster (application_1424966181866_75876) [16:18:51] ottomata: is there more info anywhere? [16:24:14] ottomata: i also see: [16:24:15] https://www.irccloud.com/pastebin/DPxDq3ht [16:24:15] but do not know really what to do with that [16:42:37] ottomata, joal : af far as i can see the job (for 1 day) needs more than 1G memory for executor/driver [16:42:37] to run [16:42:37] ottomata, joal: i have it now running for 1 day with 3g , will read a bit about spark persistance to disk [17:00:11] Analytics-EventLogging, Analytics-Kanban: Cron collects Visual Editor deployments [8 pts] {lion} - https://phabricator.wikimedia.org/T89253#1190460 (Milimetric) Open>Resolved I made a script to do this, cloned a copy of mediawiki-core on stat1003, and scheduled the script to run nightly: # update t... [17:01:40] Analytics-EventLogging, Analytics-Kanban, VisualEditor: Wikitext events need to be sampled {lion} - https://phabricator.wikimedia.org/T93201#1190468 (Milimetric) Open>Resolved [17:02:40] Analytics-Kanban, Analytics-Visualization, Patch-For-Review: Improve UX for VE/Wikitext comparison dashboard {lion} - https://phabricator.wikimedia.org/T94424#1190475 (Milimetric) Open>Resolved [17:02:42] Analytics-EventLogging, Analytics-Kanban, Analytics-Visualization: Fully instrument editing experiences {epic} {lion} - https://phabricator.wikimedia.org/T89924#1190476 (Milimetric) [17:03:47] Analytics-Kanban: Environment for Visual Editor visualizations in labs that can report usage metrics [13 pts] {lion} - https://phabricator.wikimedia.org/T93954#1190480 (Milimetric) a:Nuria [17:04:38] Analytics-Kanban: Environment for Visual Editor visualizations in labs that can report usage metrics [13 pts] {lion} - https://phabricator.wikimedia.org/T93954#1151034 (Milimetric) This is done, the dashboard is at: https://edit-analysis.wmflabs.org/compare/ We are working on reporting usage metrics separate... [17:05:05] Analytics-Kanban: Environment for Visual Editor visualizations in labs that can report usage metrics [13 pts] {lion} - https://phabricator.wikimedia.org/T93954#1190485 (Milimetric) Open>Resolved [17:06:59] nuria: sorry [17:07:06] ottomata: np [17:07:38] nuria: eating lunch now, will be with you after SoS [17:07:43] ottomata: k [17:10:10] nuria: sorry as well, will be back and discuss after diner [17:15:10] joal: k [17:26:52] milimetric, is the rate of wikitext editing events still too high for you guys? [17:27:29] Krenair: it's quite high, but analysis and everything seem to be going ok [17:27:35] you're sampling 1/4 right? [17:27:41] Not yet: https://gerrit.wikimedia.org/r/#/c/199132/ [17:27:50] oh ok, that would be good [17:27:52] it's still not been reviewed [17:27:54] ok [17:27:59] because I think at this rate the table is going to crash eventually [17:28:05] and we won't be able to compute new days [17:28:21] btw, Krenair, this isn't "officially" released yet by Kevin, but: https://edit-analysis.wmflabs.org/compare/ [17:28:23] Analytics-EventLogging, Analytics-Kanban, VisualEditor, Patch-For-Review: Wikitext events need to be sampled {lion} - https://phabricator.wikimedia.org/T93201#1190578 (Krenair) Resolved>Open [17:28:28] that's the final dashboard we're making for you guys [17:28:33] and it's being updated daily [17:28:46] starting with April 1st which is when we feel like data was at least as clean as it's going to get for a while [17:28:56] Analytics-Volunteering, Engineering-Community, Phabricator, Project-Creators, and 2 others: Analytics-Volunteering and Wikidata's Need-Volunteer tags; "New contributors" vs "volunteers" terms - https://phabricator.wikimedia.org/T88266#1190591 (kevinator) @Aklapper Yes, I have discussed with my team a... [17:30:26] milimetric, less than 50% ready? [17:30:51] in Scrum of Scrums now [17:30:58] ok [17:31:12] but you can check out the analysis here... (one sec, pasting) [17:33:01] sometimes we get another ready after abort :| [17:34:58] sometimes people end up saving without ready [17:35:10] I don't understand how this happens [17:35:22] Krenair: https://github.com/wikimedia/analytics-limn-edit-data/blob/master/edit/sessions.sql [17:35:44] nuria: I'm back ! [17:35:44] we reviewed this code a few times, but it's possible we made a mistake too [17:35:59] batcave ? [17:37:21] joal: question 1st, do you know anywhere else where to look at logs besides logs in application_1424966181866_75876 [17:37:27] (for example) [17:37:53] There are logs per worker, but I don't think we have [17:38:19] And from what I have seen in logs so far, it seems we don't have spark generated logs [17:38:49] joal: i normally just get logs from /var/blah in cluster [17:38:57] yup [17:39:01] Same for me [17:39:03] joal: ya, that is waht i think [17:39:19] I think logging has not been configured for spark yet [17:39:27] joal: i was looking at memory of every loaded block and *i think* every hour is actually small: [17:39:39] 15/04/08 17:34:33 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on localhost:59724 (size: 22.4 KB, free: 530.0 MB) [17:39:39] it is :) [17:39:57] i just cached it locally to have it report [17:40:01] ah, broadcast is different I think [17:40:20] Let's batcave, there are a few hints I can give you [17:40:51] k [17:44:47] Analytics-Wikimetrics, Community-Wikimetrics: Can't edit list of users - https://phabricator.wikimedia.org/T67208#1190647 (Fhocutt) [17:57:17] ok, you guys doing the spark stuff? [17:57:21] nuria: lemme know if you need help [18:04:06] ottomata: joel just resolved couple doubts that i had , will be gone for couple hours but back on spark after that [18:05:52] ottomata: if you could manage to get us access to analytics1001 UIs without tunnelling, that would be AWESOME ! [18:07:16] ok cool [18:07:25] joal, yarn.wikimedia.org [18:07:26] ? [18:07:33] more ! [18:07:33] what else do you need? [18:07:39] analytics1001 ;) [18:07:43] haha, it isn't that easy [18:07:50] I guess so [18:07:52] do you need more than resource manager UI? [18:07:56] like, namenode UI? [18:07:58] history? [18:07:59] But spark UI is so helpfull [18:08:00] what else? [18:08:03] you get it! [18:08:10] click on ApplicationMaster [18:08:12] while job is running [18:08:16] it will give you a crappy analytics1001 url [18:08:28] of I do, I just say that having to tunnel is kinda not fun :) [18:08:32] edit hte url and replace analytics1001.eqiad.wmnet:8088 with yarn.wikimedia.org [18:08:38] no tunnel needed, just url editing :/ [18:08:41] I know I know :) [18:08:52] really ? [18:08:57] OOOOOOH 1 [18:08:58] ja [18:09:05] \o/ [18:09:06] its just that the UI doesn' tknow that it is being proxied [18:09:09] it is dumb [18:09:13] huhuhu [18:09:14] i can't tell it to redirect based on HTTP_HOST [18:09:20] it just redirects to itself [18:09:30] nuria will lov e to know that ! [18:09:32] or has absolute URIs or something [18:09:38] that's the most annoying part [18:09:48] if we really wanted to fix this...we'd make a Hadoop JIRA and do it ourselves. [18:09:53] it is a bug for sure [18:10:02] mm [18:10:30] or do some HTML mangling at the proxy level :) [18:10:32] that is possible [18:10:48] to s/analytisc1001.eqiad.wmnet:8088/yarn.wikimedia.org' [18:11:24] If there is a web-proxy like nginx in front of those guys, sounds possib;e :) [18:11:59] Well, now I can tell you sparks handle big alright :) [18:12:07] there is, but right now it is using the misc proxy [18:12:14] mmfff [18:12:20] which isn't very flexible, because it is used for many backends that need simple proxy [18:12:27] right [18:12:33] it would be better to fix it in source [18:12:41] stupid absolute urls [18:12:41] correct [18:13:23] ApplicationMaster [18:13:23] tsk tsk tsk [18:14:56] replacing with yarn in url doesn't work for spark ui :( [18:15:10] oh no, my bad [18:15:16] works very well sorry [18:15:55] Seems that the XMLTiJSON file format for wikidumps works now :) [18:15:58] https://yarn.wikimedia.org/proxy/application_1424966181866_75668/stages/stage?id=0&attempt=0 [18:16:21] I shouldn't show you that ;) [18:23:44] COooL [18:23:47] huh? [18:24:02] I steal some cluster ressource there ;) [18:24:05] hehe [18:24:22] s'ok, fairscheduler seems to be doing things better these days [18:24:31] That's a subliminal message to tnegrin ;) [18:24:39] yeah, definitely [18:24:52] I'd still like to have more queues [18:25:00] One for critical (wit [18:25:05] with prehemption [18:25:21] one for prod, high priority, no prehemption [18:25:28] joal: feel free to submit patch: https://github.com/wikimedia/operations-puppet/blob/production/templates/hadoop/fair-scheduler.xml.erb [18:25:29] one for day-to-day stuff [18:25:29] :) [18:25:38] and we can discuss :) [18:25:43] Sounds good [18:26:40] Ironholds: just curious [18:26:41] what's ip_norm ? [18:26:58] ottomata, where? [18:27:28] SELECT geocode(ip_norm(i...webrequest_source(Stage-1) [18:27:59] https://yarn.wikimedia.org/proxy/application_1424966181866_75757/mapreduce/job/job_1424966181866_75757 [18:33:26] ottomata, it's the local name for the "take the IP and XFF and work out what we want" UDF [18:33:36] ohhhh [18:33:37] right, cool.. [18:33:39] temp function name [18:33:58] hive 0.13 has the ability to store udf functions [18:34:05] as names, so they don't have to be registered everyt time [18:34:09] we should set that up..:) [18:34:30] Analytics-Cluster, operations, ops-eqiad: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1191043 (Ottomata) p:Triage>High [18:34:36] Analytics-Cluster, operations, ops-eqiad: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1184828 (Ottomata) p:High>Triage [18:34:52] neat! [18:35:02] Analytics, Analytics-Kanban: udp2log: Announce new stream so people can compare streams - https://phabricator.wikimedia.org/T86205#1191045 (Ottomata) p:Triage>High [18:35:10] Analytics: Vowpal Wabbit on stat1002 - https://phabricator.wikimedia.org/T93537#1191046 (Ottomata) p:Triage>Low [18:35:49] Analytics-Cluster, operations, ops-eqiad: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1184828 (Ottomata) p:Triage>High [18:36:44] Analytics-Cluster: Register refinery-hive UDF functions as stored names, so we don't have to CREATE TEMPORARY FUNCTION every time. - https://phabricator.wikimedia.org/T95455#1191050 (Ottomata) NEW [18:37:00] Nice one ottomata :) [18:37:31] Analytics-Cluster, operations, ops-eqiad: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1191058 (Cmjohnson) I did an initial look and we had this in the past with a couple of the R720's and a main board had to be swapped. I need to do some more testing before I contact Dell... [18:59:34] Analytics-Wikimetrics, Community-Wikimetrics: "Create Report" button does not appear when uploading a new cohort - https://phabricator.wikimedia.org/T95456#1191170 (kevinator) [19:01:26] Analytics-Engineering, MediaWiki-API, MediaWiki-API-Team, Wikipedia-Android-App, and 2 others: Add page_id and namespace to X-Analytics header in App / api requests - https://phabricator.wikimedia.org/T92875#1191182 (Legoktm) [19:07:06] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics: "Create Report" button does not appear when uploading a new cohort - https://phabricator.wikimedia.org/T95456#1191216 (Milimetric) [19:55:57] Analytics-Cluster: Register refinery-hive UDF functions as stored names, so we don't have to CREATE TEMPORARY FUNCTION every time. - https://phabricator.wikimedia.org/T95455#1191500 (JAllemandou) CREATE FUNCTION exists from hive 0.13 We still need to decide how we handle loading the jar though. [19:59:36] ottomata: is the chart from kafka showing the cluster being overwhelmed ? [20:00:06] did I do something to the cluster? [20:00:25] Ironholds: I don't think [20:14:44] ? [20:14:44] kafka? [20:14:57] nuria: i haven't read these, just stumbled across them just now: [20:15:01] http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ [20:15:01] http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ [20:15:03] maybe helpful [20:16:42] ottomata: ok, will take a look [20:16:55] joal: kafka? [20:17:25] yup [20:17:27] http://ganglia.wikimedia.org/latest/graph_all_periods.php?hreg[]=analytics1012.eqiad.wmnet|analytics1018.eqiad.wmnet|analytics1021.eqiad.wmnet|analytics1022.eqiad.wmnet&mreg[]=kafka.server.BrokerTopicMetrics.%2B-BytesOutPerSec.OneMinuteRate&z=large>ype=stack&title=kafka.server.BrokerTopicMetrics.%2B-BytesOutPerSec.OneMinuteRate&aggregate=1&r=hour [20:17:46] first chart: only one big bumpbetween 19.00 and 20.00 [20:18:06] it is probably camus being launced [20:18:14] i actually was just looking at some camus output a few mins ago [20:18:20] comparing it to what i'm seeing in labs [20:18:22] while i was doing that [20:18:32] i noticed that i think it is taking longer than 10 minutes for a singel camus run these days [20:18:47] which, is fine. [20:18:55] but, it means we should keep an eye on it [20:19:11] if you look at the longer periods, you can see lots of bumps [20:19:21] camus is are only real kafka consumer right now [20:19:30] so it is pretty much entirely responsible for any 'bytes out' [20:20:01] ok :) [20:20:21] thx for the explanation [20:23:30] ottomata: part-2 you sent before, last section on data format [20:23:37] Please read that ;) [20:28:46] (PS1) Joal: Add access_method, client_type and is_zero fields to refined webrequest table. [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 [20:29:18] And so finishes my day :) [20:29:25] Good night team [20:30:31] ottomata: can i gain access to 1001 to be able to see the spark jobs ui ? [20:34:23] nuria: just change any url you see to yarn.wikimedia.org [20:34:24] and it will work [20:34:33] if you click on ApplicationMaster link [20:34:41] and it takes you to analytics1001.eqiad.wmnet:8088 [20:34:52] change that part of the url to yarn.wikimedia.org [20:36:25] hahahaha [20:36:34] joal|night: that Data Formats section is hiliarous [20:36:41] ottomata: it is! [20:36:43] Avro is so annoying though! [20:36:46] so far anyway :/ [20:36:48] ottomata: ay ay [20:37:08] ottomata: i would substitute json in that paragraph for xml though [20:37:17] don't tell christian [20:37:34] haha [20:37:46] i don' thtink anyone disputes that [20:37:52] but json is pretty easy to use [20:38:17] ottomata: so this is my shell job: https://yarn.wikimedia.org/cluster/app/application_1424966181866_76161 [20:38:26] nuria: i wonder if we would get a speedup for this job if we told spark to use Kryo for inter job communication [20:38:37] your job hasn't started yet nuria [20:39:02] you need to make an action happen [20:39:08] ottomata: i thought it used kyro as a default [20:39:16] not as default,i guess [20:39:17] i thought so too [20:39:30] ottomata: i thought i just read that on sparks docs [20:39:31] because of some older issues? [20:39:44] "The Kryo serializer, org.apache.spark.serializer.KryoSerializer, is the preferred option. It is unfortunately not the default, because of some instabilities in Kryo during earlier versions of Spark and a desire not to break compatibility, but the Kryo serializer should always be used" [20:42:45] ottomata: ok, will change for next run, easy todo [20:42:51] ottomata: i see whta you mena with url: https://yarn.wikimedia.org/proxy/application_1424966181866_76161/ [20:43:00] * i see what you mean [20:48:15] ja annoying [20:48:28] i was telling jo al, to fix we should probably just file a JIRA upstream and get it fixed [20:48:35] nobody cares cept us cause everybody else has VPNs :/ [20:48:48] i mean, fix it ourselves* [20:53:28] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics: "Create Report" button does not appear when uploading a new cohort - https://phabricator.wikimedia.org/T95456#1191707 (egalvezwmf) Sometimes the "create cohort" button never appears.... I have a few cohorts I needed to re-upload because o... [20:55:25] (CR) Ottomata: [C: 2] Add access_method, client_type and is_zero fields to refined webrequest table. (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 (owner: Joal) [20:55:38] ottomata: if I understand a little of how does this work (spark) I *think* chnaging serilization should not affect the session job much as there is not moving of data back and forth across newtork [20:56:09] ottomata: maybe if datasets are persisted to disk (due to not fiting on RAM) there is some impact [20:56:27] Analytics-Wikimetrics, Community-Wikimetrics: [BUG] Viewing ukwiki cohorts error - https://phabricator.wikimedia.org/T95320#1191714 (egalvezwmf) This problem is persisting in other projects, not just ukwiki [20:56:47] hm, nuria, i guess it depends on how many different phases you have. i dunno really either [20:56:56] nuria: i woudnt' worry about it much for now, if it is difficult [20:57:04] ottomata: nah, it is trivial [20:57:14] its worth a try, if you can run the thing for a single day [20:57:20] do it with and without kryo and compare [20:57:46] ottomata: ya , that is what we will do next, this is the run w/o kyro: https://yarn.wikimedia.org/proxy/application_1424966181866_76161/executors/ [21:01:31] nuria: am reading this [21:01:32] http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ [21:01:39] ya me too [21:01:53] it describes when things are shuffled, which would mean they would be serialized [21:03:42] right, that is why i was saying in our case i do not see that happening [21:03:44] right? [21:04:07] we read data , executed a query, get more data and in every record apply some booleans [21:04:31] mmm... after there is some combinebykey...fishy [21:05:12] ottomata: so that combineByKey might need to reshuffle [21:05:34] ottomata: but that is executed on a very samll dataset [21:05:37] *small [21:07:16] ottomata: also me no compredou why all these are at '0' [21:07:17] https://yarn.wikimedia.org/proxy/application_1424966181866_76161/executors/ [21:07:27] nuria, in the comments of that article: [21:07:28] "Hi Nitin, I’m glad you enjoyed it. I tend to think of combineByKey as an internal API that was exposed accidentally. I find aggregateByKey easier to use than combineByKey because it takes a zero value instead of a zero function, the latter of which forces the user to handle cloning on their own." [21:07:44] nuria, has your job started? [21:07:54] https://yarn.wikimedia.org/proxy/application_1424966181866_76161/jobs/#active 0 [21:07:54] https://yarn.wikimedia.org/proxy/application_1424966181866_76161/jobs/#completed 0 [21:07:54] https://yarn.wikimedia.org/proxy/application_1424966181866_76161/jobs/#failed 0 [21:08:17] ottomata: but it says here :'33" minutes https://yarn.wikimedia.org/proxy/application_1424966181866_76161/jobs/ [21:08:41] yessss.... [21:08:45] ottomata: ok, since submission must be? [21:08:50] this is spark shell or spark submit? [21:08:58] ottomata: spark-submit [21:09:11] hm [21:09:29] you sure? [21:09:30] Spark shell [21:09:32] https://yarn.wikimedia.org/cluster/scheduler [21:09:41] nuria [21:09:41] Spark shel [21:09:48] yessss [21:10:23] what is the command you used to launch? [21:11:21] spark-shell --jars /home/otto/algebird-core_2.10-0.9.0.jar,/home/nuria/workplace/refinery/source/refinery-job/target/refinery-job-0.0.10-SNAPSHOT.jar,/home/nuria/workplace/refinery/source/refinery-core/target/refinery-core-0.0.10-SNAPSHOT.jar --driver-memory 1G --executor-memory 4g --master yarn --num-executors 4 [21:11:43] nuria that is spark shell [21:11:48] "spark-shell " [21:11:48] :p [21:11:55] ottomata: ay ay wait [21:12:30] you want spark-submit if you are trying to launch marcel's job [21:12:45] you are right! ay ay [21:12:49] but see: spark-submit --class=org.wikimedia.analytics.refinery.job.AppSessionMetrics --master yarn --deploy-mode cluster --num-executors=4 --executor-cores=2 hdfs://analytics-hadoop/tmp/nuria/jars/refinery-job-0.0.10-SNAPSHOT.jar --executor-memory 4g [21:13:52] ottomata: do we cancel this one on the shell? how? [21:16:08] ottomata: the spark-submit job fails every time [21:20:24] ottomata: looks like it i staking longer to fail now: https://yarn.wikimedia.org/cluster/app/application_1424966181866_76199 [21:21:03] nuria: your shell job is gone now, ja? [21:21:16] you submit job hasn't started yet [21:21:32] yarn is waiting for resources to free up before scheduling it [21:21:39] that's why its status is UNASSIGNED [21:23:32] ottomata: no, look, it failed [21:23:35] https://www.irccloud.com/pastebin/uOXIXvDA [21:24:49] nuria, the app job takes input and output arguments, ja? [21:25:19] ottomata: this vs? no , it does not [21:25:26] oh? [21:25:27] https://gerrit.wikimedia.org/r/#/c/199935/5/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/AppSessionMetrics.scala [21:25:30] you using an older one? [21:25:34] see usage with spark-submit [21:26:21] ottomata: yes, w/o any input output [21:26:25] https://www.irccloud.com/pastebin/2ug5uOG3 [21:26:56] ok [21:27:25] nuria, try [21:27:29] --master yarn --deploy-mode client [21:27:34] and see if you get any useful output [21:27:44] ottomata: k [21:29:51] ottomata: where does it load classes on that mode from? [21:30:17] ? [21:30:21] oh client? [21:30:31] it just means that it will keep your CLI connected to the driver [21:30:33] ...i think [21:30:35] somethign like that [21:30:40] otherwise it works the same [21:30:59] Analytics-Kanban, Analytics-Volunteering, Analytics-Wikimetrics, Community-Wikimetrics, Easy: "Create Report" button does not appear when uploading a new cohort - https://phabricator.wikimedia.org/T95456#1191890 (Nuria) [21:31:23] ottomata: mmm, there must be something else... will look [21:33:11] ottomata: as the classloader cannot find the jar at /tmp/nuria/jars/refinery-job-0.0.10-SNAPSHOT.jar in hdfs, will check what is going on [21:33:31] hm [21:33:42] the jar mabye needs to be the last argument? [21:34:36] ottomata: nah, let's see [21:38:21] ottomata: i bet this jar needs to be on some classpath to run on this mode [21:39:16] hm. [21:39:23] no, by giving it to spark-submit, it should upload it [21:39:29] are you missing dependencies? [21:44:18]