[00:07:28] ah, dcljr's article on article counts is out https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-05-20/In_focus [00:54:42] Analytics, WMF-Product-Strategy: Provide metrics for WMF quarterly report on January-March 2015 - https://phabricator.wikimedia.org/T97344#1300630 (Tbayer) [09:19:49] Analytics-Tech-community-metrics, ECT-May-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1300931 (Qgil) I was looking at the good summary of metrics at https://commons.wikimedia.org/w/index.php?title=File:Wikimedia_Foundati... [11:55:17] Analytics-Tech-community-metrics: https://www.openhub.net/p/mediawiki stats out of date - https://phabricator.wikimedia.org/T96819#1301173 (Kghbln) stalled>Resolved Seems they have got it under control again. Projects are now updating again. [12:48:46] (PS1) Joal: Add pageview aggregation and parquet merger. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212541 [13:18:48] (PS2) Joal: Add pageview aggregation and parquet merger. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212541 [13:19:20] MOoroninnng [13:20:38] HeyyaaaaaaAAAAAAaa :0 [13:27:35] (PS1) Joal: Add pageview aggregation oozie job and hive table. [analytics/refinery] - https://gerrit.wikimedia.org/r/212542 [13:27:39] so joal, worker nodes are all rebooted with hyperthreading and have the new memory settings applied [13:27:50] Coooooool ottomata ! [13:28:22] Any difference noticeable yet ? [13:29:17] haven't checked yet [13:29:35] 1036 networking is not working, and 1028 is still down, so we are down 2 nodes atm [13:29:45] And by the way, I receive refinement emails :) [13:29:49] Thaks for that ! [13:30:05] k [13:30:13] cool! [13:30:14] I have seen jgage email about 1036 [13:30:18] That really weird [13:30:38] Seems that we have a failure in bits refinement for yesterday [13:31:22] I am going to relaunch it [13:32:29] you sure it is failure? [13:32:41] oh i see it [13:32:43] well, coordinators says so [13:32:47] ok ya [13:32:54] Let me try that :) [13:32:56] k [13:33:01] -rerun :) [13:33:03] so nice. [13:34:21] woaw, hue give a 500 error when trying to access the failed run action [13:35:21] And by the way, I submitted two code reviews for spark aggregation and oozification :) [13:35:33] Would love to get some comments :) [13:35:34] k cool [13:35:37] will get to them shortly :) [13:35:50] joal, is it the 500 page from wikimedia server? [13:35:53] or from hue itself? [13:36:14] https://hue.wikimedia.org/oozie/list_oozie_workflow/0023511-150504211649924-oozie-oozi-W/ [13:39:38] blah [13:39:42] "killed by ApplicationMaster" [13:39:46] ok hue, weird [13:40:18] Wow, now, wikimedia foundation error page [13:41:14] BTW ottomata, have you seen bwest answer ? [13:47:36] no [13:48:06] phew, joal, no idea what was up with hue there [13:48:08] it is working now [13:48:12] it was being weird for me too [13:48:15] weirdo heh ? [13:48:33] between the 1036 and hue, we are at a bizzzzzzzzare time :) [13:49:55] what was bwest response? [13:50:03] Add leila [13:50:04] i don't think ihave it [13:50:08] ? [13:50:09] analytics-internal [13:50:35] maybe he didnt't reply-all? [13:50:57] Correct ! [13:51:07] gmail wrong mapping :) [13:51:31] basically, I think he wnats us to deal with Leila [13:51:36] I am gonna do that :) [13:53:30] ok [13:57:11] ottomata: Sorry I messed up with the oozie relaunch ... [13:57:16] Re-relaunching [14:02:01] joal: qq, did you sanitize these test data files? [14:02:07] ip addresses, uas, etc.? [14:02:21] ottomata: nope, not at all ! [14:02:39] Completely forgot about that !!! [14:02:41] :) [14:02:44] How bad :( [14:02:59] Only 1000 lines though, but still, you're right ! [14:03:10] yeah, not a huge deal, but we shouldn't merge it that way [14:03:12] but still, hm. [14:03:29] Aggreed ! [14:03:40] i'm not so sure about committing parquet files for the request data....unless there is something documenting exactly what is in them [14:03:43] makes tests really hard to read it hink [14:03:44] no? [14:03:48] I'll keep the same schema as the original one, and replace sensitive values per dummy ones [14:03:52] k [14:04:41] Here the parquet data is really to prove that, for a given schema, the aggregation part of it work [14:04:51] hm, aye [14:04:54] It's more integration testing than unittest [14:04:56] the aggregation? or the merger? [14:05:05] aggregation [14:05:07] hm [14:05:23] The merger makes use of hdfs fs, so it's barely testable as is [14:05:58] The merger could probably go into the tools module, thinking of that [14:06:13] oh cool! udf.register! :) [14:06:33] yeah except, not sure, since it launches a spark job. thought of that too but was goign to finish reading commit first :) [14:06:34] :) [14:06:47] k [14:08:09] bits refined hour corrected :) [14:08:57] danke [14:14:03] joal: [14:14:08] why not use HiveContext to get the data [14:14:14] rather than going through hdfs paths? [14:14:36] Tried, but got out of memory errors every time :( [14:14:41] oh, really? [14:14:43] huh. [14:14:47] I'll try again :) [14:15:01] well, i mean,i haven't tried that much, but it seems like if it worked it would be much cleaner [14:15:12] then you wouldn't have to do deal with turning the paths into dataframes [14:15:12] Yeah, I know the feeling [14:15:16] you could just select from the table [14:23:58] joal: not sure I understand the numPartitions argument [14:24:04] i get spark partitions [14:24:31] but, why is it an argument, and why do you repartition the pastData? [14:51:06] (CR) Ottomata: "This is in some inline comments, but I'll mention it here too:" (5 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/212542 (owner: Joal) [14:51:22] (CR) Ottomata: Add pageview aggregation and parquet merger. (9 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212541 (owner: Joal) [14:54:58] joal: I tried running the my code using spark-submit yesterday and it kept coming up with this error :( full stacktrace is here: [14:55:01] https://www.irccloud.com/pastebin/WWBDrtVE [14:55:38] joal: I see that SparkSubmit uses this class in places - but I have no idea why it's not found [15:00:39] Hi madhuvishy [15:00:50] hi joal [15:01:00] Have you updated the spark version ? [15:01:41] At the end of the main pom file: 1.3.0-cdh5.4.0 [15:01:42] joal: no.. [15:01:55] I think it might be the thing :) [15:02:25] joal: gaah. should i change it to the latest? [15:03:42] I think if you get latest master version and rebase on master, it should be set for you [15:03:52] Does it makes sense ? [15:07:17] ottomata: Thanks a lot for the comments :) [15:10:41] joal: I already have the latest master. my version is 1.3.0 [15:11:20] madhuvishy: ok super :) [15:11:39] joal: it still fails though [15:11:58] hm --> even if you have last master, have you rebased ? [15:12:15] Because the version number of the generated jar tells me you probably haven't ) [15:12:33] madhuvishy: --^ [15:15:31] joal: hmmm, okay doing that. [15:16:34] joal: what should the version be [15:16:39] madhuvishy: Good luck, rebasing over relatively newer versions can be tough ... [15:16:46] let me check [15:17:34] 1.3.0-cdh5.4.0 [15:18:28] joal: hmmm, that's what i had. [15:18:50] in master, yes, but in the branch you are working on as well ? [15:21:05] joal: ja can discuss realtime today if you like [15:21:19] cool, let's do that [15:28:03] joal: yeah. But something changed with the rebase. I get a different error when I build. Will look at it after standup [15:28:31] madhuvishy: as I said, rebase after some changes sometimes needs merging --> tough time [15:28:38] I'll help if needed :) [15:28:42] madhuvishy: --^ [15:29:06] joal: I'll definitely need that :) [15:29:32] kevinator: I'm walking to office. Should be a couple minutes late to standup. [15:47:44] Analytics-Kanban, Analytics-Visualization: Build Multi-tennant Dashiki (host different layouts) - https://phabricator.wikimedia.org/T88372#1301671 (Milimetric) [15:47:45] Analytics-Kanban, Analytics-Visualization, Patch-For-Review: Create modular build system for Dashiki [21 pts] - https://phabricator.wikimedia.org/T96337#1301670 (Milimetric) Open>Resolved [15:59:47] ottomata: got an idea about the hive context issue in sparksql: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#compatibility-with-apache-hive [16:03:00] kevinator: Can't join the call on pageview ... [16:03:09] Anything wrong on your side ? [16:03:47] me too [16:04:00] Analytics-Cluster, Analytics-Kanban: UDFs to parse the url to get article and dialect [13 pts] {wren} - https://phabricator.wikimedia.org/T99918#1301685 (JAllemandou) NEW a:JAllemandou [16:05:33] milimetric: while I am at it, I also create the task about oozification :) [16:05:39] cool, thx [16:06:00] joal: what's the size of the raw pageview table, and what do you anticipate the size of the intermediate aggregation will be? [16:06:03] 50 TB and 10TB? [16:06:44] I expect the intermediate agg to be about 500M / hour [16:07:28] joal: I asked ggellerman___ to send out a new meeting invite with a new hangout link. it's not working for several people [16:07:31] therefore 4T per year [16:07:36] roughly [16:07:41] joal: woah! That's pretty small! [16:07:44] k, thx kevinator [16:08:00] Well, needs confirmation, but that's what I got from testing [16:08:02] if we aggregate to the day level from that and reduce some dimensions, I think we can serve that data raw from pretty much anywhere? [16:08:07] yes, of course [16:08:20] and the raw pageview table is more like 50TB, right? [16:08:24] hmmm -> That data is compressed [16:08:36] I see, ok [16:08:42] so not so fast cowboy dan [16:08:56] We net to double check ;) [16:09:06] But don't worry, you'll know it :) [16:10:18] ottomata: Have you seen that link ? [16:10:49] kevinator: still not working for me :( [16:11:07] joal: me neither. Jon did get into the new meeting tho [16:11:41] joal, I keep retrying every minute but no luck [16:11:50] link? [16:11:56] upppppp- ^ [16:12:00] joal: when I do "hdfs dfs -du /some/path", how do I interpret those numbers, I get something like this: [16:12:00] 951947711652 2855843134956 /wmf/data/wmf/mediacounts [16:12:00] 1271488021416 3814464064248 /wmf/data/wmf/pagecounts-all-sites [16:12:00] 86582703759357 259748111278071 /wmf/data/wmf/webrequest [16:12:21] kevinator: finally managed to get in there ! [16:13:08] milimetric: -h option ;) [16:13:26] oh no, that's not what I mean, there are two numbers and I was wondering what that meant [16:13:33] one is compressed one is not? [16:13:49] or one is replicated one is not? [16:13:53] one is usefull data, one is real (taking replication into account) [16:14:07] gotcha, thx [16:14:14] np [16:14:17] in meeting now :) [16:14:22] thx [16:18:08] joal: you think its because we are on a different hive version? [16:27:50] Don't you think ottomata ? [16:39:36] could be [16:39:40] you got OOMs? [16:39:45] Cause I have retested, fail :( [16:39:58] in meeting, will show you just after [16:40:12] k [16:53:37] joal: running to post office, back shortly... [17:02:41] * milimetric lunching [17:10:55] (PS14) Madhuvishy: Add Apps session metrics job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [17:10:57] (PS1) Madhuvishy: [WIP] Productionize app session metrics - Parse args using scopt [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) [17:12:12] joal: around? [17:12:17] in meeting [17:12:35] joal: okay let me know if you're free sometime [17:12:38] k [17:33:20] madhuvishy: here :) [17:39:11] hey joal - so I added the scopt part here - https://gerrit.wikimedia.org/r/212573 [17:39:38] and after i rebased I get this error [17:40:40] https://www.irccloud.com/pastebin/nWhiOw8e [17:41:09] Yes [17:41:22] I seeeee [17:41:40] The version bump involves quite a few changes in sparksql [17:41:46] hmmm [17:41:54] joal: I dont really understand the error. yeah i guessed something like that [17:42:27] So, basically, in the version originally written, the filter function was the one of a RDD [17:43:12] --> expects a function Taking the value of the RDD as input, and output a boolean [17:43:30] Now with sparkSQL, you are not loading RDDs anymore, but DataFrame [17:43:37] joal: okay.. [17:44:00] and dataframe has a filter function that is specific (SQL - like syntax) [17:45:01] Also for you to know, I have not managed to run hivecontext queries yet :( [17:45:09] Always an error of memory [17:45:27] joal: aah, that is when you try running it via oozie? [17:45:43] no, no, via spark-shell [17:46:11] basically, I can load parquetFiles, but can't read them through hivecontext.sql [17:46:17] :( [17:46:45] I assume it's a versioning issue (spark 1.3.0 is supposed to wotk with hive 0.13.1, and we have hive 1.1) [17:47:18] joal: got a paste of error? [17:47:25] joal: ohh. i thought the existing code was working - it was but now it doesn't? [17:47:45] So, the exisitng code wa working before upgrade :( [17:47:56] joal: aah [17:48:00] Now that we have upgraded spark version, it doesn't [17:48:14] joal: and this sql stuff is broken too [17:48:16] hmm [17:48:25] Or at least, I strongly support the idea that it doesn't (I haven't tested) [17:48:42] yup, you have it [17:49:44] Analytics-Cluster, Analytics-Kanban: Ooziefy and parquetize pageview intermediate aggregation using refined table fields [13 pts] {wren} - https://phabricator.wikimedia.org/T99931#1302083 (JAllemandou) NEW [17:50:16] ottomata: you around ? [17:50:28] yes [17:50:39] any news on the hivecontext stuff ? [17:52:32] joal: okay i think i can temporarily fix it by calling the rdd function on the dataframe. and in theory it should work. can then work on refactoring it to use a dataframe instead of rdd ( after i understand what those do a bit more) [17:52:35] ? [17:52:40] no do you have a paste of the error you are seeing? [17:52:46] i haven't been trying, but, now I shall! [17:52:57] ottomata: :) [17:53:07] madhuvishy: I think it the right approach :) [17:53:17] Will allow you to test some [17:53:25] But I think it will fail because of hive context [17:53:28] we'll see :) [17:53:30] Analytics-Cluster, Analytics-Kanban: Assess how to extract Mobile App info in webrequest - https://phabricator.wikimedia.org/T99932#1302098 (kevinator) NEW a:JAllemandou [17:54:26] ottomata: https://gist.github.com/jobar/6f3b003974dec4ed543f [17:54:41] joal: yeah will try now. i should still be able to test scopt. [17:55:19] Analytics-Cluster, Analytics-Kanban: Assess how to extract Mobile App info in webrequest - https://phabricator.wikimedia.org/T99932#1302110 (kevinator) [17:56:24] joal is that on spark shell, or did you have to look at running job log? [17:58:56] (CR) Ottomata: Add Apps session metrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [18:00:30] (CR) Ottomata: "Will ParquetMerger be run as an automated (oozie?) job? If not, and it is just for operational usage, then I agree we should put it into " [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212541 (owner: Joal) [18:00:45] joal: I'm trying to run a HiveContext query [18:00:50] it seems to hang on [18:00:50] 15/05/21 17:59:51 INFO ParseDriver: Parse Completed [18:00:52] is that right? [18:01:33] oh, then it gives me OOM [18:01:33] hm [18:05:25] DarTar, is Ellery available today? [18:06:04] he’s in Germany and probably offline by now [18:06:46] gotcha [18:07:29] I'm confused as to why readership has nobody available who can run a trivial hive query [18:07:52] ottomata: same behavior as for me :( [18:12:56] ottomata: how can I get this jar into our repo? http://mvnrepository.com/artifact/com.github.scopt/scopt_2.11/3.3.0 [18:15:35] madhuvishy: it is easiest if we do new deps all at once [18:15:49] i temp enable a proxy connector to maven central [18:16:02] then, if your maven wants deps from our archiva [18:16:03] ottomata: cool, no hurry - whenever you are doing that - let me know [18:16:06] archiva will get theem automatically [18:16:10] ottomata: aah [18:16:20] alright. [18:16:21] so, if you know deps that you need, go ahead and add them all to your poms [18:16:23] and i'll do that [18:16:33] then you can build, and we archive will get them [18:16:49] ottomata: okay, i'll poke you about this at the end of the task then [18:17:00] we just don't leave the connector on by default, because we don't want archiva just grabbing any jars from the internet whenever anybody who uses it tries to compile something [18:17:02] sho thang [18:21:19] joal, i increased --driver-memory [18:21:21] now get a different error [18:21:55] https://gist.github.com/gists [18:26:59] job hasn't died though... [18:27:32] both warnings i guess [18:30:18] joal: ottomata So good news - scopt option parsing works. but job failed with the Out of memory error [18:31:08] ja bad news indeed :( [18:33:37] hmm [18:37:30] still no OOM but the proess is definitely stuck waiting for something that isn't happening [18:38:41] you didn't past the gist though :) [18:39:15] joal: ? [18:39:15] https://gist.github.com/ottomata/65037ce8b080f4f9da08 [18:39:21] oh haha [18:39:23] whoops [18:39:24] yeah that ^ [18:39:25] :) [18:39:28] Thx [18:39:32] i think that might be irrelevant [18:39:53] those are weird warnings, maybe the datanode timed out because of whtever is hanging on the spark side [18:39:53] dunno [18:40:07] Might [18:40:29] trying to get some info [18:40:58] Maybe could come from a datanode removed [18:41:03] 2 out now [18:45:01] yeah but 1015 is fine [18:45:07] k [18:45:10] weird [18:45:28] Can't find anything about cdh 5.4, spark and hive :( [18:49:20] HIveContext is fine on small text data [18:49:29] testing on mobile_apps_uniques_monthly [18:52:10] fine with small parquet data... hmmm [18:52:25] weirdo that :( [18:57:35] Analytics-Wikimetrics, MediaWiki-Vagrant: Vagrant Setup alembic config errors - https://phabricator.wikimedia.org/T99631#1302285 (Memeht) So, I tried reinstalling twice last night, and both times I get an error message on running the wikimetrics vagrant setup. From looking at the stdout error code, a bun... [18:58:06] ottomata: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#unsupported-hive-functionality [18:58:12] Bucketting ? [18:58:38] could be, would have thought that wouldnt' matter unless you were trying to use tablesample or something [18:58:44] sine bucketing just makes separate files [18:58:47] will test [18:58:53] yeah, would have though so as well [19:01:55] joal: small bucketd parqeut table works :/ [19:02:55] ok ... Then it's really a size issue [19:03:04] hmmm [19:07:06] hm, maybe not [19:07:14] ? [19:07:15] i just copied an hour of misc data, the same hour i was OOMing on [19:07:21] into a external table in my db [19:07:23] with the same bucketing [19:07:30] no partitioning htough... [19:07:32] but ja that worked fine. [19:07:32] hm [19:07:35] will try with parittionign... [19:12:42] nope, works totally fine joal [19:12:44] check it if you want [19:12:49] :( [19:12:55] otto.webrequest_spark_parquet_test_with_data [19:13:05] I mean, I trust you [19:13:09] same partition in wmf.webrequest fails [19:13:22] How is that even possible ? [19:13:37] (PS2) Madhuvishy: [WIP] Productionize app session metrics - Parse args using scopt [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) [19:14:39] Let's try with even more memory for the driver ? [19:16:11] i am running in local mode atm. [19:16:31] And BTW ottomata, since you changed the HA for RM, can't access job info anymore :( [19:16:44] oooooh ottomata, that might be why :) [19:16:50] ? [19:16:58] local mode [19:17:03] nono [19:17:05] Remember the parquet issue we got ? [19:17:09] ? [19:17:16] k lemme try in yarn [19:17:33] but i can't get the wmf.webrequest data in local mode either [19:17:35] for that partition [19:17:40] just hangs [19:17:46] hm [19:17:47] k [19:17:57] Then it's a metastore issue, no ? [19:19:29] joal, what local mode parquet issue? [19:19:55] opening parquet file in local / yarn mode, before ensuring version homogeneity [19:21:39] ja joal my test table works in yarn mode [19:21:45] k [19:21:55] Then it's a metastore issue :( [19:22:04] like, what, too many partitions? [19:22:08] dunno [19:22:15] Will test something [19:22:19] lemme creata buncha partitions w no data on this table [19:22:19] :) [19:23:03] I am doing it the other way around :) [19:30:55] ottomata, what java thing are you running that's using 8 cores? [19:30:59] (not a complaint, just curious) [19:31:02] (stat1002) [19:31:27] must be spark local that is just haning [19:31:36] will kill it, it is busted :) [19:32:18] did that get rid of it Ironholds? [19:32:43] ottomata, still claims to be going [19:32:48] PID 4324 [19:32:51] check top! [19:33:09] it's kinda incredible :D [19:34:10] huhm [19:34:24] i think that was the java process, i killed the bash launcher [19:34:29] i guess it was left hanging, hm. [19:35:48] ottomata, gone now! [19:41:12] joal: even with > 7000 partitions [19:41:15] my test table is fine [19:41:22] k [19:42:26] still trying to get resources .. [19:42:39] will kill mine [19:58:01] ottomata: tried to create a new table based on same data [19:58:07] --> works ! [19:58:32] seems it's a metastore issue [19:59:46] haha that's what i'm doing too [19:59:53] works [19:59:56] new table with all the same partitions and locations as wmf.webrequest [19:59:57] you just did that? [20:00:14] did that but just created a single partition [20:01:26] aye i'm creating a bunch of partitoins on my new table now, gonna see what happens [20:01:31] if this works, then yeah, who knows what is going on [20:01:35] k [20:01:49] possibly the manual adaptation we did a while ago ? [20:01:50] we could try to recreate refined table. [20:01:55] with the sql queries? [20:01:56] maybe? [20:01:58] mysql* [20:02:05] can't think of anything else [20:02:10] yeah [20:21:00] ok running query! [20:21:04]