[03:25:16] (CR) Milimetric: "Some javascript gotchas pointed out in the bindings. Setting the id attribute is the only real problem, the rest could wait. The perform" (23 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/214036 (https://phabricator.wikimedia.org/T91123) (owner: Mforns) [03:29:35] Analytics-Cluster, Fundraising Sprint Kraftwerk, Fundraising Sprint Lou Reed, Fundraising Tech Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1328015 (Ottomata) In those cases, there are more requests in kafkatee than in udp2log... [05:24:37] Analytics, Engineering-Community, ECT-June-2015, ECT-May-2015: Analytics Team Offsite - Before Wikimania - https://phabricator.wikimedia.org/T90602#1328076 (Rfarrand) [05:49:37] Quarry: Add list of query executions to the query page side-bar - https://phabricator.wikimedia.org/T100982#1328147 (Abarcomb) This request arouse out of a discussion at a workshop. In my drawing, I saw modifications of the same query as branches, while completely new queries formed new top-level nodes. Of c... [08:44:11] Analytics, Tool-Labs-tools-Other: Work on Metrics tools wm-metrics and MediaCollectionDB, refactoring and code quality. - https://phabricator.wikimedia.org/T100710#1328387 (Qgil) [08:44:22] Analytics, Tool-Labs-tools-Other: Work on Metrics tools wm-metrics and MediaCollectionDB, refactoring and code quality. - https://phabricator.wikimedia.org/T100710#1328388 (Qgil) a:JeanFred [11:27:15] hi pginer. [11:27:49] hi lzia [11:28:24] pginer, do you know if we have a page that very briefly explains how users should use ContentTranslation, something like a super simple tutorial? [11:29:00] I want to make one for the recommendation test, since users will go to ContentTranslation directly without much background. I thought I'd double-check to make sure it doesn't already exist [11:29:49] I think amir created something [11:30:00] Let me paste what I can find here [11:30:07] thanks pginer. [11:31:38] Amir created this document: https://www.mediawiki.org/wiki/Content_translation/Documentation/User_guide [11:31:58] We have also a screencast: https://youtu.be/nHTDeKW3hV0 [11:32:46] That includes enabling the beta feature. There is a shorter version focusing just on the translation editor: https://youtu.be/Ed2Ke_RLqOo [11:32:56] perfect. thank you pginer. [11:33:11] ok, no problem [11:47:00] (CR) Mforns: [C: 2 V: 2] "LGTM" [analytics/dashiki] - https://gerrit.wikimedia.org/r/212454 (owner: Milimetric) [12:45:58] (PS5) Joal: Add get pageview_info udf and underlying functions [analytics/refinery/source] - https://gerrit.wikimedia.org/r/214349 [12:46:27] (CR) Mforns: [C: 2 V: 2] "LGTM" [analytics/dashiki] - https://gerrit.wikimedia.org/r/212467 (owner: Milimetric) [12:46:52] (CR) Joal: Add get pageview_info udf and underlying functions (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/214349 (owner: Joal) [12:51:05] halfak: I have no update neither, didn't have time to push the metadata analysis further [12:51:12] Shall we cancel ? [13:00:36] morning all [13:03:24] Hi milimetric [13:03:33] hi joal [13:04:41] hey joal, if I wanted to run that top articles job for 15 days instead of 30, do you think that's feasible with the cluster situation right now? [13:05:22] it is feasible for sure :) [13:05:27] I'm asking, I think, both against the raw data, and against the hourly aggregate once that finishes [13:05:48] I mean, will you guys let me run it and will it not bring down the cluster? [13:06:39] milimetric: let's batcave and talk about that [13:09:28] o/ milimetric & joal [13:09:39] I'm very briefly inbetween things and wanted to say "hi" :) [13:09:59] Hi "inbetween-man" :) [13:10:40] I hope to have time to make some progress on metadata analysis by the end of week [13:10:46] Depending on how pageviews work [13:10:50] We'll see [13:11:04] I've lost milimetric :( [13:11:25] * joal is hanging around like a dreadful soul [13:12:48] joal: :( sorry [13:12:57] I was spacing out and my pings aren't working [13:12:59] huhu :) [13:13:30] ok, restored pings. It's 2015 and we still have to "did you try reloading the page?" [13:13:41] joal: coming to the batcave [14:15:03] (CR) Mforns: [C: 2 V: 2] "LGTM! There are 2 console.logs, but I've seen you removed them in the next change." [analytics/dashiki] - https://gerrit.wikimedia.org/r/212800 (owner: Milimetric) [14:21:52] thanks mforns_ I checked the last commit in that string again and there are no loose console.logs anywhere [14:22:04] btw, the ag command line code search tool is Awesome [14:22:09] "silver searcher" [14:22:17] i'm not exactly sure how I lived without it [14:22:23] milimetric, yes, no problem [14:22:36] ? [14:23:16] milimetric, looking to ag [14:25:09] MORNING [14:25:11] hellloooo [14:26:32] hi! [14:30:36] Hullllo ottomata [14:30:43] hyeaaa [14:30:45] Have a minute for meeeee real quick ? [14:30:48] sure! [14:30:52] for you i have many minutes [14:30:55] batcave :) [14:44:08] joal, i think you missed my question about using a lib to extract the query params rather than parsing them ourselves [14:44:18] hm, nope :) [14:44:25] I added a comment about that :) [14:44:39] ja but I recommented because i think we are misunderstanding each other [14:44:46] i'm asking about actually extracting the commands [14:44:47] sorry [14:44:49] the params [14:44:52] as in [14:44:58] map = getParams(uri_query) [14:45:06] map['page_title'] [14:45:10] I need to decode the parameters in a specific way, and extracting include decoding [14:45:15] oh [14:45:18] hm [14:45:22] makes sense ? [14:45:27] I know, it's uggly [14:45:28] because you need to decode special you can't extract? [14:45:32] using lib? [14:45:34] because lib does decoding? [14:45:40] That's the thing [14:45:50] Or at least the lib I was using [14:46:05] apache.httpclient [14:46:12] something like that [14:46:48] hm, ok [14:46:53] :( [14:46:57] hm, well you still have some unused imports [14:47:04] org.apache.http.NameValuePair; [14:47:05] A LOT of data cleansing in there [14:47:11] URLEncodedUtils [14:47:16] A shit, forgot those [14:47:20] Will patch [14:50:09] Anything else before I resubmit ? [14:50:14] ottomata: --^ [14:51:31] hm, just so i understand the dialiec thing [14:51:48] if the uri_path looks like [14:52:03] /xx-xxx/yyyy [14:52:12] yup [14:52:13] return xx-xxx [14:52:16] if it look slike [14:52:18] /xx/yyyy [14:52:22] return default value? [14:52:27] yessir [14:52:32] iiinteresting [14:52:36] so will [14:52:41] examples I have are mostly for zh [14:52:42] /zh/Wikipedia... [14:52:53] will that have zh.wikipedia.org as uri host? [14:52:55] as well as [14:53:02] /zh-hk/... [14:53:12] likely those will both be in zh.wikipedia project? [14:53:13] zh is really strange from a host perspective [14:53:38] there are dialect in hosts for zh, as well as dialects in folders [14:53:41] hey milimetric, yt? [14:53:50] weird, ok! [14:53:51] heh [14:54:08] hey ottomata [14:54:44] So decision was taken with Ironholds to use only folder as dialect info, and leave host info as project [14:55:27] ok cool [14:55:33] milimetric: just a sanity check for me [14:55:37] is page_title the best name? [14:55:47] that's what I told joal, but i want to confirm tht [14:55:48] that [14:55:53] vs. [14:56:03] article, page, page_name, whatever [14:56:03] lemme think about it for a sec too [14:56:10] what's the field in the db? [14:56:14] just page? or page_title [14:56:21] i had remmebered page_title, but i want to be sure [14:56:47] page_title is cool 'cause it's the same in the mediawiki db [14:56:56] good, that's kinda what i wanted [14:57:07] we are going to use that more,a nd i think it would be good to standardize it where we can [14:57:11] I don't always love the legacy names, but i think in this case it makes sense [14:57:14] yeah [14:57:16] its a fine name [14:57:27] cool, ok [14:57:30] joal, ja, lgtm [14:57:36] send your patch and I will +2 [14:57:42] cool [14:57:56] Let's for Ironholds review as well before merging :) [14:58:15] (PS6) Joal: Add get pageview_info udf and underlying functions [analytics/refinery/source] - https://gerrit.wikimedia.org/r/214349 [15:00:35] joal: madhuvishy, i think the increase in exectuors solved exactly this: https://spark.apache.org/docs/1.3.0/tuning.html#memory-usage-of-reduce-tasks [15:02:07] (CR) Ottomata: [C: 2 V: 2] Add get pageview_info udf and underlying functions [analytics/refinery/source] - https://gerrit.wikimedia.org/r/214349 (owner: Joal) [15:02:24] Thx ottomata [15:03:28] joal: could you check out my comments here when you get a chance: [15:03:29] https://gerrit.wikimedia.org/r/#/c/212573/4/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/AppSessionMetrics.scala [15:03:36] mostly about -h for scopts and hdfsUriRoot [15:04:02] i could be wrong, but it seems like we shoudl be able to get a FileSystem object from defaults in .xml files, dunno [15:05:23] ottomata: will double check [15:06:45] joal, you can do [15:06:46] https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/Path.html [15:06:54] p = new Path("hdfs://pathtofile") [15:07:05] fs = p.getFileSystem(conf) [15:07:13] ah i will paste that into review for madhu [15:07:20] Makes sense [15:07:50] I used the hdfsRoot thing to mimic what we do with Oozie [15:08:34] ja but with oozie we usually just pass input and output to workflows or actions [15:08:49] the input is built from parts (hdfs uri, etc.) but the job takes just a single path [15:09:33] coolk [15:09:39] I don't really mind :) [15:09:53] Simplet path expression = Happy [15:10:09] s/t/r [15:10:25] (CR) Ottomata: [WIP] Productionize app session metrics - Parse args using scopt - Move away from HiveContext to reading Parquet files directly - Change rep (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [15:12:44] Analytics-Tech-community-metrics, ECT-June-2015: Gerrit changes reviewed per month (on scr.html) - https://phabricator.wikimedia.org/T97716#1329770 (Aklapper) (Adding #ECT-June-2015 because this blocks T94578 which is a hard goal for this month) [15:12:46] Analytics-Tech-community-metrics, ECT-June-2015: Active changeset *authors* and changeset *reviewers* per month - https://phabricator.wikimedia.org/T97717#1329772 (Aklapper) (Adding #ECT-June-2015 because this blocks T94578 which is a hard goal for this month) [15:17:53] (CR) Mforns: "LGTM! Just a comment on a comment." (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/212821 (owner: Milimetric) [15:21:53] ottomata: level of paralelism here refers to number of partitions, not number of cores [15:22:06] (PS4) Milimetric: Refactor Wikimetrics layout to use TimeseriesData [analytics/dashiki] - https://gerrit.wikimedia.org/r/212821 [15:22:17] nawww [15:22:19] how? [15:22:28] (CR) Milimetric: Refactor Wikimetrics layout to use TimeseriesData (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/212821 (owner: Milimetric) [15:22:30] i mean, both would matter, but without more exectuors you reach al imit [15:22:39] (PS3) Milimetric: Refactor Compare layout to use TimeseriesData [analytics/dashiki] - https://gerrit.wikimedia.org/r/213967 [15:22:44] (PS4) Milimetric: Use Dygraphs in Vital Signs [analytics/dashiki] - https://gerrit.wikimedia.org/r/214270 (https://phabricator.wikimedia.org/T96339) [15:22:53] well, you could have 100 partitions with 4 workers, or the opposite :) [15:23:04] it is talking about increasing the number of parallel tasks [15:24:02] it says: "the working set of one of your tasks" [15:24:11] task === executor [15:24:17] working set === partition [15:24:35] Spark can efficiently support tasks as short as 200 ms, because it reuses one executor JVM across many tasks and it has a low task launching cost, so you can safely increase the level of parallelism to more than the number of cores in your clusters. [15:25:01] that would be a lot of exectuors though, > # cores in cluster [15:25:17] I guess by default spark uses the number of executors as number of partitions to reduce on if not specified [15:25:30] Which would make sense [15:25:49] And, task != executor ! [15:25:58] right that is true [15:26:04] but, more exectuors == more parallel tasks [15:26:05] task == unit of work for an executor [15:26:29] (CR) Mforns: [C: 2 V: 2] "LGTM" [analytics/dashiki] - https://gerrit.wikimedia.org/r/212821 (owner: Milimetric) [15:26:34] "because it reuses one executor JVM across many tasks " [15:26:45] 1 executor JVM --- Many tasks [15:27:57] partitionning into small bits is better than into big ones --> facilitate the parallelization in case of non-homogeneous tasks, and prevents OOM issue [15:28:01] ottomata: --^ [15:28:26] Only concern --> writing results into small files [15:28:32] aye [15:28:46] well, can coalecing help there? [15:28:48] So, partition into small bits for execution, and then repartition into big enough bits for writing [15:29:01] For sure :) [15:29:33] aye [15:29:35] I thought there was one, but it seems to have been removed [15:29:40] that could be something to optimize [15:29:46] repartition(parallesm) [15:29:46] re-reading [15:29:57] there are a lot of knobs here, hard to know [15:30:06] so, i ran over 10 days with 40 executors and watched how busy they were [15:30:11] line 260 [15:30:16] most of the time, they each had 1 task, so that is good [15:30:38] They can't get more --> 1 task per executor per moment [15:31:41] right, but [15:31:42] joal / ottomata: standup :) [15:31:45] I'd like to know where, in the execution path, the OOM happened [15:31:51] oops sorry [15:31:53] just wanted to make sure there weren't empty execturos [15:31:54] whoos [15:32:10] ottomata: makes sense [15:42:17] Analytics-Cluster, hardware-requests, operations: Hadoop worker node procurement - 2015 - https://phabricator.wikimedia.org/T100442#1329868 (Ottomata) Ok cool, noted for the future danke. How goes? :) [15:42:58] Analytics-Cluster, hardware-requests, operations: Hadoop worker node procurement - 2015 - https://phabricator.wikimedia.org/T100442#1329872 (Ottomata) Oh, also, same number of cores please :) [15:44:39] Analytics-Cluster, Analytics-Kanban: Ooziefy and parquetize pageview intermediate aggregation using refined table fields [13 pts] {wren} - https://phabricator.wikimedia.org/T99931#1329876 (ggellerman) a:JAllemandou [15:52:44] Analytics-Tech-community-metrics, ECT-June-2015: Active changeset *authors* and changeset *reviewers* per month - https://phabricator.wikimedia.org/T97717#1329919 (Aklapper) p:Normal>High a:Dicortazar [15:52:46] Analytics-Tech-community-metrics, ECT-June-2015: Gerrit changes reviewed per month (on scr.html) - https://phabricator.wikimedia.org/T97716#1329922 (Aklapper) p:Normal>High a:Dicortazar [16:03:12] halfak: you there ? [16:06:16] Analytics-Cluster: Create current-definition/projectcounts [13 pts] {musk} - https://phabricator.wikimedia.org/T101118#1330039 (kevinator) NEW [16:07:29] Analytics-Cluster, Analytics-Kanban: Add Pageview aggregation to Python [13 pts] {musk} - https://phabricator.wikimedia.org/T95339#1330065 (kevinator) [16:08:33] Analytics-Cluster, Analytics-Kanban: Create current-definition/projectcounts [13 pts] {musk} - https://phabricator.wikimedia.org/T101118#1330076 (kevinator) [16:19:03] Analytics-Cluster, Analytics-Kanban: Add Pageview aggregation to Python [13 pts] {musk} - https://phabricator.wikimedia.org/T95339#1330118 (kevinator) [16:19:05] Analytics-Cluster, Analytics-Kanban: Create current-definition/projectcounts [13 pts] {musk} - https://phabricator.wikimedia.org/T101118#1330117 (kevinator) [16:23:04] milimetric, madhuvishy, joal, can anyone of you please review this patch: https://gerrit.wikimedia.org/r/#/c/215200/ ? It fixes a quite critical bug that we introduced in the last utf-8 changes and that affects a lot of wikimetrics users. [16:23:25] i'll do it mforns [16:23:28] When you merge it, I will deploy it asap! [16:23:42] thx milimetric [16:23:45] thanks milimetric [16:23:51] I can go for it, but will take longer ! [16:24:45] (PS6) Madhuvishy: [WIP] Productionize app session metrics - Parse args using scopt - Move away from HiveContext to reading Parquet files directly - Change reports to run for last n days instead of daily or monthly (not sure if this is gonna work yet) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) [16:24:55] ottomata: pushed latest code. [16:24:58] mforns: you've got a flake8 error, besides that it looks good to me [16:25:08] i'm gonna run the migration and test locally [16:25:14] milimetric, oh gosh [16:25:47] ottomata: and the command I ran was: spark-submit --master yarn --driver-memory 1500M --num-executors=60 --executor-cores=1 --executor-memory=2g --class org.wikimedia.analytics.refinery.job.AppSessionMetrics --verbose /home/madhuvishy/workplace/refinery-source/source/refinery-job/target/refinery-job-0.0.12-SNAPSHOT.jar -o /user/madhuvishy/tmp/ -y 2015 -m 5 [16:25:47] -d 26 -n 15 [16:26:22] Analytics-Tech-community-metrics, ECT-June-2015: Present most basic community metrics from T94578 on one page - https://phabricator.wikimedia.org/T100978#1330159 (Aklapper) p:High>Low > I think this task is a nice to have but not a blocker. That makes things more relaxing, thanks. Lowering priorit... [16:26:33] (PS2) Mforns: Fix cohort description utf8 bug [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/215200 (https://phabricator.wikimedia.org/T100781) [16:26:58] milimetric, pushed a new patch fixing flake8 [16:28:19] mforns: I have a call now, so not able to pull and test it, but looked at the code and it looks good to me [16:28:37] madhuvishy, thanks! don't worry [16:29:07] I will deploy it in staging before production [16:29:34] (CR) Milimetric: [C: 2] Fix cohort description utf8 bug [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/215200 (https://phabricator.wikimedia.org/T100781) (owner: Mforns) [16:29:47] milimetric, thank you :] [16:29:48] mforns: nice, tests fail before migration, work after [16:29:50] I love alembic [16:30:01] cool [16:30:19] ottomata, in in 1 [16:30:21] ...okay, 2 [16:31:20] madhuvishy: thanks, will try after this meeting.. [16:31:47] Ironholds: helloooo [16:32:13] https://plus.google.com/hangouts/_/wikimedia.org/domain-info [16:34:35] Ironholds: ? [16:34:41] ottomata, see above ;p [16:34:52] oh you'd be here [16:34:52] sorry [16:34:56] thought you were counting me down [16:44:06] Analytics-Tech-community-metrics, ECT-June-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1330263 (Aklapper) [16:44:07] Analytics-Tech-community-metrics, ECT-June-2015: Present most basic community metrics from T94578 on one page - https://phabricator.wikimedia.org/T100978#1330264 (Aklapper) [16:44:18] Analytics-Tech-community-metrics, ECT-June-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1167361 (Aklapper) [16:44:19] Analytics-Tech-community-metrics, ECT-June-2015: Present most basic community metrics from T94578 on one page - https://phabricator.wikimedia.org/T100978#1325337 (Aklapper) [17:01:39] Analytics-Kanban: Add cache headers to the datasets.wikimedia.org/limn-public-data/metrics folder - https://phabricator.wikimedia.org/T101125#1330382 (Milimetric) NEW a:Milimetric [17:02:07] Analytics-Kanban: Add cache headers to the datasets.wikimedia.org/limn-public-data/metrics folder {lion} - https://phabricator.wikimedia.org/T101125#1330391 (kevinator) [17:11:43] (CR) Ottomata: [C: 2] [WIP] Productionize app session metrics - Parse args using scopt - Move away from HiveContext to reading Parquet files directly - Change rep (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [17:11:50] (CR) Ottomata: [WIP] Productionize app session metrics - Parse args using scopt - Move away from HiveContext to reading Parquet files directly - Change rep [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [17:16:05] Ironholds: plop [17:16:09] Forgot to ask you [17:16:23] ? [17:16:39] Do you mind having quick look at that code review: https://gerrit.wikimedia.org/r/#/c/214349/5 [17:22:51] Analytics-Cluster, Analytics-Kanban: Ooziefy and parquetize pageview intermediate aggregation using refined table fields [13 pts] {wren} - https://phabricator.wikimedia.org/T99931#1330548 (ggellerman) [17:22:53] Analytics-Cluster, Analytics-Kanban: Compute pageviews aggregates daily and monthly from April {wren} - https://phabricator.wikimedia.org/T96067#1330547 (ggellerman) [17:33:33] joal, got meetings and research spikes all day :( [17:33:45] mwarf :( [17:34:14] Ok nonetheless, no merge needed before some other modification from a colleague [17:34:27] Can you get a quick look tomorrow ? [17:34:33] Ironholds: --^ [17:44:09] (CR) Madhuvishy: [WIP] Productionize app session metrics - Parse args using scopt - Move away from HiveContext to reading Parquet files directly - Change rep (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [17:47:05] joal, totally! [17:47:19] madhuvishy: Can you tell me where in the execution process you got that OOM issue ? [17:47:25] Thx Ironholds :) [17:47:55] joal: Hmmm I tried looking at yarn logs but it doesn't tell me much. I can run it again and see. [17:50:22] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics, Easy, Need-volunteer: "Create Report" button does not appear when uploading a new cohort - https://phabricator.wikimedia.org/T95456#1330687 (Abit) Hullo @madhuvishy, I just started another batch of Wikimetrics reports and got the sa... [17:56:32] Analytics-Tech-community-metrics: Weekly report for "Allow contributors to update their own details in tech metrics directly" - https://phabricator.wikimedia.org/T101134#1330724 (Sarvesh.onlyme) NEW a:Dicortazar [17:57:29] Analytics-Cluster, operations: Build Kafka 0.8.1.1 package for Jessie and upgrade Brokers to Jessie. - https://phabricator.wikimedia.org/T98161#1330742 (faidon) Yeah, what @Ottomata said, it's just a handful of packages and we have this running on Ubuntu as well, so how hard can it be... On the above: -... [17:58:54] Analytics-Tech-community-metrics: Weekly report for "Allow contributors to update their own details in tech metrics directly" - https://phabricator.wikimedia.org/T101134#1330752 (Sarvesh.onlyme) [17:58:56] Analytics-Tech-community-metrics, ECT-June-2015, Epic, Google-Summer-of-Code-2015: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1330751 (Sarvesh.onlyme) [18:00:06] Analytics-Tech-community-metrics: Weekly report for "Allow contributors to update their own details in tech metrics directly" - https://phabricator.wikimedia.org/T101134#1330724 (Sarvesh.onlyme) [18:01:21] Analytics-Tech-community-metrics: Weekly report for "Allow contributors to update their own details in tech metrics directly" - https://phabricator.wikimedia.org/T101134#1330774 (Sarvesh.onlyme) [18:02:14] Analytics-Tech-community-metrics: Weekly report for "Allow contributors to update their own details in tech metrics directly" - https://phabricator.wikimedia.org/T101134#1330724 (Sarvesh.onlyme) [18:05:43] milimetric, do you have 3 minutes? [18:05:57] hey mforns, I'm in a meeting now [18:05:59] but what's up [18:06:02] they're just doing intros [18:06:15] milimetric, I had to change the migration to work in staging [18:06:41] milimetric, I wanted to ask you a couple of things to make sure this won't mess up production [18:07:24] milimetric, but I can wait until you finish the meeting, no rush! [18:08:35] mforns: sure, ask away [18:08:38] i'll answer as i can [18:08:45] ok [18:10:02] so, when I run the migrations on staging, the values of the description column that contained special characters were migrated like this "abc???def", so there was an encoding problem in migrating them [18:10:35] so I loked at the current collation of the column, and it was latin_swedish_ci [18:11:20] I added a couple lines of code to the migration to convert the collation of the column first and then change to binary [18:11:30] and this worked in the end [18:12:06] but looking to the collation in production, the description column is a utf8_general_ci already [18:12:30] so, I wonder if the lines that I added to the migration will be necessary/harmful in production? [18:12:53] as the dbs of staging vs production differ in that aspect [18:32:28] EOD time ! [18:32:36] Have a good end of day guys :) [18:32:43] See you tomorrow [18:34:58] night joal|night ! [18:37:10] ottomata: https://spark.apache.org/docs/latest/tuning.html I was reading this yesterday [18:37:38] and wondering if my kryo serializer needs more memory [18:38:09] it was 24. I bumped it up to 1024 for fun, to see what'd happen. [18:44:48] mforns: you have two choices [18:45:00] 1. restore the production database from the backup zip on /data/project/... [18:45:04] * mforns is hearing [18:45:19] 2. try it in a controlled test on staging [18:45:31] (downgrade, change the collation, add some weird names, upgrade) [18:45:46] milimetric, I already did number 2. [18:45:54] Personally, I'd be happy with 2. because this isn't critical data and we have it backed up anyway [18:46:14] but if you want to do 1. I can help you [18:46:35] madhuvishy: did it make a difference? [18:46:44] milimetric, I'd say that the original migration will work in production [18:46:58] mforns: that sounds very likely to me [18:47:12] ottomata: nah. i'm getting these errors for 15 days - http://pastebin.com/ZuehW4uC [18:47:16] why the hell that was latin_swedish_ci ... lol, I'm sure I'll never find out [18:47:18] milimetric, cool, I'll go for the deployment then, thansk1 [18:47:34] np, let me know if you need any help [18:50:36] ottomata: this look kiiinda similar - http://apache-spark-user-list.1001560.n3.nabble.com/setting-heap-space-td16245.html [18:52:04] ya i see that too madhuvishy [18:52:44] madhuvishy: this may help [18:52:45] - Always specify the level of parallelism while doing a groupBy, reduceBy, join, sortBy etc. [18:52:48] ottomata: hmmm. so it failed. these are the 4 errors i got - http://pastebin.com/AzNpd4ps (joal|night) [18:53:03] ottomata: yeah, i saw that too [18:56:11] ottomata: although i dont see any of that in the code [18:56:26] thre is a reduce [18:56:38] but ya it doesn't seem to be OOMing there [18:56:39] hm [18:56:45] oh [18:56:46] yes it is [18:56:54] Job 0 failed: reduce at AppSessionMetrics.scala:54, too [18:57:43] (PS7) Madhuvishy: [WIP] Productionize app session metrics - Parse args using scopt - Move away from HiveContext to reading Parquet files directly - Change reports to run for last n days instead of daily or monthly (not sure if this is gonna work yet) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) [18:57:44] aah [18:58:52] not sure though madhuvishy, not exactly sure how to get more partitions during reduce, how many keys do you think there are here? [19:00:34] ottomata: i'm not sure either. also may be the combineByKey is problematic? [19:02:02] hm, maybe, why lower partitions to 100? [19:02:04] just curious [19:02:23] milimetric, it went well, will send an email to the list [19:02:46] ottomata: i actually dont know, it says so in the code. [19:02:58] ha ok [19:03:22] i dint touch those parts - so i dont have a lot of context too [19:03:37] mforns: do you know [19:03:38] ? [19:03:53] // lowering in 1 order of magnitude the number of partitions for this job [19:03:54] // logs list 1500 partitions for the original dataset [19:03:54] val userSessions = userSessionsAll.coalesce(100) [19:04:45] ottomata, reading [19:05:04] madhuvishy: i will try to look into this a lot more soon too: [19:05:05] https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation [19:05:14] looks like it requires some nodemanager daemon changes [19:05:18] won't get to that tdoay thoguh [19:05:56] ottomata, madhuvishy, I remember Nuria reducing the number of partitions to improve performance maybe? [19:06:25] ja, not sure at the moment, reducing parallelism might reduce performance because more mem needed [19:06:31] buuut, that is not currently where we are getting OOMs [19:07:25] hmmm [19:07:50] weird, madhuvishy, i'm running with 10 days nwo and also OOMing [19:07:53] but it worked yesterday... [19:08:29] ottomata: hmmm, wondering if getting the timestamps this way is not helping [19:09:05] they say java objects are costly - so may be java.sql.Timestamp to long conversion is causing it [19:10:18] vs. the thing we were doing before with the udf? [19:10:26] that is a change, eh [19:10:26] ? [19:10:28] hm [19:11:08] madhuvishy: maybe also [19:11:08] http://spark.apache.org/docs/1.0.0/programming-guide.html#which-storage-level-to-choose [19:12:50] we should try MEMORY_ONLY_SER? [19:13:17] ottomata: ^ [19:14:00] ja i would try it [19:14:05] not sure where to set that though [19:14:09] .persist()? [19:14:13] ya i'm looking too [19:14:43] (CR) Ottomata: [WIP] Productionize app session metrics - Parse args using scopt - Move away from HiveContext to reading Parquet files directly - Change rep (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [19:17:41] ottomata: should we persist userSessions? [19:18:06] well, at the moment we see OOMs at the reduce of nums.map(QTree(_)) [19:18:15] but, i barely even know what that is :) [19:18:28] ottomata: he he me too [19:19:33] ottomata: well i guess it's just summing up a list of nums in some special way [19:19:40] ja [19:20:07] ottomata: https://spark.apache.org/docs/latest/tuning.html#tuning-data-structures [19:20:38] may be making all those QTree objects is not good [19:21:23] ja i mean, who knows, try persist there i guess? [19:21:25] can't hurt to try :) [19:21:51] madhuvishy: btw, i'm going back a few patches to the UDF one and trying it [19:21:55] on 10 days with the same settings [19:22:06] ottomata: yeah cool. [19:28:12] ottomata: let me know if that succeeds. i'm trying the persist [19:28:16] k [19:36:45] madhuvishy: fabian suggests reducing depth of depth of qtree [19:36:48] to 6 maybe [19:37:10] this job is runnign much longer with your previous patch, btw, i htink you might be right about the primative type [19:38:05]