[08:13:30] Analytics-Tech-community-metrics, ECT-June-2015: Active changeset *authors* and changeset *reviewers* per month - https://phabricator.wikimedia.org/T97717#1399394 (Dicortazar) Open>Resolved [08:13:32] Analytics-Tech-community-metrics, ECT-June-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1399395 (Dicortazar) [08:18:55] Analytics-Tech-community-metrics, ECT-June-2015: Fine tune "Code Review overview" metrics page in Korma - https://phabricator.wikimedia.org/T97118#1399396 (Qgil) I think the best solution here is to bring the generic graphs from http://korma.wmflabs.org/browser/gerrit_review_queue.html, which have receive... [08:21:12] Analytics-Tech-community-metrics, ECT-June-2015: Fine tune "Code Review overview" metrics page in Korma - https://phabricator.wikimedia.org/T97118#1399398 (Qgil) [08:21:58] Analytics-Tech-community-metrics, ECT-June-2015: Fine tune "Code Review overview" metrics page in Korma - https://phabricator.wikimedia.org/T97118#1399401 (Qgil) a:Qgil>None [08:25:00] Analytics-Tech-community-metrics, ECT-June-2015: Key performance indicator: analyze who contributes code - https://phabricator.wikimedia.org/T55485#1399403 (Qgil) Open>Resolved I'm sorry it took me so long to check http://korma.wmflabs.org/browser/who_contributes_code.html In fact, it looks good to... [08:27:04] Analytics-Tech-community-metrics, ECT-June-2015: Active changeset *authors* and changeset *reviewers* per month - https://phabricator.wikimedia.org/T97717#1399405 (Dicortazar) Names were updated and trends added to the loop. This should be updated in the next data retrieval. [08:27:37] Analytics-Tech-community-metrics, ECT-June-2015: Gerrit changes reviewed per month (on scr.html) - https://phabricator.wikimedia.org/T97716#1399406 (Dicortazar) Names were updated and trends added to the loop. This should be updated in the next data retrieval. [08:27:46] Analytics-Tech-community-metrics, ECT-June-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1399408 (Dicortazar) [08:27:48] Analytics-Tech-community-metrics, ECT-June-2015: Gerrit changes reviewed per month (on scr.html) - https://phabricator.wikimedia.org/T97716#1399407 (Dicortazar) Open>Resolved [08:28:53] Analytics-Tech-community-metrics, ECT-June-2015, Patch-For-Review: "Volume of open changesets" graph should show reviews pending every month - https://phabricator.wikimedia.org/T72278#1399409 (Dicortazar) Let's close this ticket :). Thanks @Acs and @Qgil!. [08:29:02] Analytics-Tech-community-metrics, ECT-June-2015, Patch-For-Review: "Volume of open changesets" graph should show reviews pending every month - https://phabricator.wikimedia.org/T72278#1399410 (Dicortazar) Open>Resolved [08:29:58] Analytics-Tech-community-metrics, ECT-June-2015: Fine tune "Code Review overview" metrics page in Korma - https://phabricator.wikimedia.org/T97118#1399416 (Qgil) p:High>Normal [08:30:25] Analytics-Tech-community-metrics: Weekly report for "Allow contributors to update their own details in tech metrics directly" - https://phabricator.wikimedia.org/T101134#1399418 (Qgil) p:Triage>Normal [08:30:59] Analytics-Tech-community-metrics, ECT-July-2015: Ranking of repositories in Korma's code review page should update more often - https://phabricator.wikimedia.org/T102112#1399419 (Qgil) p:Triage>Normal [08:34:41] Analytics-Tech-community-metrics, ECT-July-2015: Mysterious repository breakdown(s)/sorting order - https://phabricator.wikimedia.org/T103474#1399430 (Qgil) In fact, maybe we should simply substitute that list for the list of repos in http://korma.wmflabs.org/browser/gerrit_review_queue.html, which has re... [08:34:58] Analytics-Tech-community-metrics, ECT-July-2015: Mysterious repository breakdown(s)/sorting order - https://phabricator.wikimedia.org/T103474#1399434 (Qgil) p:Triage>Normal [08:39:50] Analytics-Tech-community-metrics, ECT-June-2015: Maniphest backend for Metrics Grimoire - https://phabricator.wikimedia.org/T96238#1399448 (Dicortazar) @Qgil, in order to have all of the information together, we're finally retrieving all of the tickets from Phabricator. This means that at some point, Bugz... [08:40:16] Analytics-Tech-community-metrics: Legend for "review time for reviewers" - https://phabricator.wikimedia.org/T103469#1399450 (Qgil) p:Triage>Normal I think "Review time" and "Review time for reviews" refers to what we call "Time from submission" and "Time from last patchset" in "Age of open changesets... [08:41:50] Analytics-Tech-community-metrics, Engineering-Community: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1399462 (Qgil) Very good and very simple point. How does Metrics Grimoire scan Git/Gerrit repositories? Does it hav... [08:42:01] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1399463 (Qgil) [08:45:12] Analytics-Tech-community-metrics: Change "Date of submission of open changesets" to Date of upload - https://phabricator.wikimedia.org/T72650#1399467 (Qgil) Open>Resolved a:Qgil We have this metric in http://korma.wmflabs.org/browser/gerrit_review_queue.html since a long time ago ("Distribution of o... [08:46:30] Analytics-Tech-community-metrics, ECT-July-2015: Remove deprecated repositories from korma.wmflabs.org code review metrics - https://phabricator.wikimedia.org/T101777#1399470 (Qgil) [08:48:10] Analytics-Tech-community-metrics: Consolidating time ranges across tech community metrics - https://phabricator.wikimedia.org/T86630#1399476 (Qgil) Open>Resolved a:Qgil I don't know whether this is resolved or declined or both, :) but the fact is that we don't have a problem "Consolidating time ra... [08:49:05] Analytics-Tech-community-metrics: Code review time must be on merged patches, not closed ones - https://phabricator.wikimedia.org/T68265#1399479 (Qgil) p:Low>Normal [08:49:47] Analytics-Tech-community-metrics: Key performance indicator: code contributors new / gone - https://phabricator.wikimedia.org/T63563#1399480 (Qgil) p:Low>Normal [09:26:07] Analytics-Tech-community-metrics, ECT-July-2015: Ranking of repositories in Korma's code review page should update more often - https://phabricator.wikimedia.org/T102112#1399532 (Aklapper) p:Normal>High [09:32:27] Analytics-Tech-community-metrics, ECT-July-2015, ECT-June-2015: Tech metrics should talk about "Affiliation" instead of organizations or companies - https://phabricator.wikimedia.org/T62091#1399537 (Aklapper) [09:33:26] Analytics-Tech-community-metrics, ECT-August-2015, ECT-June-2015: "Median time to review for Gerrit Changesets, per month": External vs. WMF/WMDE/etc patch authors - https://phabricator.wikimedia.org/T100189#1399538 (Aklapper) [09:39:21] Analytics-Tech-community-metrics, Easy: Illegible overlapping tables on narrow screens - https://phabricator.wikimedia.org/T97115#1399547 (Aklapper) p:Normal>Low [09:44:34] Analytics-Tech-community-metrics, Engineering-Community, MediaWiki-Extension-Requests: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1399562 (Aklapper) p:Low>Lowest [09:56:57] Analytics-Tech-community-metrics, JavaScript: Failed to load resource: the server responded with a status of 404 (Not Found) - https://phabricator.wikimedia.org/T65061#1399613 (Aklapper) p:Normal>Lowest This creates no problems for retrieving/displaying the intended data; hence setting priority to L... [09:57:01] Analytics-Tech-community-metrics: MediaWiki.org stats should also consider discussion - https://phabricator.wikimedia.org/T62074#1399615 (Aklapper) p:Normal>Low [10:04:25] Analytics-Tech-community-metrics: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1399636 (Aklapper) p:Normal>Low [10:05:47] Analytics-Tech-community-metrics: Contributor pages without data should include an explanation - https://phabricator.wikimedia.org/T58111#1399644 (Aklapper) Note that the 404 errors part is T65061; this task is about displaying an empty page. [12:03:48] Analytics-Tech-community-metrics, ECT-June-2015: Ensure that most basic Tech Community Metrics are in place - https://phabricator.wikimedia.org/T94578#1400054 (Qgil) [12:06:14] Analytics-Tech-community-metrics, ECT-June-2015: Most basic Tech Community metrics are published and up to date - https://phabricator.wikimedia.org/T94578#1400077 (Qgil) [14:16:49] (PS2) Joal: Add webstatcollector projectview transformation [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) [14:19:51] milimetric: https://phabricator.wikimedia.org/T103798 [14:20:18] :( [14:20:23] yeah... makes sense [14:23:04] ottomata: wanna brain bounce on what to do about this? [14:23:36] milimetric: sure [14:23:38] one sec [14:38:07] mforns: you around? [14:56:13] joal|night: should I review the projectview transformation oozie change? [14:56:49] Hi ottomata : You can go for it, I made a change for correctness earlier today [14:57:01] k [14:58:34] hm joal|night, i wonder, maybe the archive webstatscollector part should be just another action of the projectview hourly job? [14:58:35] not sure. [14:58:39] whatcha think? [14:58:45] do we want to have another coordinator just for this? [14:59:13] ottomata: Clearly feasible :) [14:59:44] it's only that at some point we will probably remove this job and keep and the projectview transformation one [14:59:54] But it's the same really :) [14:59:58] I don't mind [15:00:34] yeah not sure, i was going to put some thought into the naming of this job, and the directory hierarchy, bla blabla, but if it was just another action i wouldn't have to think about it :p [15:00:40] do you have a preference? [15:00:44] which do you think is better? [15:02:03] hm, i'm looking, the year plus one hour thing is kinda specific to this job [15:02:20] joal|night: i think i don't have a big preference, especially since you've already done this. [15:03:23] ottomata: same for me, if you prefer, I modify, if not, it'll stay like that [15:05:05] haha [15:05:35] ottomata: I also tried to launch madhu's job manually --< never got enough resources to launch :( [15:05:43] even in the production queue [15:06:07] joal|night: ! i have about a 0.500001 preference for doing it in an action of the current job, but that is a very small preference, which might be overridden by the fact that you have already coded it this way :P [15:06:19] hmMMMM joal|night that is not good. [15:06:40] trying to get more logs using client deply mode [15:06:55] ottomata: --^ [15:07:06] By the way [15:07:39] job just launched seems to work fine for now [15:08:00] ottomata: You know I don't mind changing the oozie stuff, so if you prefer, I'll recode :) [15:08:37] * joal feels like the penelope of oozie job's ;) [15:09:17] ottomata: what analytics host can I use for load testing? [15:09:19] hahahaha [15:10:25] (load testing EL) [15:10:41] you were saying an1004 before, just checking that's still good [15:11:25] ja that is fine. [15:12:22] (CR) Ottomata: [C: 1] "I LEAVE IT UP TO YOU IF YOU WANT TO MAKE THIS AN ACTION OF PROJECTVIEW HOURLY! YOUR CALL! :)" [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) (owner: Joal) [15:12:28] ottomata: hm, doesn't have pip or access to git clone [15:12:45] ah hm. i gotcha... [15:12:55] no don't use git clone, just rsync the codebase over [15:12:57] that's what I do [15:13:03] yeah, but still won't have pip [15:13:05] but i got the deps for ya... [15:13:12] ok, cool [15:13:56] all the deps have deb packages so its cool [15:13:57] there you go. [15:18:54] Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review: Wikimetrics crashes when cohort description has special characters {dove} [5pts] - https://phabricator.wikimedia.org/T100781#1400684 (kevinator) Open>Resolved verified it works in production :-) [15:19:45] Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review: Wikimetrics crashes when cohort description has special characters {dove} [5pts] - https://phabricator.wikimedia.org/T100781#1400693 (mforns) o/ [15:19:48] milimetric: if you get a sec, take a look at my changes, ori hasn't responded, so I would like to merge them in beta [15:19:59] k [15:20:06] https://gerrit.wikimedia.org/r/#/c/220624/ [15:20:09] https://gerrit.wikimedia.org/r/#/c/220614/ [15:22:01] joal: madhu's job just laucnhed and works, is that what you said? [15:22:16] nope [15:22:25] Seems still waiting for resources :( [15:22:36] AppSessionMetrics? [15:22:39] yup [15:22:43] it says RUNNING now, so it got the resources... [15:23:03] It got some, but job doesn't launch :( [15:23:11] logs don't say anythiong [15:23:20] same deal right? no spark stuff happens? [15:23:28] yarn t hinks it is running by spark is not doing anything? [15:23:30] exactly [15:23:32] i can't load app master [15:23:33] hmmmm [15:24:21] joal: is that in client or cluster mode? [15:24:39] client mode this time with verbose to getmore logs [15:25:48] ottomata: executors are registered, but no job gets launched [15:25:49] is it printing anything or just ahnging? [15:25:54] hanging [15:26:03] joal, hm. i forget, in client mode, is the driver localhost? or is it the app master? [15:26:12] driver is localhost [15:26:24] Got an error ! [15:26:27] k, so, yarn says your app master is on analytics1011 though [15:26:39] and has a spark process there running with -Xmx512m [15:26:46] maybe need more --driver-memory? [15:26:52] hm, will try that [15:27:17] yeah, it has [15:27:19] ottomata: https://gist.github.com/jobar/f3c26aef3a7bb4f6b38d [15:27:19] -Dspark.executor.id= in the process list [15:27:41] First time I have that error [15:27:43] ah hm, i have seen that before joal, as far as I can tell that is just a warning for some datanode timeout [15:27:44] but not sure [15:27:48] ok [15:27:52] Analytics, Analytics-Kanban, Reading-Web: Debug blank datafiles generated by generate.py [8 pts] {lamb] - https://phabricator.wikimedia.org/T103387#1400742 (mforns) This seems to me as related to Sean Pringle's comments on slow queries (as Dan Andreescu had foreseen :]). These 3 queries hit a view (def... [15:28:02] Analytics-Tech-community-metrics, ECT-June-2015: Maniphest backend for Metrics Grimoire - https://phabricator.wikimedia.org/T96238#1400743 (Aklapper) Open>Resolved The backend is in place! ♥! Thanks a lot! So I'm closing this task (as discussed in our meeting today). >>! In T96238#1399448, @Dicorta... [15:28:07] Analytics-Tech-community-metrics, Phabricator, ECT-August-2015: Metrics for Maniphest - https://phabricator.wikimedia.org/T28#1400745 (Aklapper) [15:32:08] ottomata: joal standup! [15:32:21] oops ! [15:32:25] me too! [15:37:29] Analytics-Kanban, Research-and-Data: Validate Uniques using Last Access cookie {bear} - https://phabricator.wikimedia.org/T101465#1400770 (kevinator) [15:39:00] Analytics-Kanban, Research-and-Data: Validate Uniques using Last Access cookie {bear} - https://phabricator.wikimedia.org/T101465#1400775 (kevinator) a:madhuvishy [15:44:31] (PS3) Joal: Add webstatcollector projectview transformation [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) [15:48:23] Analytics-Kanban, Reading-Web: Cron on stat1003 for mobile data is causing an avalanche of queries on dbstore1002 - https://phabricator.wikimedia.org/T103798#1400809 (kevinator) [16:27:09] madhuvishy: are you around? :-) [16:29:39] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1401123 (mmodell) I don't think that including the phabricator repositories should bring down our... [17:05:01] kevinator, madhuvishy, are you coming to the retro? [17:06:30] the retro dance party? [17:17:31] YuviSheep: not that exciting. Analytics retrospective [18:22:37] woot, milimetric, kevinator, backfilled those server side EL events, woo! [18:40:44] madhuvishy, yt? so kevin approved the email, is that ok to you too? if yes, I'll start sending it to the schema owners [18:40:57] mforns: yeah! lets do that [18:41:16] mforns: are we creating phab tickets for long lists? [18:41:17] madhuvishy, BTW, I found Amir on IRC, and he confirmed his schemas [18:41:24] oh cool! [18:42:14] madhuvishy, yea, could be! for the mobile team I will use the same task I think. Do you have the link of the spreadsheet I created for them? [18:42:28] mforns: no.. i dont think so [18:42:47] mforns: +1, we can use the same for them [18:43:23] madhuvishy, it's in their task if you want to recover it: https://docs.google.com/spreadsheets/d/1_WCJsw-re7g86IfBV0ZlnGNp3j6JSNvTxHejrOV-LSM/edit#gid=0 [18:43:55] mforns: great thanks [18:44:25] madhuvishy, cool! I'll make a pause now, and when back will start with the emails [18:44:37] mforns: great. i will start in a bit too. [18:45:07] :] see ya [18:57:32] Analytics-Tech-community-metrics, ECT-June-2015: Most basic Tech Community metrics are published and up to date - https://phabricator.wikimedia.org/T94578#1401769 (Aklapper) [19:15:29] Analytics-Tech-community-metrics, ECT-June-2015: Most basic Tech Community metrics are published and up to date - https://phabricator.wikimedia.org/T94578#1401827 (Aklapper) [19:15:39] Analytics-Tech-community-metrics, ECT-June-2015: Most basic Tech Community metrics are published and up to date - https://phabricator.wikimedia.org/T94578#1167361 (Aklapper) [19:16:46] milimetric: where can we see the raw logs? We'd like to start looking at whether translations are coming in. :-) [19:17:51] and one other thing, milimetric: for some reason, Ellery and I are not receiving the emails that go to recommender-feedback@wikimedia.org. Bob has received them. [19:18:41] milimetric: scratch that last one. sorry. there is a delay it seems. [19:19:33] Analytics-Tech-community-metrics, ECT-June-2015: Most basic Tech Community metrics are published and up to date - https://phabricator.wikimedia.org/T94578#1401842 (Aklapper) [19:21:51] Analytics-Tech-community-metrics, ECT-June-2015: Most basic Tech Community metrics are published and up to date - https://phabricator.wikimedia.org/T94578#1401856 (Aklapper) [19:42:15] I'm off lads, [19:42:22] Tomoooooooorow ! [19:48:14] laters! [19:48:45] ottomata: just saw you backfilled - great, thx [19:54:36] (CR) Ottomata: Add webstatcollector projectview transformation (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) (owner: Joal) [20:06:24] (PS1) Madhuvishy: Add driver memory as a configurable property to Spark job [analytics/refinery] - https://gerrit.wikimedia.org/r/220952 (https://phabricator.wikimedia.org/T97876) [20:09:21] (PS2) Milimetric: 2004Q1 added to Global North share of views/edits [analytics/wikistats] - https://gerrit.wikimedia.org/r/139865 (owner: Erik Zachte) [20:09:23] (PS1) Milimetric: addition to wikistats portal: page views by category hierarchy [analytics/wikistats] - https://gerrit.wikimedia.org/r/220956 [20:09:25] (PS1) Milimetric: New section in stats portal: report on udp msg loss [analytics/wikistats] - https://gerrit.wikimedia.org/r/220957 [20:09:27] (PS1) Milimetric: New script: find Migrations patterns [analytics/wikistats] - https://gerrit.wikimedia.org/r/220958 [20:09:29] (PS1) Milimetric: commit old changes [analytics/wikistats] - https://gerrit.wikimedia.org/r/220959 [20:10:06] (CR) Milimetric: [C: 2 V: 2] 2004Q1 added to Global North share of views/edits [analytics/wikistats] - https://gerrit.wikimedia.org/r/139865 (owner: Erik Zachte) [20:10:15] (CR) Milimetric: [C: 2 V: 2] addition to wikistats portal: page views by category hierarchy [analytics/wikistats] - https://gerrit.wikimedia.org/r/220956 (owner: Milimetric) [20:14:32] ottomata: around? [20:14:38] Analytics-EventLogging: Replace EventLogging with Confluent Platform - https://phabricator.wikimedia.org/T102082#1402097 (Ottomata) Perhaps a useful endpoint? https://github.com/linkedin/pinot [20:14:42] madhuvishy: yes [20:14:45] milimetric: check it: https://github.com/linkedin/pinot [20:15:53] ottomata: I added driver memory as a property in my oozie job - but it launches the job, and then the job fails. I have no idea why! application_1434651818028_13324 is one such [20:16:05] this was with just one day [20:16:44] ottomata: yarn status and logs give some error stacktrace - but i can't make sense of it [20:17:06] ottomata: I don't understand why they re-implemented Druid, but ok :) [20:17:11] hahah [20:17:12] :) [20:17:22] also "Query cannot span across multiple tables" is something Druid is just getting now, so they're a bit ahead [20:17:27] maybe it works really well with the avro setup? [20:17:32] dunno [20:17:44] madhuvishy: is that your oozie launcher app id or the spark app id? [20:17:46] mm, data stores shouldn't care too much about the Avro stuff, that's more ingestion side [20:17:51] ottomata: spark [20:18:18] madhuvishy: i'm poking around, did you run this via a coordinator? [20:18:24] ottomata: yes [20:18:29] which one? [20:18:31] coord id [20:19:11] finding [20:19:20] its here - /home/madhuvishy/workplace/refinery/oozie/mobile_apps/session_metrics. one sec, getting id [20:19:38] btw, did you know you have this coord running as you? [20:19:39] https://hue.wikimedia.org/oozie/list_oozie_coordinator/0010253-150605005438095-oozie-oozi-C/ [20:20:02] ottomata: 0028173-150605005438095-oozie-oozi-C [20:20:37] ottomata: hmm no [20:20:43] but i think all those jobs failed [20:20:56] ottomata: killing that one now [20:21:49] ottomata: oh no [20:21:59] that was supposed to be running! [20:22:20] thats okay. i'll start it again. dint notice it was last access. gah so confusing [20:23:02] ha, it is supposed to be running as you? [20:23:06] is that supposed to be a prod job? [20:23:20] ottomata: yeah. it's not in prod yet, because i'm still validating number [20:23:36] aye ok [20:23:40] just checking thats cool [20:23:53] ottomata: yeah, i will restart that one later. [20:24:16] not sure whats going on with this session metrics one though [20:29:56] (PS1) Milimetric: Add uncomitted code [analytics/wikistats] - https://gerrit.wikimedia.org/r/220966 [20:30:21] (CR) Milimetric: [C: 2 V: 2] Add uncomitted code [analytics/wikistats] - https://gerrit.wikimedia.org/r/220966 (owner: Milimetric) [20:30:44] Analytics-Cluster, Labs, wikitech.wikimedia.org: Include role::analytics::hadoop roles in default list of labs puppet groups - https://phabricator.wikimedia.org/T70391#1402211 (Ottomata) [20:31:05] (CR) Milimetric: "By the way, this is number 6 in a string of commits that I was just cleaning up for Erik Z. My name should not be on here except it would" [analytics/wikistats] - https://gerrit.wikimedia.org/r/220966 (owner: Milimetric) [20:31:14] pfff [20:31:17] (CR) Milimetric: "By the way, this is number 5 in a string of commits that I was just cleaning up for Erik Z. My name should not be on here except it would" [analytics/wikistats] - https://gerrit.wikimedia.org/r/220959 (owner: Milimetric) [20:31:19] madhuvishy: not finding out much either. [20:31:30] your app master containers are dying [20:31:30] (CR) Milimetric: "By the way, this is number 4 in a string of commits that I was just cleaning up for Erik Z. My name should not be on here except it would" [analytics/wikistats] - https://gerrit.wikimedia.org/r/220958 (owner: Milimetric) [20:31:37] ottomata: Hmmm [20:31:38] madhuvishy: these got farther than this before, right? [20:31:43] (CR) Milimetric: "By the way, this is number 3 in a string of commits that I was just cleaning up for Erik Z. My name should not be on here except it would" [analytics/wikistats] - https://gerrit.wikimedia.org/r/220957 (owner: Milimetric) [20:31:55] they die about 3 seconds after they start [20:31:59] (CR) Milimetric: "By the way, this is number 2 in a string of commits that I was just cleaning up for Erik Z. My name should not be on here except it would" [analytics/wikistats] - https://gerrit.wikimedia.org/r/220956 (owner: Milimetric) [20:32:09] (CR) Milimetric: "By the way, this is number 1 in a string of commits that I was just cleaning up for Erik Z. My name should not be on here except it would" [analytics/wikistats] - https://gerrit.wikimedia.org/r/139865 (owner: Erik Zachte) [20:32:20] ottomata: yes! they were fine [20:32:29] until i added driver memory [20:32:43] i think [20:33:22] hm [20:38:19] madhuvishy: i'm looking at your workflow.xml [20:38:34] ottomata: okay.. anything off? [20:38:38] where are you setting driver_memory? [20:38:41] i see it as a parameter [20:38:44] but i don't see it passed to the job [20:38:47] are you looking in server? [20:38:59] ? [20:39:04] on stat1002? [20:39:08] hmmm i just removed it for sanity check. [20:39:12] oh [20:39:12] ok [20:39:28] ottomata: https://gerrit.wikimedia.org/r/#/c/220952/ [20:40:48] pffffffff, and you can run it with the same settings outsdie of oozie? [20:41:39] ottomata: hmmm it fails without driver memory too [20:42:46] HMM [20:42:48] in the same way? [20:43:42] ottomata: yes [20:44:10] and cluster's so full :( [20:45:32] ottomata: but yeah, it runs with same settings outside of oozie [20:46:42] HM [20:46:48] something is not right then. [20:46:49] hm. [20:46:56] there isn't much info about why the app masters die [20:47:20] madhuvishy: you have successfully launched a spark job with oozie, yes? [20:47:27] ottomata: yeah [20:48:04] our 30 day jobs yesterday did launch. they dint get enough memory so never started executing. it dint fail this way [20:48:20] and this oozie stuff was tested etc [20:48:46] yeah [20:48:54] but, without the driver_memory setting [20:48:56] everything is the same, right? [20:49:13] ottomata: yeah it fails the same way [20:49:26] no, i mean. yestereday [20:49:29] when the jobs launched [20:49:33] the code was exactly the same [20:49:46] and the oozie stuff, except for your driver_memory change, was exaclyt the same, right? [20:50:18] ottomata: yup [20:50:24] its on master. [20:50:37] i launched it from the deployed version in /srv [20:50:38] and the job will run as is without oozie. [20:50:43] will it run with less data? [20:50:44] like 1 day? [20:50:52] in oozie? [20:51:00] ottomata: i hope so [20:51:05] i can try that [20:51:11] worth a try, we should figure out what the failure case is. [20:51:16] in summary: [20:51:32] spark-submit cluster 30 days: works [20:51:38] with --driver-memeory [20:51:39] right> [20:51:40] ? [20:52:17] madhuvishy: do we know that with 30 days works with 2G? [20:52:25] joseph said he tried with 4G, right? [20:52:36] ottomata: hmmm, checking that too. one minute [20:52:44] ok, so i have to run soon [20:52:56] but, i would proceed by figuring out exactly which cases work and which don't [20:53:01] so we can isolate the problem [20:53:05] is the problem not enough memory? [20:53:11] or is the problem something weird with oozie [20:53:29] once we know 100% for sure the settings we want to give to oozie work directly via spark-submit [20:53:40] if we can't get oozie to run those then, something is weird with oozie + spark [20:53:42] ottomata: alright [20:53:45] then we will have to figure that out [20:53:47] i'll check [20:53:52] ottomata: yeah okay [20:53:55] but, oozie doesn't do much other than pass along your stuff and call spark-submit [20:54:15] so, i somehow doubt it will be oozie's fault. there's got to be some combo of settings we don't have right. [20:54:26] if you can find a case wher eit works directly with spark-subit, but not oozie [20:54:30] with exactly the same setttings [20:54:51] ottomata: okay i'll do some digging [20:54:54] then lets try reducing the period_days and trying in both oozie and directly with spark-submit [20:55:21] okay [21:50:00] Analytics-EventLogging, Analytics-Kanban: Reach out to half the schema owners [8 pts] {tick} - https://phabricator.wikimedia.org/T102517#1402551 (mforns) I created a spreadsheet for the schema owners that have lots of schemas to answer: https://docs.google.com/spreadsheets/d/1Co5pegaWQl_byw9VaFpOR80VAvryO... [22:01:00] Analytics-Kanban, Research-and-Data: Analysis on traffic through the HTTPS transition - https://phabricator.wikimedia.org/T102431#1402620 (ellery) I have updated the graphs in https://github.com/ewulczyn/wmf/blob/master/https_transition/https_transition.ipynb. Iran shows a severe persistent drop in pagevi... [22:52:16] milimetric: thanks for all your help. Can you give us a copy of the latest version of the folder you have on stat2 that's readable by us? [22:53:18] lzia: sure, i'll scp it there [22:56:58] lzia: I removed all the redundant json files but kept the split.py script. It's on stat1002:/home/milimetric/recs you should have read and execute I think [23:02:29] milimetric: have you ever seen the cluster with no jobs scheduled? [23:02:55] milimetric: https://yarn.wikimedia.org/cluster/scheduler [23:03:06] madhuvishy: I don't think I've ever looked at the running jobs :) [23:03:22] hmmm its empty. i've never seen it empty [23:03:26] my m.o. is more like - stay as far away from the metal as possible :) [23:03:43] hm, weird [23:03:56] ha ha but this is worrisome. i emailed andrew [23:04:35] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1402922 (Qgil) "Participants" in http://korma.wmflabs.org/browser/scr.html (Gerrit users with any... [23:05:49] unless its normal and nothing was indeed running [23:07:08] thanks milimetric. :-) [23:20:16] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1403029 (Qgil)