[00:24:12] Ironholds you're talking about this one: [00:24:13] "mysqlimport -uroot -p --local --ignore-lines=1 warehouse my_table.tsv" [00:24:19] ? [04:24:30] Analytics-Cluster, Analytics-Kanban: {story} Community downloads new pageview dumps - https://phabricator.wikimedia.org/T95257#1184593 (kevinator) NEW [07:30:06] Analytics-Cluster, operations, ops-eqiad: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1184828 (faidon) NEW [14:11:54] (PS15) Mforns: Implement A/B comparison [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 (owner: Milimetric) [14:18:09] Analytics-Cluster, Ops-Access-Requests, operations, Patch-For-Review: Requesting access to analytics-users (stat1002) for Jkatz - https://phabricator.wikimedia.org/T94939#1185701 (Andrew) Open>Resolved [14:30:34] (PS16) Milimetric: Implement A/B comparison [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 [14:55:04] Analytics-Kanban: Make EL On Call calendar for Q4 - https://phabricator.wikimedia.org/T95298#1185978 (ggellerman) NEW a:ggellerman [15:22:36] nuria: that change looks good to me, want me to merge? [15:22:59] ottomata: sure, just tried out and it works so please do merge [15:25:22] done [15:30:01] ottomata: ops meeting ? [15:30:10] oop ya [15:30:11] sorry [15:42:29] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics: Story: user wants to be able to re-run a failed report more easily [13 pts] - https://phabricator.wikimedia.org/T88610#1186146 (kevinator) p:Triage>Normal [15:43:56] Analytics-EventLogging, Analytics-Kanban: Cron collects Visual Editor deployments [8 pts] {lion} - https://phabricator.wikimedia.org/T89253#1186155 (mforns) [15:43:58] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Reliable scheduler computes Visual Editor metrics [21 pts] {lion} - https://phabricator.wikimedia.org/T89251#1186154 (mforns) Open>Resolved [15:45:18] nuria, I think you could claim this task, since you are working on it: https://phabricator.wikimedia.org/T86535 :] [15:45:44] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Mobile PMs has reports on session-related metrics from Wikipedia Apps - https://phabricator.wikimedia.org/T86535#1186164 (Nuria) a:mforns>Nuria [15:45:49] mforns: k [15:45:58] :] [15:46:27] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics: Give the option of using the same parameters for all reports for a given cohort [21 pts] - https://phabricator.wikimedia.org/T74117#1186169 (kevinator) p:Triage>Normal [15:46:59] Analytics-Kanban: Make EL On Call calendar for Q4 - https://phabricator.wikimedia.org/T95298#1186176 (kevinator) p:Triage>High [15:49:27] milimetric, yep! [15:49:32] (sorry, went offline. And then to sleep. [15:51:11] Ironholds: np, ok, so you said that command just printed the help message for mysqlimport? [15:51:20] yerp [15:51:21] and you're on the dan-pentaho box when you're doing this? [15:51:35] and which tsv file are you importing there? [15:51:51] /home/ironholds/monthly_pageview_0_5.tsv [15:53:57] (doing stuff) [15:53:58] :) [16:13:51] Analytics-Tech-community-metrics: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1186267 (Aklapper) [16:29:05] milimetric, Ironholds, can I help with pentaho? [16:29:40] mforns, can you look at the new cube and explain why it's struggling to cast a double, to a double? :/ [16:29:50] xD ok [16:30:05] gimme a sec [16:30:31] thanks :) [16:46:55] Ironholds, it seems to me that the cube specifies some columns as String that in the warehouse are stored as blobs [16:47:30] ahhh [16:47:37] and the appropriate spec for a blob is? [16:48:03] I would do the opposite, create the table with type string for those columns [16:48:16] yeah, makes sense. Thanks :) [16:48:22] because I guess saiku needs to compare strings to slice the cube [16:48:33] and I don't know if it can compare blobs [16:49:00] Ironholds, do you have the data in dan-pentaho? [16:49:09] is it easy to reimport? [16:49:26] yeah, I'll just ALTER TABLE [16:50:08] thanks! [16:50:35] Ironholds, ok, np, let me know if you have more problems [17:02:05] joal: phew, just did it! max_bytes helped. [17:02:15] can consume avro and decode via rest proxy in that topic. [17:03:11] ottomata: cool :) [17:03:21] ottomata: changed somthing for spark ? [17:04:18] no, haven't gotten that yet [17:04:30] that was via the rest proxy, just verifying that the data in that topic is valid avro for this schema [17:32:16] Analytics-Wikimetrics, Community-Wikimetrics: Viewing cohorts error - https://phabricator.wikimedia.org/T95320#1186706 (egalvezwmf) NEW a:Fhocutt [17:34:51] joal|away, ottomata : the wildcard matching on parquet files for spark is (i suspect) on the next version: https://github.com/apache/spark/pull/3407/commits [17:34:55] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics: Story: user wants to be able to re-run a failed report more easily [13 pts] - https://phabricator.wikimedia.org/T88610#1186732 (mforns) a:mforns [17:35:49] so nuria, it works with directories, but only if the files themselves are in that directory? [17:36:05] ottomata: from what i can see on our spark shell on 1002 yes [17:36:09] lemme try again [17:36:30] Any luck with HiveContext? [17:37:16] ottomata: haven't tried yet, due to dependency issues but onto that next [17:37:34] ottomata: if you specify directories like : qlContext.parquetFile("/wmf/data/wmf/webrequest/webrequest_source=mobile/year=2015/month=3/day=20/hour=01") [17:37:42] doesn't work either on shell [17:38:04] reallY? i'm pretty sure i got that to work before... [17:38:04] hm [17:38:11] maybe not. [17:38:12] ottomata: no wait [17:38:15] this works: sqlContext.parquetFile("/wmf/data/wmf/webrequest/webrequest_source=mobile/year=2015/month=3/day=20/hour=0").registerTempTable("webrequest" [17:39:00] ottomata: self inflicted , i forgot hour partititions have 1 digit [17:41:34] ha, hehh [17:41:36] cool, ok [17:41:40] but leaving off hour=0 [17:41:44] ending in day=20 [17:41:45] does not work? [17:57:56] ottomata: no, that doesn't work for sure [17:58:15] ottomata: i think it will after they merge the commit i sent you [17:58:42] ottomata: trying hive context now [18:00:28] rats, ok. maybe we will need to deploy 1.3.0 [18:10:50] Analytics-Cluster, operations, ops-eqiad: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1186954 (Ottomata) a:Cmjohnson [18:11:27] Analytics-Cluster, operations, ops-eqiad: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1184828 (Ottomata) @Cmjohnson could you take a look at this? Danke! [18:33:01] ottomata: what cloudera vs we have? [18:33:42] 5.3.1 [18:46:04] nuria: yt ? [18:46:11] joal|away: yessir [18:46:19] Got a hack for you [18:46:27] really? [18:46:42] yup [18:47:20] https://gist.github.com/anonymous/7464d8134ed6d81e8d5c [18:47:38] Currently tested on spark-shellb [18:47:47] joal|away: ahhh yaah [18:47:52] :) [18:48:00] hacky, but works [18:48:25] For known folders as we have, should work I guess [18:48:58] yes, right, but it is o(30*24) for a monthly report [18:50:02] joal|away: and you are right it works, now, do we wnat to go that way or we want to go to spark 1.3 (which i guess is doing a similar thing) that allows for wildcards [18:50:27] I have a preference for spark 1.3 [18:50:52] joal|away: or rather we want to use hivecontext (which does not work with the version of cloudera we have looks like) [18:50:57] cc ottomata [18:51:13] nuria: But I'd prefer to have it packaged via cloudera [18:51:42] The daily count using spark : (exact example in gist) --> [18:51:50] took 58.178585 s [18:51:50] [392210435] [18:51:59] rapid enough :) [18:52:10] joal|away: ya, it's super fast, i was testing that the other day [18:52:25] joal|away: at least compared to hive queries [18:52:30] yup [18:53:49] nuria: So back to the initial problem : [18:53:51] ah cool [18:53:53] that shoudl work great [18:53:56] good idea joal [18:54:17] I think we should go the hack while waiting for spark 1.3 to be packaged [18:54:35] Except if ottomata feels in the ;odd to pack it up for us ;) [18:54:46] s/;odd/mood [18:55:01] joal|away, ottomata note that 1.3 will give us wildcards but no hivecontext eh? [18:55:17] at least i think so [18:55:27] I wouldn't use haive_context at all ... [18:57:00] joal|away: ok, if we do not like hive context and we want to go the parquet route it's either packaging up 1.3 or the workarround [18:57:45] i like joal's method [18:57:47] i think it is totally fine [18:57:49] and not really that hacky [18:57:51] ok, sounds good [18:57:59] the iterating on the hour numbers is annoying [18:58:03] but i think the union rdd stuff is good [18:58:05] that is kinda what its for, eh? [18:58:23] it would be much nicer if we could private simpel input and output params to the spark job [18:58:33] but, i guess we wait :) [18:58:46] put some good comments in there about why we are doing that [18:58:49] and what we will change to when we can [19:00:09] ottomata: It's actually Tsai method http://apache-spark-user-list.1001560.n3.nabble.com/read-all-parquet-files-in-a-directory-in-spark-sql-td16298.html [19:00:13] ottomata, joal|away: ok, will do [19:00:20] Thx nuria :) [19:00:39] Have you tried the full stuff on a month already ? [19:01:27] ottomata: get my big spark job failing every time because of hdfs closed :(b [19:01:46] Will relaunch before going to bed, and we'll discuss that tomorrow [19:02:00] oh no, hm, ok [19:03:27] My dear analytics people, I wish you a good night :) [19:04:35] nuria, yt? [19:05:04] good night joal|night! [19:05:06] mforns: yes [19:05:20] nuria, one question about: https://phabricator.wikimedia.org/T88610 [19:05:33] joal|night: ciao, will try stuff on a month maybe later today, i want to iron a bunch of things [19:06:12] nuria, there's one comment from you on error message in a LoggerTable, can you explain that to me? [19:09:28] mforns: lemme see [19:09:42] aha [19:10:05] mforns: ah ya., the idea is taht we should not mix logging & reports [19:10:12] ok [19:10:21] if we wnat to have logs arround of what happen they should be on their own tables [19:11:05] nuria, ok and what do you mean with: "Subsequent updates update error message and date but do not keep history."? [19:11:41] mforns: that subsequent retries for a report do not create new log entries, if there is one they override it [19:12:23] nuria, I see [19:13:13] nuria, Also, we decided that clicking RE-RUN would create another (new) report with the same config, right? So the report id would be different... [19:13:50] mforns: mmmm... why? [19:14:04] mforns: unless it deletes traces of the old report [19:14:24] mforns: cause otherwise user wise is really confusing [19:14:34] nuria, oh yes, I thought we concluded that it was better to leave the failed reports as failures [19:15:13] mforns: but do we care about those failures? they should not show up on UI and they will just clutter the table [19:16:03] nuria, yes, I agree with you. I just remember Kevin thinking that it might be good to have them for stats. [19:17:53] nuria, OK I got it, thanks for the explanation! ;] [19:18:24] mforns: then we should store "counters" not reports on report table [19:19:03] nuria, aha, OK [19:21:07] mforns: so in order to keep things simple I would: 1) use same report for retry (like we do with recurrent ones) [19:21:15] 2) have a stats table for failures [19:21:24] mforns: and i wouldn't create new reports [19:21:35] nuria, we can have a count column in the error table [19:22:05] mforns: ya that works too [19:22:09] instead of having another stats table, unless we think there are going to be other stats coming [19:27:54] mforns: no, i think reportErrors is fine [19:28:00] as long as we are explicit [19:28:11] we are just using it for reports [19:28:13] nuria, ok [19:28:27] but next question will be "what about cohort validation... see?" [19:28:40] so maybe taksErrors is a better one [19:28:56] nuria, makes sense [19:29:05] as long as we do not fill in the db with unnecessary records to do counting later ... [19:29:33] nuria, ok [19:36:44] mforns: did you taked about this with milimetric ? [19:36:57] nuria, no [19:37:21] mforns: k, he can comment in the preliminary CR [19:37:45] nuria, ok, I remember he agreed on that the time we spoke about this [19:37:55] mforns: ya, i think so [19:38:15] mforns: take a look at how the retries happen for the nightly reports [19:38:31] nuria, oh, didn't know that, will do [19:38:37] mforns: as it is the most similar code we have [19:59:47] ottomata: yt? [20:01:58] nuria: ja [20:02:14] ottomata: me no comprendou why ... [20:02:25] ottomata: when i do mvn compile in /source [20:02:39] the refinery-job jar doesn't get build [20:02:55] ottomata: i mean classes get compiled and such [20:03:06] ah it's mnv package ? [20:03:55] ottomata: if so, excuse my retardation [20:04:31] :) [20:04:35] yes [20:04:38] mvn package builds jars [20:05:04] ottomata:really, this thing of switching languages all the time it's no good for me [20:05:15] ottomata: i am unlearning, i swear [20:05:49] ottomata: yesterday i did not even remember sql -> sandnesss [20:05:54] *sadness [20:15:21] Does Erik Zachte ever come to this channel? [20:17:12] Krenair: not so much, no [20:17:37] Krenair: do you need anything? [20:18:53] was trying to complete a big table of people with certain access and didn't know his irc nickname [20:19:27] ezachte [20:20:47] Analytics-Kanban, Analytics-Visualization: {Epic} Community reads pageviews per project in Vital Signs {crow} - https://phabricator.wikimedia.org/T95336#1187350 (kevinator) NEW [20:21:25] Analytics-Cluster, Analytics-Kanban: Generate new pageview counts in JSON files [8pts] {crow} - https://phabricator.wikimedia.org/T95337#1187361 (kevinator) NEW [20:22:26] Analytics-Kanban, Analytics-Visualization: Update Vital Signs metric configuration [3 pts] {crow} - https://phabricator.wikimedia.org/T95338#1187367 (kevinator) NEW [20:23:05] Analytics-Cluster, Analytics-Kanban: Add Pageview aggregation to Python [13 pts] {crow} - https://phabricator.wikimedia.org/T95339#1187374 (kevinator) NEW [20:23:25] Analytics-Kanban, Analytics-Visualization: Update Vital Signs UX for aggregations [13 pts] {crow} - https://phabricator.wikimedia.org/T95340#1187380 (kevinator) NEW [20:23:46] Analytics-Kanban, Analytics-Visualization: Add digraphs to Vital Signs [21 pts] {crow} - https://phabricator.wikimedia.org/T95341#1187386 (kevinator) NEW [20:27:15] Analytics-Kanban, Analytics-Visualization: {Epic} Community reads pageviews per project in Vital Signs {crow} - https://phabricator.wikimedia.org/T95336#1187393 (kevinator) [20:28:16] Analytics-Kanban, Analytics-Visualization: {Epic} Community reads pageviews per project in Vital Signs {crow} - https://phabricator.wikimedia.org/T95336#1187350 (kevinator) [20:28:37] thanks ottomata [21:11:12] (PS1) Yurik: Removed DailyData graphs [analytics/zero-sms] - https://gerrit.wikimedia.org/r/202593 [21:11:41] (CR) Yurik: [C: 2 V: 2] Removed DailyData graphs [analytics/zero-sms] - https://gerrit.wikimedia.org/r/202593 (owner: Yurik) [21:26:32] Analytics, MediaWiki-API-Team, MediaWiki-Authentication-and-authorization: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1187625 (bd808) After a longish discussion on irc @ori suggested that the most direct... [21:59:10] Analytics-EventLogging, MediaWiki-General-or-Unknown, Performance: Add event tracking queue to MediaWiki core for loose coupling with EventLogging or other interested consumers - https://phabricator.wikimedia.org/T95356#1187727 (bd808) NEW [22:18:33] Analytics-Wikimetrics: Story: WikimetricsUser runs report against all wikis - https://phabricator.wikimedia.org/T70477#1187789 (Milimetric) [22:21:05] Analytics-Wikimetrics: Wikimetrics backup has no monitoring - https://phabricator.wikimedia.org/T71397#1187791 (Milimetric) The backup routinely fails recently. I thought the fact that wikimetrics runs in labs disqualifies it from being monitored by Icinga. But if it doesn't, we should fix the underlying p... [22:27:37] Analytics-Engineering, Labs: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#1187826 (Milimetric) @yuvipanda, queries against labsdb are faster, and we saw some back-filling going on, but it's still not fast enough to... [22:32:19] Analytics-Kanban, VisualEditor: Schema:Edit seems to incorrectly set users as anonymous {lion} - https://phabricator.wikimedia.org/T92596#1187827 (Milimetric) [22:39:20] Analytics-Engineering, Labs: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#1187844 (coren) @milimetric: that's actually downright scary. What kind of changes are you noticing (i.e. additions, changes, deletions)? I... [22:44:38] Analytics-Engineering, Labs: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#1187859 (Milimetric) @coren: this problem was observed while pulling data out of analytics-store actually, so it's happening in mediawiki som... [23:01:01] Analytics-Volunteering, Engineering-Community, Phabricator, Project-Creators, and 2 others: Analytics-Volunteering and Wikidata's Need-Volunteer tags; "New contributors" vs "volunteers" terms - https://phabricator.wikimedia.org/T88266#1187899 (kevinator) @aklapper I'll have an answer by 8am Wed, Mar... [23:01:28] (PS17) Milimetric: Implement A/B comparison [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 [23:01:57] (PS18) Milimetric: Implement A/B comparison [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 (https://phabricator.wikimedia.org/T94424) [23:02:39] Analytics-Kanban, Analytics-Visualization, Patch-For-Review: Improve UX for VE/Wikitext comparison dashboard {lion} - https://phabricator.wikimedia.org/T94424#1187907 (Milimetric) [23:12:19] (PS19) Milimetric: Implement A/B comparison [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 (https://phabricator.wikimedia.org/T94424) [23:27:25] (PS20) Milimetric: Implement A/B comparison [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 (https://phabricator.wikimedia.org/T94424) [23:27:29] Analytics-Kanban, Analytics-Visualization, Patch-For-Review: Improve UX for VE/Wikitext comparison dashboard {lion} - https://phabricator.wikimedia.org/T94424#1187970 (Milimetric) [23:30:12] kevinator: all that incessant pinging above is me doing the style cleanup - I'm done with it now, take a look and feel free to mark as resovled (I didn't do one of the items because it's harder than a quick fix) [23:32:46] Analytics-Wikimetrics, Community-Wikimetrics: Viewing cohorts error - https://phabricator.wikimedia.org/T95320#1187980 (Capt_Swing) a:Fhocutt>None [23:32:47] milimetric: I just opened dev tools to clear my cache and notice that there were a few 404 errors when loading the funnel comparison [23:33:12] Quarry: Quarry sorts by the first column by default - https://phabricator.wikimedia.org/T95369#1187983 (He7d3r) NEW [23:34:14] milimetric: e.g. https://datasets.wikimedia.org/limn-public-data/metrics/deployments/visualeditor/all.tsv [23:34:37] kevinator: yes, that makes sense, that one's not available per-wiki [23:35:03] I'll have to fix that, but I think that's the only one that should be 404-ing [23:38:16] Analytics-Wikimetrics: Wikimetrics backup has no monitoring - https://phabricator.wikimedia.org/T71397#1187998 (Dzahn) How do you manually identify a fail currently? does this mean checking a logfile for patterns? checking the timestamp of files? [23:39:31] Analytics-Wikimetrics, Community-Wikimetrics: Uploading cohort or running a large report fails - https://phabricator.wikimedia.org/T87596#1188008 (Capt_Swing) Open>stalled p:Unbreak!>High [23:42:10] Analytics-Wikimetrics, Community-Wikimetrics: Uploading cohort or running a large report fails - https://phabricator.wikimedia.org/T87596#1188023 (Capt_Swing) I changed this to "stalled" for now because it's probably an intermittent bug related to the periodic outages Wikimetrics has been experiencing (p... [23:42:33] milimetric: is there a way to make the hover tooltip in Dygraphs also use a san-serif font? [23:42:58] i'll try [23:43:36] oh, kevinator, I had made it Courier new on purpose [23:43:41] because otherwise it doesn't align [23:43:59] and there's no easy way to control the style inside each label to make it more table-like [23:44:03] your call [23:45:30] Analytics-Wikimetrics, Community-Wikimetrics: Description of metrics includes link to on-wiki metrics documentation - https://phabricator.wikimedia.org/T93659#1188034 (Capt_Swing) p:Triage>High [23:45:59] Analytics-Wikimetrics, Community-Wikimetrics: Description of metrics includes link to on-wiki metrics documentation - https://phabricator.wikimedia.org/T93659#1142792 (Capt_Swing) p:High>Normal [23:46:18] hmm, milimetric: yeah, I think it would be pretty ugly if it doesn't align [23:46:43] Is courrier new fixed width? [23:46:49] at least for digits [23:47:16] Analytics-Wikimetrics, Community-Wikimetrics: Make timezone selector comprehensive and consistent - https://phabricator.wikimedia.org/T88604#1188044 (Capt_Swing) p:Triage>Low [23:48:44] milimetric: maybe we should ask the real designer: take a screenshot of both possibilities - Courrier New & right aligned or sans-serif and left-aligned. [23:56:07] kevinator: :) I just fixed it, I'll deploy when I figure out the rest too [23:56:23] (PS21) Milimetric: Implement A/B comparison [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 (https://phabricator.wikimedia.org/T94424) [23:57:07] milimetric: thanks! [23:57:10] Analytics-Wikimetrics, Community-Wikimetrics: Viewing cohorts error - https://phabricator.wikimedia.org/T95320#1188089 (Capt_Swing) @egalvezwmf can you send me the cohort files that you uploaded, and cc @Fhocutt? We will try to reproduce the error. Thanks! [23:57:36] Analytics-Wikimetrics, Community-Wikimetrics: [BUG] Viewing ukwiki cohorts error - https://phabricator.wikimedia.org/T95320#1188091 (Capt_Swing) p:Triage>High [23:57:37] kevinator: I just deployed the font stuff - actually the deployments stuff needs more work, gotta make a cron, etc. [23:57:44] I think I have to run it on my local box :( [23:57:50] maybe we'll talk tomorrow [23:57:52] nite [23:58:31] goodnite