[00:58:58] Analytics-Engineering, MediaWiki-API, Wikipedia-Android-App, Wikipedia-iOS-App: Add page_id and namespace to X-Analytics header in App / api requests - https://phabricator.wikimedia.org/T92875#1259587 (Mattflaschen) No one's responded to https://phabricator.wikimedia.org/T92875#1233108 . In summa... [03:51:01] Analytics-Tech-community-metrics, Engineering-Community, WMF-Product-Strategy: Review Engineering Community analytics needs - https://phabricator.wikimedia.org/T92807#1259722 (Qgil) [03:51:03] Analytics-Tech-community-metrics, ECT-April-2015, ECT-May-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1259723 (Qgil) [04:01:43] Analytics, Mobile-Web: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1259738 (KLans_WMF) @phuedx any sense of initial estimate for this? @jKatzWMF @phuedx I don't see any problem pulling this in to the sprint if we know how to move... [09:23:14] Analytics, Mobile-Web: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1259925 (phuedx) Tracking session time, dwell time, and PVs is already being tackled by #Analytics IIRC – not localised to the Browse experiment, obviously. I'd l... [10:49:27] (PS3) Mforns: Reduce size of public reports folder [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) [11:15:40] (CR) Mforns: [C: -1] "Still WIP" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) (owner: Mforns) [11:30:07] Analytics, Mobile-Web: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1260186 (phuedx) Here's my initial design of the schema: https://meta.wikimedia.org/wiki/Schema:MobileWebBrowse [11:30:56] Analytics, Mobile-Web: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1260187 (phuedx) My initial estimate would be 3. [13:09:24] o/ joal & milimetric [13:12:50] * halfak suspects a fire of some sort [13:29:46] hello ello lo lo o? [13:30:42] halfak: Yeahhhh ! [13:30:51] finally made it :)h [13:31:10] Please excuse me for being so late :( [13:31:14] halfak: --^ [13:31:15] No worries. [13:31:21] * halfak is hanging out in the call. [13:31:29] aouch ... joimimg [14:15:50] joal: i will be babysitting jobs for a little while this morning [14:15:55] in the meantime, i should try and fix impala and spark [14:16:00] i want to do spark first [14:16:02] what's up? [14:16:07] yeah, ottomata, I am doing the same as well [14:16:32] talking with Aaron, and checking everything works on the cluster [14:16:36] i've moved a few of the refine jobs into the essential queue [14:16:40] things are still lagging [14:16:44] webrequest - uploads are really late [14:16:50] yeah [14:17:45] joal: what was your spark problem? [14:17:53] just reading a parquet file via the hive context thinng? [14:18:41] yup [14:19:21] k will try [14:28:14] hm, joal, can you actually run spark-shell in yarn? i haven't done that much before, its hanging right now for me [14:28:19] might just be cause cluster is busy [14:40:10] ottomata: seems to catch up, but slowly [14:42:18] and camus is in bad shape from what Isee [14:42:38] well, that may be because i have been putting refine jobs in the essential queue in the las hour [14:42:50] pretty sure it is [14:43:00] I hope they don't starve camus [14:43:24] Kafka says: not enough bytes out ! [14:43:26] they might for a little bit, but not forever [14:43:28] oh? [14:44:45] makes an hour, so it might be ok on catching up [14:49:04] ok, joal, i can read a parquet file using sqlContext.parquetFile [14:49:14] hmm [14:49:17] weird [14:49:20] but, i'm not sure about HiveContext .sql select from wmf.webrequest [14:49:23] Didn't work for me yesterday [14:49:33] val parquetFile = sqlContext.parquetFile("/wmf/data/wmf/webrequest/webrequest_source=misc/year=2015/month=5/day=1/hour=0/000063_0") [14:49:37] val h = sqlContext.sql("select uri_host from parquetFile limit 10") [14:49:39] is fine [14:49:48] oh with parquetFile.registerTempTable("parquetFile") in between those [14:50:11] Will retry what I had yesdterday [14:52:17] you were using Hive Context I assume? [14:52:21] nope [14:52:24] oh! [14:53:45] intersting then, ok yeah i guess try agian [14:53:45] But read a directory [14:53:45] read is ine [14:53:45] I get the field list [14:53:46] val df = sqlContext.parquetFile("hdfs:///wmf/data/wmf/webrequest/webrequest_source=text/year=2015/month=5/day=1/hour=1") [14:53:46] df.count() [14:53:46] breaks [14:54:05] oh cool! did you see that spark 1.3 can infer partitions from path? [14:54:13] https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#partition-discovery [14:54:18] yeah, I noticed :) [14:54:32] it gives me webrequest_source: string, year: int, month: int, day: int, hour: int [14:54:36] as last fields [14:54:38] :) [14:54:44] nice! [14:54:47] indeed [14:54:54] But broken ;) [14:54:57] cool, we don't really need HiveContext then at all I guess. i think that wa the main reason we were uting it [14:55:02] joal, how does it break? [14:55:04] what does it do? [14:55:43] java.lang.IllegalStateException: unread block data [14:56:34] tried on another hour, same issue [14:56:43] weird [14:56:44] that's weird [14:56:50] can you get me full exception? [14:56:56] sure [14:57:37] https://gist.github.com/jobar/735a3534aee0836c539e [14:58:38] From what I have read, it occurs when there different versions of spark / java / scala in the classpath [14:58:51] different versions -> different serialization methods [15:02:02] ottomata: camus is still flat :( [15:03:56] yeah, it isn't accepted to run [15:04:00] i see it [15:04:06] not going to move any more jobs into essential queue for now [15:05:01] yeah [15:29:37] joal: sorry we couldn't talk about the uniques report and the x-wmf-uuid patch yesterday. today after standup? [15:29:50] Sure madhuvishy [15:29:53] No problem [15:29:58] :) [15:30:45] madhuvishy: Staaaaand up :) [15:35:58] Analytics-Cluster, Analytics-Kanban: Estimate how many machines to add to cluster - https://phabricator.wikimedia.org/T97060#1260772 (kevinator) Open>Resolved Budget cap-ex has been submitted. [16:15:03] Analytics-Cluster, operations: Build Kafka 0.8.1.1 package for Jessie and upgrade Brokers to Jessie. - https://phabricator.wikimedia.org/T98161#1260868 (Ottomata) NEW a:Ottomata [16:15:24] Analytics-Cluster, operations, Interdatacenter-IPsec: Secure inter-datacenter web request log (Kafka) traffic - https://phabricator.wikimedia.org/T92602#1260877 (Ottomata) [16:15:26] Analytics-Cluster, operations: Build Kafka 0.8.1.1 package for Jessie and upgrade Brokers to Jessie. - https://phabricator.wikimedia.org/T98161#1260876 (Ottomata) [16:15:42] Analytics-Cluster, operations: Build Kafka 0.8.1.1 package for Jessie and upgrade Brokers to Jessie. - https://phabricator.wikimedia.org/T98161#1260868 (Ottomata) [16:15:44] Analytics-Cluster, operations, Interdatacenter-IPsec: Secure inter-datacenter web request log (Kafka) traffic - https://phabricator.wikimedia.org/T92602#1115779 (Ottomata) [16:15:51] Analytics-Cluster, operations: Build Kafka 0.8.1.1 package for Jessie and upgrade Brokers to Jessie. - https://phabricator.wikimedia.org/T98161#1260868 (Ottomata) [16:15:53] Analytics-Cluster, operations, Interdatacenter-IPsec: Secure inter-datacenter web request log (Kafka) traffic - https://phabricator.wikimedia.org/T92602#1115779 (Ottomata) [16:34:47] kevinator: I added some comments to this task on backlog yesterday. Think we can close it - https://phabricator.wikimedia.org/T95690 [16:36:18] thanks madhuvishy . Go ahead and mark the task as "resolved" and I will move it to the done column. [16:36:42] BTW that's the process: the product manager verifies and moves the tasks to done. [16:37:49] Analytics-Kanban, Analytics-Wikimetrics: confirm vagrant setup works for wikimetrics - https://phabricator.wikimedia.org/T95690#1260950 (madhuvishy) Open>Resolved a:madhuvishy [16:38:04] kevinator: okay cool. done :) [16:42:53] joal: I'm around :) Ping when you are free. [16:42:53] Analytics-Cluster, Analytics-Kanban: Email engineering re: x-analytics deployed to all wikis - https://phabricator.wikimedia.org/T89749#1260955 (kevinator) Open>declined we don't need to email the list. The analytics team will inform engineers of this feature when they engage us on a new feature. [16:44:52] hey madhuvishy [16:45:04] joal: Hi [16:45:13] Got abatcave for a moment ? [16:45:29] go to batcave sorry [16:47:08] madhuvishy: --^ ?? [16:47:17] joal: ah yes [16:59:13] hey madhuvishy I'm running late.... I'll be in the hangout in 15 minutes [17:02:30] kevinator: Aaah our one on one. I forgot. I'll do it over hangouts too. I was talking to Joseph and hence will leave a little late. [17:15:37] kevinator: batcave? or some other hangout? It's not on my calendar. [17:15:47] let's do the bat-cave [17:15:50] (PS4) Mforns: Reduce size of public reports folder [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) [17:15:58] kevinator: okay joining in. [17:22:08] Analytics-Cluster, operations, Patch-For-Review: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1261065 (coren) [17:52:15] mforns: around? [17:52:24] madhuvishy, hey [17:53:28] mforns: so i have trouble wrapping my head around the wikimetrics tests. I've been working on fixing validate again on cohorts - and after my changes - have a bunch of failing tests. [17:53:41] madhuvishy, aha [17:54:15] mforns: which makes sense, except i'm not sure how to fix them. Dan told me to make a change yesterday but it din't really fix it. [17:54:22] madhuvishy, do you want tov try troubleshoot this in the batcave? [17:54:28] mforns: sure [17:54:35] ok [17:55:37] ottomata: Heya ! [17:55:45] joal: hiaay [17:56:00] I found why you don't get the error with Spark :) [17:56:02] mforns: are you there? wondering if i'm in the wrong one. [17:56:07] Laaaaaaaaaaaazy loading ;) [17:56:08] oh? [17:56:10] madhuvishy, I'm having problems entering the batcave... [17:56:16] single file? [17:56:22] naw, i did .take() [17:56:30] mforns: aah. [17:56:34] Ah ! [17:56:36] now [17:56:41] .take(10) [17:57:12] hmm [17:57:18] gave you values really ? [17:57:46] Smaep roblem for me with take or show [17:57:53] :( [17:58:01] back in 5 [17:58:37] i only did it on one file though [17:58:43] is millimetric around? [17:58:55] The graph ext is alive, and I will write the announcement about the new capability soon [18:08:09] ottomata: did it on one file, so it's not that [18:08:19] you got the error on one file? [18:08:32] yup [18:08:43] val df = sqlContext.parquetFile("hdfs:///wmf/data/wmf/webrequest/webrequest_source=text/year=2015/month=5/day=1/hour=2/000001_0") [18:08:47] df.take(2) [18:09:02] file format read is ok [18:09:16] but when I try to access the data itsel, gone [18:09:24] ? [18:09:31] .take(2) fails? [18:09:35] yup [18:09:43] hmmm, i don't think that's how it works? you can't just take(2) on the parquetFile, can you? [18:09:44] hm, mabye you can [18:09:47] I did [18:09:53] but I can read the fields from parquet after created df [18:10:07] df.createTemporaryTable("parquetFile") [18:10:17] Yes, did that as well [18:10:21] val rows = sqlContext("select uri_host...") [18:10:22] yeah? [18:10:25] yup [18:10:25] rows.take(10) [18:10:28] that failed for you? [18:10:33] i was running in client mode, btw, not yarn [18:10:33] yes, same error [18:10:34] at that time [18:10:39] sorry, local mode [18:10:56] :) [18:11:00] Might be the reason [18:11:25] And what that sound to me, is that there might nodes on our cluster that wouldn't have the same version as others [18:11:50] yeah, i saw those posts too [18:11:52] i checked though [18:11:59] I trust you :0 [18:12:01] i checked versions of spark-core and hadoop across the cluster [18:12:13] all the same, from cdh 5.4 [18:12:19] hmm [18:12:22] java ? [18:12:26] or scala ? [18:12:52] uploads refinery slowly catches up ... [18:12:58] But it's painfull [18:13:29] And camus does big batches :) [18:13:43] oh i just checked spark and hadoop versions [18:13:54] will check java and scala...although scala shoudlnt' matter, as it is included in the spark jars [18:13:57] yeha [18:13:59] i know [18:14:09] maybe we shoudl pause all of the other jobs [18:14:19] tsvs, mediacounts, pagecounts? [18:14:20] I dunno .. [18:15:30] looks like same version of java on all datanodes [18:15:58] I mean, I would have expected that, with puppet :) [18:16:11] well, yeah sorta, puppet doesn't usually manage versions htough [18:16:15] it can, but we don't have it do that [18:16:25] ah, ok didn't know [18:17:47] diner time, will be back [18:17:58] k [18:26:52] Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#1261863 (MarcoAurelio) [18:28:05] Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#685364 (MarcoAurelio) [19:02:38] Analytics-Cluster, Analytics-Kanban: Compute pageviews aggregates daily and monthly from April {wren} - https://phabricator.wikimedia.org/T96067#1262015 (MeganHernandez_WMF) Really wonderful that this is happening. I'm very interested in regularly reviewing pageview numbers by country and device. I'm won... [19:28:28] (CR) Mforns: WIP: Fix validate again functionality on cohort display page (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/206346 (https://phabricator.wikimedia.org/T78339) (owner: Madhuvishy) [19:29:27] halfak: Man ... You are the one to suffer from me today ... [19:29:43] halfak: I really apologise for the second miss of the day :S [19:29:48] Analytics-EventLogging, Beta-Cluster, VisualEditor: Beta cluster is sending VisualEditor events to production bits.wikimedia.org/statsv - https://phabricator.wikimedia.org/T98196#1262125 (Krinkle) NEW [19:34:22] joal, no worries. I figured you were sleeping/having an evening, so I didn't ping :) [19:34:41] How's your open tickets with altiscale? [19:34:46] *How're [19:34:49] yeah, actually I was feeding the small monster, and did not notice the time [19:35:06] :) [19:35:10] So, thank you for having let me enjoy that moment :) [19:35:21] * halfak is happy to not interrupt [19:35:23] :) [19:35:57] well, spent some time with rajana reviewing the various things we exchanged via email [19:36:06] She's to come to me again soon [19:36:39] bout the big job failing, I now have a different error trying to keep everythiong on disk [19:36:57] But I have trouble accessing the logs (thus the first ticket about proxy stuff0 [19:37:03] halfack --^ [19:37:22] Oh yeah. I remember looking at that with you. [19:37:29] Are they getting back to you quickly enough? [19:37:59] they usually do, but having diffeernt time zone doesn't speed up the process [19:39:07] halfak: I'll ask them if they can give me some more details as per why the job fails [19:39:53] OK. If necessary, we can raise concerns about the level of support with Dean during that weekly sync. Let me know if that is necessary. [19:39:59] joal, ^ [19:40:30] Sounds a good idea, depending of this week outcome [19:42:08] halfak: I'll let you know how things go [19:42:28] kk :) [19:42:38] thanks about that :) [19:43:00] ottomata: I hope US night will help the cluster ... [19:43:17] For the moment, upload-refine are still 17 hours behind [19:43:50] what support is Limn getting, going forward? [19:44:33] joal: hoping that too [19:44:50] been busy trying to make HA resourcemanger work... [19:44:51] We'll take actions tomorrow if issues, ok ? [19:45:11] yes [19:45:14] Can I help [19:45:19] about HA ? [19:45:26] i think i'm going to pause all of the non webrequest refine related jobs [19:45:36] oh, not sure [19:45:49] i'm trying it in labs, something is funky for sure, if you want to brain bounce w me in 5 or 10 minutes, ja sure [19:46:05] let's batcave :) [19:46:30] o/ ottomata did you get to talk to Ellery? [19:47:29] yes, i emailed him and he responded [19:47:47] kk [19:49:48] joal: i have paused the legacy_tsvs job [19:49:55] ok [19:50:01] also, i just noticed that (even though I thought I had), there was a lingering (unworking) old oozie refine bundle running [19:50:14] the jobs failed because they still used the old .hql file without the add jar [19:50:22] I'm in batcave if you wish [19:50:31] ooooohhh ! [19:50:35] but, that could slow things down some, since the oozie launcher jobs have to fit in queue, and then could slow others from launching [19:50:42] small amount of resources, but still ! [19:50:44] but, the jobs themselves never got around to exectuting [19:50:45] yeah [19:50:55] right [19:51:21] Analytics-EventLogging, Beta-Cluster, VisualEditor: Beta cluster is sending VisualEditor events to production bits.wikimedia.org/statsv - https://phabricator.wikimedia.org/T98196#1262230 (greg) @ori: just pinging you because of the bits change you pushed yesterday re beta cluster, related? [19:57:23] joal: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Ports [20:02:45] Analytics-General-or-Unknown: Package libcidr + libanon + libdclass for Ubuntu Trusty - https://phabricator.wikimedia.org/T70997#1262256 (hashar) Open>declined a:hashar For CI purpose, the packages are ignored, so I have no need anymore for this task. [20:02:59] joal: [20:02:59] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_hag_rm_ha_config.html [20:03:01] http://blog.cloudera.com/blog/2014/05/how-apache-hadoop-yarn-ha-works/ [20:03:09] http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html [20:30:04] bmansurov: [20:30:19] ottomata: hey, can we get access for earldouglas to stat1003? [20:30:31] ottomata: unix username is jdouglas [20:37:06] bmansurov: sure, phab ticket! :) [20:37:21] ottomata: ok thanks [20:37:35] please specify what you want the access for too [20:37:41] as there are differnt types of access [20:37:49] ok [20:39:03] Analytics: Access to stat1003 for jdouglas - https://phabricator.wikimedia.org/T98209#1262406 (bmansurov) [20:40:10] Analytics, Ops-Access-Requests, operations: Access to stat1003 for jdouglas - https://phabricator.wikimedia.org/T98209#1262408 (Krenair) [20:42:01] madhuvishy, yt? I understand your change better now [20:42:09] mforns: yeah [20:42:17] madhuvishy, I think it makes complete sense [20:42:40] mforns: :) Sorry I din't do a very good job explaining it [20:42:50] madhuvishy, and I think I know one thing that will make some tests pass [20:43:05] mforns: aah. I also saw your comment on gerrit [20:43:09] madhuvishy, not at all, I need some time to understand [20:43:15] aha [20:43:33] madhuvishy, about the test we were discussing in the batcave [20:44:08] madhuvishy, I agree you should change 'username' by 'raw_id_or_name' [20:44:25] mforns: aah. alright [20:44:32] madhuvishy, but this is not enough, you should also change fixtures.py [20:44:57] madhuvishy, fixtures.py line 229 [20:45:28] mforns: yeah, i see it. [20:45:53] mforns: I added 'raw_id_or_name' : editor.user_id or editor.user_name, to line 233 yesterday [20:45:54] madhuvishy, fixtures.py has some data structures that help in testing. If you follow the inheritance chain from the test class, you'll see that some initializations are done in fixtures.py [20:46:06] mforns: right. [20:46:30] madhuvishy, oh, those are the changes you were talking about before, that you had not pushed yet? [20:46:37] mforns: yes [20:46:40] ok [20:46:52] mforns: did you have anything else in mind? [20:47:14] madhuvishy, so I'd say if you change the username by raw_id_or_name in the test it should pass, no? [20:47:30] mforns: yeah I guess. Let me try it. [20:47:36] ok [20:49:31] https://www.irccloud.com/pastebin/4gzNKIze [20:49:40] self.mwSession.add(MediawikiUser(user_name=username)) [20:49:47] mforns: what is that line doing [20:50:28] madhuvishy, it is adding a MediawikiUser to the database with user_name = username [20:50:49] madhuvishy, not wiki_user, but mediawiki user [20:51:01] ottomata: do you know who maintains the listservers? [20:51:09] madhuvishy, in the wiki database, not wikimetrics [20:51:18] mforns: aah. so i wouldn't have to change any of that [20:51:31] madhuvishy, I don't think so [20:51:49] kevinator: no [20:53:06] ottomata: ok thanks. [20:54:15] mforns: the assertion changed from 0!=3 to AssertionError: 1 != 3 [20:54:23] madhuvishy, aha [20:59:12] mforns: the first assertion is passing.. the second one on line 252 is the one that fails [20:59:24] mforns: do you know what it's trying to check? [20:59:33] madhuvishy, how did you make the first one pass, I'm trying here too [21:00:34] mforns: I changed lines 231 and 232 [21:00:43] https://www.irccloud.com/pastebin/L2S9I309 [21:01:00] mmm [21:01:08] madhuvishy, this doesn't work for me [21:01:41] mforns: your test fails where? [21:02:09] madhuvishy, I think it passes now [21:02:24] madhuvishy, I changed the order of the field in the fixtures.py modification [21:02:37] madhuvishy, 'raw_id_or_name' : editor.user_name or editor.user_id, [21:03:03] mforns: ooh. i wonder if this will fail if we test for user ids then [21:03:56] madhuvishy, yes it fails [21:04:19] mforns: ha [21:07:52] kevinator: hey I just got home [21:08:01] give me a bit to eat, then you wanna chat? [21:08:26] madhuvishy, I also think that there is something wrong in validate_cohort.py:92 [21:08:51] madhuvishy, raw_id_or_name is not a field of cohort_upload, no? [21:09:34] mforns: No.. [21:09:48] madhuvishy, I think it should be as it was before [21:10:15] madhuvishy, I think 'username' [21:10:29] mforns: wait.. so I changed username in the form to be raw_id_or_name [21:10:38] madhuvishy, but just the right part [21:10:54] record[]? [21:11:36] madhuvishy, yes [21:11:37] mforns: I renamed it I think. https://gerrit.wikimedia.org/r/#/c/206346/4/wikimetrics/forms/cohort_upload.py [21:13:00] mforns: I thought technically they do paste either names or ids there - so calling it usernames was misleading. [21:15:22] madhuvishy, ok, then you should change also api/centralauth.py [21:15:35] madhuvishy, which is using still 'username' [21:15:55] mforns: aah, i din't know that [21:16:24] madhuvishy, some tests may be testing centralauth and their test will fail [21:16:32] mforns: this is becoming one big patch! [21:16:58] madhuvishy, yes [21:17:13] madhuvishy, :] [21:22:03] mforns: I don't really understand what centralauth.py is doing. I feel uncomfortable replacing all of it with raw_id_or_name [21:23:52] madhuvishy, it should work, the previous 'username' was just a key of the input [21:24:15] madhuvishy, now the input comes with 'raw_id_or_name' instead [21:24:26] mforns: right. okay [21:24:27] madhuvishy, so it's just changing that [21:25:54] madhuvishy, except from that, I don't recall anything more that needed change... maybe you can try to put some prints in the tests that fail and see what is happening... [21:27:00] kevinator: ok, I'm free now [21:27:02] madhuvishy, If tomorrow (or thursday, because tomorrow=docs) we are on the same spot, we can try milimetric :] [21:27:11] hey guys, i'm around [21:27:13] what's up? [21:27:28] hi milimetric, how was the travel? [21:27:32] mforns: yup okay :) [21:27:37] exhausting [21:27:52] hi milimetric! [21:28:00] milimetric, xD [21:28:02] hi madhuvishy [21:28:06] milimetric, you should rest then :] [21:29:06] milimetric: +1 to mforns [21:29:42] thx :) maybe that's wise [21:30:56] milimetric, madhuvishy, I'll also be signing off in short [21:31:20] mforns: that's okay :) thanks for all the help today [21:31:57] madhuvishy, yw, sorry couldn't help more [21:33:24] milimetric: i'm in the batcave [21:45:02] hi all, I'm trying to collect information for https://phabricator.wikimedia.org/T94271 [21:45:23] sorting MediaWiki objects based on pageviews [21:47:30] how hard is it to aggregate pageviews to a certain URL prefix in hadoop, get that data into the production DB and update periodically? [22:07:46] Analytics, Mobile-Web: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1171644 (JKatzWMF) @phuedx I like that idea and the schema looks good. Anyway, let's do this! Put T95300 into needs analysis, as I think that it can happen... [22:07:48] milimetric: I'm not sure about some of these tests on wikimetrics. Marcel helped a bit today and I fixed a bunch of things - but somethings I don't get. We should pair on it remote sometime. I don't feel good making the changes that I'm making. [22:12:35] joal: if you're around - you mentioned that you ran the query with the x-wmf-uuid change - how do you pass year/month etc? [22:36:37] * yuvipanda writes a stern letter to madhuvishy about using sudo on cluster [22:38:23] yuvipanda: lol [22:38:40] matanya: it claimed access denied. so tried for fun :D [22:38:42] madhuvishy: https://xkcd.com/838/ [22:38:43] uhh [22:38:47] yuvipanda: ^ [22:38:51] sorry matanya. [22:39:02] yuvipanda: I know the one [22:39:14] milimetric: now that madhuvishy has gotten a stern warning I presume she'll become ops in 2 years... [22:39:17] yuvipanda: i have no password [22:39:28] madhuvishy: I know. you don't have sudo rights on that machine. [22:39:33] or any machine, for that matter... [22:40:05] yuvipanda: it was worth trying [22:40:07] :D [22:45:45] ottomata: Help? If I want to run a hive query like - https://phabricator.wikimedia.org/diffusion/ANRE/browse/master/oozie/mobile_apps/uniques/daily/generate_uniques_daily.hql, can I run it like "Usage" claims? Hadoop throws me out with access control exception [22:57:35] permission errors? [22:57:36] hm [22:58:23] madhuvishy: what's it say? [22:58:39] ottomata: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=madhuvishy, access=WRITE, inode="/":hdfs:hadoop:drwxr-xr-x [22:59:27] ottomata: I first tried to set temporary_directly as /tmp/.. and it threw a slightly different error [22:59:42] ottomata: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=madhuvishy, access=WRITE, inode="/tmp/mobile_apps":nuria:hdfs:drwxr-xr-x [23:00:09] ottomata: this is what i tried to run - hive -f generate_uniques_daily.hql -d source_table=wmf.webrequest -d archive_table=wmf.mobile_apps_uniques_daily -d temporary_directory=/tmp/mobile_apps/2015-5-4-madhutest -d year=2015 -d month=5 -d day=4 [23:00:42] ah, looks like tmp dir is owned weird, try setting temporary_directory to [23:00:53] temporary_directory=/tmp/mobile_apps-madhu/2015-5-4-madhutest [23:00:54] maybe [23:00:55] or mabye just [23:01:04] temporary_directory=/tmp/mobile_apps-madhu-2015-5-4-madhutest [23:01:09] or wahtever [23:01:11] something that doesn't exist yet [23:01:17] yeah so then i did: hive -f generate_uniques_daily.hql -d source_table=wmf.webrequest -d archive_table=wmf.mobile_apps_uniques_daily -d temporary_directory=/home/madhuvishy/tmp/mobile_apps/2015-5-4-madhutest -d year=2015 -d month=5 -d day=4 [23:01:44] and [23:01:46] ? [23:01:55] oh, it probably wants to write intto archive_table [23:01:57] the error i first pasted came up [23:02:12] maybe not [23:02:28] oh [23:02:31] ottomata: the tmp directories - are they in local or hdfs? [23:02:31] madhuvishy: not /home [23:02:33] /user [23:02:38] in hdfs [23:02:46] ottomata: right. that makes sense [23:04:07] ottomata: do the tmp results tables get created in this directory? [23:04:16] also now it runs [23:36:32] Analytics-Engineering, Analytics-Kanban: Normalize the domain names while querying for uniques based on last-access cookie - https://phabricator.wikimedia.org/T98257#1263141 (madhuvishy) NEW a:madhuvishy [23:51:55] Analytics-Engineering, Analytics-Kanban: Normalize the domain names while querying for uniques based on last-access cookie - https://phabricator.wikimedia.org/T98257#1263261 (madhuvishy) [23:57:45] kevinator: You asked about pageviews for March or april? I can't really find March data.