[00:00:26] gwicke: seems to me that in order to use refined kafka data you need a platform we don not have now. raw topic with webrequest data is likely not usable for your purposes as it is now. [00:01:06] understood that it's not possible right now [00:01:42] but, if you said that you are likely looking into streaming before the end of Q4, then it might be worth waiting for that [00:02:03] gwicke: no, sorry, that is not what i said [00:02:13] 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), 03Interactive-Sprint: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2878987 (10Yurik) @mpopov, I'm not sure you need `AND INSTR(referer, '/geohack/') = 0 ` -- as it already matches the pre... [00:02:20] gwicke: I said: "once we know where we stand with public event streams RCFeed deprecation next quarter| kubernetes+ budget for hardware we will decide, it is going to take at least one more quarter to define the project " [00:02:31] nuria, do you think we can enable that query today? I would hate to loose 2 weeks of data :( [00:02:39] right, I was kind of asking if that might be before the end of Q4 [00:02:40] yurik: enable? [00:02:45] it's a conditional [00:02:46] nuria, https://phabricator.wikimedia.org/T151832 [00:02:54] if you said X, then it might make sense to do Y [00:03:16] gwicke: no, it would not be most likely. [00:03:35] yurik: sorry, not super sure what you are asking? [00:03:47] nuria: okay [00:05:20] yurik: can you elaborate a little bit maybe? [00:08:11] 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), 03Interactive-Sprint: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2878998 (10mpopov) >>! In T151832#2878987, @Yurik wrote: > @mpopov, I'm not sure you need `AND INSTR(referer, '/geohack/... [00:13:25] nuria, i mean - the hive query seems to be ready for being added to the pivot - what are our next steps to actually add it? I am hopping not to loose historical data if possible [00:14:51] yurik: if the data is in hive it will not get lost, the first thing to do would be to create a schema that describes the data we are going to insert into druid, data is inserted from hive tables, not query results. [00:15:17] yurik: Example: this is the schema for pageview data: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/druid/daily/load_pageview_daily.json.template [00:15:48] nuria, i meant the data gets deleted every day, and since we will have a 2 week freeze, two more weeks in november will be lost [00:16:00] unless we won't delete anynthing for the next few weeks? [00:16:35] yurik: wait data retentio applies everywhere, we delete dat ain druid too [00:17:07] yurik: unless it is from a greggated dataset (monthly article pageviews , i think) [00:17:16] oh, including aggregated data? [00:17:27] yurik: if it is very granular yes, [00:17:29] right, the query result should be aggegated enough to be fully anon [00:17:35] gotcha [00:17:51] i am hoping that this won't be granular enough so it can be retained [00:18:06] yurik: this is a POC so i wouldn't have that expectation [00:18:17] poc? [00:18:26] yurik: it is just a mean to try whether pivot is a good fit [00:18:41] ah, proof of concetp [00:19:10] bearloga: do you have a table in which teh results of your query are stored? [00:20:14] bearloga: we can use that as an intermediate table to reformat data (maybe) into another one where dimensions match closely waht we expect in druid [00:20:16] *what [00:21:01] cc yurik [00:21:47] nuria, i guess we could run it already and dump it into another table... bearloga probably is better versed in it, or i could do it (i used to work on it a lot) [00:21:49] 10Analytics-EventLogging, 06Analytics-Kanban: Add user_agent_map field to EventCapsule - https://phabricator.wikimedia.org/T153207#2879052 (10Nuria) a:03Nuria [00:21:53] but don't remember much by now :) [00:22:27] nuria yurik: ah, okay. I'll make & fill mikhail.tiles_hourly or something and then we can work with that [00:22:42] thanks! [00:22:49] data shall be preserved! :D [00:23:01] bearloga: sounds good, data will not be updated or anything, again this is just a poc and a static snapshot is fine [00:23:15] bearloga, mikhail.kartotherian_hourly ? [00:23:33] okie dokie [00:23:50] bearloga, if you are ok with it, i hope we can also automate some of that data to be sent to grafana [00:24:11] but first thing first - lets keep the aggregate for a bit so we don't loose it [00:28:23] (03PS1) 10Nuria: Removing dependency on files coming from labs domain [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/327675 [00:29:18] (03CR) 10Nuria: [V: 032 C: 032] "Self-merging deploy" [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/327675 (owner: 10Nuria) [00:41:19] (03Abandoned) 10Nuria: Removing dependency on files coming from labs domain [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/327675 (owner: 10Nuria) [00:45:37] (03PS1) 10Nuria: Removing dependency on files coming from labs domain [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/327679 [00:46:04] (03CR) 10Nuria: [V: 032 C: 032] "Self-merging merge." [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/327679 (owner: 10Nuria) [00:49:57] (03PS1) 10Nuria: Correcting piwiki configuration from last commit [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/327680 [00:50:20] (03CR) 10Nuria: [V: 032 C: 032] Correcting piwiki configuration from last commit [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/327680 (owner: 10Nuria) [01:40:29] 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), 03Interactive-Sprint: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2879219 (10mpopov) OK, gonna get those counts into **mikhail.kartotherian_hourly** via the .hql below. Will update when... [09:36:42] the clickhouse nightmare is not finished [09:36:55] I tried to unpack the deb to test it on druid hosts [09:36:55] but [09:37:29] ldd clickhouse-server is not what I expect [09:37:45] and I think that it is due to building the pkg on Debian [09:38:47] they use extensively the -fPIC compiler flag (relocable code) that usually is needed for .so libraries. AFAIU Debian does not package static libs (.a) building them with -fPIC, so I get compilation errors [09:39:52] there is an option to use shared libs when possible (the only one that eventually allows me to build) [09:40:06] but as expected it wants the shared libs at load time :D [11:09:09] (03CR) 10DCausse: [] "Thanks nuria!" (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [11:15:36] random question (out of curiosity), do you have plans to migrate to java8 everywhere? [11:15:44] and hi (sorry :) [11:23:04] dcausse: hi :) Any specific place that you'd like to have java8 into? [11:23:38] elukey: i.e. allow java8 on refinery-source [11:24:07] so we can import recent libs that switched to java8 [11:24:10] ahh okok, so for this question I'd wait joal :) [11:24:19] ok :) [11:24:27] no big deal, just curiosity :) [11:30:51] 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: Puppetize clickhouse - https://phabricator.wikimedia.org/T150343#2880486 (10elukey) "Victory" should never be said out loud too soon: ``` ./usr/bin/clickhouse-server: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.23' not found (required by ./usr/bin... [12:43:37] * elukey lunch [12:50:07] 10Analytics-Dashiki: Dashiki 404 - https://phabricator.wikimedia.org/T104545#2880695 (10Nemo_bis) I still get a 404 at https://metrics-staging.wmflabs.org/static/public/dash/ ; a redirect would be in order. [13:30:48] 06Analytics-Kanban, 06Operations, 10Ops-Access-Requests, 15User-Elukey: Requesting access to Analytics production shell for Francisco Dans - https://phabricator.wikimedia.org/T153303#2880870 (10elukey) [13:38:40] fdans: o/ [13:38:58] FYI https://gerrit.wikimedia.org/r/#/c/327730 will be reviewed on Monday by the ops team (evening EU time) [13:39:22] We use puppet to configure everything of our infrastructure [13:39:44] I added your user + ssh key, and some basic groups [13:40:07] that will grant you ssh access to some machines in prod like stat*, druid*, the Hadoop cluster, etc.. [13:40:18] if you have any questions, please let me know :) [13:49:37] Hi a-team ! [13:49:42] Hi dcausse [13:49:56] hey joal ! [13:50:14] dcausse: we have no 'plan' to move to java 8, but why not think about it [13:50:27] cool :) [13:50:34] dcausse: Biggest concern I can think of is that hadoop is java7 [13:50:40] oh ok [13:50:51] we're still exploring solutions atm [13:51:31] still unsure if java is the way to go, competing languages are python and java for us I think [13:52:03] dcausse: for hadoop, if you use python, it'll have to go through jython, or to use hadoop-streaming [13:52:33] elukey: I have not uderstood everything about the lib linking you talked about, seems complicated ! [13:52:50] milimetric: ready when you want for dynamic partitioning [13:54:40] joal: I've heard that some of you are experimenting with word2vec for article reco, do you know which implementation is being used? [13:54:58] dcausse: indeed, Ellery is doing that [13:55:03] dcausse: I think [13:55:27] dcausse: I think they use TensorFlow [13:56:16] hello team :] [13:57:21] Hey mforns :) [13:57:36] mforns: I have infos for you (not working systems, but info ;) [13:57:51] joal, I like info :] [13:58:06] mforns: batcave? [13:58:12] omw! [14:05:47] joal: if you want I can explain the mess, whenever you have time [14:07:26] elukey sorry, just back from lunch [14:07:39] thank you so much for that! [14:09:26] joal: heya, to ze cave? [14:09:38] ah he's in ze cave [14:09:41] fdans: yw :) [14:09:47] milimetric: yes, we are with mforns :) [14:09:49] the sooner you have perms the better for us! [14:10:52] milimetric do you have a few minutes at some point today to talk to me through dashiki? :) [14:11:19] fdans: yes! now is the best time, https://plus.google.com/hangouts/_/wikimedia.org/a-batcave-2 [14:11:52] goin in! [15:17:09] milimetric: I have some tim now :) [15:27:59] joal: omw batcave [15:28:17] milimetric: joining ! [15:42:52] 06Analytics-Kanban, 10Analytics-Wikistats: Navigation model concept development - https://phabricator.wikimedia.org/T152438#2881228 (10ashgrigas) Navigation model that will go into forming site map and wireframes has been completed and can be viewed here: https://www.dropbox.com/s/c5nsrqyedjtwune/Wikistats%20N... [15:43:28] 06Analytics-Kanban, 10Analytics-Wikistats: Visual Language for http://stats.wikimedia.org replacement - https://phabricator.wikimedia.org/T152033#2881230 (10ashgrigas) [15:43:30] 06Analytics-Kanban, 10Analytics-Wikistats: Navigation model concept development - https://phabricator.wikimedia.org/T152438#2881229 (10ashgrigas) 05Open>03Resolved [15:46:30] 06Analytics-Kanban, 10Analytics-Wikistats: Develop site map and overview and detail page wireframes - https://phabricator.wikimedia.org/T153466#2881248 (10ashgrigas) [15:58:33] 06Analytics-Kanban, 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-Elukey: Requesting access to Analytics production shell for Francisco Dans - https://phabricator.wikimedia.org/T153303#2875954 (10Cmjohnson) FYI: This will need Ops approval since it grants SUDO and will be discussed du... [16:00:33] elukey: hola, standup [16:03:20] 06Analytics-Kanban, 10Analytics-Wikistats: Visual Language for http://stats.wikimedia.org replacement - https://phabricator.wikimedia.org/T152033#2881330 (10Nuria) [16:03:23] 06Analytics-Kanban, 10Analytics-Wikistats: Navigation model concept development - https://phabricator.wikimedia.org/T152438#2881329 (10Nuria) 05Resolved>03Open [16:48:22] 10Analytics, 06Operations: sync bohrium and apt.wikimedia.org piwik versions - https://phabricator.wikimedia.org/T149993#2771280 (10Dzahn) And exactly this almost happened. i saw "piwik" as "waiting for upgrade" in servermon and only in the last minute stopped myself when "The following packages will be DOWNG... [16:53:53] joal: semi-success [16:53:58] Arf [16:53:59] all wikis, all dates result: [16:54:11] the ones that had results for the previous run were not overwritten [16:54:22] no error, just no update [16:54:41] and the wikis that had no results for the previous time period (the small wikis that would have had 0 for those dates) [16:54:47] like abwiki [16:54:51] those have full results now [16:54:59] milimetric: hm, makes sense: nothing to overwrite, therefore not deleted [16:55:30] milimetric: This probably means we'd want to deletethe folders before re-inserting [16:55:34] yep [16:55:43] no prob for this, one-off, we can remember to wipe [16:56:03] I sthink it should be ok year :) [16:56:08] but success in the sense that it kept all results in one file, even daily for all time [16:56:18] milimetric: Happy :) [16:56:23] yep, me too :) [16:56:27] milimetric: The reason for that is ORDER BY [16:56:29] nice, come Monday we'll have dashboards [16:56:43] joal: oh, i should run these on your latest data! [16:56:46] milimetric: let's not forget to have that on all queries [16:56:47] where's it at? [16:56:57] yep, i'll make sure all queries are consistent [16:57:08] milimetric: cool [16:57:45] ok, I just have to move your latest to /wmf/data/wmf/mediawiki/history [16:57:48] gotta remember that [16:58:02] milimetric: correct :) [16:58:06] You found it ;) [16:58:37] milimetric: I'll also push that onto druid under mediawiki-history source, ok ? [16:58:51] yes, good with me [16:58:54] yay [16:59:06] oh, sorry, yours is /user/joal/wmf/data/wmf/mediawiki/history of course [16:59:23] but same path - easy to remember [16:59:48] I actually thought that's what you meant (didn't properly read the root slash) [17:01:05] joal: are you doing the druid loading now? It's faster to move your thing, and then you can do it from the /wmf instead of yours? [17:02:56] milimetric: not doing it now [17:03:00] milimetric: you can move :) [17:03:16] milimetric: can you also please check the folder/files ownership ;) [17:05:40] k, moved and chowned -R hdfs:hadoop [17:13:19] Many thanks milimetric [17:13:26] milimetric: have you updated hive schema ? [17:13:40] oh, no, there's new fields?! [17:13:41] milimetric: there was the redirect that had been added I think [17:13:49] I think that was there before, checking [17:13:53] thanks [17:14:24] you're right, not there, will do so before I run the queries [17:14:40] milimetric: let's triple schemas before you actually change them [17:15:04] milimetric: 55 fields written in parquet [17:16:18] I usually get the _common_metadata, transform it in vim, and update the create statement with it [17:16:28] k :) [17:16:43] it's really quick, but i gotta run grab lunch now, will do it, no worries [17:16:44] then you'll get correct parquet stuff indeed [17:16:49] thanks mate [17:29:30] (03PS5) 10Nuria: Lucene Stemmer UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [17:36:25] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2881747 (10RobH) Well, if the proposed replacements to stat100[23] will potentially have GPU added at the time of order, can we demonstrate that we kn... [17:44:00] I'll be offline the next half hour to go to the airport and will be working from there for a couple hours until I board :) [17:49:05] * elukey goes afk! [17:49:06] byeeee [17:52:35] Bye elukey ! [17:58:27] (03PS6) 10Nuria: Lucene Stemmer UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [17:59:58] (03CR) 10Nuria: [] Lucene Stemmer UDF (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [18:00:47] 06Analytics-Kanban: Check if we can deprecate legacy TSVs production (same time as pagecounts?) - https://phabricator.wikimedia.org/T130729#2881811 (10Nuria) 05Open>03Resolved [18:01:00] 06Analytics-Kanban: Dashiki tests broken on master - https://phabricator.wikimedia.org/T152631#2881813 (10Nuria) 05Open>03Resolved [18:01:16] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#2881822 (10Nuria) [18:01:18] 06Analytics-Kanban, 13Patch-For-Review: Migrate the simplest limn dashboards to dashiki tabular {frog} - https://phabricator.wikimedia.org/T126358#2881821 (10Nuria) 05Open>03Resolved [18:01:30] 06Analytics-Kanban, 06Operations, 13Patch-For-Review, 07Puppet: Refactor eventlogging.pp role into multiple files (and maybe get rid of inheritance) - https://phabricator.wikimedia.org/T152621#2881824 (10Nuria) 05Open>03Resolved [18:02:00] 10Analytics-EventLogging, 06Analytics-Kanban, 10EventBus, 13Patch-For-Review: Ensure no dropped messages in eventlogging producers when stopping broker - https://phabricator.wikimedia.org/T142430#2881829 (10Nuria) 05Open>03Resolved [18:02:11] 06Analytics-Kanban: (subtask) Dan's standard metrics - https://phabricator.wikimedia.org/T150025#2881830 (10Nuria) 05Open>03Resolved [18:02:13] 06Analytics-Kanban: Run Standard metrics on denormalized history and compare with wikistats - https://phabricator.wikimedia.org/T150023#2881831 (10Nuria) [18:02:25] 06Analytics-Kanban: Run Standard metrics on denormalized history and compare with wikistats - https://phabricator.wikimedia.org/T150023#2771909 (10Nuria) [18:02:27] 06Analytics-Kanban: (subtask) Marcel's standard metrics - https://phabricator.wikimedia.org/T150024#2881832 (10Nuria) 05Open>03Resolved [19:10:32] 10Analytics, 06Analytics-Kanban: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2882034 (10Milimetric) a:03Milimetric [19:11:14] 10Analytics, 06Analytics-Kanban: Investigate duplicate EventLogging rows - https://phabricator.wikimedia.org/T142667#2882052 (10Milimetric) a:03Milimetric [19:11:24] 10Analytics, 10Analytics-EventLogging, 06Analytics-Kanban: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2672617 (10Milimetric) [20:11:08] 10Analytics, 10Analytics-EventLogging, 06Analytics-Kanban: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2672617 (10Nuria) We no longer have ops logs nor data logs for this item. To reiterate over our e-mail conversation: Please tag tickets... [20:11:10] 10Analytics, 10Analytics-EventLogging, 06Analytics-Kanban: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2882302 (10Nuria) 05Open>03Resolved [20:12:00] milimetric: FYI that i have closed one of the EL tickets, we had actually looked at that problem earlier [20:12:18] milimetric: wait, i will put it in done but not closed [20:12:25] yeah, it looked familiar, but this disconnect between phab workboards is awful [20:12:30] 10Analytics, 10Analytics-EventLogging, 06Analytics-Kanban: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2672617 (10Nuria) 05Resolved>03Open [20:12:47] milimetric: phabricator doesn't substitute human interaction though [20:13:06] phabricator *is* human interaction [20:13:25] at least, one way of interacting [20:14:44] I think this situation is very bad - an open bug that's thought to be ignored [20:15:00] I think there are problems all around that on all sides [20:15:12] and we can fix them, so it's not a big deal [20:15:14] but we should fix them [20:15:40] mforns: still here? [20:15:53] 10Analytics: event_user_is_anonymous is never true - https://phabricator.wikimedia.org/T153492#2882307 (10Milimetric) [20:15:53] joal, yes! [20:15:57] what's up? [20:15:58] Heya mforns :) [20:16:10] mforns: What version of node should I use for testing KafkaSSE? [20:16:47] joal, don't know it's not specified in the README... [20:17:20] joal, but it should be one that supports ES6 [20:17:54] joal, I had to upgrade my node to be able to npm install [20:18:23] joal, currently at v7.2.0 [20:18:35] wow, growing fast [20:18:36] ok [20:18:41] yea [20:19:33] Thanks mforns [20:19:40] np :] [20:29:11] 10Analytics-Cluster, 06Operations: stat1004 - sync snakebite version with repo - https://phabricator.wikimedia.org/T153493#2882338 (10Dzahn) [20:46:48] joal: I think node 6 might be too new no? [20:47:05] nuria: I have no idea ... mforns suggested 7.2 [20:47:14] joal: 4 is likely to work ( but please ignore if things are working) [20:47:22] joal: and it did not give you nay issues? [20:47:31] nuria, 6.9.2 is LTS [20:47:33] nuria: currently testing [20:47:50] mforns: the one we use is 4, we just evaluated moving to 6 [20:47:55] aha [20:48:07] mforns: but agin, if it works ... [20:48:14] *again [20:49:09] yes, but you're right, 7.2.0 is latest, still not stable, sooo [20:51:32] nuria: found node:4 in .travis.yml - Will test with that [20:51:56] joal: right, that seems more likely to work w/o issues [20:52:22] milimetric, nuria, I finished the docs: https://wikitech.wikimedia.org/wiki/Analytics/Dashboards [20:52:47] please, can you review to see if there's something unclear or missing? [20:53:16] 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), 03Interactive-Sprint: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2882398 (10mpopov) It's almost done with Novemeber's requests. It should be done with December by the end of the day. He... [20:53:40] 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), 03Interactive-Sprint: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2882399 (10Nuria) Are the columns on this table the dimensions we wish to see in pivot? Seems that you could save some... [20:53:42] mforns: looking [20:55:05] 06Analytics-Kanban, 10GitHub-Mirrors, 07Documentation, 07Easy, 13Patch-For-Review: Mark documentation about limn as deprecated - https://phabricator.wikimedia.org/T148058#2713665 (10mforns) Finished rewriting the documentation on Wikitech: https://wikitech.wikimedia.org/wiki/Analytics/Dashboards Comments... [20:56:23] nuria, thanks! can I merge the limn README change? [20:56:52] mforns: nice, my sugeestions. I would add a quickstart gide with "i already have tsv files , how do a generate a visualization?" , "access data from hive using reportupdater" (so is clear that you rae not limited just to mysql replicas) [20:57:03] mforns: i though i did merge it... [20:57:15] nuria, ah ok will merge [20:57:25] mforns: ah sorry, merged now [20:57:35] thanks! :] [20:57:53] mforns: super thanks for updating docs [20:58:07] nuria, yes, I can add something like: if you already have tsv files, jump to next section [20:58:27] regarding hive [20:59:13] bearloga: i would remove "device" from our first load of data, i think given the amount of different devices we might run into issues [21:00:01] bearloga: forgot what is the limit for 1 dimension for druid but that field is large (in terms of variety of data) [21:00:05] nuria, regarding hive, there's this section called supported data sources that includes hive cluster, do you think it should be more visible? [21:00:13] nuria: that makes sense! noted. I remember hearing the limit is 10k [21:00:29] mforns: for q quickstart yes, cause you need to read a lot to get there [21:00:54] nuria, ah ok ok I get it [21:01:06] bearloga: ya, let's remove it [21:07:54] bearloga: and let's work with just one moth of data no? How many records do we have? [21:09:29] nuria, bearloga: there is no strict limit, just that high-cardinality dimensions are harder to compute on, there slows down everything [21:09:39] (03PS5) 10Milimetric: [WIP] Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 [21:10:04] joal: right, we want to be able to iterate on this fast, cc bearloga [21:10:14] :) [21:10:55] hey a-team, logging off! have a nice weekend :] [21:11:01] Bye mforns :) [21:11:42] 06Analytics-Kanban: Create 1-off tsv files that dashiki would source with standard metrics from datalake - https://phabricator.wikimedia.org/T152034#2882434 (10Milimetric) I found a small bug in event_user_is_anonymous, which is that it's always false. So I couldn't run the metrics that look at that flag, but I... [21:11:51] nite yall [21:11:59] * milimetric prays for clean data :) [21:13:54] milimetric: I'm very sorry for that bug :( [21:23:48] milimetric: found it -- Question for you [21:24:09] milimetric: Should we compute isBotByName for anonymous users? [21:29:44] milimetric: Shall I recompute during weekend ? [21:32:01] (03PS1) 10Nuria: i[WIP] POC of loading tile data into pivot [analytics/refinery] - 10https://gerrit.wikimedia.org/r/327845 (https://phabricator.wikimedia.org/T151832) [21:32:46] joal: for a 1st loading try do you recomend loading from hadoop or just a file? cc bearloga , see: https://gerrit.wikimedia.org/r/#/c/327845/1/oozie/maps/druid/load_map_tiles.template.json [21:33:16] nuria: loading druid only happend with hadoop (for us) [21:33:27] nuria: so, hadoop it must :) [21:33:53] joal: ah, didn't we had some python loader before? [21:34:00] joal: but ok, hadoop it is [21:34:12] nuria: Ah ! [21:34:17] nuria: didn't get it [21:34:35] nuria: You can launch the load manually, yes (it'll fire some hadoop job is what I mean) [21:34:47] nuria: but you cam launch load manually [21:35:15] joal: ok, will read a bit about this [21:36:44] nuria: given you have the ready-to-go for python file, you can use the python script used in oozie (but by hand) [21:37:11] nuria: also, just a quick check: data to be imported must be in json format [21:40:49] joal: ah no, thus far is on a hive table, so i will just modify the pageview oozie workflow, i guess [21:41:30] nuria: might actually be simpler (steps are there: JSON conversion, then export into druid) [21:41:54] joal: ay sorry, which one do you think i simpler? [21:41:59] *is simpler? [21:42:31] nuria: doing by hand might be a bit simpler, but since you're willing to automate anywasy, going for oozie striaght makes sense [21:43:16] joal: actually i would first like to see whether data in pivot is useful before automating, thsu far this is a poc [21:44:10] nuria: ok, then go for manually replacated steps from pageview workflow: convert data in hive (new table, json formated -- notice the compression format, it';s important) [21:44:49] joal: this right? https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/druid/daily/generate_daily_druid_pageviews.hql#L44 [21:45:38] And then manually launch a druid indexation (instead of patterns in file, use real values, open ssh tunnel to Overlord, send json) [21:45:51] Yes nuria, that output format for json in hive [21:46:31] joal: will give it a try next week [21:46:32] nuria: Things at the beginning are important: ADD JAR, SET compressions etc [21:46:40] k [21:46:51] joal: ok, will ping you again I'm sure [21:46:52] nuria: As said earlier on, I can provide assistance ;) [21:46:57] sure [21:47:50] 10Analytics, 10EventBus, 06Operations, 06Services (done), 15User-mobrovac: EventBus HTTP proxy service's syslog entries should be readable - https://phabricator.wikimedia.org/T153028#2882574 (10mobrovac) 05Open>03Resolved The imminent problem has been resolved, let's track log files consolidation in... [21:54:56] (03PS9) 10Joal: Add mediawiki history spark jobs to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T141548) [21:57:21] (03PS10) 10Joal: Add mediawiki history spark jobs to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T141548) [22:08:46] 06Analytics-Kanban: Create 1-off tsv files that dashiki would source with standard metrics from datalake - https://phabricator.wikimedia.org/T152034#2836112 (10JAllemandou) @milimetric: Bug found and corrected in scala code, new dataset computation launched... Multi jinx ! [22:19:00] (03PS1) 10EBernhardson: UDF for extracting primary full text search request [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T149047) [22:19:54] (03PS2) 10EBernhardson: UDF for extracting primary full text search request [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T149047) [22:53:05] (03PS3) 10EBernhardson: UDF for extracting primary full text search request [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T149047)