[05:27:43] (03CR) 10EBernhardson: [] "turns out there is still a problem here, although i don't know if its worth worrying about. When trying to use this UDF from a pyspark Hiv" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [06:18:06] 10Quarry: Remove dependency on labs_debrepo - https://phabricator.wikimedia.org/T153615#2885716 (10scfc) [07:12:55] (03PS8) 10EBernhardson: Lucene Stemmer UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) [09:21:13] Hi elukey, just saw the mess in ops chan - Let me know if there's anything I can do to help [09:22:01] helloooo [09:22:10] yesterday or today? [09:22:12] :D [09:22:28] Aouch, yesterday as well ? [09:22:39] man .... bad weekend :( [09:22:42] What's up? [09:24:02] atm only a puppet merge failure, a lot of noise but nothing weird.. yesterday https://phabricator.wikimedia.org/T153588 [09:24:54] that is really bad :( [09:25:03] elukey: hm hm ... It'll be interesting to spend time together with Eric and others at Dev Summit [09:40:45] (03CR) 10DCausse: [C: 031] "lgtm," (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [10:51:49] (03PS1) 10Joal: [WIP] Add oozie job loading MW history in druid [analytics/refinery] - 10https://gerrit.wikimedia.org/r/328154 (https://phabricator.wikimedia.org/T141473) [11:05:11] 10Analytics-EventLogging, 10MediaWiki-Vagrant: How to use Wikipedia EventLogging schemas in Vagrant setup? - https://phabricator.wikimedia.org/T153641#2886331 (10mschwarzer) [11:10:44] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#2886344 (10JAllemandou) @Ottomata: Let's kill that one and rethink the whole partitioning one of those days, right? [11:18:21] Morning a-team! [11:20:18] o/ [11:20:26] Hi fdans :) [11:44:25] joal: I can't accept your excuses for the email [11:44:31] :D [11:44:35] we'll need to settle this with beers in SFO [11:45:01] elukey: I just realzed I started a new test without changing the email ... Maaaaaan [11:45:09] elukey: I'll have to pay beers [11:45:18] elukey: or maybe it'll actually work? [11:46:07] :D [11:49:51] 06Analytics-Kanban: Create 1-off tsv files that dashiki would source with standard metrics from datalake - https://phabricator.wikimedia.org/T152034#2886578 (10JAllemandou) @Milimetric : Just had a quick look at the queries, I think we can confirm the LIMIT theory :) Something to notice: Since those queries are... [13:50:45] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017), 07Documentation: Create basic Kibana (dashboard) documentation for admins - https://phabricator.wikimedia.org/T145929#2886935 (10Aklapper) [13:52:18] mforns are you able to meet today at 1pm et? if not i can move meeting to another time [13:55:36] 06Analytics-Kanban, 10Analytics-Wikistats: Navigation model concept development - https://phabricator.wikimedia.org/T152438#2886938 (10ashgrigas) Uploaded file here: https://phabricator.wikimedia.org/F5102110 [14:00:57] joal: got a few minutes to give me an intro about denormalized? [14:20:33] hey fdans [14:20:37] I have yes :) [14:20:57] batcave? :) [14:21:01] OMW ! [14:36:01] joal, kudos for the docker kafka patch :] [14:36:14] You're welcome :) [14:36:25] Took me some time, but I'm happy having found the solution :) [14:36:28] mforns: --^ [14:37:18] awesome work [14:48:29] joal: doh! simple limit!! not complicated limit :) [14:48:33] I didn't think of that - rerunning now [14:48:50] cool milimetric :) [14:51:54] (03PS7) 10Milimetric: Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 [14:59:06] milimetric, I finished the dashboard docs, can you have a look at them when you have a sec? :] https://wikitech.wikimedia.org/wiki/Analytics/Dashboards [14:59:20] mforns: nice, I'll take a look, of course [14:59:22] fdans: we'll discuss your access request today hopefully [14:59:26] thanks [15:09:48] mforns ^^ can you meet at 1pm et today? [15:10:25] elukey awesome thank you! :) [15:10:58] hi alllll :) [15:11:14] hellooo :] [15:11:38] ashgrigas, sure, it's in my calendar :] [15:11:59] ok cool just wasnt sure since you didnt reply at least as i see on my cal :) [15:12:46] ashgrigas, ok ok, I'll be there [15:12:50] hey joal, I think there are two strict modes? [15:12:59] set hive.exec.dynamic.partition.mode=nonstrict doesn't affect some other strict mode [15:13:23] do you want to turn off the other strict mode or increase the limits? [15:27:06] milimetric: hm [15:27:39] milimetric: get limits higher seems better then (1biliion for instance) [15:27:42] But that's weird [15:28:31] milimetric: I have question for you, do you have a minute? [15:30:09] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2887110 (10Ottomata) I think only @ellery can speak to this question. @ellery? :) [15:32:13] (03CR) 10DCausse: [] UDF for extracting primary full text search request (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T149047) (owner: 10EBernhardson) [15:42:22] hi there! how to setup the analytics-eventlogging system for development purpose? is the vagrant eventlogging role enough? or you need additional tools, e.g. https://github.com/wikimedia/eventlogging , to simulate the productive setup? [15:44:40] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#2887125 (10Ottomata) Yeah, I think so, but that would be a huge task. If we want to merge in the shorter term, we can consider doing this. [15:46:02] 10Analytics: Check if we can merge maps partition into misc partition at varnishkafka level - https://phabricator.wikimedia.org/T130733#2887127 (10JAllemandou) Right ... I wonder. [15:52:25] 06Analytics-Kanban, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2887128 (10mforns) I'm looking into this. I think the difference between Chrome Mobile and Mobile Safari in the third chart is legit. If we look at long term trends, M... [15:54:21] ottomata: quick question on arcanist: seems to need php installed- is that true? [15:55:26] ottomata: found it myself [15:55:31] ottomata: vermind [15:58:08] joal: i think so? [15:58:11] but ok! [15:58:19] needed indeed ottomata [15:58:40] ottomata: I'm so sad ... I managed NOT to install ANY php on that machine of me so far ;) [15:59:49] haha daw, sorry! [15:59:52] :D [16:01:13] ping ottomata standduppp [16:02:26] OO [16:03:16] 06Analytics-Kanban, 10MediaWiki-Vagrant: Cannot enable 'analytics' role on Labs instance - https://phabricator.wikimedia.org/T151861#2887162 (10Ottomata) Oh! I didn't realize you wanted Hive and Oozie. Those instructions are def old. The Hive Metastore and database need to be set up as well. So does the Oo... [16:13:21] 06Analytics-Kanban, 10Continuous-Integration-Config, 10EventBus, 06Release-Engineering-Team, 10Wikimedia-Stream: Improve tests for KafkaSSE - https://phabricator.wikimedia.org/T150436#2887184 (10mforns) a:05mforns>03JAllemandou [16:15:43] ottomata !!! Managed to have that : https://phabricator.wikimedia.org/D521 [16:16:07] NICE [16:25:20] (03PS33) 10Nuria: Script sqooping mediawiki tables into hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476) (owner: 10Milimetric) [16:27:12] (03CR) 10Joal: [V: 032 C: 032] "This patch has been +1ed by ottomata a while ago, looks good to me." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476) (owner: 10Milimetric) [16:34:54] ottomata: Having re-scanned the KafkaSSE docker patch, I can see some improvement :) [16:35:18] ottomata: not so much on tech side, more on doc [16:35:28] k [16:36:47] 06Analytics-Kanban, 10MediaWiki-Vagrant: Cannot enable 'analytics' role on Labs instance - https://phabricator.wikimedia.org/T151861#2887224 (10Nuria) a:03Ottomata [16:37:08] 10Analytics, 10EventBus, 10Wikimedia-Stream, 13Patch-For-Review: Implement server side filtering (if we should) - https://phabricator.wikimedia.org/T152731#2887225 (10Nuria) [16:39:38] 06Analytics-Kanban, 10MediaWiki-Vagrant: Enable 'analytics' role on Labs instance - https://phabricator.wikimedia.org/T151861#2887234 (10Nuria) [16:39:47] 06Analytics-Kanban, 10MediaWiki-Vagrant: Enable 'analytics_cluster' role on Labs instance - https://phabricator.wikimedia.org/T151861#2830159 (10Nuria) [16:40:39] 10Analytics-EventLogging, 06Analytics-Kanban: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2887240 (10Milimetric) [16:41:08] 06Analytics-Kanban: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2887242 (10Nuria) 05Open>03Resolved [16:41:20] 06Analytics-Kanban: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2672617 (10Nuria) a:05Milimetric>03Nuria [16:41:57] 06Analytics-Kanban: Investigate duplicate EventLogging rows - https://phabricator.wikimedia.org/T142667#2887260 (10Nuria) a:05Milimetric>03Nuria [16:47:29] 06Analytics-Kanban, 10EventBus, 06Operations, 10Traffic, and 2 others: Productionize and deploy Public EventStreams - https://phabricator.wikimedia.org/T143925#2887270 (10BBlack) [16:49:24] 10Analytics: Investigate US traffic by state normalized by population - https://phabricator.wikimedia.org/T114469#1696431 (10Nuria) Let's take time on Q3 to reserach whether this is still the case. Might be a bug on us redaing the header that stores the location of the external request [16:54:10] 10Analytics, 10Analytics-Cluster, 07Epic: Epic: qchris transition - https://phabricator.wikimedia.org/T86135#2887284 (10Nuria) [16:54:13] 10Analytics: Hand off of Christian's MaxMind geolocation databases repository - https://phabricator.wikimedia.org/T89453#1036629 (10Nuria) 05Open>03Resolved [16:54:50] 10Analytics: Hand off of Christian's MaxMind geolocation databases repository - https://phabricator.wikimedia.org/T89453#1036629 (10Nuria) 05Resolved>03Open [16:54:52] 10Analytics, 10Analytics-Cluster, 07Epic: Epic: qchris transition - https://phabricator.wikimedia.org/T86135#962018 (10Nuria) [16:55:12] 10Analytics: Dashboard Directory- Hay collaboration - https://phabricator.wikimedia.org/T95172#2887292 (10Nuria) 05Open>03Resolved [16:56:20] 10Analytics, 10Analytics-Wikistats: Design new UI for Wikistats 2.0 - https://phabricator.wikimedia.org/T140000#2887297 (10Milimetric) This is interesting context: T95172, if anyone's watching that task this is the place to engage. [17:03:00] 10Analytics: Put data needed for edits metrics through Event Bus into HDFS - https://phabricator.wikimedia.org/T131782#2178434 (10Nuria) [17:09:42] 06Analytics-Kanban: Add Analytics-Wikistats 2.0 phab project tag - https://phabricator.wikimedia.org/T146043#2887331 (10Nuria) a:05Milimetric>03Nuria [17:24:45] 10Analytics, 10RESTBase, 06Services, 15User-mobrovac: configure RESTBase pageview proxy to Analytics' cluster on wiki-specific domains - https://phabricator.wikimedia.org/T119094#2887410 (10Nuria) @GWicke: What is the value proposition of this work? Consistancy? Doesn't seems like it would value from a per... [17:30:22] ottomata: anything to add in the Analytics section? [17:30:31] milimetric: Have the queries run better? [17:31:12] AndyRussG|away: Heya, still 2 pysaprk hanging in cluster ;) [17:33:02] joal: it's finishing the last one now!! :) [17:33:28] Ah ! [17:46:23] 06Analytics-Kanban, 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-Elukey: Requesting access to Analytics production shell for Francisco Dans - https://phabricator.wikimedia.org/T153303#2875954 (10RobH) Please note the groups requested: researchers statistics-privatedata-users... [17:59:05] 10Analytics, 10Pageviews-API, 06Reading-analysis: Skewed pageviews for Azerbaijani and Bulgarian Wikipedias, September, October and November 2016 - https://phabricator.wikimedia.org/T153699#2887617 (10Amire80) [18:02:18] milimetric can you add me again [18:02:18] fdans: I am merging your credentials [18:02:20] to batcave [18:02:42] ashgrigas, I'll do it [18:02:42] fdans: do you have your ssh config ready for prod access by any chance? [18:03:29] 10Analytics: Get rid of Pageview-API -> Analytics auto-tagging - https://phabricator.wikimedia.org/T146042#2648650 (10greg) That is done by this herald rule: https://phabricator.wikimedia.org/H126 It was requested in {T124086} I can disable that rule if you want. [18:03:29] fdans, do you want to hangout to talk about Wikistats2.0? [18:03:33] elukey: I created a second key for it, do you eed the public key? [18:03:53] mforns I have a 1on1 with nuria now, sorry! [18:03:57] fdans: it was the one that you put in the task right? [18:04:08] elukey that's right :) [18:04:14] ok :D [18:04:30] but your ssh config would need to be updated [18:04:37] like you did for the labs key [18:04:41] fdans, np, if we're still in the cave after your meeting, you can come! [18:05:07] ah snap I saw only now that you are in a meeting [18:05:55] fdans: whenever you have finished, if you have time try to ssh to stat1002 [18:07:14] 10Analytics, 10Pageviews-API, 06Reading-analysis: Skewed pageviews for Azerbaijani and Bulgarian Wikipedias, September, October and November 2016 - https://phabricator.wikimedia.org/T153699#2887652 (10Amire80) [18:07:44] (your username is now pushed on stat100[234], the rest will be done by puppet during the next half an hour more or less) [18:27:32] I am going afk, let's chat tomorrow morning if you are around! [18:27:41] * elukey afk! [18:28:53] latesr! [18:29:13] fdans: if you need help wtih access stuff in the next few hours, i'll be around [18:45:05] joal: ah hey sorry... It's just from one ipython session I left open in a tmux... Does it use up resources even when it's not doing anything? It's not a problem to shut it down more, just a very slight convenience for me to leave it open [18:45:34] AndyRussG: When you leave it open it takes slots of ressources on the cluster [18:45:51] AndyRussG: It's not a lot, but it's not nothing (compared to a process not doing anything) [18:46:37] AndyRussG: Best would be for you to configure your shell to run in dynamic allocation mode: you'd get faster result when needed, and release resources [18:49:24] AndyRussG: Do you want us to spent 5 minutes looking at that? [18:52:05] ottomata elukey sorry guys, was batcavin' I'm going to try and ssh to stat1002 :) [18:52:09] joal: hey yeah that'd be great! about 5 minutes from now OK? thx!!! [18:52:50] 10Analytics-Wikistats, 06Editing-Analysis: Update active editor metrics to use consensus definition - https://phabricator.wikimedia.org/T153702#2887751 (10Neil_P._Quinn_WMF) [18:58:31] 10Analytics-Wikistats, 06Editing-Analysis: Update active editor metrics to use consensus definition - https://phabricator.wikimedia.org/T153702#2887781 (10Nuria) Let's please update definition here too: https://meta.wikimedia.org/wiki/Research:Standard_metrics cc @Milimetric , @JAllemandou , @mforns [18:58:31] joal: K anytime now... BTW somehow my tmux session is gone, can't reattach, but the pyspark shell processes are still there :( [18:58:58] AndyRussG: I'm gonna kill it within hadoop :) [18:59:12] Can you paste the command you use for launching your notebooks? [18:59:15] AndyRussG: --^ [18:59:44] yep [18:59:45] joal: deploying cluster in a bit, is that ok? [19:00:22] Good for me nuria, will be gone for diner soon [19:00:34] nuria: recall the jobs to restart? [19:00:50] joaL; yes, pageview hourly and webrequest [19:01:09] export IPYTHON_OPTS="notebook --pylab inline --port 8123 --ip='*' --no-browser" [19:01:20] pyspark --master yarn --deploy-mode client --num-executors 2 --executor-memory 2g --executor-cores 2 [19:01:29] as per https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark#Spark_and_Ipython [19:01:31] joal: ^ [19:01:36] joal: it's done, checking results now and transforming, making dashboards [19:01:49] milimetric: Hurray ! [19:01:56] 06Analytics-Kanban, 06Discovery, 06Discovery-Analysis (Current work), 03Interactive-Sprint, 13Patch-For-Review: Add Maps tile usage counts as a Data Cube in Pivot - https://phabricator.wikimedia.org/T151832#2887804 (10Nuria) Need to work on ops staff thsi week , moving this to paused. [19:02:18] AndyRussG: Ok, let's try differently (and update the docs :) [19:03:41] K u bet! [19:03:44] thx much :) [19:04:20] joal: summary of check: 738 wikis have daily metrics, a few wikis don't have newly_registered users (and therefore no new editors or surviving), and only 360 wikis have any bot stats [19:04:35] sounds good enough to me to sync to thorium and display in a dashboard [19:04:44] milimetric: That's awesome :) [19:04:56] AndyRussG: Arf, wifey calling for diner ! [19:05:06] AndyRussG: Will update the docs after diner and ping you :) [19:05:27] joal: no worries, I'm around for quite a while [19:05:29] AndyRussG: Command for starting pyspark will change, that's all [19:05:34] K [19:05:36] later ! [19:05:40] cyas [19:05:44] awesome milimetric :] [19:13:07] milimetric: are there any jobs we need to stop when deployed the tsv fhen shui? [19:13:07] * feng [19:13:08] nuria: nope, the bundle that was running them has been killed and shouldn't be restarted ever [19:13:09] I'm not sure what happens to it when we deploy and it's not even there [19:13:09] I'm guessing you can't start it anymore [19:13:10] milimetric: but job would need to be deleted no? [19:13:10] milimetric: otherwise it will try to recurr [19:13:10] nuria: the job was deleted from refinery [19:13:10] and killed in hue [19:13:10] so it shouldn't recur [19:13:10] I didn't kill just a coordinator or just a workflow, I killed the whole bundle [19:13:10] milimetric: right, but oozie still has it schedule to run , correct? [19:13:10] no [19:13:11] nuria: take a look at https://hue.wikimedia.org/oozie/list_oozie_bundles/ [19:13:16] the legacy_tsv one is killed there [19:13:21] and not shown as scheduled [19:13:24] milimetric: wait .. how not to? oozie doesn't check whether bundle has been killed once started and told to execute every n time, does it? [19:13:40] after you kill something, you have to explicitly schedule it again with oozie [19:13:49] milimetric: ahahah, you killed it before [19:14:35] ottomata: again, cannot deploy as there is no space on 1027...ains [19:14:56] ottomata all good, I got access to production \o/ [19:15:24] ! [19:16:34] nuria: i'm going to make a special /srv partition on analytics1027... [19:16:40] ottomata: k [19:19:58] 06Analytics-Kanban, 06Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2887873 (10mforns) This issue regarding webrequest processing happened at the same exact time. Doesn't seem related with the Mobile Safari thing though, but listing it... [19:26:13] ottomata: let me know when i can try to deploy again [19:29:42] 10Analytics, 10Analytics-Cluster, 06Operations, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2887911 (10DarTar) Note that @ellery is currently OoO and will respond when he's back. He's the primary driver for this request although other team me... [19:30:14] a bit, i'm moving stuff from the old srv to the new mount [19:31:39] (03CR) 10Nuria: [C: 032] Lucene Stemmer UDF (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [19:32:16] (03CR) 10Nuria: [] "Sorry, we need to correct 1 thing before the +2, will ping on irc." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [19:35:24] nuria: go ahead [19:35:34] ottomata: k, retrying [19:36:52] ottomata: ahemm.. worked but too fast maybe? let me make sure code is there [19:39:06] (03CR) 10EBernhardson: [] Lucene Stemmer UDF (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [19:42:20] ottomata: on nuria@analytics1027:/srv/deployment/analytics/refinery$ [19:42:29] ottomata: doesn't look like code ws deployed [19:43:40] ottomata: things are there on 1002 [19:43:49] ottomata: so i think eployment did not worked [19:44:01] nuria: can I try deploy? [19:44:09] ottomata: please [19:44:42] ebernhardson: boy this thing with initialize and configure and spark and hive is sure a pain, eh? [19:45:12] ebernhardson: i remember have gotten that kryo error but only on spark correct? [19:45:19] (03CR) 10EBernhardson: [] Lucene Stemmer UDF (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [19:45:37] nuria: looks like it worked for me [19:45:38] oooh, mforns, are the "Editor Engagement" metrics that were recently added to Dashiki:CategorizedMetrics the ones that you and Helen were working on? [19:45:48] milimetric, yes [19:45:55] :) quick brainbounce in cave? [19:45:59] one sec - bathroom [19:45:59] sure [19:46:07] no no no no, ok :] [19:46:15] ebernhardson: I take my comment back what you did of not trying to serialize lucene is right [19:46:32] ebernhardson: we need to fix teh multiple initialization in configure and initialize [19:46:49] ottomata: k lemme see [19:47:08] mforns: ok, in cave [19:47:27] ottomata: indeed, scap must have a per user cache [19:47:40] nuria: i think can drop the configure() method and that will be handled? [19:48:00] ebernhardson: and do things work for you in spark? [19:48:07] nuria: i'm going to compile and find out :) [19:48:23] ebernhardson: i saw that method in *some* of our UDFs [19:48:42] ebernhardson: which already looks a bit fishy , let me see if there are any docs [19:49:06] ebernhardson: no [19:49:14] ebernhardson: ahem.. no docs [19:49:32] AndyRussG: Updated the docs: with the new command I pasted you'll have dynamic allocation up to 32 workers [19:49:39] configure docs say "Additionally setup GenericUDF with MapredContext before initializing. This is only called in runtime of MapRedTask." [19:49:42] joal: let me know when you have a sec [19:49:54] here nuria, how may I help? [19:49:55] so i suppose if spark doesn't have a MapRedTask, because it's something different, that makes sense [19:50:25] joal: working with mr ebernhardson we run into this issue that udf that works on hive it no works on spark [19:50:39] joal: unless we add an additional configure method [19:50:48] cc ebernhardson [19:50:59] joal: basically it creates some objects from org.apache.lucene that hive serialization fails to handle properly [19:51:22] joal: i initialy tried to move the initializing of that into the configure(MapredContext ctx) method, but it seems that is never called from spark [19:51:42] since the configure() is run on the final nodes, whereas initialize() is run on the source client [19:52:01] ebernhardson: unfortunately spark and hive have different behaviors even if using same language and answering same functions [19:52:22] ebernhardson: I'm not surprised some initi method defined in hive are not in spark [19:52:35] nuria, do you remember when HTTPS was activated for all wikis? [19:52:49] mforns: when? no, it will be on sal log though [19:52:57] ok, will look [19:52:58] mforns: and on meta [19:53:05] joal: ok, does it seem reasonable to you then to implement KryoSerializable, and basically not write out the lucene stuff to serialization, and re-initialize it on read? [19:53:17] but ebernhardson, some internet reading tells me it still should wor :( [19:53:19] nuria, looked for it in wikitech, lo luck [19:53:24] joal: hmm :S [19:53:34] mforns: https://meta.wikimedia.org/wiki/Research:HTTPS_Transition_and_Article_Censorship [19:53:47] ebernhardson: I'd rather go for initialisation check at usage and transient objects [19:54:53] joal: hmm, well i can certainly check if the class property is null and initialize. Is there a guarantee that evaluate() will never be called in parallel? [19:54:58] ebernhardson: scala deals with that using 'lazy val' --> only initialised at usage, therefore in executors, but in java I think going for explicit init check is needed [19:55:32] mforns: june last year: https://blog.wikimedia.org/2015/06/12/securing-wikimedia-sites-with-https/ [19:55:50] ebernhardson: this is a formal garanty in hive (parallelisation is handled through mapreduce), and in spark as long as you don't use multiple cores per executior [19:56:00] nuria, yes, but this looks like it was the beginning of a long process [19:56:02] joal: but that leaves us managing threads rather than using initialize method right? [19:56:14] mforns: it completed in weeks, not months though [19:56:24] nuria, I see [19:56:26] nuria: no thread thing, just a trnasient object that you check and initialise if null [19:56:59] ebernhardson: do you have a jar with your udf defined? [19:57:18] joal: wait, but we would do that as part of "evaluate" method? that seems that would incur in the object being initialized multiple times [19:57:33] joal: thus the instance of initialize method correct? [19:57:42] nuria: once per mapper or reducer, which ios what you'd expect [19:57:44] joal: at least that is teh paradigme we have used before [19:57:54] *the [19:58:27] joal: stat1002.eqiad.wmnet:/home/ebernhardson/refinery-hive-0.0.39-SNAPSHOT.jar, that's the one that has a few extra hacks (setup in initialize(), configure() and kryo-desserialization) that duplicates work but doesn't fail [19:58:36] doesn't appear to fail anyways [19:58:53] hm [19:59:07] joal: evaluate is called once per row though, rightttt? not once per mapper [19:59:12] do you have one with initialize only? [19:59:17] nuria: i think it ust didn't work that one time? dunno :) [19:59:25] joal: i can compile one fairly quickly, sec [19:59:29] thanks [20:00:27] nuria: initialisation in UDFs is called only once, but is only supposed to work on objectInspector [20:00:58] joal: right, we changed udf so that would be the case, object inspector all arround [20:01:01] nuria: configure is called per map or reduce, therefore it's in there you should initialise the object [20:01:05] ebernhardson: --^ [20:01:31] ebernhardson: so, finally: no init in init, but in configure, and no need for kryo (normally) [20:01:33] joal: right, and that worked in hive, but then i loaded the jar into spark and it NPE'd due to being uninitialized [20:01:37] joal: ah i see, i missunderstood, i though you mean "evaluate', right [20:01:57] nuria: for spark, we might have to go through evaluate check [20:02:12] this will probably be used through hive, but i was trying in spark as well to cover my bases [20:02:18] joal: boy ... if so we should document that throughly [20:02:19] if spark doesn't handle the configure correctly, we'll have to do it on our own [20:02:25] joal: i will do that [20:02:35] joal: the docs i mean [20:02:45] let's test first :) [20:02:48] ebernhardson: i can update code with results of your tests [20:02:59] joal: compiled into stat1002.eqiad.wmnet:/home/ebernhardson/refinery-hive-init-only-0.0.39-SNAPSHOT.jar [20:03:10] joal: but most udfs use the initialize method, i wonder if we have not used those in spark before [20:04:29] nuria: we have, but almost none of them uses objects - one of the only I can think of is GeocodedData, and it uses configure to set up its internal object [20:04:46] joal: ahhhhhh yes, indeed [20:04:53] ebernhardson: triple checking: your jar has initialisation in configure? [20:05:16] joal: doh, no this is in initialize() [20:05:27] ebernhardson: sorry, would you mind? [20:05:31] sure, sec [20:07:54] joal: replaced, same file [20:08:12] ebernhardson: great - one last thing: an example of usage? [20:08:56] joal: create temporary function stemmer as 'org.wikimedia.analytics.refinery.hive.StemmerUDF'; select stemmer('this is a stemming test', 'en'); [20:09:41] hm ... ebernhardson --> This version NPE on hive:( [20:10:23] (03PS1) 10Milimetric: Updating Standard Metrics Beta [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/328216 [20:10:36] (03CR) 10Milimetric: [V: 032 C: 032] Updating Standard Metrics Beta [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/328216 (owner: 10Milimetric) [20:11:14] !log deployed analytics/refinery to cluster (2nd try) [20:11:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:11:21] joal: hmm, looking [20:14:30] ebernhardson: did your stemmer exist before ? [20:14:53] cause when I launch hive I have: "Registering function english_stemmer org.wikimedia.analytics.refinery.hive.Stemmer" [20:15:04] even without doing anything [20:15:22] joal: hmm, that was an initial test, when it only supported a single language. must have forgot to add temporary to my add function call at some point [20:18:24] oddly it doesn't seem to want to let me drop function, it claims to work but doesn't [20:20:13] ebernhardson: because it doesn't find the initial class (renamed with UDF at the end) [20:21:22] compiled up my PS7, which worked last night, sec [20:21:56] NPE :S how odd [20:22:05] ottomata: ahem.. in order NOT to screw up restarts this time (for a change) [20:22:21] ottomata: this is the last sucessful pageview hourly job: https://hue.wikimedia.org/oozie/list_oozie_workflow/0040150-161121120201437-oozie-oozi-W/ [20:23:23] ottomata: sorry coordinator: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0014015-161020124223818-oozie-oozi-C/ [20:23:52] ottomata: I kill and restart at 18:00? correct? [20:24:53] looking [20:25:11] nuria: are you just restarting pageview jobs? [20:26:25] ACK! [20:26:32] i forgot to reenable puppet and camus on an27 [20:26:40] joal: i can't see why this is NPE :S it's because this.defaultAnalyzer is null, but from the code (and i ran a mvn clean to make sure nothing funny was happening) that can't possibly be null if configure() was run: https://gerrit.wikimedia.org/r/#/c/326168/7/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/StemmerUDF.java [20:26:43] i betcha we are going to get SLA oozie email alerts [20:27:34] ebernhardson: please try with getting data from a table: select stemmer(uri_host, 'en') from wmf.webrequest where webrequest_source = 'misc' and year = 2016 and month = 12 and day = 18 and hour = 18 limit 100; [20:27:47] ebernhardson: when input is constant, hive behaves bizzare [20:29:55] 10Analytics, 10RESTBase, 06Services, 15User-mobrovac: configure RESTBase pageview proxy to Analytics' cluster on wiki-specific domains - https://phabricator.wikimedia.org/T119094#2888146 (10GWicke) >>! In T119094#2887410, @Nuria wrote: > @GWicke: What is the value proposition of this work? Consistancy? Doe... [20:29:59] joal: ahh, ok i think i know what you are getting at. I ran a query that forces it to start a map-red instead of running on the client [20:30:06] now waiting [20:30:16] you got my point ebernhardson :) [20:30:18] thanks :) [20:30:53] yea no npe running as a map-red, so it wasn't a spark specific problem it was probably just spark running it in the client [20:31:09] ebernhardson: what that means for us is we need to intialise both in init and configure, checking for null at init not to do it twice [20:31:11] milimetric mforns fdans updated the spreadsheet with a summary stats tab https://docs.google.com/spreadsheets/d/1M5m6z7YWSqxB3rqQstxPVBK-J5s70p1XMrTN7gYetC8/edit?usp=sharing [20:31:34] please add any i missed by weds if possible and we can review with nuria [20:31:45] thanks ashgrigas, will do [20:31:50] thx ashgrigas ! [20:31:52] thank you ashgrigas !! [20:32:06] joal: well, the thing is if i set it up in initialize, then serialization fails. So i end up needing to implement kryo read() and write() methods to not serialize the lucene stuff. sounds ok? [20:32:28] ebernhardson: hm, in spark only, right? [20:32:58] ebernhardson: we don't want to try to serialize this thing [20:33:26] joal: hive was the one that turned up the serialization problem [20:33:39] seems it initializes on the client, then serializes the udf to send to the mapper/reducer [20:33:41] ebernhardson: let's make the object transient and not final (for trnasience), and make sure it gets initalised with and without mapreduce [20:33:45] makes sense? [20:33:46] ok that seems reasonable [20:33:58] cool :) [20:40:27] joaL: i do not understand what you mean with transient... [20:40:51] nuria: transient means not serialized when the parent is serializable [20:41:31] joal: ah sorry i though it was something completely different .. !!! [20:41:34] nuria: are you looking to just restart pageview oozie jobs? [20:41:46] ottomata: i was going to restart those yest [20:41:48] *yes [20:43:17] ok, but not webrequest etc. ones? [20:43:22] then yeah, i think that is the right time [20:44:43] ottomata: yes, webrequest too [20:47:07] oh [20:47:11] they may have diffferent times [20:47:46] all 4 webrequest coords need to be started from 19:00 [20:47:47] atm [20:47:53] ottomata: right those i have not touched [20:48:19] joal: houston there is a problem: "Error: E0701 : E0701: XML schema error, The prefix "sla" for element "sla:info" is not bound" [20:48:36] nuria: checking [20:48:59] my bas nuria [20:49:02] my bad nuria [20:49:15] nuria: how do we handle? live correct? patch ? [20:49:16] joal:Np, let's just work on fixing it [20:49:25] joal: ya , submit a patch [20:49:39] my bad TOO cause I CR_ed [20:51:32] joal: are we missing this " xmlns:sla="uri:oozie:sla:0.1"" [20:51:43] correct, 0.2 instead of 1 [20:52:11] joal: ah, k, easy to fix [20:52:22] Joal: want me to submit and you can CR? [20:52:34] nuria: please, I'm fighting with git :) [20:52:41] joal: k, np [20:56:13] (03PS1) 10Nuria: Adding namespace for sla tags [analytics/refinery] - 10https://gerrit.wikimedia.org/r/328227 (https://phabricator.wikimedia.org/T152109) [20:56:48] (03CR) 10Joal: [V: 032 C: 032] "Thanks nuria ! Merged" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/328227 (https://phabricator.wikimedia.org/T152109) (owner: 10Nuria) [20:56:49] joal: i love how RETRo this error is: https://gerrit.wikimedia.org/r/#/c/328227/1/oozie/pageview/hourly/coordinator.xml [20:57:00] joal: back to the ninetees [20:57:10] nuria: don't even mention that ;) [20:57:29] nuria: merged [20:57:48] joal: k, deploying again [20:58:45] (03PS9) 10EBernhardson: Lucene Stemmer UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) [21:09:32] ebernhardson: any chance? [21:11:30] joal: restarted pageview jobs with success (i think) https://hue.wikimedia.org/oozie/list_oozie_coordinator/0040252-161121120201437-oozie-oozi-C/ [21:11:38] joal: yea it appears to work in hive and spark now [21:11:45] ebernhardson: SUCCESS ! [21:11:54] today i learned what transient does in java ;) [21:11:57] nuria: SUCCESS AS WELL ! [21:12:26] ebernhardson: but the null check for reinitilization is missing though, right? i can submit that in a bit [21:12:36] ebernhardson: as I said, in scala it's a bit easier since you can define your object to be lazy loaded :) [21:12:56] nuria: oh, right configure() should do a null check to make sure it doesn't duplicate the owrk [21:12:58] anyway, going to bed now :) [21:13:01] joal: thanks! [21:13:06] no prob ebernhardson :) [21:13:11] pleasure to help [21:13:18] ebernhardson: right [21:13:29] sorry for the duplicate deploy nuria, beer next january :) [21:14:14] joal:np, let's make note of always dry run xml and that's it but really, no big deal [21:14:59] nuria: ahem, I **sometimes** test my code [21:15:29] joal: great to hear..... [21:15:50] joal: amd i restarting all webrequest partitions? [21:15:56] text , upload... [21:15:58] nuria: bundle [21:16:45] joal: but wait (we can talk about this tomorrow) marcel's changes only apply to webrequst statistics [21:17:03] joal: https://gerrit.wikimedia.org/r/#/c/321676/ [21:17:08] nuria: this is applied on the bundle managed job - so we restart the bundle :) [21:17:34] joal: k, this one right? https://hue.wikimedia.org/oozie/list_oozie_bundle/0011555-161121120201437-oozie-oozi-B [21:18:06] nuria: correct ! be carefull with times ;) [21:18:30] joal: will restart at 20:00 [21:19:05] nope nuria, you should restart at 19 [21:19:28] joal: k, thank you [21:19:49] nuria: same as last time - small partitions are waiting on hour 20, but big ones have not finished 19 - so 19 is to rerun [21:19:59] np nuria :) [21:20:03] joal: let me add that to docs [21:21:02] nuria: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie/Administration#When_.properties_file_has_changed [21:21:07] first of the list [21:21:39] gone for real :) [21:21:47] joal: ya but i think we need a more pointed example, let me add one ciaooo [21:27:09] ebernhardson: should i submit change and we do last round of testing? [21:27:37] nuria: sounds good [21:27:41] !log deployed analytics refinery, restarted webrequest load and pageview_hourly jobs [21:27:44] ebernhardson: k [21:28:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:40:42] (03PS10) 10Nuria: Lucene Stemmer UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [21:44:06] (03CR) 10Nuria: [] Lucene Stemmer UDF (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [21:44:41] (03PS11) 10Nuria: Lucene Stemmer UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [21:45:37] ebernhardson: pushed, thanks for your patience [21:49:45] nuria: thanks for the help! trying now [21:53:18] nuria: seems to work everywhere [22:20:02] 06Analytics-Kanban: Create 1-off tsv files that dashiki would source with standard metrics from datalake - https://phabricator.wikimedia.org/T152034#2836112 (10Milimetric) https://analytics.wikimedia.org/dashboards/standard-metrics/ [22:59:00] (03CR) 10EBernhardson: [C: 031] "tested in spark and hive, looks to work as expected" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson)