[00:17:55] (CR) Ottomata: "So easy, I like :)" (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215628 (owner: Joal) [12:06:30] joal: morning! [12:07:19] I enjoyed playing with Scala but I should probably ask for the right way to do this :) [13:15:50] Hi milimetric [13:16:46] There is no "best way :)" [13:16:57] I'll tell you mine, and let you judge [13:17:30] halfak: Hi [13:17:42] halfak: Dumped 5T out of my folder [13:21:14] o/ joal [13:21:22] Woot Thanks. I'm not sure how much I'm using. [13:21:26] I'll need to check. [13:21:47] Hm, I think about 13T if I don't mistake :) [13:21:57] Woops. [13:22:06] huhhu :) [13:22:10] There must be some things in there for me to clean up :) [13:22:15] But you handle raw data, so it's fair ;) [13:29:24] halfak, so, I'm presenting a thing today I think you'll really like [13:29:26] * Ironholds schemes [13:29:47] it's a way to add some humanity to the algorithmic approach to recommender systems that Bob's working on [13:29:55] (that's not it's primary goal, but it's a potential extension) [13:30:19] Ironholds: I love the way you think about algorithms governing our lifes: with some humanity :) [13:30:55] joal, every interview I have done this week, the candidate has asked "what sort of person do you think would excel in this role?" [13:31:12] and the two things I have consistently said are "the ability to adapt" and "the ability to remember there are humans behind the numbers" [13:31:25] :) [13:31:36] * joal is the numbers [13:44:58] Ironholds, oooo. Is this coming to the Research Group meeting? [13:47:20] Hey milimetric [13:47:25] Still in IRC trouble ? [13:47:45] halfak, it is! [13:48:04] and I'm using the design language you came up with as part of session reconstruction to describe bits of it! [13:48:50] Woot! [15:01:51] joal: wanna hang out in the batcave after a break? [15:01:58] I wanna debrief and also go over scala stuff [15:02:03] dr0ptp4kt: sure :) [15:02:14] oops milimetric : sure ! [15:04:45] OMW ! [15:05:44] milimetric: turn the sound on ;) [15:38:41] hello joal and milimetric. joal, i know you meant milimetric. but good tidings to you both. [15:39:00] dr0ptp4kt: stand up time, now I get back to you when I have a minute :) [15:42:45] good tidings, sir, great! tidings [15:44:52] milimetric: and you already have access to this box, woo! [15:44:55] analytics1004.eqiad.wmnet [15:45:47] halfak, update; got delayed on my slides by kicking off a small revolution [15:46:30] I am Okay with this [15:48:17] ottomata: so would i just setup EL manually there? [15:48:25] or should I do some puppet thing? [15:48:28] yes, in your home dir if possible [15:48:32] oh, ok [15:48:37] do you need mysql? [15:48:40] server? [15:48:45] mmm... don't think so [15:48:47] ok good [15:48:59] then ya, i think you can just checkout eventlogging in your home dir and manually start services and do your thang [15:49:13] well, at least it wouldn't be a very good test 'cause the performance of that insertion depends on the actual server, so we'd need to load test the actual prod server [15:49:21] cool, will do [15:49:24] ? [15:49:30] wait, without puppet there won't be any "services" [15:49:40] no, you can start via the cli [15:49:44] so you mean just start the stuff in bin/ [15:50:38] yes [16:12:18] joal: so I got a "Task not serializable" error [16:13:02] this seems like one of those things that'll be much easier if you tell me what I did wrong :) [16:13:26] I tried: [16:13:26] val tops = counts.rdd.map( [16:13:26] r => (r.getString(0), (r.getString(1), r.getLong(2))) [16:13:26] ).aggregateByKey(Vector.empty[(String, Long)])(topMap, topRed) [16:14:56] joal: workplace/refinery-source/ has the jars [16:15:02] brb, lunch [16:21:29] (PS1) KartikMistry: Add languages to deploy on 20150604 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/215930 [16:25:10] (CR) KartikMistry: [C: 2] Add languages to deploy on 20150604 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/215930 (owner: KartikMistry) [16:25:16] (Merged) jenkins-bot: Add languages to deploy on 20150604 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/215930 (owner: KartikMistry) [16:27:35] milimetric: Have you put your code in a class ? [16:32:47] joal, no, just defined those methods and used them in the chain of transformations like you did [16:32:56] is it supposed to be in a class? [16:37:01] milimetric: nope [16:37:05] That's weird :( [16:37:37] Maybe because of the definition of the previous class [16:37:38] maybe after the staff meeting we can hangout and screenshare [16:37:43] Surew :) [16:37:50] the TopsByKey thing? [16:37:54] I think maybe restarting the shell could help [16:38:03] ok, cool [16:38:14] or clean context, but I don't know how to do that [16:46:01] joal: your pageview job is spark, right? [16:50:35] ottomata: correct [16:50:42] for the moment at leawst [16:51:09] ottomata: why ? [16:51:19] And actually, I have two jobs running now [16:51:47] ottomata: I will change the second one to use the data from the first [16:55:39] ottomata, madhuvishy : something that really speeds up the task processing is data locality (as expected) [16:56:11] Having a number of workers of at least the number of nodes makes sense for very big jobs (monthly ones) [17:14:09] ottomata, you around? [17:19:21] meeting! [17:19:24] but yes after [17:24:01] ottomata, okie, cool :). Gotta bug. [17:29:23] Ironholds: waassuuup [17:29:24] ? [17:29:36] dr0ptp4kt: Heya [17:29:40] Got some time now [17:29:43] ottomata, okay! So [17:29:55] building my next UDfs [17:29:57] mvn test [17:29:58] [ERROR] Plugin org.apache.maven.plugins:maven-enforcer-plugin:1.0 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-enforcer-plugin:jar:1.0: Could not transfer artifact org.apache.maven.plugins:maven-enforcer-plugin:pom:1.0 from/to wmf-mirrored (https://archiva.wikimedia.org/repository/mirrored/): peer not authenticated -> [Help 1] [17:30:07] * Ironholds jazz hands [17:32:10] ottomata: Big difference between april and may webrequests ! [17:32:58] joal: oh? [17:33:19] 18th, hour 19 --> +37% [17:33:35] Really small misc number [17:33:42] in mayb [17:34:16] mobile and upload + 20%, text + 270% !!!!! [17:34:34] Ironholds: delete your ~/.m2/https://archiva.wikimedia.org/# [17:34:36] ACK [17:34:46] ~/.m2/repository/org/apache/maven/plugins [17:34:48] rm that [17:34:49] and try again [17:35:40] bits? [17:35:55] 4% [17:35:59] ottomata, same problem :( [17:36:20] 4% increase in bits and 270% increase in text?! [17:36:26] everything increase? [17:36:27] hm [17:36:33] really? hm. [17:36:34] nope, bits: decrease 96% [17:36:37] AH [17:36:45] that is more interesting! [17:36:48] But xstill, doesn't cover for the change ! [17:36:50] yeha [17:38:00] Ironholds: delte it again and try now [17:38:02] oh [17:38:06] peer not authenticated... [17:38:13] try again i'm watching logs [17:39:58] ottomata, okay, trying again [17:40:06] ottomata, done [17:40:11] same fail. Anything server-side? [17:40:42] hmmm, you didn't hit archiva...looks like your maven is unhappy about ssl or somethign [17:40:43] hmmm [17:42:18] joal: sorry I missed your previous messages. how many nodes do we have [17:42:19] 20G webrequest_source=bits/year=2015/month=4/day=20/hour=19 [17:42:19] 13G webrequest_source=text/year=2015/month=4/day=20/hour=19 [17:42:19] 448M webrequest_source=bits/year=2015/month=5/day=20/hour=19 [17:42:19] 41G webrequest_source=text/year=2015/month=5/day=20/hour=19 [17:42:28] madhuvishy: np [17:42:36] I think we have roughly 20 nodes [17:42:39] so bits+text in april: 33G [17:42:46] bits + text in may, 41.5 G [17:42:54] yup [17:42:57] ottomata: --^ [17:43:12] joal: right. i was trying with 40 workers and 2g memory for each. [17:43:16] So launching for a month with 32 workers or more is cool [17:43:24] yop [17:43:34] ottomata: Trying with another sample [17:43:36] so 25% increase total? [17:43:41] 37% [17:43:52] (41.5-33)/ 33 [17:43:52] ? [17:44:04] no, counting rows :) [17:44:06] oh [17:44:08] aye [17:44:40] Now getting 25th, hour 6 [17:45:00] hm, joal maybe lots of redirects, if bits is now gone? [17:45:05] probably lots of cached urls out there [17:45:57] roughly the same results with day 25th, hour 6 [17:46:10] +32% increase in row count global [17:46:18] Guys, need to go [17:46:36] k, thanks joal, am talking to brandon and or i about it [17:46:51] Ironholds: do you have a ~/.m2/settings.xml file [17:46:54] and, if so, what's in it? [17:47:45] See you all tomorrow [17:47:47] joal: did your 30 day job finish - if not do you have the application id - i'll watch it [17:48:20] madhuvishy: 30 days finished for datafiltering, currrently halfway over list generation [17:48:40] I'll keep it running, and will come back after diner for a short check :) [17:48:49] Thamks ottomata [17:48:54] joal: okay :) dont worry about it. good night! :) [17:49:02] dr0ptp4kt: Sorry, we missed each other today [17:49:09] Maybe tomorrow :) [17:49:10] Bye all [17:54:31] ottomata, I will check [17:54:53] ottomata, nope! [17:55:30] Ironholds: watching more logs, try again please [18:00:16] did you just do it Ironholds? [18:00:51] ottomata, nope? [18:01:06] doing it...nnnnow [18:01:13] done. [18:03:03] uhh, weird [18:03:25] Ironholds: just so I am sure this request is yours [18:03:27] what is your public IP right now? [18:03:36] https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&es_th=1&ie=UTF-8#q=what%20is%20my%20ip [18:04:07] ottomata, 174.62.175.82 according to our servers [18:04:10] I just used geoiplookup [18:04:44] joal|night: what's up? no worries, asynchronous communication and all that :) [18:06:40] cool ja [18:06:51] Ironholds: , hm, yeah the only request I see from you is [18:06:54] ==> /var/log/nginx/access.log <== [18:06:55] 174.62.175.82 - - [04/Jun/2015:18:01:09 +0000] "-" 400 0 "-" "- [18:07:03] hhuh [18:07:14] so it's..not getting LOGGED, or? [18:07:24] or your maven is being really weird [18:07:24] what settings /should/ my system have? Let's look at this the other way around [18:07:36] your settings.xml shouldn't have anything in it [18:07:38] I mean, I'm on 3.0.5 and I obliterated /.m2 in its entirety [18:07:40] it should all be in pom.xml [18:07:46] is the pom file okay? [18:07:50] wait. which pom? [18:07:54] project pom [18:07:57] in refinery-source [18:07:59] and it works for me fine [18:08:06] yeah, I'm just pulling source. hrn. [18:08:44] yep, https://archiva.wikimedia.org/repository/mirrored/ as the repo [18:09:35] what maven command are you running? [18:12:14] Ironholds: ^ [18:12:31] mvn test [18:18:05] ottomata, ^ [18:19:18] aye [18:19:20] um [18:21:29] Ironholds: [18:21:30] run this [18:21:30] https://gist.github.com/ottomata/64ddda1b251982c69b4b [18:21:48] oh sorry, wait [18:21:49] not yet [18:22:23] ok, try that. [18:22:25] https://gist.github.com/ottomata/64ddda1b251982c69b4b [18:22:26] now [18:22:55] ottomata, okie! [18:22:57] Analytics-Kanban, Need-volunteer: Top Articles ad-hoc Report for Wikipedia Zero [5 pts] - https://phabricator.wikimedia.org/T99083#1338335 (Milimetric) Submitted a spark shell job, this should be done in a few hours. I'm parking the code here since it was an ad-hoc job: ``` // Gets top 100 pageviews for... [18:23:12] lemme know what happens [18:23:19] ottomata, done, I think? [18:24:27] ok, try the maven thing again [18:24:28] then [18:25:27] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics: Give the option of using the same parameters for all reports for a given cohort {dove} [21 pts] - https://phabricator.wikimedia.org/T74117#1338361 (Milimetric) I trust your mockup, J, and we can iterate. [18:26:15] ottomata, lesse [18:26:41] ottomata, okay, didn't work, but looking into the gist paste; "verify error:num=20:unable to get local issuer certificate [18:26:41] " [18:26:47] and then two more cascading errors from that [18:27:36] ? [18:28:17] Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#1338373 (Huji) NEW [18:29:07] ottomata, what what? [18:32:10] Ironholds: when did you get that error? [18:32:20] from maven or from the keytool thing [18:32:20] ? [18:32:21] ottomata, grabbing the certificate [18:32:28] the keytool, yep [18:32:30] oh, the first command? [18:32:37] yep! [18:32:47] what's this show you? [18:32:47] openssl s_client -connect $HOST:443 HOST=archiva.wikimedia.org [18:33:41] Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#1338395 (Huji) [18:34:06] oof [18:34:10] Ironholds: ok, let's try this first [18:34:13] mvn test -Dmaven.wagon.http.ssl.insecure=true -Dmaven.wagon.http.ssl.allowall=true [18:34:17] just do that [18:34:53] that...appears to workish? [18:34:57] that looks better [18:34:59] or at least it's downloading things [18:35:00] Ung [18:35:05] you know, i thikn joal figured this out before [18:35:05] some ssl fuckery? [18:35:07] yea [18:35:10] + maven [18:35:14] * Ironholds hits maven [18:35:17] see, this is my problem with Java [18:35:19] i was looking for some docs about this, but i can't find them [18:35:24] emailing us about it [18:35:26] as a language? I actually quite like it. It's no C++ but it's pretty good. [18:35:32] but dear GODS the ECOSYSTEM [18:35:44] * Ironholds stares meanly at maven [18:37:02] ottomata, thank you! you just saved my bacon [18:37:18] Ironholds: , well, that is not the proper way to do it, hopefully joseph will have a better memory than me [18:39:58] ottomata, as long as I never forget this one thing it'll be fine :D [18:59:08] ottomata1, can I ask you something about https://phabricator.wikimedia.org/T99932 ? [19:02:46] milimetric, yt? [19:03:41] hey mforns [19:03:54] what's up [19:03:57] milimetric, hey, can I ask you something agout https://phabricator.wikimedia.org/T99932 ? [19:04:01] sure [19:04:04] app user agent parsing [19:05:05] so my doubt is, what exact info do we need to store in the user agent map, just the version number after the slash or the whole string after the slash until the end? [19:05:52] I was wondering that when I first saw it too [19:06:20] I guess to be consistent to the other stuff stored in the UA map, we should parse out everything we can [19:06:37] one sec, lemme check the notes on that field [19:06:47] milimetric, it seems as per http://etherpad.wikimedia.org/p/PV_Analytics_Infra that there is already some fields filled out in uamap for app requests [19:08:16] mforns: where in the etherpad does it say that? I don't remember being there when we went over it [19:08:38] milimetric, line 44 [19:08:50] it says os_family is there [19:09:27] milimetric, and line 42 suggests that the only infos missing are browser and version [19:09:50] er... none of this seems very clear to me at all [19:10:04] like, any of it could be interpreted two ways [19:10:36] This is not adequate documentation. We need to know two lists: [19:10:38] what's happening now [19:10:44] what do we want to happen after this patch [19:11:36] milimetric, agree [19:11:36] were you at this meeting where the etherpad was written? [19:11:37] who was? [19:11:37] milimetric, I wasn't, and don't know who was [19:12:03] milimetric, probably joal and ottomata? [19:12:09] and 2 more people [19:12:44] but milimetric, I can follow up with them, np [19:12:53] mforns: yeah, see if joseph can clear it up tomorrow [19:12:58] sure [19:13:12] are you blocked then? [19:13:24] I was just going to look at gerrit, kicked off my scala job [19:13:33] mforns: sorry, was talking in office, about to go to lucnh [19:13:45] milimetric, kind of, but I want to make some queries to hive to know more [19:13:55] ottomata1, np! [19:13:57] ok, cool [19:25:06] (PS9) Milimetric: Add stacked bars component to compare layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/214036 (https://phabricator.wikimedia.org/T91123) (owner: Mforns) [19:25:17] (CR) Milimetric: [C: 2 V: 2] Add stacked bars component to compare layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/214036 (https://phabricator.wikimedia.org/T91123) (owner: Mforns) [19:26:58] Analytics-Kanban, Analytics-Visualization: Set up vital-signs.wmflabs.org {musk} [8 pts] - https://phabricator.wikimedia.org/T95338#1338567 (Milimetric) [19:33:50] Quarry: Quarry should show the results in the way they were ordered - https://phabricator.wikimedia.org/T101396#1338583 (Umherirrender) [19:33:52] Quarry: Quarry does not respect ORDER BY sort order in result set - https://phabricator.wikimedia.org/T87829#1338584 (Umherirrender) [19:55:13] milimetric, after some queries, I think all 6 fields of the user agent map are already being populated correctly for mobile app [19:55:48] ooohk... [19:55:49] milimetric, this makes me think what they want is an extra field with the app version number, wich is what comes right after the slash in the ua [19:56:10] :) I would've just checked with them [19:56:14] I don't like guessing what people want [19:56:26] I like being a total pain until they explain themselves crystal clear :) [19:56:36] milimetric, yes you're right [19:56:53] milimetric, that is the right way to do it [19:58:32] can you have empty strings in csvs you pipe into JUnit tests? :/ [20:03:15] insufficient arguments my ass! [20:09:44] (PS1) OliverKeyes: Add Search-centric UDFs [WIP] [analytics/refinery/source] - https://gerrit.wikimedia.org/r/215964 [20:19:15] madhuvishy, yt? [20:37:06] mforns: yeah [20:37:13] madhuvishy, hi! [20:37:30] can you briefly explain how you test refinery code? [20:37:39] mforns: refinery source? [20:37:45] madhuvishy, yes [20:38:01] okay.. i compile it by doing mvn clean package [20:38:18] aha [20:38:23] what are you testing? [20:38:34] the mobile app user agent parsing stuff [20:38:47] I'm not testing yet, but will in short [20:38:52] my strategy for spark jobs is rsync after it compiles to the cluster on my home folder [20:38:56] and run spark-submit [20:39:04] madhuvishy, aha [20:40:08] madhuvishy, I guess java code will be different [20:40:19] mforns: yeah [20:40:40] madhuvishy, and do you have an idea how to run the unit tests? [20:40:58] mforns: mvn clean package usually runs all the tests [20:41:07] madhuvishy, oh cool! [20:41:18] if the tests fail - build will fail too [20:41:28] madhuvishy, ok, will try this :] [20:41:51] thank you! [20:42:05] mforns: alright. let me know if you need anything. no problem! [20:42:32] madhuvishy, sure [21:43:26] (CR) Madhuvishy: "It would be useful to put Bug: T.. in the commit message above the Change-Id :) It links the related task here, as well as adds a comment " (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/212542 (owner: Joal) [21:55:25] Analytics, operations: analytics1013 crashed, investigate... - https://phabricator.wikimedia.org/T97380#1339105 (Gage) Resolved>Open This machine crashed again. All the errors are on socket 0, so we should probably replace that DIMM. Furthermore I'd like to know if that socket is the one closest to... [21:55:34] Analytics-Cluster, Fundraising Sprint Kraftwerk, Fundraising Sprint Lou Reed, Fundraising Tech Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1339107 (AndyRussG) > In those cases, there are more requests in kafkatee than in udp2... [21:58:08] have a nice end of the day folks, see you tomorrow! [21:59:12] 'laterrs! [21:59:55] milimetric: you still doing eventlogging stuff? [23:31:12] Analytics-Kanban, Research-and-Data: Validate Uniques using Last Access cookie {bear} - https://phabricator.wikimedia.org/T101465#1339575 (ggellerman) NEW [23:31:56] Analytics-Kanban, Research-and-Data: Validate Uniques using Last Access cookie {bear} - https://phabricator.wikimedia.org/T101465#1339587 (kevinator) [23:31:57] Analytics-Cluster, Analytics-Kanban, Epic: {epic} WMF has UC report per project per month & day {bear} - https://phabricator.wikimedia.org/T88647#1339586 (kevinator) [23:43:28] ottomata: no EL right now, had to finish up the vital signs stuff [23:43:36] I want to get back to the load test maybe tomorrow [23:46:46] (PS1) Milimetric: Fix wikimetrics layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/216008 (https://phabricator.wikimedia.org/T95338) [23:47:23] (CR) Milimetric: [C: 2 V: 2] "self-merging, trying to set up vital-signs.wmflabs.org so I can be done with it :)" [analytics/dashiki] - https://gerrit.wikimedia.org/r/216008 (https://phabricator.wikimedia.org/T95338) (owner: Milimetric) [23:47:37] milimetric: i'm trying to make this work now :) [23:47:37] https://gerrit.wikimedia.org/r/#/c/215982/2/server/eventlogging/handlers.py [23:50:02] (PS1) Madhuvishy: [WIP] Add oozie job to schedule mobile app session metrics spark job. See also - https://gerrit.wikimedia.org/r/#/c/212573/ [analytics/refinery] - https://gerrit.wikimedia.org/r/216009 (https://phabricator.wikimedia.org/T97876)