[05:56:52] New review: Tim Starling; "(5 comments)" [analytics/log2udp2] (master) - https://gerrit.wikimedia.org/r/58449 [13:40:12] morning :) [13:41:30] morrrning! [13:41:49] milimetric, i never got anywhere with that stupid oozie error [13:41:57] sucks [13:41:59] i'll work more on that today (after I fix the accounts on the analytics machiens) [13:42:04] yeah, i banged on that for hours [13:42:09] sometimes it ran though! [13:42:15] oh man [13:42:18] how weird! [13:42:18] actually [13:42:22] ok, i'll switch over to user metrics stuff for now [13:42:23] i could get it to run consistently [13:42:25] but. [13:42:32] it wouldn't pick up the proper intputcs [13:42:33] inputs [13:42:40] ah that's too werid [13:42:46] only one file instead of the 8 it needed [13:42:53] yeah, later let's bounce about it [13:42:56] i'll brb, my cat is being hungry :) [13:42:57] maybe two brains will get it [13:43:00] k [13:43:01] ya [14:04:49] gotta change locations, be back in a bit. [15:02:51] New review: coren; "(6 comments)" [analytics/log2udp2] (master) - https://gerrit.wikimedia.org/r/58449 [15:40:45] milimetric: can you try logging into an02 as milimetric? [15:40:54] k [15:41:29] permission denied but one sec, lemme check my ssh config [15:42:20] yea, there's nothing about usernames in there, just the proxy command to go through an01 [15:42:46] so ssh analytics1002.eqiad.wmnet doesn't work and neither does ssh milimetric@analytics1002.eqiad.wmnet [15:43:00] try ssh milimetric@ again real quick [15:43:03] i'm watching logs [15:43:18] yep, doing now [15:43:36] hm [15:43:53] what about milimetric@stat1001.wikimedia.org? [15:44:15] worked [15:44:19] hmm ok [15:44:32] ottomata: can I have your oppinion on this ? https://github.com/wsdookadr/udp-filter-profiling/blob/master/README.md [15:45:31] hm [15:45:42] milimetric: do milimetric@an02 again, but with -v [15:45:43] ssh -v .. [15:45:48] and paste me output [15:46:21] milimetric: do you happen to have multiple ssh keys in ~/.ssh/ ? [15:46:28] cool! [15:46:42] gr [15:46:45] got throttled [15:46:51] i'll email you [15:47:13] no average, only one key [15:51:10] ok [15:57:46] we got it [16:08:17] mooooorning! [16:08:23] average, around? [16:08:35] drdee: yes [16:08:48] about udp-filter repo [16:08:52] yes [16:08:58] how much have the two branches diverged [16:09:21] well, a lot since we tried to pull the filter from webstatscollector in.. [16:09:49] I'll do a diff between the branches, moment [16:09:52] k [16:17:29] milimetric; morning [16:17:29] ottomata; morning [16:18:05] some new variables, some new/different tests, truncatio of urls to a fixed size to avoid problems, some indentation, some logic originating from webstatscollector filter(and consequently some new commandline switches) [16:18:43] + the fact that when we added all this logic, I remember that there was the problem of it being slower than initially [16:19:05] so we were getting more packet loss because of that [16:19:09] that's a short summary [16:19:32] morning drdeee [16:21:30] drdee: ^^ [16:22:49] let's talk with xyzram about this, but i think we need to merge back the field-param-delim branch without the 'filter' code back into master [16:23:29] yes [16:30:06] hihi [16:30:20] drdee: i have a thing with robla right now, but it might be short. [16:30:26] i'll poke you when done [16:30:33] aight [16:43:13] aiight drdee [16:43:22] are you in the office? [16:50:11] aiight. welp. grabbing a food then. [16:50:15] i am [16:50:22] (sorry, missed that message, drdee) [16:50:47] mind if i grab a food right quick before scrum? unless you have something pressing you want to talk about [16:51:37] go ahead! [16:51:39] I'm getting set up for scrum [16:51:48] ty [16:51:54] average: have the mobile reports been published? [16:52:36] drdee: they are published since last time we rsynced them manually [16:52:47] Erik said he's going to merge my script into his own [16:52:54] right, did that happen? [16:52:57] this patchset is not yet merged.. https://gerrit.wikimedia.org/r/#/c/59864/ [16:53:03] so.. :| [16:53:05] :( [16:53:15] but he did give feedback, right? [16:53:31] why do we need 3 blank.sh files? [16:53:59] because this is the only way to force git to version empty directories [16:54:04] to add a file in them [16:54:12] otherwise it doesn't want to [16:54:37] ok, but why do you need empty folders in git? [16:55:11] because we talked about having a script that is able to run wikistats without in-depth knowledge of it [16:55:17] and so I made a script [16:55:22] and it requires 3 directories [16:55:35] milimetric: i think it is working [16:55:36] to store csvs, and reports [16:55:56] i didn't change much, i did change the way the OUTPUT variable was being set, but I don't think this was the problem [16:56:02] but its working so I'm not going to tough it [16:56:10] i've got it submitted and running as the stats user [16:56:10] its chugging through [16:56:16] huh [16:56:18] so weird [16:56:32] what was the OUTPUT variable change? [16:58:07] i tried to set it manually based on nominal time rather than reyling on the output dataset value [16:58:08] hdfs://analytics1010.eqiad.wmnet:8020/wmf/public/webrequest/wikipedia-mobile-platform-daily/data/${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1, 'DAY'), 'yyyy/MM/dd')} [16:58:18] vs [16:58:19] ${coord:dataOut('OUTPUT')} [16:58:42] also, i noticed that the INPUT was only messed up for the first workflow [16:58:48] the following ones are fine' [16:58:54] so the data for 04-15 is incorrect [16:58:58] but from then on it should be good [16:59:18] oh ok, i agree that stuff shouldn't change anything [16:59:32] but maybe there's something fishy about coord:dataOut [16:59:33] ? [17:11:58] yeah maybe, but it works elsewhere [17:11:59] dunno. [17:24:13] drdee: https://mingle.corp.wikimedia.org/projects/analytics/cards/614 [17:32:04] milimetric: i'm available if you're working on oozie crap [17:32:09] i know how frustrating it can be [17:32:20] well, something seems to have been fixed [17:32:25] if you want a quick overview of what i do with my jobs, i'm down [17:32:45] what was the problem you were having? [17:32:51] this is the main config change that eliminated the problem: [17:32:52] (12:58:08 PM) ottomata: hdfs://analytics1010.eqiad.wmnet:8020/wmf/public/webrequest/wikipedia-mobile-platform-daily/data/${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1, 'DAY'), 'yyyy/MM/dd')} [17:32:52] (12:58:18 PM) ottomata: vs [17:32:52] (12:58:19 PM) ottomata: ${coord:dataOut('OUTPUT')} [17:33:04] --> gchat, to avoid spam [17:33:22] the second configuration led to a variable "OUTPUT" could not be resolved error [17:33:34] well, otto and I were working on it [17:33:40] and if you have any insight it might be useful for all [17:34:01] but gchat works if you prefer [17:34:53] yeah, don't know why that fixed it, and i'm not entirely sure that was the problem either [17:35:02] i kinda think that [17:35:04] maybe [17:35:16] OUTPUT as a path did not exist on the first workflow run [17:35:25] so when we tried to set it [17:35:28] oozie got all pissy [17:35:48] but it is ok with the timeformat stuff, because it doesn't check for a string's path existance [17:35:53] but might when you are referencing a dataset [17:35:55] really not sure though [17:36:23] oozie gives such unhelpful error messages [17:36:27] i'll take a look [17:36:44] i sort of have a checklist of stuff now [17:45:25] qq -- can anyone in here reach the cluster? [17:45:36] i get "connection closed" from everybody [17:45:48] actually, from an01 [17:45:48] hm [17:51:47] New patchset: Erik Zachte; "added comments" [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/60295 [17:57:49] milimetric: can you ssh to an01? [17:58:16] it appears not [17:58:20] hmmmm [17:58:21] it's just hanging [17:58:34] it look slike it worked for you... [17:58:35] really? [17:58:37] but is it because i'm using the proxy command through an01 [17:58:40] Apr 22 17:58:02 analytics1001 sshd[3777]: pam_unix(sshd:session): session opened for user milimetric by (uid=0) [17:58:52] yeah, maybe it's in some infinite loop? [17:58:54] you are proxying to an01 through an01? [17:58:54] haha [17:58:55] yeah [17:59:01] an?? [17:59:03] :) [17:59:19] heh. gotta have ProxyCommand None for that host :) [17:59:30] k, doin that :) [18:00:46] k, that works [18:00:49] thanks ottomata [18:01:00] hm, ok [18:01:11] weird, dschoon its just you then , hm [18:02:38] hrm. [18:02:43] lovely. [18:02:54] i'll let you know if i figure it out. [18:05:26] Change merged: Erik Zachte; [analytics/wikistats] (master) - https://gerrit.wikimedia.org/r/60295 [18:17:00] New review: Erik Zachte; "(4 comments)" [analytics/wikistats] (master); V: 1 C: -1; - https://gerrit.wikimedia.org/r/59864 [18:17:18] erosen: ping [18:19:46] geohacker: hey [18:20:02] sorry, was in the wmf analyst office hours [18:20:15] (in #wikimedia-office in case you) [18:20:19] 're interested [18:20:26] ah [18:21:46] erosen: oh nice. I'm in. [18:22:13] word [18:22:13] was wondering whether we've gotten any close with the data :) [18:22:27] so, the san, but expected answer is no [18:22:55] i didn't figure out why the counts were matching up, and I haven't rewritten the code to log the country edits directly [18:23:11] hmm okay. [18:23:53] do you think we can get this going end of this week, may be? [18:24:07] or if it's not worth your time. [18:24:49] geohacker: I'm a bit swamped with overdue tasks at the moment, so I'm not super optimistic [18:25:15] i think this is definitely a long term priority, and I am certain it will happen within a few weeks, but I can't commit to a short term time line [18:25:28] ah okay cool. [18:26:17] erosen: :) sorry for harrowing you down on this. [18:26:35] no worries, I'm glad someone is interested in this data and I feel bad I haven't been more responsive [18:26:59] erosen: also, do you think the article level info is handy? [18:27:19] I'm just trying to figure out my work slots as well :) [18:27:40] so that the maps can wait a bit while I work on the articles. [19:08:42] question [19:09:02] ottomata: to what extent do we need to take notes of what we're doing on stat1 in the view of moving to a different machine in the future ? [19:09:18] I mean, there are things which are not packaged, like wikistats [19:46:38] heya drdee [19:46:42] you der? [19:47:58] or milimetric?? [19:51:02] drdee: just reaached a consensus with Erik [19:51:09] drdee: he has a cronjob and so will I [19:51:28] drdee: his runs pageviews_monthly.sh , mine will provide his job with the needed csv in the required location [19:51:44] at the relevant time (one per month) [19:51:46] why not one script? [19:53:09] * ottomata is testing me [19:53:15] drdee: because his is meant to be run daily, whereas mine can only be run monthly [19:55:49] why is that? [19:57:23] because I didn't take into account this when I started developing the new mobile pageviews metric almost 5 months ago [19:57:33] heeeyayyayay drdee [19:57:46] re E3Analysis vs user_metrics repo in gerrit [19:57:51] can we move user_metrics to its own repo? [19:58:19] oh hm, i guess it is in its own repo...hmm [19:58:21] not sure what you mean? [19:58:27] i'm not either [19:58:31] i was under the impression [19:58:33] I mean it was a feature that I wasn't aware was needed [19:58:43] that E3Analysis was a larger repo, and user_metrics subdir was really metrics api [19:58:56] no, that's not the case [19:59:09] one sec, lemme demystify this [19:59:23] mk [20:01:59] demystify us :) [20:02:05] oh ok [20:02:14] i thought you were saying there's a user_metrics repo [20:02:21] well, i thikn i thought there was [20:02:27] or, that there shoudl be [20:02:31] right, so no - the user_metrics subfolder should basically be called src or something like that [20:02:34] or that the one on github was user_Metrics [20:02:48] the one on github was E3Analysis [20:02:59] hey guys! good to see you again. [20:03:01] that was owned by rfaulkner [20:03:04] hi Kraig! [20:03:06] welcome back :) [20:03:17] thanks milimetric [20:03:22] so then when it got forked to wikimedia, it was called user_metrics [20:03:32] but when it got mirrored to gerrit, it got called E3Analysis [20:03:49] so they're all the same thing, and you have to try to ignore that there's a user_metrics subfolder [20:03:55] hm ok [20:04:16] what we need to do is rename gerrit...analytics/E3Analysis to gerrit...analytics/user_metrics [20:04:41] and mirror it to the wikimedia/user_metrics on github if people need that [20:04:44] but for now I don't think we do [20:07:10] qq [20:07:15] why is it called user_metrics? [20:07:20] are all the metrics about users? [20:07:30] the name is not the best [20:07:40] because technically it's about edtirosr [20:07:43] i mean editors [20:08:44] mmm, I think calling it anything else would be up to Dario [20:08:55] is it always about editors? [20:08:55] but it seems to me to be about anyone, not just editors [20:09:10] no, it's just the metrics so far have concentrated on that [20:09:35] but yeah, since it gets its data from databases, it'll be about editors for the foreseeable future [20:09:54] although it could easily execute Hive to get pageview data [20:12:22] yeah this seems like a generic api frontend to whatever data, right [20:12:23] ? [20:12:39] well not sure about [20:12:39] that [20:12:42] i like metrics api as a name more than anythign else [20:15:22] just as a note, i've been increasingly relying on hive to do data validation [20:15:26] it's pretty fantastic [20:20:24] check: https://mingle.corp.wikimedia.org/projects/analytics/cards/506 [20:20:31] it includes hive puppetization [20:20:49] scheduled for tomorrow's grooming session [20:21:38] cool! [20:22:47] drdee: i created https://mingle.corp.wikimedia.org/projects/analytics/cards/598 right before i became enplagued [20:27:52] drdee, i think those non-hadoop cdh4 service depend on the cdh4 puppetization first [20:27:54] just fyi [20:28:02] well yes of course [20:28:10] but this is about *grooming* not *scheduling* [20:28:38] ok cool [20:29:07] average, around: wanna talk about udp-filter repo [20:30:00] milimetric: can i drop https://mingle.corp.wikimedia.org/projects/analytics/cards/306 [20:30:22] yes [20:51:20] howdy! [20:51:25] howdy [20:51:37] ok, so you've got an instance set up? [20:51:52] that you want limn on? [20:51:55] well someone set up an instance... [20:52:15] what's it called and under what project is it? [20:52:18] according to Ryan Lane it was set up under the analytics project (which I'm a member of) [20:52:22] i belief it was dzahn who did that [20:52:31] but I don't see it listed under that project myself [20:52:47] maybe it's something only the project admins can see [20:53:02] any idea what it's called? [20:53:09] i'm looking under that project in labs console now [20:53:15] nope, but the site is http://ee-dashboard.wmflabs.org/ [20:53:33] if I try to ssh to that from bastion it doesn't work though: Could not resolve hostname ee-dashboard.pmtpa.wmflabs: Name or service not known [20:53:34] hm, that'll be a bit trickier to find [20:53:41] i see the "krike" instance was just made [20:53:47] yeah [20:54:00] probably because it needs to be ssh krike.pmtpa.wmflabs [20:54:08] (if that's indeed the same instance) [20:54:11] one sec, i'll check [20:54:19] dartar also mentioned the kripke instance a few days ago, so maybe that was it [20:55:42] hm, no, that's our old instance, but we can put your stuff on there [20:55:43] krike is the replacement for kripke [20:56:02] dschoon you could not come up with another name? [20:56:09] haha :) [20:56:09] ok [20:56:13] kirke :) [20:56:17] so this is a needle in a haystack silly thing [20:56:40] as everyone here knows, i totally subscribe to the narcissism of small difference [20:57:32] "Could not resolve hostname krike.pmtpa.wmflabs: Name or service not known" [20:57:37] ok... someone set it up on kripke [20:57:39] ok, so it's there [20:58:02] (kirke is the proper greek attic spelling of http://en.wikipedia.org/wiki/Circe) [20:58:04] or, maybe i'm confusing myself, maybe i set that one up but had wanted to migrate it [20:58:16] either way, kaldari, are you ok using the instance on kripke for now? [20:58:34] ok, I look a look at /var/www/ on kripke, but didn't see anything [20:58:54] no, it's not there, limn is installed in a bit older fashioned way there [20:59:07] wanna do a quick google hangout to talk about it or keep it here [20:59:22] sure google hang out is fine [20:59:50] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [21:00:40] one sec... [21:09:07] rkaldari@wikimedia.org [21:13:38] guys, i'm outtie, i gotta drive back to NYC, tty tomorrroooowww [21:14:20] LATERZ! [21:19:44] mean pageviews per session of the mobile site: 2.0333919022283062 [21:20:02] (for 4/21) [21:20:32] NICE! [21:20:37] what's the max/ [21:20:38] ? [21:21:10] average: erik zachte gave feedback on your gerrit patch set: https://gerrit.wikimedia.org/r/#/c/59864/ [21:22:00] xyzram are you around? i am in the office on the 3rd floor [21:22:19] I'm in LA [21:23:12] do you live there? [21:23:13] On conf call with weekly platform meeting. [21:23:32] Yes, but planning to move to bay area in a few months. [21:23:37] ok [21:23:42] ping when you have tim [21:23:43] e [21:24:03] Sure. [21:31:49] millimetric: what were the names of those github repos? [21:47:05] drdee: where in mingle can i find out which of my stories that have yet to be completed ? [22:00:35] hive experiment says: [22:02:34] 2013-04-21 total_visits: 51,624,103 unique_visitors: 37,736,120 total_pageviews: 104,972,033 avg_session_pvs: 2.0333919022283062, max_session_pvs: 141,882 [22:02:50] ^^ tfinc_ [22:03:03] just a quick experiment with the session data [22:03:12] 4/21 is 15.3GB [22:26:34] so gang [22:26:38] storing analysis files [22:27:17] where are we gonna do that [22:27:24] now that stat1 is rapidly being decomissioned? [22:27:33] the replacement for stat1? [22:29:20] what kind of analysis files? [22:38:52] any kind drdee [22:39:07] the idea is, any limn customers would want a place where they can put their datafiles [22:39:14] so stat1001 was one such place right? [22:39:18] but is that *the* place? [22:40:26] and if so, we should create some standard docs and use Ori's rsync to get stuff there maybe (not sure) [22:40:45] whatever we do, we should have as a goal: answer questions like "where do I put my data and how?" very easily [22:41:00] Anyone know where the files that actually generate the dashboards at ee-dashboards live? The limn-editor-engagement-data github repo seems to have nothing but json files. [22:42:57] Are there some PHP files somewhere? [22:52:14] so the visualization is actually all generated client side [22:52:25] so limn is middleware [22:52:29] yeah, in js via limn [22:52:35] as I understand [22:52:38] yes [22:52:52] but something has to contain the JS :) [22:53:00] oh! [22:53:00] the limn repo [22:53:00] one sec [22:53:07] https://github.com/wikimedia/limn [22:53:31] and your instance is running off the develop branch: https://github.com/wikimedia/limn/tree/develop [22:53:42] which is usually fairly stable, but sometimes not, so we don't deploy then :) [22:54:18] and that's all coco code, which compiles to js (dynamically through connect compiler middleware) [22:54:39] if you want to see the generated js, you can look in the /srv/ee-dashboard.wmflabs.org/limn/var/js directory [22:54:43] on kripke [22:54:50] ok [22:55:24] Something tells me I'm not going to get all this working by Wednesday :P [22:56:38] but hopefully I won't have to hack the actual pages [22:56:48] and can just edit the json [22:56:55] well, i'll guarantee that you'll get it all working [22:56:57] but maybe what we have to talk about now is what's missing [22:57:11] so let's make an etherpad with exactly what you need to do [22:57:22] http://etherpad.wmflabs.org/pad/p/qhlgEj4zpk [22:59:49] right now, I don't know what I'm missing [22:59:55] I probably will soon :) [23:01:05] well, i mean, nobody told you "do this"? and handed you some type of list or task? [23:01:33] heh [23:01:38] drdee: are you around ? [23:01:39] kaldari, you're so cute [23:01:45] "are there PHP files somewhere?" [23:01:46] psh. [23:01:51] that would be the EASY WAY [23:01:56] xyzram in 30 minutes [23:02:03] ok [23:02:04] INSTEAD WE HAVE 16,000 LINES OF JAVASCRIPT [23:02:25] well, last week benny was going to build some dashboards for Echo in preparation for the en.wiki deployment on Thursday, but he's currently having a baby [23:02:28] it's up to 16k? [23:02:30] huh [23:02:34] that he wasn't expecting yet [23:02:38] oh wow [23:02:46] so it got dumped in my lap at the last second [23:02:54] ok, I understand now :) [23:03:09] thus, why I have no clue :) [23:03:26] ok, DarTar, any clue on what exactly was dumped on kaldari over here? [23:03:28] :) [23:03:39] I talked with dartar a bit [23:04:06] he suggested we just set up a new labs instance with static files to make things easy, which would probably be fine for now [23:04:19] since I only have a few hours to work on this [23:04:54] ok, but make what easy? If you're making new graphs, do you have the definitions of the graphs and datafiles to visualize? [23:06:05] http://art.less.ly/2013/heart-dino-2.png [23:06:28] ok folks, I'm talking to kaldari - let's freeze this conversation for now [23:06:33] k [23:07:55] bb8b5b6ed3d64718292673a1b771d67ca7ab9e17 [23:07:57] oops :) [23:11:43] DarTar, kaldari, i'll keep updating the etherpad if you ping me on any new questions up there: http://etherpad.wmflabs.org/pad/p/qhlgEj4zpk [23:13:12] ok, I think me a dartar have a plan to hack something quick in the meantime [23:13:35] yeah, we'll migrate all my dashboards away from the TS to an instance E2 has access ti [23:13:52] and figure out the Liminification later [23:14:01] >limnification [23:14:08] since we only have a few hours to put all this together [23:14:32] hm, i doubt anything will be faster than the new graph creation page DarTar [23:14:40] that will probably literally take a few minutes [23:20:07] kaldari: i can come upstairs if you'd like some help [23:20:28] gimme a little while to try some stuff [23:22:05] dschoon: you might like https://github.com/nelhage/reptyr, it let's you reattach a process to a different terminal (like a screen session) [23:22:17] huh [23:22:55] if you had a process running in an ssh terminal, but want it to live on in a screen shell, you don't have to kill it and restart [23:23:05] just ran across and it seemed useful [23:27:46] yeah, def [23:38:30] tfinc: custom view for your eyes only: https://mingle.corp.wikimedia.org/projects/analytics/cards/list?columns=Development+Status&filters%5B%5D=%5BType%5D%5Bis%5D%5BFeature%5D&filters%5B%5D=%5BCustomer%5D%5Bis%5D%5BProduct+-+Mobile+Web+%26+Apps+%28Tomasz%29%5D&page=1&style=list&tab=All [23:43:34] drdee: still busy ? [23:43:53] yes [23:43:56] i mean no [23:43:59] we can talk [23:44:18] did you have some issues to discuss ? [23:45:35] yes, the repo :) [23:45:50] hangout? [23:46:06] sure [23:46:07] https://plus.google.com/hangouts/_/f570bb9e5495e6128b5d2a4bd4799b700ec3b977 [23:48:10] ping average [23:48:15] are you around?