[00:09:47] milimetric: hey, back on-line, on train [12:46:19] yo milimetric, already around? [12:46:27] howdy, yep [12:46:32] please help me :) [12:46:39] no prob [12:46:39] hangout? [12:46:39] yup [13:25:35] morningg! [15:39:01] mooorning ottomata [15:39:04] morning! [15:41:45] can you have a quick look at https://mingle.corp.wikimedia.org/projects/analytics/cards/756 [15:45:35] HMmmmMMm [15:45:41] i dunno, that seems weird [15:45:48] to have INFO but not WARNING [15:46:01] there might be other warnings that could be relevant [15:46:19] usually that will end up in ERROR [15:46:40] i have never seen a WARNING that's a not a deprecation and was actually meaningful [15:46:58] but for now i have got something more urgent [15:48:59] can you help me write a simple batch script to append the filename of each zero log file to each log line? [15:50:18] on stat1? [15:52:34] easy way for files [15:53:12] grep ".*" ./* [15:55:51] but the filename needs to be appended to each log line in each file [15:57:25] appended? [15:57:27] not prepended? [15:57:31] is prepended with a colon ok? [15:57:36] with multiple files [15:57:41] grep will automatically prepend filename to the match [15:58:03] but it needs to added to each file and saved as a new file [15:58:38] k [16:01:08] for f in *; do sed -e "s/\$/ $f/g" $f > ${f}.appeneded; done [16:09:31] does that work drdee? [16:09:43] not sure, haven't tried yet in meeting [16:09:59] k, you can change * if you want to list different files, but you should probably also make ${f}.appended smarter if you do [16:10:01] mabye [16:10:12] > $(basename $f).appended [16:10:14] or something [16:13:07] this is blocking me on #244, can you try it? [16:13:17] i am in meetings the coming 2 hours [16:19:41] you want ALL files done? [16:19:44] tell me what to do [16:19:55] drdee ^ [16:20:53] all zero log files that contain tab in the filename [16:22:09] ottomata: ^^ [16:22:20] tab, not the newer tsvs? [16:23:10] yes and those ones as well, we should rename those file to have consistent naming [16:23:39] meh? its indicative of when the server was upgraded and better puppetized :) [16:24:20] really ALL of them drdee? [16:24:30] back to february? [16:24:46] yes all zero files back to february 1st 2013 [16:29:01] ok drdee, that's running on a screen on stat1 [16:29:03] results saved to [16:29:11] /a/squid/archive/zero/filename_appended [16:29:48] thank you! [16:54:25] ottomata did you separate the new field using a tab? [16:54:44] nope :) [16:54:51] want me to start over :p [16:54:51] ? [16:54:53] yes [16:56:23] k [16:57:45] hokay started over, going again, good catch [17:02:58] average: are you around? [17:26:03] ok erosen [17:26:05] whazzup? [17:26:07] what we do? [17:26:10] so [17:26:20] a first step would be where the code should live [17:26:25] right now it lives at.... [17:26:46] https://gerrit.wikimedia.org/r/#/admin/projects/analytics/global-dev/dashboard [17:26:52] under /mobile [17:27:16] ottomata: this is the file that does the magic [17:27:48] ottomata: and it depends on this other library for mcc-mnc stuff: https://github.com/embr/mcc-mnc [17:28:07] that's fine for it to live, i'm just going to one off do whatever it is you do [17:28:16] great [17:28:18] where does mccmnc need to be cloned to? [17:28:29] just installed on the same machine [17:28:39] so that it can be used by run.py [17:28:42] oh easy_install [17:28:43] ? [17:28:46] python setup.py? [17:28:50] I have to sudo install? [17:29:04] you can do a pip install -iuser -e . [17:29:07] hm [17:29:14] i mean pip install --user -e . [17:29:29] if this is an issue, we can definitely get around it [17:29:29] k [17:29:47] i was just tired of the mcc-mnc info living in several projects [17:29:52] so I wanted to create its own [17:29:57] repo [17:30:25] ok, that worked [17:30:30] ottomata: what machine are you on? [17:30:32] an02 [17:30:34] ~/global-dev [17:30:39] k [17:31:24] ottomata: run.py has a pretty extensive argparse situation [17:31:38] it currently has a small catch which is that it expects a directory containing count files [17:31:48] ImportError: No module named apiclient.discovery [17:31:50] but if you give it the pig results it takes forever [17:31:54] aah [17:31:55] sorry [17:31:59] it also requires gcat [17:32:11] https://github.com/embr/gcat [17:32:21] which is my google docs scraper [17:32:27] this part could be a bit annoying [17:33:00] the config files you need for this live in stat1:~/.gcat [17:33:19] so I think there is another install: pip install --user -e . (inside gcat) [17:33:29] and copy over the .gcat dir form stat1 to an02 [17:36:56] python run.py --carrier_counts=carrier --country_counts=countr --daily [17:37:54] ottomata: this was my most recent command: /run.py -l=DEBUG --carrier_counts=carrier_fixed/ --carrier_date_fmt='%Y-%m-%d_%H' --country_counts=country_local/ --daily [17:38:11] the --carrier_date_fmt was just because I was using a werid data fie [17:50:11] drdee: hi [17:50:24] ottomata , erosen , rounce123 , milimetric hi [17:50:36] hey [17:50:40] average_drifter: we're doing backlog grooming [17:50:41] I couldn't make the scrum [17:50:49] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [17:51:04] average_drifter: let me know if you need an invite [17:51:09] can't at the moment join [17:51:11] k [17:51:21] I'm currently working on separating into 3 packages for libdclass [17:51:41] in particular setting libdclassjni_ladir in the Makefile.am [17:51:47] :) [17:58:48] ottomata; is your script still running? [18:00:30] yes [18:00:40] 1636/2077 files [18:04:30] thx [18:21:53] erosen [18:21:54] from BeautifulSoup import BeautifulSoup [18:21:54] ImportError: No module named BeautifulSoup [18:22:00] arrggg [18:22:04] let me check [18:22:12] File "/home/otto/global-dev/mcc-mnc/mccmnc/__init__.py", line 3, in [18:23:41] erosen, yt? [18:23:54] Maryana: hey [18:24:11] Maryana: sup? [18:24:27] i'm having trouble uploading a new cohort to UMAPI. dario told me to come here and bother you about it :) [18:25:04] it's about 4K commons users, in the correct format (username, projectname) [18:25:06] erosen: context, Maryana was trying to upload a 4k commons cohort [18:25:07] not sure what the deal is [18:25:18] gotcha, are you in the office? [18:25:19] yep [18:25:25] should i come up to 6? [18:25:26] I checked the usertags(_meta) table, there's no trace of it [18:25:36] Maryana: either way, I can come to 3 too [18:25:56] cool, thanks erosen [18:26:09] no worries - i'll be up in a sec (if you're not busy) [18:26:15] k [18:26:27] hi Maryana, wanna shoot us over the csv? [18:26:35] i can run it locally and see what's blowing up [18:26:39] ottomata: pushed mcc-mnc fix [18:26:48] k [18:26:52] do I need to repip? [18:26:54] try reinstalling with git pull && pip install --user -e . [18:26:56] yeah [18:27:01] milimetric, sure [18:27:16] * milimetric is Dan Andreescu btw [18:27:35] erosen [18:27:36] File "/home/otto/global-dev/mcc-mnc/mccmnc/__init__.py", line 4, in [18:27:36] import requests [18:27:36] ImportError: No module named requests [18:27:57] ottomata: sorry about this [18:28:04] first user other than me [18:28:48] i used it :D [18:29:10] milimetric, i know you! [18:29:17] k :) [18:29:27] :) [18:29:29] ottomata: one more time [18:29:50] ottomata: script still running? [18:30:24] 1829/2077 [18:33:23] k erosen i think its going [18:35:17] ottomata: great [18:35:39] ottomata: I think it will also require limnpy, is that someting that is installed on an02? [18:36:48] [INFO ][oauth2clie][MainProcess ][_do_refresh_req:0680] Refreshing access_token [18:36:48] Traceback (most recent call last): [18:36:48] File "dashboard/mobile/run.py", line 71, in [18:36:48] … [18:36:48] response = conn.getresponse() [18:36:49] File "/usr/lib/python2.7/httplib.py", line 1018, in getresponse [18:36:49] raise ResponseNotReady() [18:36:50] httplib.ResponseNotReady [18:36:57] limnpy? naw but I could [18:37:39] did [18:37:42] erosen ^^^ [18:37:45] hey [18:37:55] hmm [18:38:20] that is it trying to grab the google spreadsheet with logic for which partners to show which versions [18:38:29] [INFO ][oauth2clie][MainProcess ][access_token_ex:0586] access_token is expired. Now: 2013-06-11 18:38:09.871448, token_expiry: 2013-06-11 03:00:45 [18:38:29] [INFO ][gcat ][MainProcess ][get_credentials:0301] refreshing token [18:38:29] [INFO ][oauth2clie][MainProcess ][_do_refresh_req:0680] Refreshing access_token [18:38:29] sometimes the google oauth servers just time out [18:38:32] can you try again? [18:38:43] is that working? [18:38:50] i think http has to go through brewster proxy [18:39:00] hmmm [18:40:22] there is a workaround for this [18:40:29] as we should really just be parsing a wiki table [18:40:36] which I think drdee may have already done [18:40:55] drdee: ^^ [18:40:57] why are you guys doing that again? [18:41:06] my code is in java btw [18:41:09] k [18:41:12] that was my suspicion [18:41:33] drdee: basically some carriers don't want to see the zero version counts if they don't support it [18:41:47] drdee: it's essentially a minor UI detail which amit asked for [18:42:06] (contingent on the X-CS headers being set correctly) [18:42:08] k, but isn't it better not to output it then in pig script? [18:42:29] these small tweaks are hard to keep track of [18:42:37] drdee: I think it is fine to do in pig, but it also needs to be done in the dashboards [18:42:43] ok, erosen, i need to get this kafka thing pushed before I leave for the day [18:42:48] i'm leaving in hopefully less t han 30 mins [18:42:49] ottomata: k [18:42:50] so i'm going to focus on that [18:42:51] no worries [18:43:04] erosen, what's the process for getting access to the same logs as yurik for wikipedia zero? [18:43:39] dr0ptp4kt: i'm not sure [18:43:44] i think drdee would be the one to talk to [18:44:33] rt ticket with request for access to an* is step 1 [18:45:09] drdee, will i literally say "Requesting access to an*"? [18:45:49] drdee, er, *should* (not *will*) i say that? i know you can't predict all futures, just some of them. [18:46:25] i would use this as subject [18:46:36] Request access for dr0ptp4kt to analytics machines (an*) [18:46:54] drdee, thx [18:47:04] please CC me and ottomata on the ticket as well [18:47:13] and of course CT [18:49:16] drdee, done. thx [18:49:21] ty! [19:09:19] ok Maryana, so there were some unicode handling issues that we fixed [19:09:37] the cohort is sitting on my laptop, ready to upload [19:09:55] there were a handful of users that look like they exist but they didn't validate [19:10:17] I think it's because they were written in with underscores in their names but they don't actually have underscores [19:10:23] (probably spaces) [19:10:58] I could 1. finish the upload and then you can play with the cohort or 2. wait until we figure out what's up with these other users (I can send you a list) [19:37:31] ok, i'm out all, laters! [19:42:35] sorry milimetric, was afk to get lunch [19:42:52] just go ahead and drop those funky users on the cutting-room floor [19:43:31] :) curiosity got the better of me so I'm gonna try to make them work [19:43:42] hehe, ok [19:43:51] but if it takes more than 10 more minutes, heads will roll :) [19:43:56] mingle card? [19:44:01] sounds reasonable [19:44:28] milimetric: back from lunch [19:44:37] I'm looking into the queue stuff locally [19:45:03] https://mingle.corp.wikimedia.org/projects/analytics/cards/757 [19:46:44] drdee: that's not quite accurate [19:46:49] though there is some work we could do [19:47:03] sorry, that was my understanding of reading the IRC chat [19:47:07] feel free to update it [19:47:15] k [19:47:16] will do [19:47:48] drdee: basically if the username actually contains spaces, but the user uploads the username as it appears in the url, then it can be a problem [19:48:42] got it,thx [20:00:53] oh man! I had success on every single invalid user Maryana & erosen [20:01:00] I'm re-running it now [20:01:07] \o/ [20:01:51] so it was two things: commas in the names & underscores that replaced spaces. We won't be handling the underscores thing because there's no way to know in an automated way, but I'll put a little note in the example file [20:05:18] milimetric, erosen: hangout? [20:05:22] got some data to share [20:05:50] i think evan dropped out [20:05:55] think so tooo [20:07:24] New patchset: Milimetric; "more validation and handling of special characters" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/68041 [20:08:16] drdee - you hear me? [20:08:21] I feel like maybe I'm muted :) [20:08:28] (on your side) [20:08:35] yes [20:08:39] unmute your speakers...? [21:05:45] milimetric, is that cohort up or still loading? [21:06:16] still loading Maryana, I had another problem with it :( [21:06:26] I'll get it done as soon as possible though [21:06:38] no worries - thanks for helping w/this! [21:06:39] it wasn't the cohort this time, some bug on the front end [21:06:41] no prob [21:07:19] even if it's buggy, it's still better than the good old days when we had to create cohorts by hand [21:07:21] in the snow [21:07:24] uphill both ways [21:07:26] ;) [21:13:14] milimetric: let me know when you are done with cohort stuff [21:13:32] I pushed some changes to wikimetrics evan [21:13:45] it runs your queue tests now [21:13:57] I think it's because of bad assertions though [21:14:05] i added the proper data on the enwiki mwSession [21:14:12] but something must be off [21:14:19] k, cool [21:16:51] heh... ok, found the problem [21:16:56] this is a username: [21:16:56] Swim Team "Prishtina" [21:16:58] lol [21:20:42] haha [21:23:36] http://www.illyrianswim.org/drinsstatpg.html [21:25:44] hey, I have no beef with the Albanians :) [21:25:58] Just saying their username made our code better [21:26:21] erosen: this is kinda crazy 'cause it has to validate the users twice [21:26:25] but i think it's gonna work this time [21:26:36] yeah the validate is annoying [21:31:14] i'm a little beat erosen but with that *hopefully* last run of the cohort, wanna talk wikimetrics? [21:31:18] milimetric: :) [21:31:22] i'm jumping in the hangout [21:31:27] yeah, hanging out [21:36:02] milimetric: http://stackoverflow.com/questions/8506914/detect-whether-celery-is-available-running [21:59:14] Maryana: I'm sorry, I have one last problem with this thing. Is it ok if I finish up tonight? When do you need the cohort by? [21:59:51] I'm pretty sure uphill both way cohort creation is a little faster than generic crazy character handling :) [22:01:28] milimetric: it's not super urgent - tomorrow is fine :) [22:02:28] cool, thanks :) [22:02:35] i'll ping/email when it's finally uploaded