[00:03:19] drdee: you about? [00:04:27] drdee: (for when you are) could you send me info about the sampled data to use for the tablet counts in #61? [00:05:00] hm. i will send an email. [00:44:26] ori-l thanks a bunch for the rsync puppet, I used it to build: https://gerrit.wikimedia.org/r/54811 [00:45:11] if that gets accepted, it'll rsync yuvi's stuff so you could probably abandon the one you set up [00:46:12] I decided to make the rsync go from stat1 to stat1001 because that way it won't have to sweat whether or not the SQL finished running [00:49:35] milimetric: were you referring to deploying any particular type of artifact? [00:49:59] no, it doesn't have to be specific [00:50:08] we didn't understand the policies altogether [00:53:55] oops, forgot to ping kraigparkinson ^^ [00:55:55] ok [00:55:58] thanks. :) [00:57:21] milimetric, what's left to do for #68? [00:59:43] i am around btw [01:00:41] 68 is now waiting on two more things [01:00:48] 1. ops approval of my puppet change [01:01:15] 2. me to point the graphs to the result of the scripts I just puppetized [01:01:57] kraigparkinson: so I'm hoping ottomata can polish up any mistakes I made with puppet and we can have that merged quickly tomorrow morning [01:02:13] and as far as #2, it'll take no more than 20 minutes as soon as 1 is done [01:02:26] so there's some risk that this won't get done, but it's sort of out of our hands [01:02:57] OK. [01:03:03] Thanks for the update. :) [01:03:26] can you update me on that tomorrow morning around 9 pacific? [01:03:33] :) [01:04:26] drdee: updates to https://mingle.corp.wikimedia.org/projects/analytics/cards/92 are in [01:04:41] drdee: i'm going to be leaving in about 30min. let me know if you need anything from me [01:04:42] yes, thanks! reading right now [01:10:37] drdee: please mail if any questions come up. im going to wind down [01:10:48] tfinc: will do [01:30:31] kraigparkinson: will do [01:30:55] thanks, milimetric [09:28:26] [travis-ci] master/b40f613 (#100 by dsc): The build has errored. http://travis-ci.org/wikimedia/kraken/builds/5651302 [10:15:09] [travis-ci] master/f314567 (#101 by dsc): The build has errored. http://travis-ci.org/wikimedia/kraken/builds/5652129 [11:12:09] [travis-ci] master/1991a3e (#102 by dsc): The build has errored. http://travis-ci.org/wikimedia/kraken/builds/5653141 [12:16:22] morning [12:18:47] * YuviPanda waves at milimetric [12:18:58] hey YuviPanda [12:19:06] you are in EDT? [12:19:16] we're waiting on the gerrit patchset to get reviewed [12:19:22] yes, I'm on EDT (EST) [12:19:43] ah, nice [12:19:48] * YuviPanda looks at analytics/limn-mobile-data [12:20:05] milimetric: is /a/ world readable? [12:20:13] i don't think so [12:20:18] hmm [12:20:18] okay [12:20:22] i think that's actually where all of the private data is stored [12:20:38] based on my poor reading of statistics.pp [12:20:51] looks like at least some raw logs are being stored there [12:20:57] oh! you reminded me! [12:21:01] i never changed site.pp [12:25:06] milimetric: ah, okay [12:25:14] milimetric: also your commits - some are as Dan, some as milimetric [12:25:29] i think i committed from stat1 [12:25:40] and forgot to git --config global user.name [12:25:48] or whatever [12:26:08] it's ok, I won't do that again [12:26:12] as vim is unusable over ssh [12:42:24] milimetric: you can scp ~/.gitconfig dan@stat1:/home/dan/ [12:49:23] thx average_drifter, I just set the username and email [13:01:09] morning [13:08:16] morning drdee [13:08:22] yoyo [13:21:48] got some big problems with git on stat1 [13:21:53] dunno what's going on [13:22:00] it just stalls [13:22:04] I hit git status, it stalls [13:24:01] I would report it on https://rt.wikimedia.org/ , but I don't have an account [13:24:39] yeah, stat1 is being crazy slow [13:24:46] vim is acting up on it too, right average_drifter? [13:25:10] milimetric: haven't tried vim, I edit locally and I push to stat1 and I run there [13:25:20] but I can't use git on stat1 [13:25:21] :| [13:25:27] milimetric: would you report this to ops please ? [13:25:51] yeah [13:25:51] I guess I'll just resort to scp -r if I can't push.. [13:25:56] wait with reporting [13:26:03] let's first try to diagnose the problem ourselves [13:27:39] average_drifter: which repo can mark try? [13:28:23] milimetric: /home/spetrea/wikistats/pageviews_reports [13:29:06] now it works, I don't know why, it's weird [13:29:19] I would still like if mark could have a look at it [13:29:52] git checkout now takes loads of time [13:31:22] milimetric: vim acting up, confirmed [13:32:58] isn't it just that the load is high and that another process is doing a lot of IO stuff? [13:34:42] there's like 2 processes [13:34:44] one of ezachte [13:34:49] k average_drifter, it's just 100% disk util [13:34:54] mark confirms [13:34:58] and one of halfak [13:35:06] but vim is definitely most horribly affected by that - must be the history tracking [13:35:21] i'd suggest editing in nano and expecting IO bound stuff to take forever [13:35:26] hm....................... [13:35:28] drdee! [13:35:36] maybe that's why Tillman got the email at 7:00pm [13:35:42] :D [13:35:48] though that's literally 19 hours after the job would've started [13:36:22] or wait no... it's 5 hours earlier than it should've started [13:36:26] ok, I have to go now, I'll be back for standup hopefuly [13:37:35] milimetric: could you talk to mark please and ask if he can solve this. I will have to roll out some new reports today [13:37:55] there's no solving it, it's just disks spiked to literally 100% [13:38:03] he suggested removing that symlink you have also [13:38:54] brb chickens & shower [13:38:57] spetrea@stat1:~/wikistats$ df -h [13:38:58] Filesystem Size Used Avail Use% Mounted on [13:38:58] /dev/mapper/stat1-root 14G 5.6G 7.7G 43% / [13:38:58] udev 16G 4.0K 16G 1% /dev [13:38:58] tmpfs 6.3G 356K 6.3G 1% /run [13:39:00] none 5.0M 0 5.0M 0% /run/lock [13:39:03] none 16G 172K 16G 1% /run/shm [13:39:05] /dev/mapper/stat1-a 6.4T 2.7T 3.4T 45% /a [13:39:08] 208.80.152.185:/data 48T 42T 5.8T 88% /mnt/data [13:39:10] /dev/mapper/stat1-tmp 50G 5.9G 42G 13% /tmp [13:39:13] /dev/mapper/stat1-home 1008G 413G 545G 44% /home [13:39:17] last row, doesn't seem to me like 100% [13:39:23] more like 44% [13:40:18] ok I'm relly going now [13:40:19] bbl [13:49:31] grrrrrr [13:49:53] ottomata changed it too - he made the job run at 02:00 UTC [13:49:56] drdee ^^ [14:52:26] mornin [14:53:29] morning [14:53:57] so, new zero job in the format that erosen expects is running [14:54:05] it replaces the old job. [14:54:09] nice [14:54:14] nice, you're about [14:54:20] output lives here: [14:54:26] let me know when it is ready and I'll give it a run through [14:55:10] http://localhost:8888/filebrowser/wmf/public/webrequest/zero_carrier_country/2013/ [14:55:12] and so on [14:55:19] when you drill to a job result, you get: [14:55:36] er, add a /view in there [14:55:37] heh [14:55:38] http://localhost:8888/filebrowser/view/wmf/public/webrequest/zero_carrier_country/2013/03/16/20.00.00/ [14:55:46] the first link should have been http://localhost:8888/filebrowser/view/wmf/public/webrequest/zero_carrier_country/2013 [14:55:56] anyway -- it has a directory for carrier and for country [14:56:04] nice [14:56:07] format is the same for both, but with - for carrier in the country files [14:56:14] * erosen ktunneling... [14:56:33] the rollups are currently disabled (as ottomata suggested) until the job catches up [14:56:42] then i'll have it generate a combined CSV for each run [14:56:51] or even a bigger rollup [14:57:28] http://stats.wikimedia.org/kraken-public/webrequest/zero_carrier_country/2013/03/16/00.00.00/ [14:57:31] there's a public link [14:57:40] dschoon: cool, the rollups aren't a necessity for me either, as my code does this [14:57:44] kk [14:57:49] then we'll ignore it :) [14:58:00] COOOL [14:58:13] it'd be fuckin sweet if we had an updated dashboard before scrum [14:58:21] dschoon, talk to me about the webrequest loss job again real quick, i haven't looked at it in a while and you had the other day [14:58:21] so we could perform the almighty Card Moving [14:58:30] oh, right [14:58:35] it needs: [14:58:35] i think i'd like that working again, i need to look into this packet loss issue more, and it would be nice to know if it was happening in kraken too [14:58:55] 1. fix coordinator.properties to not have Ye Accursed Typo [14:59:00] oh yes yes [14:59:07] (analytics1010.wikimedia.org) [14:59:12] dschoon: as for the card moving, shall we wait until i generate the dashboard and we decide that it looks okay? [14:59:14] 2. kill old coord [14:59:18] re submit? [14:59:22] 3. resubmit to oozie [14:59:23] ja [14:59:24] ok cool [14:59:26] on it [14:59:29] it's really too bad you can't tell it to reread [15:00:02] atm, i'm really pissy to find out that the ternary op in pig can only return *values [15:00:10] so this is illegal: [15:00:20] device_info = FOREACH device_info GENERATE day_hour, country, (is_wireless ? 'handheld' : (is_tablet ? 'tablet' : 'desktop')) as device_class:chararray; [15:00:30] only the inner ternary is correct :P [15:00:48] aye [15:01:04] dschoon, you like this? [15:01:05] ${YEAR}/${MONTH}/${DAY}/${HOUR}.${MINUTE}.00 [15:01:09] yeah. [15:01:10] vs $YEAR-MONTH/... [15:01:10] ? [15:01:13] regular. [15:01:16] ok [15:01:18] will do the same [15:01:19] exceptions suck :) [15:01:19] i think i do too [15:01:39] my device job is emitting its intermediate results (when it works) to /wmf/data/mobile/device_class [15:01:52] /wmf/data? [15:02:02] i figured it was time to start being a little more regimented about making data visible and re-usable [15:02:09] yeah. [15:02:25] for datasets you generate that can be re-used, but aren't necessarily public. [15:02:30] materialized views, annotations, etc [15:02:32] ahh, hmm [15:02:56] I do wish I had imported the raw logs in time basd directories too :( [15:03:09] something to fix when we recombobulate everything one day [15:03:19] yeah :/ [15:03:38] i think for the second iteration of the device job, i'm going to do by-minute rollups. [15:03:49] it reduces the data by like 1/10000th [15:04:02] hm, but you don't want the data too small, right? [15:04:02] but honestly nobody ever needs <1m resolution [15:04:06] yeah [15:04:18] also it makes the aggregate data larger [15:04:59] the overhead of starting the jobs might be more than actually running the jobs [15:08:13] [travis-ci] master/8edbaec (#103 by Andrew Otto): The build has errored. http://travis-ci.org/wikimedia/kraken/builds/5659065 [15:08:41] i think it's long-past time for FairScheduler [15:08:47] also, why the hell does that build keep failing? [15:08:51] it builds locally. [15:12:39] ottomata, did you get a chance to look at https://gerrit.wikimedia.org/r/54811? [15:12:52] we need it for today, along with another small change [15:13:05] (can't submit the other small change until I get this patchset done [15:24:12] [travis-ci] master/2bcda38 (#104 by Andrew Otto): The build has errored. http://travis-ci.org/wikimedia/kraken/builds/5659590 [15:26:19] AHHH i started, so many things! [15:26:31] hah, yes, i have lots of comments, I will review extensively [15:26:33] here's my list: [15:27:00] 1. get webrequest loss back up (hopefully on 30 more mins) [15:27:01] 2. review your thing [15:27:01] 3. python .deb for hashar (only 30 mins or less) [15:27:47] \O/ :-] [15:28:24] cool, thanks ottomata, your RT weeks are packed :) [15:29:02] this isn't even RT! [15:29:06] i ahven't even looked at RT this week yet! [15:29:09] this is alll people just asking me! [15:29:10] heheh [15:32:13] dschoon, should we kill the webrequest_mobile_device* coordinators? [15:32:23] the workflows are all dying [15:32:31] i'll take a look in a sec [15:33:14] ok ok, hashar, i'll do yours first, since paravoid already approved it, that should only take a few mins [15:33:30] niiiice [15:34:13] [travis-ci] master/36b6f84 (#105 by Andrew Otto): The build has errored. http://travis-ci.org/wikimedia/kraken/builds/5659984 [15:47:04] dschoon: that build keeps failing because travis-ci can't find the cdh packages [15:47:12] huh [15:47:24] i'll take a look later today [15:47:38] see https://github.com/travis-ci/travis-ci/issues/948 [15:47:55] travis-ci should setup a proxy repo [15:48:09] because their local settings is quite restricted [15:50:29] hmmm [15:50:33] oops wrong chat [15:51:32] hmmmmm indeed [15:54:43] gm all [15:55:12] good morning kraig [15:55:17] kraigparkinson: update [15:55:28] ottomata is working on reviewing the changes I submitted last night [15:56:29] morning [15:56:44] milimetric, cool. What's the likelihood of that being ready for showcase? [15:57:36] well, i thought it was pretty ready and that the review would be quick, but ottomata said he had loads of comments, so now I'm skeptical. I would estimate 35% likelihood [15:58:29] k, would be good to debrief at some point on ottomata's feedback and how we can learn from it. [15:58:32] dschoon, way to getting #244 ready for showcase [15:58:41] way to go, that is. :) [15:59:03] word [15:59:09] almost done with a fix for 61 [15:59:18] does that mean #60 is on its way for this morning? :) [15:59:27] er, right, thanks. :) [16:00:07] average_drifter, how about #60? [16:23:21] omw to the office [16:25:13] drdee, have a sec to chat? I want to see if we can get a clear policy on the Customer property in Mingle. [16:28:08] kraigparkinson: in a meeting with erikz [16:28:48] k [16:31:31] kraigparkinson: problems with stat1 blocked me to roll out reports for #60 [16:32:37] milimetric: how can i reach mark? [16:33:08] in the operations channel, you were talking to him there before [16:33:22] average_doc: ^ [16:33:24] whats his nickname? [16:33:28] mark [16:33:29] :) [16:33:36] ok [16:42:04] average_doc, hehheh, its really unclear what your problem is over in operations [16:42:43] right now it is 'git and vim weren't working, erik's job was using only 1 cpu' [16:43:26] well, ill leave it at that and hope that when ill get home it will be usable [16:43:44] so i can wrap up #60 [16:43:52] ottomata, the problems are very annoying and just started happening [16:44:08] but i agree that it's not very clear what's going on [16:44:17] or whether it's anything that ops could help with [16:44:42] but if you'd like to see it for yourself, ssh to stat1, open a file in vim, and try to save it [16:45:00] however, before that, have you had a chance to look at that patchset? [16:45:22] milimetric: is this a vim on stat1 prob? [16:45:36] it's been reaallllly slow for me too [16:45:42] yes erosen, but i'm pretty sure that's just a symptom [16:45:47] mark said disk usage spikes to 100% [16:46:02] this doesn't sound right to me, it sounds like there's some errant job [16:46:11] hrm [16:46:48] i was thinking it was an issue with my screen session [16:46:59] well, ping me if you find anything out [16:47:21] k [16:47:43] btw erosen: workaround is to use nano [16:48:05] so that makes me think it's vim's .swp file history tracking stuff that's causing an issue [16:48:32] check halfak's job on stat1 [16:49:05] milimetric: https://gerrit.wikimedia.org/r/#/c/54811/ [16:49:06] reviewed [16:49:54] vim seems totally fine for me on stat1 [16:50:14] woohoo thanks otto [16:50:21] yeah, that uses 100% of a cpu [16:50:27] ezachte is gzipping something right now too [16:50:54] yikes, halfak is running amysql query that is using 50% of memory [16:50:58] 14G [16:52:41] ezachte's stuff should keep running [16:52:50] i belief halfak's stuff is snuggle [16:52:56] yeah [16:53:03] and if it's misbehaving than we should just stop it for now [16:53:07] and warn halfak [16:53:25] kraigparkinson: quickly pre-scrum chat in 3 minutse? [16:53:30] drdee: and if it's misbehaving than we should just stop it for now [16:53:31] [12:53pm] drdee: and warn halfak [16:53:33] ottomata ^^ [16:53:34] sure [16:53:36] hmm, how do I get a hold of halfak [16:53:38] i just tried chatting at him [16:53:45] i will pm his email addresss [16:53:50] k, CC me [16:54:11] am in the scrum hangout./ [16:54:30] i cant get to scrum today [16:54:44] im 50m away [16:54:55] minutes [16:54:58] drdee ^^ [16:56:13] and ottomata, that .my.cnf.research file is safe in /a/? [16:56:19] it's not world readable or anything, right? [16:56:42] you need to make the file perms right [16:56:48] i put that in a comment too [16:59:47] thanks, got it [16:59:55] i don't know how to do the rsync thing... [17:09:34] erosen! [17:09:43] dschoon!? [17:09:53] sup? [17:10:07] qq -- You think those dashboards could make themselves before 11? [17:10:15] eeeh [17:10:17] perhaps [17:10:21] we are hoping to demo them at the showcase :( [17:10:21] I'd give 70% [17:10:24] :/ [17:10:26] i'll start now [17:10:28] hehe [17:10:33] anything i can do to make that happen? [17:11:02] dschoon: the only part that I am a little unclear about , is how to best grab the files [17:11:10] if you want to point it at a different limn (to leave the current ones there) that's cool [17:11:17] I'll just rsync to stat1 for now I guess [17:11:17] they're on stat1001 [17:11:25] http or local works [17:11:27] here: [17:11:43] yaeh, but because they aren't going directly to limn, i need them somewhere i can run my python code (stat1) [17:12:06] http://stats.wikimedia.org/kraken-public/webrequest/zero_carrier_country/2013/03/ [17:12:07] ^^ erosen [17:12:10] yep. [17:12:37] is it possible to do a recursive copy with wget and an apache file browser? [17:13:08] /a/srv/stats.wikimedia.org/htdocs/kraken-public/ [17:13:11] on stat1001 [17:13:12] also works [17:13:20] world readable [17:13:34] that's what I'll do, thanks [17:13:47] everything under zero is only ~15M [17:13:50] yes but I wouldn't put files in there manually [17:13:54] right [17:13:54] that dir is rsynced out of kraken [17:13:56] out of hdfs [17:14:00] he just needs to get to them [17:14:03] oh ok [17:14:04] to postprocess into a new location [17:14:09] yeah, just copying from [17:14:17] ok ok, sorry didn't read convo :) [17:14:45] that cool, erosen? [17:14:54] def lmk if there's anything else [17:16:12] dschoon: actually, the path for recursive copy isn't working / I don't understand how it is supposed to work [17:17:49] hm? [17:17:52] you upstairs? [17:18:02] naw, still at pydata [17:18:07] grah [17:18:16] okay, let's switch to gchat for pastecore [17:18:28] dschoon: actually i got it [17:18:33] k [17:18:43] wait, nvm [17:18:44] hehe [17:18:53] dschoon gchat it is [17:24:55] ottomata: ready for review: https://gerrit.wikimedia.org/r/54811 [17:26:18] okey dokey lets try it [17:26:19] merging [17:28:50] ah [17:28:51] err: Failed to apply catalog: Parameter path failed: File paths must be fully qualified, not '$rsync_from/mobile/datafiles' at /var/lib/git/operations/puppet/manifests/misc/statistics.pp:760 [17:28:53] single quote prob i think [17:29:31] yeah, milimetric, if you've got variables in your strings you need to use double quotes [17:29:35] sorry I didn't catch that [17:29:52] oh i didn't know [17:29:58] thought it was like JS :( [17:30:09] i'll fix and re-submit along with the other patch? [17:30:18] the blog one? naw bette rto be separate [17:30:23] ok, cool [17:30:25] try using a topic branch for it! [17:30:29] submitting the fix now [17:30:32] k [17:30:40] uh... would rather not, my feet are a bit on coals right now :) [17:33:26] ok, pushed ottomata, and made it run every hour per YuviPanda's request [17:33:43] wheee! [17:33:46] now... we wait an hour? [17:35:44] drdee: is the analyst scrum officially dead? [17:36:04] DarTar: NOOOOO [17:36:16] i'm at pydata, so I won't make it [17:36:20] sorry for hte lack of warning [17:36:28] ok, I'm hanging out :) [17:36:47] erosen: np [17:40:07] okey dokey milimetric, ! [17:40:07] # Puppet Name: rsync_mobile_apps_stats [17:40:07] 0 * * * * python /a/limn-mobile-data/generate.py /a/limn-mobile-data/mobile/ && /usr/bin/rsync -rt /a/limn-public-data/* stat1001.wikimedia.org::www/limn-public-data/ [17:40:09] looks good [17:40:24] so, one more thing, not for now [17:40:30] if/when we add more scripts like this [17:40:32] that gneerate limn public data [17:40:45] dartar, I was thinking we could combine analyst scrum into weekly analytics showcase discussion, how about that? [17:40:51] we might want to abstract this a bit more then [17:40:54] but for now this is good [17:40:55] yay! [17:41:06] cool, thanks ottomata, I agree [17:41:11]