[13:22:30] New patchset: Diederik; "Bug fix for 'Group parameter not honored', Mingle #603" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62974 [13:24:34] New patchset: Diederik; "Bug fix for 'Group parameter not honored', Mingle #603" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62974 [13:30:41] hi everyone, morning [13:31:10] MOOOORNING [13:31:17] i see the blocker from Dario and your fix [13:31:19] lemme look [13:31:27] yes please have a look [13:31:31] the fix is trivial [13:31:40] probably too easy :) [13:32:32] yeah, i'm wondering how to test [13:32:55] well, that's the hard part but how about merging and running the same analysis as dario did ? [13:33:03] that's horrible [13:33:03] k, that works for me [13:33:07] but it works [13:33:08] nah, whatever [13:33:12] it's 100% broken, can't get worse [13:33:34] and our dev environment needs to be tweaked a bit to handle these things. [13:33:48] yeah speaking about that [13:33:56] the bug is actually in the mw instlall [13:34:02] not in our sql script [13:34:43] New review: Milimetric; "merging this (will merge more changes if necessary) because it's 100% broken and the easiest way to ..." [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62974 [13:34:56] New review: Milimetric; "merging this (will merge more changes if necessary) because it's 100% broken and the easiest way to ..." [analytics/user-metrics] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/62974 [13:34:56] Change merged: Milimetric; [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62974 [13:51:05] milimetric, you know how to deploy the new code? [13:51:49] ohhhhhh haha, the result is being cached [13:52:13] milimetric ^^ [13:53:24] mmmmmmm how can you also retrigger the result without creating a new cohort? [13:58:53] i guess wipe out the cache [13:58:57] that's kind of the only way [13:59:08] a "rerun" button would be great :) [13:59:42] oh and you said how to deploy - you just wait 30 minutes i think [13:59:44] puppet does it [13:59:45] let's make a copy of the cache object [14:00:18] we should not wait for puppet [14:00:29] but we don't have sudo to git pull [14:00:46] or to run puppet manually [14:01:52] you can run git pull [14:01:56] no sudo required [14:02:00] code has been updated [14:02:14] so now you have to run [14:02:31] i tried it before... didn't let me [14:02:54] touch /a/e3/E3Analysis/user_metrics/api/api.wsgi [14:02:58] i just did it [14:03:02] mmmmm [14:03:11] are you part of the wikidev group? [14:03:21] oh, did you run git pull and it said "already up-to-date"? [14:03:36] or it actually fetched && merged? [14:04:08] oh, did you run git pull and it said "already up-to-date"? [14:04:08] yes [14:04:13] check [14:04:14] https://metrics.wikimedia.org/cohorts/e3_ob45/threshold?project=enwiki&t=24&n=1 [14:04:16] it's running [14:04:24] i renamed the cache object [14:04:37] right, no, I mean running git pull doesn't require sudo, and neither does fetch, but git merge does [14:04:43] 'cause you're overwriting files you don't have rights to [14:05:06] okay it failed [14:05:30] you mean has the same "no users" bug right? [14:05:49] i haven't deleted the cache file yet, still trying to find it [14:07:05] oh api_data.pkl, right [14:07:48] oh cool, it just let me move it [14:09:42] yeah, this should be without the datafile: https://metrics.wikimedia.org/cohorts/e3_ob45/threshold?project=enwiki&t=24&n=1&group=reg [14:09:54] oh how the heck did that group get in there, one sec [14:10:06] because of my 'fix' [14:10:14] but that didn't' solve the problem [14:10:24] so what is 'group' supposed to do / be [14:10:40] no idea, we gotta read the code more carefully [14:10:45] i suggest we revert the change [14:10:56] i am pretty sure that is broken regardless [14:11:31] because it was a nested dictionary [14:11:43] and there is no code anywhere that assumes a nested dictionary [14:11:45] i checked that [14:11:48] let's go to the batcave [14:11:51] is easier [14:12:06] i'm gonna jump in the shower quick, and i'll brt [14:12:09] 6 min. [14:13:28] ok [14:21:50] i'm there [14:24:11] k [14:26:55] you are frozen [14:27:13] milimetric ^^ [14:27:17] you have muted me [15:33:51] oh i disconnected [17:23:45] drdee: Need to step away for about 30m for some errands; we can discuss when I'm back. [17:24:11] The jar file is on analytics1010 at /tmp/kraken-pig-0.0.2-SNAPSHOT.jar [17:25:30] ok [17:34:37] New patchset: Milimetric; "fixed 'cohort users never getting returned'" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62991 [17:37:56] Change merged: Erosen; [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/62991 [17:48:55] drdee: I'm back [18:15:39] New patchset: Milimetric; "instructions for developing on mediawiki vagrant" [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/63000 [18:16:04] New review: Milimetric; "self merging - just doc change" [analytics/user-metrics] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/63000 [18:16:04] Change merged: Milimetric; [analytics/user-metrics] (master) - https://gerrit.wikimedia.org/r/63000 [18:20:47] drdee: i might have stopped puppet from deploying the fix [18:20:49] because i did git pull [18:20:54] which fetched, but couldn't merge [18:21:09] and I think puppet might think that it's at the latest since fetch doesn't get anything new [18:21:32] if you do git pull does it work? [18:21:34] brb [18:27:47] same problem [18:45:52] xyzram: it's probably useful to read http://pig.apache.org/docs/r0.10.0/basic.html before we start deploying the srcript [18:50:42] ok [19:02:33] drdee, erosen: https://plus.google.com/hangouts/_/1fdd663a84259feb76a1ef4c54511688dbb92cd1 [20:04:57] sorry about that - my internet went down and my phone worked for a while but now it crashes hard every time i connect through it [20:05:06] drdee, rounce123 ^ [20:06:04] No Problem we are finished you can review the spreadsheet for the last estimate [20:06:32] cool, will think of a number before i scroll down :) [20:07:11] k, i wrote 2 [20:07:45] the thinking is that it will be kind of a manual hacky "authorization role" type thing so it'll take a bit more testing [20:09:32] milimetric: is the patch for the group parameter merged in prod yet? [20:09:49] no because we don't have anyone who has rights to do "git pull" in prod [20:10:05] aargh, right [20:10:17] i fixed that [20:10:26] it should be deployed [20:10:39] I keep getting the same "No users to pass to process method." message [20:10:48] like 2 mins ago [20:10:49] but we also need to relaunch mod_wsig [20:10:51] i1 sec [20:10:55] k [20:11:11] try again [20:11:15] ok [20:12:09] sigh, request started, but disappeared from the queue [20:12:40] this is the URL if you want to replicate it https://metrics.wikimedia.org/cohorts/e3_ob45/threshold?project=enwiki&t=24&n=1&refresh [20:13:26] looking good AFAICT [20:13:27] it worked! [20:13:42] yup [20:13:43] so we still have these known issues of jobs disappearing from the queue [20:13:48] occasionally [20:13:58] but that's because we reload the app [20:14:04] and the jobs in queue are not permanent [20:14:06] hmm no [20:14:19] I mean after the restart and after touching mod_wsgi [20:14:24] ohhh sorry [20:14:25] job launched [20:14:26] misunderstood [20:14:28] job disappears [20:14:30] k [20:14:44] this is my hunch [20:14:52] I think ryan started troubleshooting this [20:15:04] job gets grabbed from queue for launch [20:15:09] so queue is empty [20:15:10] so it's worth checking with him as he may have a solution [20:15:42] nvm [20:15:52] rfaulckr ^^ [20:15:53] yup [20:16:29] k, I think I can get my data if this issue is only occasional [20:16:34] thanks folks [20:16:38] np dario [20:17:08] noted DarTar. Will have a look tonight if the issue isn't resolved by then. [20:17:10] i think regardless of v1, v2, bugs like this are gonna get addressed with first priority [20:17:24] thx thx [23:26:29] anyone having issues writing to stat1? [23:26:55] df shows root is only at 45% [23:27:47] nvm, appears to be specific to my screen session