[10:18:36] (CR) Qgil: "What is the status of this patch? Still aiming to be reviewed and eventually merged or does it have a problem?" [analytics/wikistats] - https://gerrit.wikimedia.org/r/102298 (owner: QChris) [12:18:33] awww, no qchris [13:56:13] good morning [13:56:32] looks like i've been uninvited from the stand-up again, could someone invite me or paste the hangout link? [13:56:42] please :) [14:13:51] * hashar sits down [14:14:33] (CR) Milimetric: [C: 2] Remove end of time span in new app-edits-start report [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/156204 (owner: BearND) [14:16:27] * qchris sits next to hashar, and hands him a drink. Cheers! [14:16:46] *cheers* [14:19:43] (CR) Milimetric: +app edit reports for previews, saveAttempts, successful edits (3 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/156205 (owner: BearND) [14:26:18] jgage: sorry I missed your message :( [14:26:42] no problemo, otto just re-invited me [14:26:51] jgage: just ping me next time, I'm usually fumbling with standup sheets and all these things so I'm not watching IRC [14:27:08] k :) [15:29:35] bwuh. Internet too unreliable for meeting :( [15:53:45] (PS2) BearND: Add more app edit reports [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/156205 [16:14:45] qchris_away: ping when back :) [16:57:31] qchris: \o/ [16:57:57] Thanks for the Hangout! :-D [17:00:21] milimetric, can I have a google hangout? ;p [17:00:39] Ironholds: sorry, I default to batcave: https://plus.google.com/hangouts/_/calendar/d2lraW1lZGlhLm9yZ19jYjM3bXU0OGNuaHRkN2hybmE4czI3b25hb0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t.c6j7qidqs491nhi7ovk9pi4h14 [17:00:43] ta! [17:00:45] http://goo.gl/1pm5JI for short [17:00:53] DarTar: http://goo.gl/1pm5JI [17:01:11] milimetric, not allowed to join [17:01:21] oops, same here [17:01:44] Ironholds / DarTar: I re-added you [17:02:26] yay [17:02:42] I'm just stalking [17:19:07] stalking is useful [17:19:31] Ironholds: I'm looking through puppet to see where the rsync happens [17:19:38] yuvipanda, cool :) [17:20:04] Ironholds: col, do write the hive query thing :) Apps team still has idea about pageviews at all [17:22:30] milimetric: Ironholds indeed, there's no rsync from stat1002, but is trivial to add [17:22:55] yuvipanda: make sure you don't get in trouble with that - it may be just as purposeful as there being no hive on stat1003 [17:23:11] yuvipanda, yay! [17:23:12] technically the data accessible on hadoop is still ops data, with completely raw access [17:23:21] milimetric: indeed, but it's a specific directory you'll have to explicitly put files into, so should be ok. [17:23:33] milimetric, I don't think so, I just think we didn't use stat2 for anything but EZ's stuff until I got you guys to stick a mysql and hive client on it [17:23:33] right, I tend to agree - just saying [17:24:18] Ironholds: I do run services on stat1002 :-) [17:25:43] qchris, okay, fair. I didn't know that :) [17:25:53] when ori set up the rsyncs like, last year, I remember s2 being kind of a dead zone [17:26:06] but analytics + research weren't tightly integrated, so. [17:27:36] (PS1) Yuvipanda: stats: Setup rsync for stat1002 as well [operations/puppet] - https://gerrit.wikimedia.org/r/156324 [17:27:40] Ironholds: milimetric ^ should do it [17:27:47] now to get ottomata to merge it when he has the time [17:28:15] cool! [17:30:19] urgh [17:30:23] segfault from too many permutations :/ [17:30:26] iiinteresting... [17:30:52] ottomata, ? [17:31:34] ottomata: I don't have a file {} to make sure /srv/public-datasets exists on stat1002 tho [17:31:39] ottomata: but I don't see one for stat1003 either [17:56:11] Analytics / General/Unknown: Constantly increasing number of fundraising processes on gadolinium - https://bugzilla.wikimedia.org/70053 (christian) NEW p:Unprio s:normal a:None Created attachment 16281 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16281&action=edit gadolinium_numbe... [18:18:54] Analytics / General/Unknown: Constantly increasing number of fundraising processes on gadolinium - https://bugzilla.wikimedia.org/70053#c1 (Toby Negrin) Adding Katie Horn Hi Christian -- it turns out jgreen is on vacation until the 2d. Do you think Gadlolinium will survive until then? Katie says it w... [18:31:09] Analytics / General/Unknown: Constantly increasing number of fundraising processes on gadolinium - https://bugzilla.wikimedia.org/70053#c2 (Andrew Otto) Ok, if Jeff is on vacation, I'll look into it... [18:34:58] Ironholds: uh, any chance you'll do the Hive query today? [18:46:06] Hallo. [18:46:30] The "new article" count at http://stats.wikimedia.org/EN/TablesWikipediaCA.htm - [18:46:53] more precisely, in the first table, Articles - New perday [18:47:31] Does it count articles that were created and deleted? [18:47:45] Or only those that were created and remained? [18:51:21] If I run a query similar to the one at https://www.mediawiki.org/wiki/Content_translation/analytics/queries , I get larger numbers. [18:51:35] Ironholds: do you happen to know? ^ [18:52:21] aharoni, I don't, I'm afraid :(. ErikZ maintains that stuff. [18:59:41] Analytics / General/Unknown: Constantly increasing number of fundraising processes on gadolinium - https://bugzilla.wikimedia.org/70053#c3 (Andrew Otto) I don't know what would cause this to start happening a couple of weeks ago. But, from what I can tell, these processes should not be running on gado... [19:01:59] (CR) AndyRussG: [C: 1] "Thanks, looks fine! :))" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/126927 (owner: Awight) [19:02:12] (PS1) Milimetric: Add generic timeseries visualizer [analytics/dashiki] - https://gerrit.wikimedia.org/r/156346 [19:16:04] (CR) Milimetric: [C: 2] Add more app edit reports [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/156205 (owner: BearND) [19:30:08] milimetric: I love your message to Rachel. :D [19:30:27] had to be said :) [19:30:35] I laughed real hard. [19:30:51] and its tnegrin's fault really [19:30:52] hahaha, good, means it's true [19:31:03] we should just go ahead and schedule that meet-up, or it will not happen [19:31:07] tnegrin confessed to being bad at this [19:31:21] we need more than confession. ;-) [19:59:32] qchris: yt? [20:00:08] ottomata: yes. [20:00:12] What's up [20:00:58] i'm comparing some collector output :) [20:01:08] uhhh, what are the project count numberfs? [20:01:09] En - 24 4735779 [20:01:09] ? [20:01:27] i assume 4735779 is views [20:01:28] what's 24? [20:01:34] Nope. [20:01:38] 24 is views. [20:01:43] 4735779 is bytes. [20:01:46] Don't be scared. [20:01:53] The project is cose sensitive. Yes :-( [20:02:04] So don't go for En, but look for en. [20:03:12] hmm, ha, filter or collector should probabably downcase themm...ohhh well [20:03:26] Yes, it should :-) But it doesn't. [20:03:38] One of the idiosyncrasies of webstatscollector :-D [20:03:52] http://dumps.wikimedia.org/other/pagecounts-raw/ [20:04:00] ^ has some explanations of the format. [20:04:01] hmm, off by 1M views for 20140826-190000 [20:04:20] (/home/otto/gadolinium-webstats/projectcounts-20140826-190000) [20:04:23] for en [20:04:48] kafkatee's files having 1M more or 1M less pageviews? [20:05:04] 1M fewer views in kafkatee's fille [20:05:28] diff /home/otto/gadolinium-webstats/projectcounts-20140826-190000 /run/shm/qchris/webstatscollector/dumps/projectcounts-20140826-190000 [20:05:33] Not good :-) [20:05:43] nope [20:06:35] tbh. I have no clue right now. [20:07:18] ja, me neither, i mean, [20:07:20] no drops [20:07:35] so that means that filter or collector are not counting things properly [20:07:39] collector should have the same logic [20:07:40] We have errors like ".temp.db: unable to flush page: 0" for the first 20 pages. [20:07:45] oh [20:07:53] ? [20:07:54] But that should be the same for the udp2log pipeline. [20:08:41] where/what is tha terror? [20:09:11] See /home/qchris/collector-restarter.sh.log on analytics1003 [20:09:37] Those errors get reported, when the hourly file gets written. [20:10:01] But since that's nothing we changed, we should have the same errors on the udp2log pipeline. [20:10:19] However, it's the only things that looks suspicious for me for now. [20:19:18] qchris: does collector log the starting message every hour? or was it actually getting restarted [20:20:32] I kill it every 7 minutes after the full hour. [20:20:43] That explains a lot :-) [20:20:58] I thought to have turned the killer off. [20:21:56] Now it's off. [20:22:17] So if you want to check files, wait for the next full fille. [20:22:29] So the one /after/ the 230000 file. [20:22:34] haha, ok good, glad we foound an explanation! [20:22:34] :) [20:23:58] Oh wait. The 220000 file should be good already. [20:27:52] ah because it wasn't killed? [20:28:25] Because I looked at the wrong clock :-) [20:28:45] The current run started on 20:07, so the 210000 file does not cover a full hour. [20:28:58] But the 220000 should cover a full hour. [20:29:49] ok [20:30:20] But if there are again mismatches, I would not worry to much. [20:30:41] The collector is currently running more to see whether or not it really can take the load reliably. [20:36:12] well, yes, and we want to the mismatches to be very small! [20:36:18] in order to use it at all [20:38:37] That need not necessarily be the case. [20:39:14] (PS1) Milimetric: Add wikimetrics data adapter [analytics/dashiki] - https://gerrit.wikimedia.org/r/156453 [20:39:16] We're having the same problems udp drop issues that we had on analytics1003 also on gadolinium. [20:39:38] So it might be, that numbers do differ quite a bit. [20:39:50] But I haven't done the math on that really. [20:46:08] DarTar, ping [20:48:52] ottomata: Should hive still be working on stat1002? [20:49:10] Starting hive fails for me with [20:49:19] log4j:ERROR setFile(null,true) call failed. [20:49:21] [...] [20:49:30] No default-logstash-fields.properties resource present, using defaults Hadoop 2.3.0-cdh5.0.2 Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 8e266e052e423af592871e2dfe09d54c03f6a0e8 Compiled by jenkins on 2014-06-09T16:20Z Compiled with protoc 2.5.0 From source with checksum 75596fe27f833e512f27fbdaaa7b0ab This command was run using /usr/lib/hadoop/hadoop-common-2.3.0-cdh5.0.2.jar [20:49:35] stat1002, yes, but inteerresting [20:49:48] wha [20:51:55] did you people break my toys again? [20:52:07] haha, i think jgage and I may have... [20:53:02] dammit guys [20:53:18] you know full well that every bit of research we have is dependent on hive right now. [20:53:35] Erik is going to show up at Toby's desk and make him cry. That's just something that's going to happen. And it'll be your fault. [20:55:08] ha [21:18:28] qchris_away: Ironholds, should be ok now, jgage is looking into what caused that [21:18:32] but we reverted it on stat1002 for now [21:18:39] cool! ta :) [22:09:06] ahoy [22:09:14] tobie: in? [22:09:28] anyone with authority? [22:09:34] there exists a mail list [22:09:34] Analytics Team - Continuous Improvement analytics-team-ci@wikimedia.org [22:09:38] with only one member [22:09:45] Kraig Parkinson kparkinson@wikimedia.org [22:09:48] and he's no longer here [22:09:53] can I clean up (remove) this group? [22:12:01] Ironholds: awake? [22:12:55] cajoel, yes, and yes, obliterate it [22:14:16] you're going to have to do your improvement in a stepwise fashion from now on... [22:14:21] pew pew --- gone [22:14:56] huh? [22:14:59] oh, gotcha [22:47:44] (PS1) Yuvipanda: Added CSV and TSV output format generation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156469 [22:47:54] (CR) jenkins-bot: [V: -1] Added CSV and TSV output format generation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156469 (owner: Yuvipanda) [22:49:37] (PS2) Yuvipanda: Added CSV and TSV output format generation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156469 [22:49:41] (CR) jenkins-bot: [V: -1] Added CSV and TSV output format generation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156469 (owner: Yuvipanda) [22:50:38] (PS3) Yuvipanda: Added CSV and TSV output format generation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156469 [22:51:08] (CR) Yuvipanda: [C: 2] Added CSV and TSV output format generation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156469 (owner: Yuvipanda) [22:51:13] (Merged) jenkins-bot: Added CSV and TSV output format generation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156469 (owner: Yuvipanda) [23:30:30] (PS1) Yuvipanda: Add download buttons for CSV, TSV and JSON downloads [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156476 [23:30:34] (CR) jenkins-bot: [V: -1] Add download buttons for CSV, TSV and JSON downloads [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156476 (owner: Yuvipanda) [23:32:38] (PS2) Yuvipanda: Add download buttons for CSV, TSV and JSON downloads [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156476 [23:32:57] (CR) Yuvipanda: [C: 2] Add download buttons for CSV, TSV and JSON downloads [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156476 (owner: Yuvipanda) [23:33:14] (Merged) jenkins-bot: Add download buttons for CSV, TSV and JSON downloads [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156476 (owner: Yuvipanda) [23:38:38] (PS1) Yuvipanda: Don't put revid in downloaded file name [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156478 [23:54:51] (CR) Yuvipanda: [C: 2] Don't put revid in downloaded file name [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156478 (owner: Yuvipanda) [23:54:57] (Merged) jenkins-bot: Don't put revid in downloaded file name [analytics/quarry/web] - https://gerrit.wikimedia.org/r/156478 (owner: Yuvipanda)