[16:01:47] good morning halfak lzia guillom Ironholds! :D [16:01:54] hello hello [16:01:54] morning bearloga_ ! [16:02:00] can you answer my dwell-time question now? :P [16:02:09] good morning bearloga_ :D [16:02:12] wasup? [16:02:22] Ironholds: hiii [16:02:25] question for you Ironholds [16:02:26] :D [16:02:29] lzia, sure! [16:02:51] Ironholds: sure! [16:02:54] your favorite topic: did you compare sampled log pageview counts with actual counts based on the logs? [16:02:54] bearloga, danke! [16:03:11] lzia, when? [16:03:23] I'm trying to understand how far off sampled log trends can be from the actual log counts [16:03:33] at any point Ironholds [16:03:44] lzia, oh, yes. I've got a writeup, hangon [16:03:51] oooo, nice, Ironholds! [16:04:09] lzia, http://ironholds.org/blog/content/images/2015/08/sample_testing.png [16:04:23] I would do a full report on this but that's Not Mah Job :D [16:04:33] but basically you're looking at teeny tiny amounts of variation [16:04:40] like, most of the time it's hard to tell those are different lines [16:05:13] it gets more problematic if you compare to the historical sampled data in the Pentaho cubes because then you run into temporal issues - one implementation getting patched before or after the other to deal with $weird_edgecase [16:06:10] Ironholds: this is helpful [16:06:18] awesome [16:06:56] one question: sampled log on this plot is sampled log count times 1000? [16:07:58] I hope so otherwise our data is truly screwed! [16:08:02] (yeah, it is) [16:08:27] okay, great. [16:08:31] thanks, Ironholds. [16:10:20] no problem! [16:36:47] Is the watchlist table available on the toollabs db slaves? [16:37:24] or is it called something else? [16:37:26] I do hope not [16:37:38] if it is it'd be, I guess, t_watchlist or watchlist_t but I really do hope that data isn't there. [16:39:27] I can't find any of them, so they arn't publicly available? [16:40:31] Jeph, apparently not [16:44:37] On a diff note, select page_title,page_id, YEAR(rev_timestamp),MONTH(rev_timestamp), count(*) from page join revision_userindex on page.page_id = revision_userindex.rev_page where page_namespace=0 and page_is_redirect=0 and rev_user !=0 and lower(CONVERT(rev_user_text USING latin1)) not like '%bot%' group by YEAR(rev_timestamp), MONTH(rev_timestamp), rev_page having count(*) > 4; [16:44:53] is a query I've never been able to run for 'en' [16:45:06] it always timesout [16:46:02] I set both : interactive_timeout=288000 & wait_timeout=288000 [16:46:35] any help will be much appreciated :-) [16:49:51] Yeah. No indexes helping you there. [16:50:06] It's better to find bots with the user_groups table [16:59:13] Regretfully, I don't have time to hack on this right now. [16:59:37] query optimization is one of my fav. things to do in this channel, but I've got some paper deadlines coming up on top of my regular work :/ [17:02:34] so I should get rid of the string check and check the user table instead? [17:03:09] I'll try that, any other quick observation for me? [17:03:24] without taking too much of your time :-) [17:45:11] Jeph, sorry. I'll try to have a look later. [17:53:33] Sure, thanks :-) [20:07:28] wah, look at this madhuvishy [20:07:35] getting puppet patches merged left and right :P [20:07:39] he he [20:07:46] this was analytics