[01:41:09] I'm getting an error when I try to connect to the EventLogging database? [01:41:20] MySQL said: Access denied for user 'research'@'208.80.154.149' (using password: YES) [01:41:30] Anyone know what's up with this? [01:42:15] Password changed recently. [01:42:31] I see. [01:42:35] How do I get the new one? [01:43:32] I'll help. [02:06:09] Next question... [02:06:25] There are a few tables in the ELDB that only hold one or two records that are safe to be deleted. [02:07:03] I'm unsure if I can do that myself. [02:07:15] Is there some list I should email? [02:08:58] I can list every table that I know for sure is safe to delete. [02:23:01] I just emailed analytics@lists.wikimedia.org [02:29:56] Perfect. Thank you [02:29:58] Deskana, ^ [14:34:44] hey halfak :) [14:34:57] Hey dude. [14:35:27] I wrote a little bit about activity theory last night. [14:35:34] yay! [14:35:39] https://docs.google.com/document/d/12vY6jmhaSM-z1eae4yg_CbdO1ZCZCeykAyCPqsPWTy4/edit [14:35:39] where does it live? [14:35:42] snap ;p [14:35:55] Googling "activity theory" will piss you off. [14:36:17] man, everything we've googled has pissed at least one of us off. [14:36:21] I've seen some shit, writing this paper. [14:36:29] :) It's one of those frameworks for thought that embraces the nuance so strongly that it is difficult to talk about. [14:36:30] I've seen Teevan do poorly thought-out research [14:36:41] I've seen people hit "publish" on a paper with "Sessionization" in the title. [14:36:48] All these memories and more will be lost, like tears in rain. [14:37:06] heh [15:08:28] yay! [15:08:50] tnegrin, re the arithmetic mean: I just found an industry work that indicates using the arithmetic mean to calculate segmentation for pageview times produces the wrong result. [15:09:00] literature review, god bless. [15:09:27] why? because they aren’t normally distributed? [15:09:41] I’m still banking on the pictures [15:10:04] the exact phrase used is "very skewed" [15:10:10] yep, I'll make pictures [15:10:21] use bright colors :) [15:10:24] they actually recommend using Huber's M-estimator [15:10:33] but I feel like explaining that is an additional 10 levels of complexity [15:11:05] at least 10, possibly more [15:11:30] have you considered the exponential mean? [15:14:31] nope, simply because I don't know enough about it. [15:15:14] good - cuz I made it up [15:15:19] hah [15:15:29] I assume it's just "the mean applied to an exponential distribution", which *blink* [15:15:51] I’m not sure what the point of that was except that without grounding the subject, it _all_ sounds made up [15:16:30] totally. I still think the harmonic mean is 12. [15:17:11] because https://en.wikipedia.org/wiki/Guitar_harmonics [15:33:18] * halfak can't believe that we are still struggling with this one [15:33:59] When error is exponentially distributed, use a statistic that's designed for making sense of exponential distributions. [15:34:19] Why does no one bat an eye when we use a CHI^2 test rather than the more common t.test? [16:35:20] halfak, because they don't know how t-tests work either and so categorise the entire thing as "voodoo" [16:35:28] but they do know how the arithmetic mean works and so have an opinion [16:48:17] Bike shed? [17:04:51] halfak, huh? [17:05:08] You're familiar with the term, right? [17:06:38] totally! [17:06:51] I just couldn't map it to the existing conversation. [17:16:27] arithmetic mean is (assumed to be) easy to understand so everyone has an opinion. [17:16:45] Statistical tests, on the other hand, are voodoo. [17:33:39] agreed! [17:34:07] I think it's all equally hard. But I think people are more likely to "know" the arithmetic mean and so have opinions. [17:59:01] I'd like them to describe what arithmetic mean actually "is". [18:00:13] The point that minimizes the squared error of the distance from it to observations. [18:00:33] * halfak isn't 100% sure of that [18:00:49] Oh wait. yes I am [18:07:10] haha [18:49:36] halfak, yt? [18:49:46] (sort of) :P [18:49:51] :P [18:50:29] halfak: do you know who is 'madman' who could be running 'cvresearch-bots'? [18:50:36] in that case: I'm looking for the actual text users leave on each others' talk pages. I thought this should be in revision and/or text tables [18:51:20] do you have a pointer to the table where I should be looking into, halfak? [18:51:36] lzia: so those are in 'External Storage', I don't know if they're replicated to analytics-store. [18:51:50] YuviPanda, I actually want it on labs [18:52:00] lzia: aaah, so that I know. it's on in the database. [18:52:02] lzia: you need to hit the API [18:52:18] *not [18:52:21] *not in the database [18:52:28] ah! [18:52:40] do you have a sample query? [18:52:41] lzia, you want to process XML dumps [18:53:01] yeah, or ^ if you're doing more than a small handful [18:53:32] it's more than a small handful, YuviPanda. looking at all talk page edits/messages in 2013 in enwiki [18:53:48] Then I'd go for XML dumps. [18:53:52] halfak, have a pointer handy? [18:53:53] I have python that will make it easy. [18:53:56] https://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump [18:54:14] From my docs, "Streaming XML parsing is gross." [18:54:23] So I made a utility to make it not gross. [18:54:24] :) [18:54:31] ah! cool! thanks! [18:55:03] See also David's talk page parser. [18:55:16] https://github.com/sdivad/WikiTalkParser [18:56:11] thanks, halfak. David's uses API. [18:56:39] * halfak looks at API [18:56:41] ewww. [18:56:42] Hmm. [18:56:49] That should have been factored out. [18:56:52] hmm, can be adapted to use halfak's library in some form, I guess? [18:56:55] Looks like you could do that yourself though. [18:57:09] yeah, I'll start from halfak's library. thanks. [18:57:20] and for the dumps themselves, where should I be looking? [18:57:30] They are on an NFS mount on stat3 [18:57:46] are there in a place I can point people to? like on labs? [18:57:58] See /mnt/data/xmldatadumps/public/enwiki/ [18:58:04] Oh.. Not sure about labs. [18:58:24] You can download from the website too. [18:58:31] halfak: it's on labs [18:58:37] let me find path [18:58:48] http://dumps.wikimedia.org/enwiki/ [18:59:10] halfak: lzia on labs, /public/dumps [18:59:12] thanks, guys! [18:59:16] thanks! [19:00:59] * halfak --> paper lansd [19:42:30] DarTar, it took an entire 3 hours of the SF office being awake, before we got the first "can someone write down the password" email ;p [20:02:20] Ironholds: no, I got the first request yesterday from Maryana :) [20:02:30] haha [20:02:51] are we meeting somewhere to talk cubes? [20:03:55] Ironholds: tnegrin arriving [20:04:01] kk