[03:07:57] halfak_ is still away? aw [15:00:37] o/ Ironholds [15:01:03] I'm officially ops now [15:01:06] I also have a beard [15:01:07] hi halfak_ [15:01:15] lol [15:01:17] Hi YuviPanda [15:02:09] :) [15:02:46] * halfak_ runs off to grab breakfast/shower. [15:28:42] * halfak is back [15:29:50] yay! [15:29:53] I have news for you! [15:30:06] So, I took a look at the datasets, and applied all sorts of interesting experimental filters. [15:30:38] It looks like some search actions were making their way into the pageview logs, so I can solve for that + filter out spiders, which /hopefully/ will make a difference. [15:31:03] What datasets didn't show the expected pattern? [15:31:21] * halfak pulls up notes. [15:32:28] (I'll grab a sample over the smallest one, see) [15:32:51] https://docs.google.com/document/d/1YQhtDH-rTkF-7ju2oEj4--IR3y1x-Ma_Yn_dj_gum0U/edit [15:33:01] Edits looks weird. [15:33:25] Searches looks plausible, but there are weird spikes at < 5 sec. [15:33:49] Desktop and Mobile pageview look weird. [15:34:00] Search suggestions look as expected. [15:35:55] okay, I'll start with mobile then, and dig into search [15:36:02] edits I can rerun trivially and exclude bots, which may help [15:42:36] Hey DarTar, can you ban Aileen from Wiki-Research-L? [15:44:06] or just moderate her until the 11th. [15:44:09] This is getting kinda silly. [15:48:01] say, halfak, can I ask a stats question? [15:48:20] I'll do my best, [15:48:21] Suppose you had a dataset of intertime values and you wanted to generate an appropriate average. Would you use a geometric mean, a straight mean...? [15:48:30] I know that you <3 geometric means ;p [15:49:36] hey Ironholds, sorry busy getting the girls ready, can you remind me when I get in the office or drop me a line [15:49:57] DarTar, totally! [15:50:18] Ironholds, geometric means are highly desirable for measuring log-normally distributed data. [15:50:35] However, it's not a guarantee that inter-times will be distributed log-normally. [15:51:30] If one were to observe the log-normal nature of within-session activity, then I would expect them to use either a median/quantile approach or a geometric mean. [15:51:38] * Ironholds beardstrokes [15:52:00] are they that inconsistent? wompwomp. [15:52:23] I guess I could automatically attempt to fit to a log-normal model and check the consistency, but that seems arbitrary to set. [15:52:27] you've a beard now, Ironholds? [15:53:24] no, although I should [15:55:12] halfak, okay, I've regenerated the edit dataset, if you want to test that? [15:55:39] I'm running one of my own right now. [15:55:55] If you would like to independently generate a sample of intertimes, that would be helpful. [16:01:28] halfak, an edit set of a pageview set? [16:01:30] *or a [16:01:45] Both or either. [16:02:04] alrighty, let's look at edits... [16:09:37] halfak, edit intertimes at /home/ironholds/sess/halfak_intertimes.RData [16:09:47] it excludes bots this time (albeit only by checking user_groups; nothing fancy) [16:16:50] Great. Thanks Ironholds [16:17:31] halfak, re in-session activity, the problem is that's circular: I need to determine an appropriate average intertime so that I know how to treat sessions. [16:18:37] Sort of circular. I think the observation of the mixture of normals makes this less arbitrary. [16:19:22] If you are looking for the average within-session intertime, then you'll need to split activities into sessions. [16:19:36] mmn, point [16:20:20] I mean, I can say with certainty that a straight mean is a bad way of doing it for intertimes overall, because that value is ludicrously high thanks to breaks /between/ sessions. [16:22:43] morning leila [16:22:53] morning Ironholds. how's it going? [16:24:19] same old, same old. You? [16:25:18] Ironholds, I just checked my data pull for edits against your last one and I am seeing the same issue. [16:25:26] huh! [16:25:28] One problem: I'm ignoring the archive table. [16:25:32] aha [16:25:37] In my last analysis I didn't. [16:25:42] but that really shouldn't matter. [16:25:44] Ironholds, doing okay. Just got to the office. lining up the work for the day [16:25:56] cool [16:26:17] halfak, well, lets pull it in and see! [16:28:33] Oh! And I ignored anonymous editors. [16:29:03] this time or previously? [16:29:10] This time. [16:29:13] aha [16:33:01] Ironholds, did you include archive in your query? [16:33:23] nope! [16:33:28] Gotcha. [16:42:19] halfak, what do app pageviews look like, out of interest? [16:43:25] Ironholds, wasn't in the dataset. [16:43:42] they weren't? Huh. I thought I'd included those. [16:44:01] nope, you're right [16:44:07] generated them, then forgot to save them. Will do that. [16:53:51] Ironholds, so, my old script didn't sample based on editor first. I'm wondering if that is having an effect. [16:54:05] Otherwise, I can't find a difference. [16:54:17] hmn. what happens if you run, line-for-line, the old script over new data? [16:55:24] The old script is the data gathering one. [16:55:33] So that wouldn't really make sense. [16:57:09] I don't know what to think here. :\ Even if user sampling is having an effect, we shouldn't see it where we do. [16:59:05] hrm [16:59:18] oh: by "new data" I meant "the tables as they exist now", sorry. [16:59:26] Gotcha. Trying that. [16:59:45] Which will take a good long time and let me get back to regular work :) [17:00:00] halfak, Ironholds, I started adding [Q2] to the beginning of the title of all cards in trello that I should work on as part of Q2 goals. [17:00:24] I don't see DarTar getting the chance to add the new board, and I need a way to track cards. [17:26:18] leila, cool! [17:47:17] halfak, app PVs now in ./sess/Output as app_events.tsv [17:47:27] the query to regenerate mobile has completed and I'm just generating the hashes [17:47:31] Thanks dude. Will look at them tonight. [17:51:51] * Ironholds headscratches [17:51:57] this dataset should not be generating the results it is. [17:57:40] halfak, say, do you want to pull dartar into our weekly meeting if he's available to talk datasets, or do you have other topics you'd like to go through? [17:58:13] I'm fine with pulling DarTar in. [17:58:19] Assuming he has time. [17:58:38] * Ironholds nods [17:58:43] leila, if he's around can you physically prod him? [18:00:51] Ironholds, his calender shows busy [18:01:15] I can poke him but should I? he may be focusing on something [18:03:55] point! nm [18:40:42] hey leila: what’s the URL for Rachel’s consultation on Meta? [18:41:57] I know the one on office DarTar: https://office.wikimedia.org/wiki/Community_Engagement_%28Product%29/User_Surveys [18:42:19] oh I thought there was a public-facing one [18:43:01] leila: if you find one, can you add it here? (L7) http://etherpad.wikimedia.org/p/RD201409 [18:43:34] sure. eod. [18:43:37] by eod [18:45:13] thx [19:03:15] leila, Ironholds: are we meeting? checking if I have a room attached to the invite [19:03:30] We have the Collab Standup place [19:03:41] I'm available if we want to meet. [19:03:42] that sucks for Oliver [19:03:52] why, there's screen there [19:04:07] oh right [19:04:25] let’s see if there’sa room first in case we need to scribble something [19:04:33] like the quiet room? [19:04:58] DarTar, yep [19:06:53] Ironholds: in the hangout [19:07:01] DarTar, cool; sorry, still in meeting :( [19:07:11] ok cool [19:08:41] DarTar, my bad. [19:08:53] I was holding Ironholds up. [19:48:39] halfak, do you know if there are user and user_property tables that are global, in the sense that they have data from all projects? [19:48:54] I see the two tables in enwiki. [19:48:59] No. [19:49:08] There are not global user properties. [19:49:17] I see. thanks! [19:49:24] I do have a global use table that I last updated a couple months ago. [19:49:41] see analytics-store:staging.local_user_info [19:50:00] great! thanks! [19:51:04] Ironholds: sorry, battery [19:51:15] so: [19:51:35] something as simple as a PNG map would do the job [19:51:51] DarTar, yeah, that's doable. [19:51:56] and let’s discuss separately the static site idea [19:51:59] yep [19:52:08] I’ll loop you in the thread with Comm + James H [19:52:14] the PNG map is.../fairly/ easy, I guess? But it's not a zero-time commitment [19:52:33] I'd say, if I've got the session stuff up and running by Wednesday morning, I'll work on it before I head off to paper-writing land. [19:52:58] sounds good, let me check the timeline with Comm (if they need it sooner we may just end up scrapping this) [20:08:10] yay, bug fixed! \o/ [21:23:43] DarTar, we has session data [21:23:50] w00t [21:23:59] (pending people looking at Trello anyhoo) [21:24:10] halfak, we has spider-sanitised mobile contributions [21:24:14] *mobile pageviews [21:24:19] Woot [21:24:52] so now I'm going to work on the Ebola problem for the rest of the afternoon! [21:26:06] Don't die. [21:26:20] I think not-dying is not the ebola problem [21:26:22] it's the ebola solution! [21:26:37] leila, on a similar note, if you've got any interesting R you want poked and prodded, I have the rest of the afternoon freeeeeeee [21:29:26] you are one free bird Ironholds. enjoy it. ;p [21:29:31] okay [21:30:14] DarTar, no outside facing documentation for Decision about Decision. Rachel says that she wants to work on one. [21:33:43] leila: got it [21:33:50] * halfak copies a full database dump to HDFS for testing. [21:34:23] erik just told me about the status of that proposal with respect to the broader strategy consultation [21:35:05] leila: sad that the bigger consultation is not going to benefit from the design lessons from the Product consultation [22:22:39] DarTar, so, "secondary screen". What...does that mean? [22:32:14] Ironholds: context? [22:32:53] DarTar, I got asked to do a secondary screen of someone [23:29:31] you connect them to your laptop via HDMI [23:29:36] look for the port behind the neck [23:31:03] Ironholds, can you run select distinct(up_value) from enwiki.user_properties where up_property like 'date' limit 20; and see if the result makes sense? [23:31:03] I thought up_value in this case should be one of the following: mdy, dmy, ymd, ISO_8601, or default. [23:31:03] the output looks very different to me. [23:31:35] sure! [23:33:01] leila, what do you use to connect to SQL? [23:33:09] ...for some reason my muscle memory is telling me the wrong thing [23:35:58] huh; got it [23:36:29] leila, investigating! [23:37:21] leila, okay, so, I see oddities too [23:37:24] I imagine they are legacy lines [23:37:33] so, 0 is (for skins, at least) how MediaWiki says "default" [23:37:46] 2 was...damn. Monobook? Possibly? [23:37:55] Chad (^d on IRC) will know more. [23:41:47] leila, re the generally weird fragments; whatever people put in there, stays, even if it's not valid :/ [23:46:09] Ironholds: 2 is cologneblue [23:46:23] aha [23:46:25] ta, legoktm [23:46:57] https://github.com/wikimedia/mediawiki/blob/master/includes/skins/Skin.php#L125 [23:47:27] if the user goes to Special:Preferences and hits "save" without changing anything, those values will get updated [23:48:08] people could even have stuff like "nostalgia" or "myskin" in their prefs if they haven't updated them since the skins were removed [23:56:11] Ironholds, thanks for checking. sorry, I was pulled into another conversation. [23:56:27] np! [23:56:29] but things like ČSN basic td, hh:mm d. month y. , fi normal, persian, etc. are too weird. [23:56:36] how can they get there? [23:56:46] people typing random junk? let's test [23:56:56] hmn. nope. [23:56:58] * Ironholds headscratches [23:57:02] I wonder if it's some kind of field overflor? [23:57:12] they can't type, at least nowadays [23:57:17] I remember a bug a while back where user_values would get transmuted into user_properties. Something like that? [23:57:26] or maybe historically they could type [23:57:46] ha! do you think it's appropriate to send an email to internal about it?