[09:32:25] nuria_: Thank you :). [13:22:20] 10Analytics-Tech-community-metrics: Usable links for specific users or repositories - https://phabricator.wikimedia.org/T164934#3261031 (10Nemo_bis) 05Invalid>03Open The share button produces an equally unusable URL: 10Analytics-Tech-community-metrics: Usable links for specific users or repositories - https://phabricator.wikimedia.org/T164934#3261035 (10Aklapper) 05Open>03Invalid The first icon (two arrows) under "Share a link" is called "Generate Short URL". [16:18:40] neilpquinn: sorry should've realized earlier, the labs snapshot doesn't have a lot of wikis because wmcs and dbas haven't loaded them yet. So you can use the production snapshot with older data or we can take a new production snapshot, let us know, cc nuria [17:30:10] milimetric, neilpquinn i think an older snapshot should do as we are taking about stats that are not likely to change in the near term [22:18:28] milimetric, nuria_: well, in this case, I was looking to create lists of users who frequently take actions that suggest interactions with new users (e.g. writing on a user's talk page for the first time, reverting new users) so we could invite them to take part in our study. So having recent data is quite important. However, I can probably get 70% of the same thing using MariaDB, so no urgent action needed [22:18:28] from y'all :) [22:22:38] neilpquinn: sorry, i though we were talking about autoconfirmed users [22:23:33] oh, no worries. enwiki data is all there so the autoconfirmed question isn't affected :) [22:25:06] But as a general note, one of the barriers to using the Data Lake is that simple queries to get the shape of the data (e.g. how is `revision_parent_id` coded for revisions that create a page? do page creation events have the revision fields populated, do they have event comments? what's the most recent data in the snapshot? which wikis have data?) seem to take longer than they do on MariaDB. [22:25:28] So good documentation is even more useful here because the feedback cycle is longer :) [22:30:43] neilpquinn:thet is good feebcak, the snapshots are going to be renamed by date, so that one should be taken care of. we just updated docs but no docs are perfect so we will just aupdate as people ask questions [22:31:01] neilpquinn: the "take longer" part i do not think i got [22:31:10] neilpquinn: can you elaborate? [22:32:45] nuria_: makes sense, I'll certainly do my part to update the documentation too. [22:33:01] neilpquinn: thank you [22:33:46] nuria_: by takes longer, I mean my impression from working with the data is that, say, a limit 1 query just to get me an example of a specific type of record might take a minute whereas it would take a couple seconds on MariaDB. [22:34:24] neilpquinn: ah i see, but once you start requesting real data is a lot faster, right [22:34:25] That's probably a necessary part of the architecture that allows it to do complicated queries much faster, but it does make the feedback cycle longer when you're writing a query :) [22:34:42] neilpquinn: right, it is, as there is a fix cost. [22:36:34] yeah, exactly. There's probably not a way around that, so it's just a general piece of feedback to help you understand the user's point of view :) [22:38:00] neilpquinn: ya, that one i knew cause i run into that a lot when i analyze pageviews which i do all the time [22:38:19] neilpquinn: i totally get it [22:39:48] nuria_: ah ha, cool, I was thinking that you knew that data structure so well that you might not encounter that problem much :) [22:40:05] neilpquinn: i wouldn't if had any memory left [22:40:20] ...sadness [22:44:00] nuria_: yeah :( the human condition...