[15:31:04] Hi halfak: Do you know when you will be running the entity quality over time versus entity value analysis? Thanks! [15:31:34] So my plan was to generate monthly item quality predictions for all of Wikidata -- not to really perform the analysis. [15:34:37] Okay, that would be useful to me whenever you get a chance to generate that [15:42:16] * halfak looks into starting the job [15:43:57] Thanks in advance! [15:53:10] hall1467, running [15:54:29] Thanks :) [16:42:30] hey halfak. question: for looking at the talk page revisions, should we look at https://meta.wikimedia.org/wiki/Data_dumps/Dump_format ? [17:47:45] 10Quarry, 10Cloud-Services: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3421858 (10Milimetric) @Halfak is it a deal-breaker if we couldn't migrate the history of Quarry to Redash? I'm wondering if you care as much about the history as the features themsel... [18:13:14] Hey leila! [18:13:47] So yes, I'd process those dump files to get at talk page revisions if you are looking to process a lot of content [18:14:13] the mwxml python library is designed to make that work easy on a stat1002/3 class machine [18:14:26] There might be some good options for doing some spark stuff too [18:14:34] I know that joal was working on that, but he's AFK now [18:16:06] great. thanks, halfak. :) [19:16:32] halfak I started a write up of my New Page Review deletion rate analysis. Thanks for prompting me: https://meta.wikimedia.org/wiki/Research:New_page_reviewer_impact_analysis/Incorrect_deletion_decisions [19:17:15] Awesome :) [19:26:25] * Nettrom decided to join the ACTRIAL bandwagon [19:31:48] I rewrote my projects manifest last night (a list of everything I work on, paid or otherwise) and I have too much and am trying to offload things. Other than that I am enthusiastic about this kind of proactive collaboration with the community and think there should be more of it [20:59:29] halfak: should I boldly edit [[:meta:Research:Autoconfirmed trial experiment]]? [20:59:43] yes please :) [21:04:25] {{done}} [21:35:25] halfak: If I want to get the number of revisions for all Wikidata entity pages, would you recommend that I do so by processing the stub-meta-history dump files for wikidatawiki? [21:35:48] Would process with mwxml [21:35:56] yeah. That's what we discussed last time you tried (and failed) to query the db table. [21:36:16] I think it'll be really fast and with a little pre-processing you can identify bot editors and stuff too. [21:39:29] Okay, I wasn't sure if you recommended processing dump files in order to both get the current page revision id and to get the number of revisions for a page [21:39:48] Couldn't think of any other way, so wanted to check with you :) [22:09:19] :)