[01:59:13] <accraze>	 ok done for the day, see y'all tomorrow
[10:16:11] <wikibugs>	 10ORES, 10Scoring-platform-team, 10Growth-Team, 10MediaWiki-Recent-changes, and 2 others: SpecialRecentChanges::doMainQuery needs tunning - https://phabricator.wikimedia.org/T244569 (10matej_suchanek)
[14:32:59] <haksoat>	 Hello halfak_
[14:51:02] <halfak_>	 Hey haksoat!
[14:51:19] <haksoat>	 I made a PR for the image issue, but then I remembered infobox is yet to be worked on
[14:51:38] <haksoat>	 From my checks, the images in an infobox do not have specific structure
[14:52:03] <haksoat>	 some have image_file, map, image, etc
[14:52:34] <halfak>	 haksoat, we could probably capture the most common template parameter names. 
[14:52:53] <halfak>	 image, file, photo, map, image_file, etc. 
[14:53:17] <haksoat>	 Okay. Great.
[14:53:40] <haksoat>	 When chanced, could you help take a look at the PR in its current state?
[15:02:42] <halfak>	 Sure!  Got a link handy? 
[15:02:47] <halfak>	 haksoat, ^ 
[15:03:29] <haksoat>	 Yeah
[15:03:31] <haksoat>	 https://github.com/wikimedia/articlequality/pull/102
[16:12:31] <haksoat>	 Feedback seen halfak
[16:12:47] <halfak>	 Solid.  Sorry I forgot to ping.  I'm in a meeting block :|  
[16:12:48] <haksoat>	  The gallery_images you talked about will be for both the tag and template gallery types right? As we can have and {{gallery...
[16:12:53] <haksoat>	 Okay
[16:51:05] <halfak>	 haksoat, I was only thinking about the tag-based. 
[16:51:26] <halfak>	 You might call it "tag_images" then. 
[17:32:03] <travis-ci>	 wikimedia/ores#1408 (master - 83bc66e : Andy Craze): The build has errored. https://travis-ci.org/wikimedia/ores/builds/655455077
[17:32:43] <halfak>	 Arg!  I didn't escape a character >:( 
[17:35:41] <accraze>	 ahhh :(
[17:35:53] <accraze>	 is this on the travis side?
[17:37:58] <halfak>	 Yup.  Fixed and restarted the build. 
[17:50:43] <accraze>	 hmm weird... it looks like the auto-deploy still fails
[17:50:51] <accraze>	 "Could not restore untracked files from stash entry"
[17:54:23] <halfak>	 wat
[17:55:28] <halfak>	 Oh!  I think I mixed up the pypi creds. 
[17:58:20] <accraze>	 if that doesn't work, I'm also seeing "ORES needs Python 3 to run properly. Your version is 2.7.12" in the logs
[17:58:39] <halfak>	 accraze, I just PM'd a quick question
[17:59:28] <accraze>	 might need to add something like what we have in revscoring: https://github.com/wikimedia/revscoring/blob/master/.travis.yml#L5
[18:04:05] <travis-ci>	 wikimedia/wikilabels#536 (install_docs - 9f25b7c : halfak): The build was fixed. https://travis-ci.org/wikimedia/wikilabels/builds/655467171
[19:12:39] <halfak>	 Woo!  Meeting block complete. 
[19:12:41] <halfak>	 Lunch!
[20:14:56] <halfak>	 back
[20:22:00] <travis-ci>	 wikimedia/ores#1408 (master - 83bc66e : Andy Craze): The build has errored. https://travis-ci.org/wikimedia/ores/builds/655455077
[20:23:20] <halfak>	 I bet the problem is that I need to escape periods. 
[20:25:36] <accraze>	 ^ yep just remembered i had to do this
[21:18:28] <codezee>	 halfak: I'm thinking of using our previously published WikiProjects dataset of 93k articles for the new work. To work on article states as they evolve, I will have to store the version of article at each point in history. Do you think the API should be fine to get this much data or should i use the dumps?
[21:22:40] <travis-ci>	 wikimedia/ores#1408 (master - 83bc66e : Andy Craze): The build has errored. https://travis-ci.org/wikimedia/ores/builds/655455077
[21:35:07] <wikibugs>	 10Jade, 10Scoring-platform-team (Current), 10MW-1.35-notes (1.35.0-wmf.22; 2020-03-03), 10Patch-For-Review: Address Jade UI issues. - https://phabricator.wikimedia.org/T245311 (10ACraze)
[21:53:21] <halfak>	 codezee!  Hey!  So we've moved beyond that strategy recently. 
[21:53:59] <codezee>	 oh, better ways available now?
[21:54:02] <halfak>	 We've switched to a manual taxonomy described here: https://github.com/halfak/wikitax/blob/master/taxonomies/wikiproject/halfak_20191202/taxonomy.yaml
[21:54:27] <halfak>	 The taxonomy has been adjusted and improved in a lot of ways.  This is produced better fitness and more useful topic categories. 
[21:55:09] <halfak>	 I'd be happy to share some labeled data with you.  But you can also generate it with our updated makefile. 
[21:55:11] <codezee>	 great! can you point to documentation which i can use to extract articles corresponding to these topics again? possible ~100k
[21:55:23] <codezee>	 *possibly
[21:55:28] <halfak>	 https://github.com/wikimedia/drafttopic/blob/master/Makefile
[21:56:15] <halfak>	 You probably want to start with "datasets/enwiki.labeled_article_items.json.bz2"
[21:56:35] <halfak>	 Alternatively, you could just use the model we have in production to label some articles :) 
[21:58:06] <codezee>	 halfak: the dataset itself is a representative sample of articles of varying qualities, hence it seemed like a useful start point
[21:58:20] <codezee>	 do you know roughly how many articles are part of the new dataset??
[21:59:45] <halfak>	 The new dataset is ~5 million. 
[21:59:51] <halfak>	 But you should be able to sub-sample that. 
[22:00:07] <halfak>	 It's basically every page on enwiki with a sitelink in wikidata. 
[22:00:35] <codezee>	 that seems easy if its just the article titles and some metadata i'll take about 100k articles for analysis out of that
[22:01:40] <codezee>	 halfak: in terms of getting the article history (including content at each step) for these 100k articles, do you think API is an okay choice?
[22:02:06] <codezee>	 i'm planning to store them on myself locally, then run further analysis with regards to categories like clarification, verificiatoon, etc
[22:02:07] <halfak>	 Yeah,  100k is reasonable for a few queries.  
[22:02:20] <codezee>	 *on mysql, not myself :P
[22:02:37] <halfak>	 Yourself using mysql :D 
[22:03:32] <codezee>	 haha, true! ;) okay then, I'll use that 5M article dataset, subsample it, and fetch the entire history of these articles locally, lot of incoming data this week ;)