[14:02:25] o/ [15:25:29] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2677286 (10Halfak) @jcrespo, we have 6 columns * page_id (UNSIGNED INT) -- The page identifier * page_title (VARBINARY(255)) -- The title of the... [15:26:27] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2677289 (10Halfak) By the way, I've been referencing http://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html [15:28:02] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2677308 (10Halfak) Now that I think about it, I'd really like to drop the `title` field since it's not a stable identifier for a page. [17:22:16] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2677596 (10jcrespo) @Halfak We can talk further, but a single 500 million-row mysql table should either get its own hardware or be split on small... [17:27:57] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2677615 (10Halfak) Does splitting 500m rows into 5 x 100m row tables help somehow? Indeed, I expect that schema changes and table maintenance wo... [17:41:40] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2677658 (10jcrespo) >>! In T146718#2677615, @Halfak wrote: > Does splitting 500m rows into 5 x 100m row tables help somehow? > Indeed, I expect t... [17:44:22] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2677663 (10jcrespo) > What do you mean by "consider compression"? Both InnoDB has a compressed row format; or alternatively, tokudb has high com... [19:41:00] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 07Wikimedia-Incident: [Spec] CI tests for Wikilabels - https://phabricator.wikimedia.org/T137625#2373593 (10greg) This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment explaining why. If... [19:43:47] 10Revision-Scoring-As-A-Service-Backlog, 10Wikilabels, 07Wikimedia-Incident: [Spec] CI tests for Wikilabels - https://phabricator.wikimedia.org/T137625#2678085 (10Halfak) That's right. We haven't had the resources for this yet @greg. We're currently working with the #collaboration-team-triage to bring engi... [21:20:43] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2678397 (10Halfak) I don't believe we have anything available to Quarry that isn't a MySQL replica, right? @yuvipanda, what do you think about... [22:12:17] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2678518 (10Halfak) > Another question is, what is the relationship between these tables and ores_classification and ores_model from production?... [22:22:16] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2669300 (10Platonides) Note that if you drop page_title, in order to have page_id mean something, you would have to join it (and even if includin... [22:32:35] 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#2678582 (10Halfak) This is only articles, so page_namespace == 0. You're right that joining would be necessary to get back to a title, but one...