[01:21:05] I am getting an error with articlequality, it worked when I used it some weeks ago, now I made the upgrade and I downloaded the new ptwiki model, when I run I get this error: http://dpaste.com/3X74D82 [01:44:37] after I run 'pip install --upgrade revscoring' it works, I was only used that command to upgrade articlequality, I thought it would upgrade all dependencies [09:33:56] 10Jade, 10CirrusSearch, 10Discovery-Search, 10MassMessage, and 8 others: Audit and remove extension `use Revision;` declarations where possible - https://phabricator.wikimedia.org/T257011 (10DannyS712) 05Openโ†’03Resolved [10:01:27] 10Scoring-platform-team (Current), 10LDAP-Access-Requests, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Production shell access for Chris Albon - https://phabricator.wikimedia.org/T256412 (10jcrespo) p:05Triageโ†’03High [10:22:16] 10ORES, 10Scoring-platform-team (Current), 10Operations, 10Patch-For-Review: ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10akosiaris) https://grafana.wikimedia.org/d/000000607/cluster-overview?panelId=87&edit&fu... [15:13:35] morning all [15:16:43] Morning chrisalbon [15:16:43] Afternoon here though [15:16:43] :) [15:16:49] ha fair [15:55:56] halAFK: I run articlequality in all ptwikis articles dump again and I discoverer the page that returned the error "C tokenizer exited with BAD_ROUTE", it is [[pt:Lista de finais para cadeirantes do US Open]], see http://dpaste.com/3W44BEA [15:55:57] 10[1] 10https://pt.wikipedia.org/wiki/Lista_de_finais_para_cadeirantes_do_US_Open [15:57:23] (sorry bad english *ran *discovered) [16:15:08] and while the dump is being processed, something in the articlequality seems to be acumulating memory, I had to make the code watch the memory and restart the proccess when the memory usage is high in order to avoid being killed by out of memory [16:15:34] the memory graph: https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&var-namespace=tool-ptwikis&from=1593999360000&to=1594048920000&panelId=2&fullscreen [16:18:45] the code processes others thing in the dump besides the ORES quality, when I tried to remove only the ORES quality it ran in 30 min without problems with memory, so the acumulating memory seems to be relates to articlequality [16:29:16] 10Jade, 10Scoring-platform-team (Current), 10Documentation: Flesh out mw:Jade/Edit_quality - https://phabricator.wikimedia.org/T256811 (10ACraze) https://www.mediawiki.org/wiki/Jade/Edit_quality [17:54:03] accraze o/ [17:54:37] I have an update for you. We can jump on a video call if you have a minute. [17:54:49] cool sounds good, call when ready [18:34:04] accraze are you still available? I've got it to work with 2 solutions. I'd like your opinion on which one is most optimal. [18:40:50] yeah give me a call kevinbazira [18:41:00] cool ... [18:49:40] Thank you for your time accraze. Have a good day ๐Ÿ‘‹ [18:50:03] no prob kevinbazira! the wikitext parsing looks great! [19:16:40] 10Scoring-platform-team (Current), 10LDAP-Access-Requests, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Production shell access for Chris Albon - https://phabricator.wikimedia.org/T256412 (10Nuria) Approved on my end [19:18:05] 10Scoring-platform-team (Current), 10LDAP-Access-Requests, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Production shell access for Chris Albon - https://phabricator.wikimedia.org/T256412 (10Nuria) @calbon once this goes through please try: ssh to stat1007.eqiad.wmnet @ssingh chris is also g... [19:37:38] 10ORES, 10Scoring-platform-team, 10Growth-Scaling, 10Growth-Team, 10drafttopic-modeling: Add articletopic model to testwiki - https://phabricator.wikimedia.org/T257248 (10Tgr) [19:37:56] 10ORES, 10Scoring-platform-team, 10Growth-Scaling, 10Growth-Team, 10drafttopic-modeling: Add articletopic model to testwiki - https://phabricator.wikimedia.org/T257248 (10Tgr) [20:17:42] 10Scoring-platform-team (Current), 10LDAP-Access-Requests, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Production shell access for Chris Albon - https://phabricator.wikimedia.org/T256412 (10ssingh) [20:20:11] 10Scoring-platform-team (Current), 10LDAP-Access-Requests, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Production shell access for Chris Albon - https://phabricator.wikimedia.org/T256412 (10ssingh) >>! In T256412#6283027, @Nuria wrote: > @calbon once this goes through please try: ssh to stat... [23:24:13] 10Scoring-platform-team, 10revscoring, 10Chinese-Sites, 10artificial-intelligence: Tokenization of "word" things for CJK - https://phabricator.wikimedia.org/T111179 (10jeena) @Halfak Looking at the result posted by @Pavol86 `['ใƒ›ใƒƒใ‚ฑใƒผ', 'ใซ', 'ใฏ', 'ใƒ‡ใƒณใ‚ธใƒฃใƒฉใ‚นใƒ—ใƒฌใƒผ', 'ใฎ', 'ๅๅ‰‡', 'ใŒ', 'ใ‚ใ‚‹', 'ใฎใง', 'ใ€', '่†', 'ใ‚ˆใ‚Š', 'ไธŠ'... [23:31:31] 10Scoring-platform-team, 10revscoring, 10Chinese-Sites, 10artificial-intelligence: Tokenization of "word" things for CJK - https://phabricator.wikimedia.org/T111179 (10VulpesVulpes825) @jeena, @Pavol86 & @Halfak Mecab does have multiple dictionaries to use. I would suggest redoing the testing using JUMAN+...