[14:14:29] 10Scoring-platform-team (Current), 10Technical-blog-posts: Story idea for blog: Building algorithmic systems while keeping community in the loop - https://phabricator.wikimedia.org/T251426 (10Halfak) Thanks @srodlund! We will ping again when we have a draft that we think is ready for your review. First or se... [14:53:49] (03PS3) 10Vidhi-Mody: Upgraded WebdriverIO from v4 to v5 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/591791 (https://phabricator.wikimedia.org/T250900) [15:12:34] hello halfak [15:12:46] is the helder PR ready for me to pull down from [15:37:42] chtnnh, https://github.com/wikimedia/articlequality/pull/125 [15:43:29] will merging that into master and pulling locally make those observations available to me [15:43:35] halfak ^ [15:45:02] chtnnh, Good Q. I think merging is probably the easiest path. This looks straightforward. Since I merged the code change and this is just the resulting model, I think it's relatively safe to merge. Do you want to look at what I did in the Makefile and ask any questions you have? [15:46:19] i did check out the Makefile, seems straightforward [15:47:22] hey halfak, did the new code work for you? [15:47:29] so we can merge your PR and then i can resolve the conflicts it will cause when i pull it down if any. then i rebuild the model with the added features. sound good halfak [15:47:50] Helder_, yes! It looks like we got a performance improvement too. [15:48:07] I stopped the script yesterday, and tried again, but it has being 8 hours since it said "1 mappers still running" [15:48:31] so I was wondering if my changes had some error which might be causing this [15:48:46] Helder_, yeah. Definitely a weird bug causing this. I think that it's OK. It seems to be machine and run specific when this happens. [15:48:50] I don't think it was you. [15:48:57] https://github.com/wikimedia/articlequality/pull/125 check out the model PR. [15:49:02] Any notes you have are appreciated. [15:49:06] * Helder_ opens that [15:50:17] halfak, what was the result of [15:50:19] cat ptwiki.labelings.20200301.json | json2tsv wp10 | sort | uniq -c [15:50:21] when the script finished [15:50:23] ? [15:51:23] 145585 1 [15:51:23] 32694 2 [15:51:23] 6088 3 [15:51:23] 2229 4 [15:51:23] 1553 5 [15:51:24] 1484 6 [15:54:14] For the latest run, which didn't finish, I get 7 items less than you: [15:54:15] 145584 1 [15:54:15] 32690 2 [15:54:15] 6086 3 [15:54:15] 2229 4 [15:54:15] 1553 5 [15:54:17] 1484 6 [15:54:39] which is also pretty similar to what I've got yesterday [15:55:22] halfak i see you have already merged the commit? i will pull it down now? [15:55:58] Helder_, I think it just fails to detect the *done* state. It's a shame that this is so difficult to get right. [15:56:21] chtnnh, I didn't merge the new model yet. You'll want my makefile changes. [15:56:26] no you havent, my bad. i really need my glasses rn [15:56:36] they broke yesterday :/ [15:58:54] halfak, unrelated question: do you know how it is decided which pages/revisions go into each of the many ptwiki-20200301-pages-meta-*.bz2 files? [15:59:03] what criteria is used for the spliting? [16:02:00] Oh no chtnnh! Will you be able to get new glasses soon? [16:02:17] Helder_, it's page_id ranges. [16:02:29] The filename gives p####-p#### [16:04:11] oh, I see [16:04:18] makes sense now [16:06:11] hopefully before monday [16:06:20] dont know for sure halfak [16:07:00] just for my knowledge tho, what exactly are we waiting on the PR for? [16:07:25] I need someone to review it so I can merge it. [16:07:47] Either chtnnh or Helder_ can look at what I did and tell me if it looks good or there is some mistake I'm not seeing. :) [16:10:42] kevinbazira, just saw your question re. the content of edit comments. Let me know if you want to have a chat. [16:10:43] halfak, I don't see any problems [16:10:54] Gist is, we need to extend our edit comments and figure out how to render them. [16:11:04] Helder_, just say that on the PR and I'll merge :) [16:11:17] done :) [16:11:18] That'll pave the way for chtnnh to work with the words_to_watch features. [16:11:20] \o/ [16:11:42] Merged. [16:11:44] halfak o/ yes please we can jump on a short video call. [16:11:48] chtnnh, feel free to pull. [16:11:53] Call when ready kevinbazira [16:12:03] cool ... [16:25:53] pulling now [16:31:16] building the model now halfak [16:44:33] this is taking forever xD [16:44:35] wikimedia/articlequality#358 (he7d3r-old_templates - 4441ea2 : Aaron Halfaker): The build has errored. https://travis-ci.org/wikimedia/articlequality/builds/681552467 [16:46:12] o.O? [17:03:59] Helder_, it's because I deleted the branch. No worries. :) [17:04:09] *deleted after I merged [17:04:12] hmm... ok [17:04:41] halfak, does anyone still have the revscoring file models/enwiki.damaging.linear_svc.model so that this change could be tested? [17:04:42] https://github.com/wikimedia/revscoring/pull/486/files [17:05:30] Oh! We should instead put a file in there that would actually work. I think there are some good ways to do that. [17:05:56] We have a scorer model in ORES that doesn't have any interesting dependencies that we could pickle. [17:06:28] https://github.com/wikimedia/ores/blob/master/ores/scoring/models/rev_id_scorer.py [17:07:27] model built [17:07:34] https://github.com/wikimedia/articlequality/pull/121 [17:08:28] halfak ^ [17:08:33] so quickly? It took me hours... [17:08:50] where were you trying to build it/ [17:09:14] on my own machine [17:09:34] (but I was also extracting the labels from dumps) [17:09:43] that maybe it [17:10:05] im running this on a testing server, isnt that right halfak? [17:16:39] I was going to try using my personal account on Toolforge, leave it running there (using tmux or screen), and come back afterwards to see the results, but after reading https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rules I realised I was not supposed to do that, so I tried locally instead [17:46:52] good on you helder [18:43:26] posting our async update notes -- [18:43:41] kevinbazira- [18:43:44] Y: [18:43:46] Looked into localizing the second part of the edit comments on the history page. [18:43:48] - The FormatAutocomments MW hook provides a way to customize only the comment prefix which I did yesterday. [18:43:48] 04Error: Command “the” not recognized. Please review and correct what you’ve written. [18:43:50] - As I was working on customizing the rest of the comment, I came across the PageHistoryLineEnding MW hook which has comments in their HTML format. [18:43:50] 04Error: Command “as” not recognized. Please review and correct what you’ve written. [18:43:52] I am still avoiding parsing HTML until I can't find any other solution. [18:43:54] T: [18:43:56] Worked on localizing the second part of the edit comments on the history page. [18:43:58] - I ended up parsing comments in their HTML format as provided by the PageHistoryLineEnding MW hook. [18:43:59] 04Error: Command “i” not recognized. Please review and correct what you’ve written. [18:44:00] - I'll probably demo this in one of the sync meetings but the basic workflow was: parse the DOM, traverse it, pick the comment node, update old parts of the comment with new localized ones. [18:44:19] halfak- [18:44:26] Y: Lots of progress and notes on the blog post for values WRT ORES/ML/AI/etc. https://phabricator.wikimedia.org/T251426 I ended up doing some work on the ORES paper -- mostly reading papers related to platforms and innovation. Otherwise, I was AFK in the afternoon for some personal reasons. [18:44:28] T: I dug into some more papers re. ORES and write notes for a new cases study for the paper regarding the participatory model-building process we are following for ptwiki models right now. I've also been supporting chtnnh and helder on building/improving models. It looks like we're due for some solid improvements there. I hope to take some time this afternoon for the RC filters design work. [18:44:33] and me- [18:44:49] Y: Finally got the WIP patchset for re-enabling db hooks passing on Jenkins, although the link table name is still 'jade_diff_label'. There is still some work to be done to fix the LinkTable helper classes to be able to handle different table names for our ad-hoc approach. [18:44:51] T: Will continue working on unraveling our table names inside all the link table helper classes and tests to support the ad-hoc approach. Also need to figure out how to merge the db patchset without breaking beta (most likely will just leave the code for old tables and then manually delete later). [18:50:29] \o/ thanks accraze [20:10:03] AFK for a it [20:22:48] 10Scoring-platform-team, 10Wikilabels, 10articlequality-modeling, 10artificial-intelligence: Build article quality model for Ukrainian Wikipedia - https://phabricator.wikimedia.org/T251571 (10Ata) [20:52:20] 10Scoring-platform-team, 10Wikilabels, 10articlequality-modeling, 10artificial-intelligence: Build article quality model for Ukrainian Wikipedia - https://phabricator.wikimedia.org/T251571 (10Halfak) Thanks @Ata for filing this. I have a few follow-up questions What do the infobox template names look li... [22:10:30] 10Scoring-platform-team, 10Wikilabels, 10articlequality-modeling, 10artificial-intelligence: Build article quality model for Ukrainian Wikipedia - https://phabricator.wikimedia.org/T251571 (10Ata) > What do the infobox template names look like? … In English Wikipedia, all Infobox templates start with "Info...