[13:20:55] halfak: i'm using mwxml.map as `for revision in mwxml.map(process_dump, [dump_file], 25):` but it seems to be slow. It has been running for 2 days now. The only checks I'm doing are look for 'pov' and 'neutral' in revision comments and then insert those edits in a local mysql [13:21:40] I tried with both without specifying threads and with 25 threads, but it doesn't seem to speed up [13:54:02] 10Jade, 10Scoring-platform-team (Current): The wrong label shows up while performing an undo - https://phabricator.wikimedia.org/T256812 (10Halfak) I added a label to [438261](https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Foobar&action=edit&undoafter=418935&undo=438261) in the Undo interface and the... [13:54:42] codezee, it could be that the mapper is failing to shut down. I have seen that behavior before. [13:54:47] Is it still using CPU? [13:59:23] when i do htop, i only see 1 cpu at 100% and others at 0% [14:00:33] the mapper was still running , after 1 day it could only insert 679k revisions, which seems slow so i'm guessing either my mysql inserts are slow or something else [14:04:22] Oh! That could be. If it is going at 100% though it's definitely the bottleneck. It could be that (1) there's some degenerative loop somewhere in the code or maybe (2) it's processing a really big dump file in the last mapper. [14:04:42] If (2), it could be that it is the dump file that contains the Administrator's noticeboard. [14:04:48] That page has A LOT of edits. [14:06:45] oh! i should've remembered to include only ns0 [14:09:31] If you ever look at the file sizes of the dump files, you can always pick out the one with the Admin's noticeboard because it's way bigger than the others. [14:14:00] halfak: what do you mean by pick out? the dump is one single big file with all the revisions in enwiki. Do i exclude the admin noticeboard manually like "if page.title == `noticeboard..`"? [14:14:28] You could do something like that. NS 0 will work as well. [14:14:33] Because this shows up in NS 4 [14:14:50] Looking at that one big file is more of curiosity than anything. [14:15:12] It just gives you a sense for how big that one page is :| [14:15:24] haha, ok :P [14:17:30] i killed the script and running again. If i'm seeing 100% on 1 cpu and 0% on rest 31 i guess its not doing multiprocessing right [14:17:31] ? [14:19:15] It is but there's only one process remaining. It's just trying to finish off that last dump file. [14:19:44] hmm, let me try doing some process killing and running again [14:20:25] Regretfully, we don't have any progress bar for output. That might be a nice feature to add. [14:20:56] yes \o/ [14:24:14] https://github.com/wikimedia/articlequality/pull/133 halfak can you have a look please? [14:29:15] Nice work chtnnh. Merged! That means this can go out in the next deployment. I'm hoping to work with chrisalbon to have him prepare a deployment soon. [14:29:26] With any luck, we might be able to get it on Beta today and Prod on Monday. [14:29:40] It'd be nice to blow smoke through the deploy pipeline before I high-tail it out of here :) [14:29:45] amazing! [14:29:58] 2 down, 2 to go [14:32:06] chtnnh, once we get this deployed, we'll want to consult with the ukwiki folks about the quality of the model. I'm guessing there are going to be issues because we have a small amount of data to train/test and the fitness doesn't look to be amazing. [14:32:22] I wonder if we could get someone from ukwiki to join us in IRC [14:33:03] yeah, thats part of our process with every wiki right? lets try to get someone from the ukwiki on irc as you suggest and we can collaborate on improving the model further [14:54:11] +1 chtnnh :) [14:54:14] brb [14:55:29] sure! [15:31:14] woops. I'm back. Forgot to say. [15:41:49] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10editquality-modeling, 10User-Ladsgroup, and 2 others: migrate bad words detection to editquality repo - https://phabricator.wikimedia.org/T131861 (10jeropbrenda) Hello, I'm getting familiar with ORES and its models so I decided to pick this tasks to... [15:57:40] Hello halfak, I'm considering just handling the error from yesterday and continuing with other labels instead of having it break the whole thing. [15:57:58] Wasn't able to see why that particular one didn't have a page key value [16:03:01] halfak: will mwxml.map maximize #cpu's if I dont' specify the number of threads? [16:36:54] 10Jade, 10Scoring-platform-team (Current): The wrong label shows up while performing an undo - https://phabricator.wikimedia.org/T256812 (10kevinbazira) Thank you for the explanation @halfak. I have managed to reproduce this issue and fixed the bug. The wrong label was showing up while performing an undo beca... [17:11:14] codezee, sorry for the delay. It does. [17:11:24] It uses whatever multiprocessing.cpu_count() returns. [17:11:43] haksoat, did you get an output for the one observation that was breaking? [17:19:12] sorry was afk [17:19:36] halfak,what can we do to get someone from ukwiki to move along the testing of the model? [17:19:56] We'll need to get the model deployed first [17:20:31] chtnnh, I've been talking to chrisalbon about doing that deploy with him so he gets the experience. We're going to try to get it on beta on Monday and that will open the door for a deployment on Tuesday or Wednesday. [17:20:57] Once we have the model deployed, we'll need some way for people to use it. I have the ArticleQuality.js userscript. I think we'll want to set that up for ukwiki. [17:21:19] Make a page that looks like this on ukwiki: https://en.wikipedia.org/wiki/User:EpochFail/ArticleQuality.js [17:21:26] Replace EpochFail with your username [17:21:38] And update the "weights" and "names" to match ukwiki. [17:22:50] on it boss [17:23:09] Let me know when you're done and I'll review what you have. :) [17:23:19] --> Lunch [17:24:57] okay, thanks! [17:27:04] Yes I did halfak|Lunch [17:27:10] https://gist.github.com/HAKSOAT/7867b6b342b8ec84fedb0f388ba77cc7 [17:38:32] https://en.wikipedia.org/wiki/User:Chtnnh/ArticleQuality.js halfak|Lunch [18:35:04] haksoat, that json blob is super weird. It's not what we'd expect to see in the revdocs file. It's what we would expect to see in the labeled_items file! [18:35:21] Is there any random print() in the transform_content utility? [18:41:00] The issue came from the words2plaintext utility [18:41:08] So I suppose it's normal [18:42:41] does the link look good halfak ^ [18:47:57] haksoat, actually I think the problem is in the output of the transform_content utility, haksoat [18:48:15] chtnnh, looks like you have the English Wikipedia "Stub", "Start", etc. still in there. [18:48:38] Hmmm [18:50:08] There's no random print in the utility though. The closest thing to a print is the logger. [18:50:22] sorry slipped that, is it good now [19:02:22] I wonder how we might have gotten that random line from a different file in there! [19:02:35] haksoat, it might be worthwhile to regenerate the revdocs. [19:03:13] halfak: i'm doing some cleanup of our cloud VPS instances. there's an old wikilabels instance that was supporting research-wikilabels.wmflabs.org (not currently up). just wanted to check to make sure that this wasn't something you needed. i assume https://labels.wmflabs.org/ is the main instance and running off of one of your team's servers [19:04:04] Nope. i think taht was something baha used for testing. [19:04:21] halfak: sounds good, thanks! i'll check with him then [19:04:37] haksoat, on the other hand, I do think handling problems with the input data and reporting issues in the form of "logger.warn" and moving on is good practice. [19:04:53] So I'd merge that change too if you'd rather not regenerate. [19:05:19] haksoat, one other thing though is I'd like to check if any of the lines after 3076 (or whatever) are good and include "page_name". [19:05:35] I wonder if we somehow accidentally got many bad lines in that file and this is just the first one we noticed. [19:06:32] I'm regenerating now [19:06:54] Also added a logger so I can see how many issues come up after that [19:07:51] The revdocs just finished generating.... Currently being converted into plaintext [19:13:25] halfak, does https://en.wikipedia.org/wiki/User:Chtnnh/ArticleQuality.js look good/ [19:15:56] OK haksoat sounds good :) [19:22:46] 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Why is jawiki's goodfaith model so bad? - https://phabricator.wikimedia.org/T230953 (10Halfak) @jeena took a list at a bunch of example edits that scored as likely to be badfaith and confirmed that most of them look good. I think th... [19:25:13] chtnnh, looks like you forgot to rebase that last PR. [19:28:36] on it [19:28:56] chtnnh, for ArticleQuality.js, I don' [19:29:05] t think the "class" template exists in ukwiki. [19:29:16] So let's just change "{{class|IV|image=yes}}" to "IV" [19:29:18] 10[1] 04https://meta.wikimedia.org/wiki/Template:class [19:29:23] And follow that pattern for the classes. [19:29:35] Also, you will want to create this page in ukwiki :) [19:30:01] hahaha looks like i messed up on this one [19:30:03] The meta links that used to reference "User:EpochFail" should still reference that. [19:30:07] No worries! It's confusing. [19:30:27] also on the rebase, can i just squash and merge? [19:30:46] I don't think that will work. [19:31:11] thats what we used to do earlier if i remember correctly? [19:38:22] can you check the merge please halfak [19:46:22] halfak, https://uk.wikipedia.org/wiki/%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D1%82%D1%83%D0%B2%D0%B0%D1%87:Chtnnh/ArticleQuality.js [19:48:14] chtnnh, I see that the PR conflicts with master still. [19:48:49] Some good news [19:48:53] Weird one too [19:49:05] chtnnh, the last line in https://uk.wikipedia.org/wiki/%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D1%82%D1%83%D0%B2%D0%B0%D1%87:Chtnnh/ArticleQuality.js should still reference User:EpochFail/ArticleQuality.css [19:49:12] The word vectors have been generated for ukwiki [19:49:12] oh, haksoat? [19:49:16] And there was no error [19:49:17] \o/ [19:49:21] :) [19:49:25] Ha. Must have been a weird fluke. [19:49:30] Seeing that one line in there was WEIRD. [19:50:16] Hehe [19:52:35] fixed the articlequality.js thing [19:53:25] OK chtnnh. This is looking pretty good. [19:57:51] can you temme whats wrong with the PR? [19:57:57] halfak ^ [20:01:05] Needs a rebase. Should have exactly one commit in it. [20:27:14] can you try your rebase and merge, ive been trying to rebase but its not working [20:27:21] its fine if you get credit for the commit [21:11:19] chtnnh, roger that, I'll take a look. [21:21:02] ha. this is messy. [21:21:07] * halfak is going to cheat :) [21:24:46] chtnnh, did I miss anything? https://github.com/wikimedia/articlequality/pull/140 [21:39:33] looks good to me [21:39:42] xD great cheat senpai [21:40:08] halfak ^ [21:41:50] Merged! [21:42:02] I'm heading out pretty soon. Have a good night! [21:42:07] And a good weekend. [21:52:50] you too senpai! [21:53:00] glad we got some work done today [21:53:11] i will see you monday :')