[07:03:16] wiki-ai/wb-vandalism#123 (client_move - 9ec6a24 : amir): The build passed. https://travis-ci.org/wiki-ai/wb-vandalism/builds/97812025 [15:18:46] o/ [15:40:13] o/ Amir1 [15:40:27] Sorry for the trouble with your university yesterday. [15:40:36] Who do I contact to try to make it better? [15:47:31] hey halfak [15:47:35] I was afk [15:47:36] sorry [15:47:42] No worries :) [15:48:31] I talked a little bit about it today, it seems it's a way more complicated situation [15:48:40] if I want to stay overtime [15:48:50] Gotcha. Let's move the meeting. [15:48:55] At least for you and me. [15:49:02] we may stuck in bureaucratic process [15:49:07] that's the hardest part [15:49:15] Anything that I can do to calm them down for you? [15:49:38] there is no need to do anything yet [15:49:44] OK [15:50:04] I was told if it happens again, they would report it to the university [15:50:21] so I probably can't be at the meetings [15:50:39] but in the other hand I don't want people to change the meeting time [15:50:48] *on [15:51:05] because I don't want them to get into trouble because of me [15:51:14] I'll try to find a way [15:51:23] If I couldn't, I'll let you know [15:51:52] OK. Let's not risk your status at the university. I think that moving the meeting time might end up being more reasonable overall. [15:52:03] Maybe we can move it to today. [15:52:17] That would be good for our ability to hack on things that come up during the meeting. [15:53:20] ok [15:53:37] Thursday and Friday are weekend here, that's causing the problem [15:53:54] Oh! [15:54:16] So it's a work day today? [15:54:23] and university is open until 10 PM (exactly) and it's when our meeting finishes [15:54:25] yes [15:54:29] I had classes today [15:54:39] I've three more classes tomorrow [15:55:00] So, I think that Aurthur will have the hardest time with this. [15:55:13] Since 9AM is the earliest anyone gets up in PT [15:55:27] And our meeting starts at 9:30AM PT [15:55:44] Does the university close at 10PM night? [15:55:50] yes [15:55:58] our meeting finishes at 9:50 [15:56:09] and we usually go much overtime [15:56:19] Yeah. We need to get better at that. [15:56:19] and that makes them angry :D [15:56:35] just half on hour sooner make my life much easier [15:56:35] I'm hoping out mailing list will help to minimize the long, synchronous discussions we need to have. [15:57:03] So, if we meet on Saturdays, we can probably have ToAruShiroiNeko join sooner. [15:57:23] So we could have a half hour buffer after. [15:57:40] Asking aetilley to get on a call at 9AM PT wouldn't be too crazy, I think [15:57:50] How about we propose that when everyone shows up. [15:58:08] +1 [15:58:52] If I could join the meeting at home, that would be great but it seems google hangout doesn't work in "ordinary" internets [15:58:55] :( [16:24:53] Amir1, will work something out. Just let us know what you need. We can skype if necessary. [16:25:10] okay [16:25:12] sure :) [16:25:14] So, I want to give you a heads up that there's a monster code review coming for revscoring [16:25:29] I'm just finishing up the tests for a major refactoring of the way that our features are organized. [16:25:49] It will help us expand feature types in the future. And it should improve our test coverage too. [16:25:55] oh boy [16:26:07] Let me know when you're done [16:26:08] Here's the in-progress https://github.com/wiki-ai/revscoring/pull/231 [16:26:19] btw there is a PR for wb-vandalism [16:26:22] Just to get a sense for the 3k lines of code that changed. [16:26:24] OOooh [16:26:25] * halfak looks [16:26:47] but we need to re-train, re-tune, and re-depoly everything [16:27:23] It'll improve our work slightly. I added this feature based on mistakes report in the Wikidata [16:28:01] Amir1, did you run the feature against a few comments to make sure it works as expected? [16:28:21] https://www.wikidata.org/wiki/Wikidata:ORES/Report_mistakes#First and second improvements [16:28:31] let me do it [16:28:33] good point [16:28:53] I forgot it, since it was a rather straight forward patch [16:29:12] Amir1, since you're going to run a test, maybe you could throw a it in a "tests/" dir in "feature_lists/" [16:29:26] Somehow I thought I had already made on of those [16:29:31] sure [16:29:39] Oh! no I have it in wikiclass [16:29:39] https://github.com/wiki-ai/wikiclass/tree/master/wikiclass/feature_lists [16:29:48] See https://github.com/wiki-ai/wikiclass/blob/master/wikiclass/feature_lists/tests/test_enwiki.py [16:30:20] hmm [16:30:28] we do have wikidata-related tests [16:30:37] but they are not in the right place I think [16:30:46] I remember writing them [16:34:16] OMG PICKLE DOESN"T WORK ON STATICMETHODS!!!! [16:34:17] WHY [16:34:39] instance methods work fine [16:34:47] * halfak hates pickle [16:35:21] Eventually they are going to fix this crap and then we're going to be stuck with a bunch of goofy work-arounds that don't make any sense. [16:39:09] So... I need to score all of the anonymous edits in the last week, but we just killed our caching server. Boo. [16:39:18] We need an ORES client [16:39:39] >>> print(list(extractor.extract(274775730, [is_client_move]))) [16:39:40] [True] [16:39:43] so it works fine [16:39:53] but let me add some tests [16:40:00] +1 [16:40:08] please do not use extractor in test files. [16:40:11] use solve() [16:40:27] Makes tests more explicit and robust to network issues. [16:43:13] sure [16:43:35] It seems we don't have any tests related to wikidata.py at all [16:43:40] anywhere [16:43:42] :| [16:43:57] Yeah. s'ok. They should be quick to write, right? [16:45:20] some of them are [16:45:24] but some of them don't [16:45:39] *aren't [16:45:46] my english is going backwards [16:45:49] I think we should only be testing the Meta-features [16:46:01] The functions that produce a feature [16:46:35] okay [16:46:41] let me examine [16:46:53] I don't think we need to test the property_changed() ones. [16:47:13] Or has_property_value [16:47:30] * halfak thinks is_blp is beautiful; [16:47:41] I think just the comment_matches [16:47:43] the problem is all of them are like this [16:47:47] * Amir1 agrees [16:47:56] okay [16:48:26] is_blp = has_birthday.and_(not_(dead)) "has birthday and not dead" [16:48:35] but my biggest obstacle is doing it without using API, since keep all of those edits would be hard [16:49:00] we can either keep several json files [16:49:11] or we can make dummy edits for ourselves [16:49:17] eq_(solve(wikidata.is_item_creation, cache={comment: "...."}), True) [16:49:23] without using api [16:49:26] yeah [16:49:27] exactly [16:49:31] No dummy edit. Just put "comment" into the cache :) [16:50:19] Did you see the JSON files I put in here for testing? https://github.com/wiki-ai/wb-vandalism/tree/master/wb_vandalism/features/tests [16:50:28] That worked pretty well for the diff features. [16:51:18] yeah [16:51:25] you are one of the json files :D [16:51:46] but we can't test everything with them [16:52:18] e.g. testing item_creation [16:55:43] Oh! yes we can. [16:55:54] Because the parent revision will be missing. :) [16:56:04] And the current revision will have parent_id == 0 [16:56:09] Oh [16:56:10] Yeah [16:56:12] Woops [16:56:21] We'd need to edit the files [16:56:31] To make it look like the parent_id is zero [16:56:46] Maybe we can load the doc and modify it in the test [16:56:59] Or just add a new JSON file :) [17:03:32] it's just one of them, we need this for is_client_delete, is_client_move etc. [17:05:13] and it will work based on comment [17:05:33] o/ aetilley [17:05:53] hi halfak [17:06:21] I'll be back in ten min. [17:06:31] oh, o/ aetilley [17:07:49] ok [17:07:52] hi Amir1 [17:29:13] Amir1: http://pastebin.com/VzWkkgMF [17:32:54] Perhaps I should convert the booleans to ints? [17:32:57] Amir1: ^ [17:37:09] ellery: greetings [17:42:12] I'm back :) [17:43:46] Amir1: hiya. I sent you a paste. [17:44:33] sure [17:44:36] I'll check [17:45:36] that shouldn't happen [17:45:52] can you send me what you're giving to the ANN? [17:46:41] It's a tuple (A, B) where B is a boolean and A is a list of floats/booleans [17:47:05] Sorry, each datapoint is a tuple like that. [17:48:07] you should convert any bool to int [17:48:11] ok [17:48:13] 1 = True, 0 = False [17:48:17] (unlike SVM) [17:48:22] I should support that too [17:48:31] (added to my todo list) [17:48:35] ok, let be fix my read-file [17:50:47] sklearn handles bools [17:50:50] BTW [17:54:11] I should add that [18:08:05] Amir1: I converted bools to ints and got the same error [18:08:19] I'm going to try converting them to floats. [18:08:23] Can you send me some parts of it [18:08:33] it won't be a problem [18:08:42] I give int to kian all the time [18:20:47] http://pastebin.com/t2r7Uwib [18:21:00] Let me also paste you my readfile. [18:23:00] readfile here: http://pastebin.com/PUX8aic6 [18:23:02] Amir1: [18:23:11] thanks [18:23:14] I check it asap [18:23:22] k [18:23:26] halfak: how I can run nose tests? [18:23:36] nosetests [18:23:48] which tests? [18:23:58] I do this for pywikibot stuff, I'm not sure if it works in wb-vandalism too [18:24:06] in wb-vandalism [18:24:12] oh, i can check travis.yaml [18:24:30] Yeah. 'nosetests' works great [18:24:38] I think travis has '-v' for verbose [18:29:19] yeah, I checked [18:32:42] halfak: https://github.com/wiki-ai/wb-vandalism/pull/25/files [18:33:37] {{merged}} [18:33:37] 10[1] 04https://meta.wikimedia.org/wiki/Template:merged [18:33:41] Awesome work [18:33:44] lol AsimovBot [18:33:48] Thank you [18:34:04] here comes the hard part, [18:34:14] Should be straightforward. [18:34:17] we need to re-extract, re-train, re-deploy [18:34:19] :D [18:34:24] Want me to kick it off? [18:34:31] Are you working on ores-compute.labs yet? [18:36:17] I've never worked with it [18:36:24] Can you help me do it? [18:36:36] Sure! [18:37:11] Set up for accessing Labs: https://wikitech.wikimedia.org/wiki/SSH_access [18:38:09] Then "ssh ores-compute.wmflabs" [18:39:08] I'm not in ores project [18:39:10] afaik [18:39:22] I connected to revscoring instances before [18:39:28] so I know the protocol [18:39:34] This is a revscoring instance [18:39:40] I don't know how to re-extract and stuff [18:39:47] The makefile does [18:39:50] I'll show you :) [18:40:14] You just need to delete the old feature file and then say "make me a new model" and make will handle the rest :D [18:41:33] connected :) [18:43:10] what's next halfak ? [18:43:15] OK. Set up wb-vandalism in your home dir [18:43:29] I'll help you transfer the files you need pre-feature extraction next [18:43:59] using git clone? [18:45:11] Yup [18:48:53] halfak: done [18:49:15] should I install it too? [18:49:29] You don't need to, but you will need the requirements. [18:49:41] SO install it or pip install -r requirements.txt [18:50:04] * halfak extracts scores for all enwiki anons in the last week [18:55:18] Then copy these two files to your own datasets/ dir [18:55:51] /home/halfak/projects/wb-vandalism/datasets/wikidata.sampled_revisions.20k_balanced_2015.tsv [18:56:23] okay [18:56:24] /home/halfak/projects/wb-vandalism/datasets/wikidata.rev_reverted.20k_balanced_2015.tsv [18:56:25] It's installing [18:56:49] Amir1, quick Q, did you already have a virtualenv set up on ores-compute? [18:57:12] no [18:57:19] Should I make one? [18:57:22] Yes [18:57:39] One sec. I have a recommendation. [18:58:03] https://gist.github.com/halfak/9f4830895496af9e9731 [18:58:21] Note the "--system-site-packages" in virtualenv creation [18:58:38] That will make it so you don't need to compile scipy and numpy [18:58:45] However, you'll still need to compile sklearn [19:01:42] Amir1, compiling? [19:02:27] it seems so [19:02:35] even though I did what you said [19:02:46] It's just downloading now [19:03:27] yes, it's compiling again [19:04:06] kk [19:04:24] Have you used Screen before? [19:04:33] The next steps will take a while. [19:05:26] not [19:05:29] No [19:06:03] If you like pain: https://www.gnu.org/software/screen/manual/screen.html [19:06:18] If you don't like pain: https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/ [19:06:37] TL;DR: Screen lets you set up a terminal *on the server itself* [19:06:46] That means you can leave it running when you log out. [19:06:54] And then log back in to "re-attach" to it. [19:07:01] It works just like a terminal window. [19:07:09] You can start up many and switch between them. [19:07:18] I use them for any long-running process on a server. [19:07:36] (except daemons of course) [19:08:20] E.g. I just started up a script to query ORES for 180k scores. That's going to take about 10 hours so I stuck it in a screen and I'll check on it tomorrow. [19:09:30] ohhhhh [19:09:41] I remember now [19:09:50] I used to use this in toolserver [19:10:00] but we're not allowed to use it in tools now [19:10:24] I'm expert in screen [19:10:30] I just forgot :D [19:10:50] yay! [19:12:47] So, once it installs and you copy those files, run 'make -n models/wikidata.reverted.rf.model' and pastebin that for me [19:13:05] That'll show you what Make is going to do without actually doing it. [19:13:05] should I screen that? [19:13:11] It will help us know that we did things right, [19:13:13] Not yet [19:13:17] kk [19:13:20] Screen will come once we're ready to kick it off. [19:15:03] It's still compiling [19:15:12] in the mean time I wanted to show you something [19:15:13] tools.wmflabs.org/dexbot/tools/good_faith.php?limit=2 [19:15:17] https://tools.wmflabs.org/dexbot/tools/good_faith.php?limit=2 [19:15:50] these are damaging edits (> %90) that are good faith (> 50%) too [19:16:31] Amir1, cool! What do you think of the prediction. Does it make sense? [19:16:33] I'm finishing this off, and then people can check this list once a while and go fix edits+ make a proper note in newbies' talk pages [19:16:41] yeah [19:16:44] Cool. :) [19:16:46] I checked that [19:16:52] That' [19:17:00] That is awesome. [19:17:04] Like super awesome. [19:17:09] When can I get that for enwiki? [19:17:23] I want to show it to people in the Teahouse and ask them about it. [19:17:29] o/ Helder [19:17:39] oi! :) [19:17:50] we have two ways to do that: 1- let's wait until ORES extension is there (It would make a very long time, I think) [19:18:02] or I partially run my cluebot NG for en.wp [19:18:12] with disabling every unnened part [19:18:13] Hmm... Let me think more about that. [19:18:25] Helder: o/ [19:18:27] This would be something good to put in front of violetto [19:18:50] She'd likely have some ideas for what we could put in front of users and what might belong in the ORES extension. [19:19:38] once we store these data in tables, getting queries would be unimaginably easy [19:19:52] Yeah. Looking forward to having some daily dumps for quarry [19:20:16] we can even have a specially page re. that [19:20:25] *special [19:20:56] I'll already wrote one special page for mediawiki (Special:UnconnectedPages is mine) [19:21:09] actually I rewrote that but anyway [19:21:31] I can write this again for ORES extension too [19:21:31] Amir1, I've got to get on the road soon and I need to prep. I'm going to run away for 30 minutes, but I'll be back to make sure the new feature extraction went well. Is that timing going to work for you? [19:21:45] yeah [19:21:46] :) [19:22:02] * halfak runs away [19:27:02] halfak: http://pastebin.com/pAgNBUDw [19:27:09] for when you're back [19:35:47] * YuviPanda waves very vaguel [19:35:48] y [19:44:18] Amir1, can we have that tool for other languages easily? [19:44:53] It's not easy yet [19:44:56] but It should be [19:45:02] depends on several factors [19:45:23] the most important one is ORES extension, it should be deployed ASAP [19:45:50] * Helder reads previous conversation he missed [19:46:13] afk for a while, be back soon [19:47:14] ok [19:56:15] o/ Amir1 [19:56:30] * YuviPanda is slightly more healthy today! [19:56:49] \o/ [19:57:01] ^ re. YuviHealthiness [19:57:06] :D [19:57:24] halfak: what's your schedule looking like now? back on MOnday? [19:57:26] or? [19:57:43] Amir1. Looks good. Run that in a screen without the "-n" [19:58:05] YuviPanda, I've got to get on the road in 5 minutes and do some holiday things tonight. [19:58:11] I'll be online for an hour tomorrow and back on Monday [19:58:28] The tomorrow hour is 7-8AM PT [19:58:29] halfak: ok! [19:58:35] So that I can sync up with ToAruShiroiNeko_ [19:58:37] I'll poke around and fix staging later today [19:58:45] and send out emails [19:58:53] kk :) [19:59:12] * halfak logs in with phone so he can advise Amir1 from the road. [20:01:01] * YuviPanda adds matplotlib, pandas and friends to PAWS [20:01:05] o/ [20:01:18] Off I go [20:01:22] have a good one folks! [20:12:50] Amir1, ping me here if you need a hand