[14:53:06] hey halfak I want to work on wb vandalism [14:53:24] I want to write features [14:53:49] but first of all, do you think I should copy features from revscoring too? like user related features [15:17:09] Amir1, don't copy any features. [15:17:22] You can just import the features from revscoring. [15:17:30] brb need to restart [15:20:17] halfak: okay :) [15:20:17] I already finished lots of them [15:21:33] Great! [15:21:47] Do you know what I mean when I say "import from revscoring"? [15:22:02] of course [15:22:11] :) [15:22:17] kk [15:22:17] :) [15:22:42] halfak: one thing, I want to add features like is_human or is_blp. What do you think? [15:22:57] (for revision) [15:23:53] I wonder if we can have a meta feature. Can you remind me how that works? There's "is instance of" and "human" that we need to know, right? [15:24:37] if 'Q5' in item.claims.get('P31', []) [15:24:56] P31 = instance of [15:24:56] Q5 = human [15:25:07] something like: has_property_value("P31", "Q5") [15:25:25] And that would evaluate to a boolean feature [15:25:35] like modifiers.log(10) [15:25:46] What do you think? [15:26:10] sounds really good [15:26:34] where should I put it? [15:27:42] Right next to the other features. [15:27:48] I'll get you a thing. [15:27:51] One sec. [15:28:17] https://github.com/wiki-ai/revscoring/blob/master/revscoring/features/feature.py#L99 [15:28:22] okay, sure [15:28:31] This is a good example of a feature that implements the "Modifier" class. [15:30:04] What does the Q in a Q stand for? [15:30:42] it's first letter of Denny's wife :) [15:30:50] wut [15:31:07] P stands for property though [15:31:17] :) [15:31:29] I can get his notice [15:32:43] http://comments.gmane.org/gmane.org.wikimedia.wikidata/2447 [15:33:22] "I also would like to use this chance to reveal a secret. Wikidata items are identified by a Q, followed by a number, Wikidata properties by a P, followed by a number. Whereas it is obvious that the P stands for property, some of you have asked - why Q? My answer was, that Q not only looks cool, but also makes for great identifiers, and hopefully a certain [15:33:22] set of people will some day associate a number like Q9036 with something they can look up in Wikidata. But the true reason is that Q is the first letter of the name of the woman I love. We married last year, among all that Wikidata craziness, and I am thankful to her for the patience she had while I was discussing whether to show wiki identifiers or language [15:33:22] keys, what bugs to prioritize when, and which calendar systems were used in Sweden." [15:34:58] sweet [15:35:09] Huh. I guess the decision is somewhat arbitrary. [15:35:17] Seems to work fine either way :) [15:35:19] https://gist.github.com/halfak/d17268a4ac150c7ccb58 [15:36:13] ^ A quick attempt at the meta-feature [15:37:37] awesome [15:37:42] I just copy paste it and then modify it a little and then use it :) [15:41:11] https://www.irccloud.com/pastebin/EXB0xQdF/ [15:41:26] Does it look good? [15:41:41] halfak: ^ [15:42:31] self.value = pywikibase.ItemPage(value) ? [15:42:58] yup [15:43:16] oh it needs some more modifications [15:43:45] Why do we need that line Amir1 ? [15:44:10] becuase item.claims stores claims [15:44:15] and claim objects [15:44:57] let me fix it [15:45:41] No worries. I'm just trying to work out how "value" is an ItemPage. [15:46:07] Maybe we're confirming that the value is a "Q" [15:48:30] https://www.irccloud.com/pastebin/9YH91jBn/ [15:48:38] ah hello [15:48:53] we can send item page object instead of 'Q5' [15:48:58] ToAruShiroiNeko: hey :) [15:49:54] halfak: this is an option ^ [15:50:30] Amir1, I see. So we get an "ItemPage" out of the claims? What if a value is not an ItemPage? [15:51:01] e.g. a number [15:51:05] Or a date [15:51:16] I see [15:52:58] https://www.irccloud.com/pastebin/VKOAcmSv/ [15:53:07] I think that would be the best option [15:56:09] I see.. So we can use the target -- that could be a string or number? [15:56:23] How would it work for a date? [15:58:58] it works [15:59:13] members of item.claims['P31'] are claim object [15:59:16] no matter what [15:59:41] but that claim object can have different types of targets [16:02:45] Makes sense. [16:05:14] halfak: I pushed my changes to wb-vandalism [16:09:10] Amir1, any new thoughts re. wikibase diffs? [16:10:56] the API part or our part? [16:10:57] halfak: ^ [16:11:12] Both, with a focus on our practical concerns. [16:11:40] It's very easy to write some differs now since we have pywikibase ready [16:11:48] not very easy [16:12:05] but easier than writing an API module in Wikibase [16:24:38] * YuviPanda waves weakly [16:28:05] +1 for that Amir1. I think we might do set diffing on claims. [16:28:13] o/ YuviPanda [16:31:24] I'm in Rome [16:31:47] Are you doing as the Romans do? [16:32:07] yes, watching Full Metal Jacket and learning perl6 [16:32:11] quite Roman, I bet [16:33:28] perl6 ? [16:33:53] yes [16:33:57] it's a very interesting language [16:34:11] and seems to have some pretty great features for async / concurrent / parallel stuff [16:34:21] reading the docs reminds me a lot of haskell and/or scala [16:36:17] which is quite surprising [16:36:27] since all the perl code I've had to deal with so far have given my nothing but pain [16:39:39] +1 Perl had/has an awesome community that a lot of other languages lack. [16:39:49] Resulted in support for really cool things. [16:39:59] The community is a big reason I stick around python [16:40:14] yeah [16:40:23] I hear the Bio community uses a lot of perl [16:49:27] Amir1 & ToAruShiroiNeko ^ [16:49:37] If you can find time, I'd appreciate turning around on this one quickly. [16:49:47] It's a big nasty change, but it should also be awesome. [16:49:55] sure [16:49:57] I can look [16:50:11] Thanks! [16:50:44] https://travis-ci.org/wiki-ai/revscoring/builds/75703598 : [16:51:25] * halfak shakes fist at scipy [16:57:34] so I am looking at the actual change [16:58:30] will you fix that error? or is it just travis being strange? [16:58:56] ToAruShiroiNeko, it's travis [16:59:02] We still don't have working travis [16:59:18] that a travisTY [17:00:06] wow [17:00:07] .say for keys( @a ∩ @b ); [17:00:09] is valid perl6 [17:00:12] it has unicode operators [17:00:34] I see lots of stuff moved around but there is no real functionality change right? [17:00:47] \aside from what we discussed two weeks ago [17:00:48] YuviPanda, amazingly aweful [17:01:03] ToAruShiroiNeko, nope. You heard it all two weeks ago. [17:01:23] The main functionality change is in that RegexExtractor [17:01:36] okay I am going to trust you on this one since most of the change is bad words being shuffled around. [17:03:07] O.O [17:03:12] Far less fanfare than expected :) [17:03:31] No, we'll need to be careful because this ruins backwards compatibility. [17:03:37] so why did I end up closing two pull reuests? [17:04:04] woooo! [17:04:06] * YuviPanda fanfares [17:04:09] halfak our only user is us more or less [17:04:10] Ha. Looks like I built off of one of the pull requests that had been submitted earlier. [17:04:19] so backwards compatibility isnt THAT vital just yet [17:04:20] So before we cut the next version (which will be a 'minor' update) we need to address issues with ORES [17:04:28] This will affect wb-vandalism too Amir1 [17:04:44] ToAruShiroiNeko & YuviPanda +1 [17:05:07] I don't want to make promises to others about stability until we go 1.0.0 [17:06:54] oh [17:07:09] I am declaring de.wikipedia attemt as failiure for now [17:07:16] Fine with me [17:07:21] estonians have shown great interest so that was uite a turnabout [17:07:37] I have been hearing about new ORES use-cases left and right. [17:07:40] DE will come around [17:09:47] when they do they will be at the end of the ueue [17:10:02] I am tempted to create a burocratic process to add languages just for them [17:10:12] where they will have to ask us to create a vote >:D [17:25:26] I was afk [17:25:31] scrolling up [17:28:07] halfak: Are you saying the language refactoring will affect wb-vandalism [17:46:43] Amir1, yes [17:46:50] But only if you install it. [17:47:07] Any features we use that are language specific will be affected. [17:48:04] Amir1, I've also been working on a type library. This one is going to be the new base of the 'xml_dump' part of MediaWiki Utilities. See https://github.com/halfak/python-mwtypes [17:48:50] let me check :) [17:48:54] It seems this is different than pywikibase, but it's solving a somewhat similar problem. [17:49:12] e.g. field names are different between API/Dump/DB. [17:49:51] halfak: you can take pywikibot and use them [17:49:59] they have a very rich library on that [17:50:17] Yeah, I have mine too. [17:50:25] Been using them in the XML dump processing for a while. [17:51:22] I'm not saying you should import them, copy paste useful part [17:51:24] *parts [17:52:41] Yeah... I guess it will be interesting to see what else is covered. [17:53:55] halfak I am tempted to create a revert model for every wiki we are working with or will work with [17:54:16] ToAruShiroiNeko, I don't see a problem with that. [17:54:44] copmputation power will be sucked