[00:00:07] Related: How many files contain feature defs? I noticed some but not all are in feature.diff [00:00:32] All of the files in features/ [00:00:36] except feature.py [00:00:38] ok [00:00:40] and modifiers.py [00:00:46] and __init__.py [00:00:50] :D [00:00:56] I've got to run. I'll ping ellery tomorrow to see if he has time to sit down with us. [00:01:01] See you at the meetings. [00:01:11] ok [00:01:15] ttyl [00:01:23] Oh yeah, BTW: https://meta.wikimedia.org/wiki/Research_talk:Revision_scoring_as_a_service#Media_coverage_for_Revscoring.2FORES [00:01:23] o/ [00:01:48] I’m down to brainstorm how to fix the problem described in the phab ticket! [00:02:02] :D! [00:02:07] ellery: hello [00:02:34] We have a hack session Saturday morning. You're welcome to join. [00:03:05] * halfak runs away for real now. [00:03:48] I’m hosting a brunch at my house on Saturday. Thank you for the invitation though, I would like to attend a session in the future [00:06:15] Cool [00:30:47] Where can I find the datasets used to generate the models that ORES is using? [00:32:02] hi seancron! [00:32:06] someone asked on https://github.com/wiki-ai/ores/issues/106 too [00:32:18] I don't know the answer - halfak probably does but he's away for the day now [00:33:13] Thanks yuvipanda [00:33:19] Closest I've been able to find is https://github.com/wiki-ai/ores-wikimedia-config/blob/master/Makefile [00:33:40] And http://datasets.wikimedia.org/public-datasets/enwiki/reverts/ [00:33:50] yeah I think the makefile is correct [00:33:55] not sure exactly where teh data ist hough [00:33:57] *though [16:52:26] * aetilley stumbles out of bed. Inserts caffein IV.... [16:53:27] https://www.youtube.com/watch?v=jX3iLfcMDCw [16:59:12] o/ aetilley [17:12:39] aetilley, BTW, I'm going to be 15 minutes late to the meeting today. [17:12:44] Sorry for the trouble. [17:13:08] I had another meeting dumped on me. Usually these are recorded, but not this time, so I can't watch it later :/ [17:13:54] * yuvipanda dumps more meetings on halfak [17:16:14] ok [17:16:29] no problem. [17:19:17] yuvipanda, won't work. no space. My calendar flow'th over. [17:19:38] halfak: I shall double book and triple book you! [17:19:38] geometric sequencing algorithms goddamn.... [17:20:06] yuvipanda, only if you book me for hack sessions :D [17:20:18] the randomness fed into itself, via a source of randomness (e.g. some random part of the internet), and unpredictable numbers (e.g. pi, etc) [17:20:23] halfak: +1 [17:20:26] I'm confused and lost here [17:20:35] I need this formula... [17:21:12] jenelizabeth__, sorry. no insights there. It sounds like a stochastic strategy. Must be a cost/fitness function somewhere? [17:21:24] ^ \o/ [17:21:30] this is all manual diy, and yeah true... [17:21:49] I want colonies of cells that follow rules given to them, i.e. a stability there [17:22:11] but then I also want the next best step to be of random step from the colonies beneath them [17:22:15] if that makes sense [17:23:08] irrational numbers... can endlessly have a predictable pattern [17:23:14] they're not random [17:23:17] jenelizabeth__, shoulds like a cellular automata [17:23:26] a bit yeah [17:23:33] but I want organisms... look at nature [17:23:36] what it produces [17:23:39] cellular ottomata [17:23:40] the random plant formations [17:23:41] etc [17:23:59] cellular automata doesn't explain these phenomena sadly [17:24:09] Hmm... [17:24:12] or either too much computing power is required that is unavailable [17:24:17] so modifying these formula [17:24:20] A neural network is a cellular array. [17:24:26] I think that the pattern generalizes. [17:24:34] I think we could accomplish natural life, the way life should evolve and develop lol [17:24:45] I'm not asking for super smart AI, this is just structuring [17:24:49] scaffolding [17:24:57] * halfak runs away to meeting stuff [17:24:57] sorry! [17:25:21] * jenelizabeth__ needs to dye her hair a reddish color [17:29:22] jenelizabeth__: OMG YAY FOR HAIR COLORS! [17:29:26] * YuviPanda too has red now [17:31:17] http://i.imgur.com/tvhsmMu.jpg [17:31:24] I can't remember what color it was [17:31:28] nor how I got that hair style [17:31:38] wore half makeup for some reason, lol! [17:31:41] o/ Amir1 [17:31:48] I'm going to be 15 minutes late for the meeting. [17:31:51] Please start without me [17:31:53] http://imgur.com/Tl9FzK9 [17:31:56] ToAruShiroiNeko, ^ [17:32:06] http://imgur.com/bry0cZP bit more natural [17:32:21] jenelizabeth__: https://twitter.com/yuvipanda/status/630151634560024576 was mine [17:33:06] nice! [17:33:10] very red [17:33:17] lots of red like hot sun lava red [17:33:23] jenelizabeth__: :D [17:33:25] yes [17:33:28] http://i.imgur.com/8bJaBO8.jpg natural hair [17:33:31] Amir1: Heya [17:33:32] no makeup [17:33:43] yay [17:33:43] http://i.imgur.com/8Db4IXz.jpg ditto here too [17:33:45] aetilley: o/ [17:33:46] meat ing [17:33:50] ie throwing meat [17:33:57] ToAruShiroiNeko: bonjour [17:34:03] I dunno, some lesbian girl friend suggested I go for blonde hair [17:34:05] bonsoir [17:34:11] haha [17:34:19] indeed [17:34:46] Would I honestly suit blonde hair? [17:34:53] my eyes are hazel [17:35:14] so greenish depending on lighting,like they look naturally green depending on the lighting [17:35:22] mostly brownish though [18:04:54] jenelizabeth__ is there any specific question you have regarding AI? [18:52:17] o/ ToAruShiroiNeko [18:52:27] can you also take a look at https://phabricator.wikimedia.org/T119928 and add a to-do list. [18:52:48] bmansurov will likely be an excellent contributor if you tell him what steps to take :) [19:02:20] I'm back halfak [19:02:29] shall we use telegram? [19:05:10] yes [19:05:18] Amir1, ^ [19:27:25] Amir1: Howdy. [19:27:38] thanks [19:27:58] ? [19:28:04] :D [19:28:12] not bad [19:28:19] I've got to go [19:28:27] be back in ~ten min. [19:28:48] k [19:44:44] halfak: Yesterday you told me to check out [19:44:57] revscoring.languages.english.revision.words(or .content_words) [19:45:08] So in english.py I see [19:45:10] .. autoattribute:: revision.words [19:45:13] .. autoattribute:: revision.content_words [19:45:37] Is this supposed to get me the raw text of a revision somehow? [19:46:51] Pardon my pygnorance. [19:54:24] aetilley, no worries. SO I should have said revision.content_words_list. [19:54:37] That returns an array of the words in the content bits of the page. [19:54:58] (Which is probably what you are looking for) [19:55:05] * aetilley looks at this [19:55:55] Conversely, ...revision.words_list returns a list of all word-like tokens in the page. This includes a lot of non-content things like template names: e.g. {{foo}} is tokenized to "{{", "foo", "}}" and "foo" is word-like. [19:56:12] I'm back [19:56:18] aetilley: let's work [19:57:58] halfak: I don't see anything like that in english.py [19:58:18] Check out the class it inherits from [19:58:32] https://github.com/wiki-ai/revscoring/blob/master/revscoring/languages/space_delimited/revision.py [20:02:54] it? [20:03:26] halfak: I guess I'll have to come back to this. [20:03:29] Amir1: hello [20:03:43] it = "Revision" [20:04:08] So you pointed out that Kian is in python2. Is it meant to be a stand-alone tool? [20:04:38] it can be used as a stand-alone tool [20:05:01] (I have to make it compatible with both versions) [20:05:26] btw: the core "should" be compatible with 2 & 3 [20:06:33] ok, I tried running it in debian-jessie but the problem might have been related to no having MySQL [20:06:52] (deb-jessie venv) [20:07:03] you should not use the feature extraction part [20:07:11] No this was just importing Kian [20:07:53] Let me try again and I'll paste the message. [20:08:08] okay [20:09:37] yeah [20:09:48] "No module named 'MySQLdb' [20:09:50] " [20:10:33] Are you running? from kian import Kian [20:10:39] yes [20:10:53] That's what causes that import error [20:12:12] it shouldn't happen [20:12:25] copy/paste the full error [20:15:43] http://pastebin.com/yT2TWkRw [20:18:45] aetilley: you can either install mysqldb using pip [20:19:22] or remove "from .trained_model import TrainedModel" and "from .parser import ModelWithData" [20:19:28] from __init__.py [20:21:36] I recommend getting MySQLdb installed [20:22:01] (or pymysql, since MySQLdb doesn't do py3 (last I checked)) [20:22:07] I already tried the former, and it gives me another dependency error [20:22:20] (ConfigParser) [20:22:35] ok, let me try pymysql [20:22:40] http://stackoverflow.com/questions/17599830/installing-mysql-python-on-mac [20:22:48] pymysql won't fill the requirement until Amir changes the code [20:23:03] But yeah, pymysql is better :) [20:23:50] yeah [20:24:09] I'll do the python3 for kian asap [20:24:36] Yeah, apparently I already have pymysql [20:25:05] pymysql is part of my mediawiki-utilities suites. [20:25:06] Sorry for the drive-by collaboration... but if anyone wants to comment on https://phabricator.wikimedia.org/T107723 it would be much appreciated. [20:25:06] Amir1, rest your hand. [20:25:10] :P [20:25:25] Anyway, we can talk Kian later. I just wanted to pick your brain about where you think we should go with this bag-of-words stuff. [20:25:26] awight, yes! [20:25:27] +1 [20:25:30] Amir1: ^ [20:25:43] * YuviPanda provides more love for awight [20:25:49] Sure halfak :) [20:26:20] * awight swoons [20:26:35] aetilley: I'll write a detailed email (using voice typing) for you [20:26:42] send you some links [20:27:24] Ok. I've been learning nltk. Very cool stuff, but I'm not sure exactly how to apply it in our situation. [20:27:27] awight, commented. Thanks for pushing on that. [20:27:41] * halfak needs to clean up and categorize our proposed work so that we can better prioritize. [20:27:43] great, I'll peck away at it in between urgent real-time things :p [20:27:51] real-life things I meant [20:27:59] awight, awesome. Thank you. :D [20:28:16] Livin' the real-time-life [20:28:44] awight, You might also be interested in https://phabricator.wikimedia.org/T120138 [20:29:09] thanks! [20:29:51] Random thought on the Phabricator workboard... It might be productive to create some tracking tasks and epics for the next few phases of the project, and link blocking tasks so we have more context hints. [20:30:11] * awight pushes halfak under the Product bus :p [20:30:56] awight, this is a good point. It's sort of my job(role?) to get that organized and so that our board remains useful. [20:31:37] Being a fancy pants public scholar this week has been seriously disruptive. Good disruptive. [20:31:52] But still. :/ [20:32:00] * YuviPanda sends more press enquiries to halfak [20:32:29] I need to put them on conference calls. A virtual press room where I can say "no comment". [20:32:51] halfak: make sure they don't record the call [20:32:59] lol [20:34:59] halfak: And try to avoid words like skynet [20:35:52] Ha. Everyone wants to ask me about the skynet takeover. [20:35:53] lol [20:36:02] :) [20:36:11] My response -- We're not really innovating on the algorithms side of things. [20:36:25] (yet) [20:37:01] We're not building robotic editors. We're building information theoretic super suits that make it easy to so something but prevent and obstruct other things. [20:37:55] Nice. [20:41:30] careful about https://en.wikipedia.org/wiki/Land_Warrior [20:41:47] I hate seeing military metaphors accidentally deployed (e.g. "boots on the ground") [20:42:35] awight, good point. [20:42:43] super suit != cyborg war suit, but it's close. [22:01:06] halfak: Do I need to translate stop words of Persian? [22:01:24] Amir1: first edits made to wiki from PAWS! https://i.imgur.com/PojxutV.png [22:01:30] Amir1, if you don't mind. I don't think it is a big rush. [22:02:03] It's a nice-to-have [22:03:47] YuviPanda: \o/ [22:03:49] \o/ [22:04:05] I'll do more once I got rid of this [22:04:49] \o/ [22:05:08] halfak: sure thing :) [22:05:16] Just sent you an email [22:18:56] Amir1, fun story. So when I "prelabel" the edits from wikidata's biased sample, we filter out nearly 50% of the reverted (and probably damaging) edits. [22:19:09] They are reverts of edits by privileged users! [22:19:25] And a quick spot-checking suggests that they are not damaging. [22:19:44] because people use auto tools to do edits in wd [22:19:50] So... we have yet another filter that we can apply to "probably damaging reverted edit" detection. [22:19:56] and sometimes they make a mess [22:20:18] Yeah. Probably "damaging" but not "vandalism" [22:20:18] +1 [22:20:43] it's not our job to catch that [22:20:49] Yeah. [22:21:18] So... maybe we're doing a bad job by filtering edits by priv'd users before Wiki labels [22:21:36] But my sense is that we should bring this filtering to our "probably damaging reverted labeler" [22:22:05] I think this is why we catch so many merges. [22:22:17] Merges get reverted all the time! [22:22:25] no [22:22:34] merges don't get reverted [22:22:44] it's almost impossible to revert a merge [22:22:53] but I get your point [22:23:23] mostly people use tools to add claims to wd [22:23:29] like autolist [22:23:48] and lots of times it gets reverted [22:24:30] See https://www.wikidata.org/w/index.php?title=Q685365&diff=prev&oldid=142045049 [22:25:08] So I guess the edit produced by the merge was reverted. [22:26:28] it's uncommon but not happening at all [22:26:38] *not not [22:26:43] :D [22:26:57] some merges are easy to revert [22:27:05] but some of them are not [22:27:12] anyway [22:33:22] Gotcha [22:48:07] So we have ~4500 edits [23:06:21] halfak: http://ores-staging.wmflabs.org/scores/wikidatawiki/reverted/?revids=269077027|269077025|269086457|251530750|243937491|269090263|269093456|257856652|237999679|210649590|269186604|253584599|269609233|270093221|274775730|275681784 [23:06:27] last edit [23:06:43] NotImplementedError: Failed to process : monolingualtext datatype is not supported yet. [23:06:56] are we using the most recent version of pywikibase? [23:07:37] https://www.wikidata.org/wiki/Wikidata:ORES/Report_mistakes#First_improvements [23:07:39] maybe not. Can you check wb-vandalism [23:07:46] sure [23:08:00] and are we using user.age? [23:08:26] some edits got high prob. even though they made by an old user [23:08:38] We are. Not sure what's up? [23:10:11] let me test [23:11:39] halfak: I think I know [23:11:56] we didn't release a new version of wb-vandalism [23:12:07] not in pypi nor github [23:12:14] Hmmm... Yeah. Good point [23:12:15] Pypi [23:12:23] is it affecating our work? [23:12:29] maybe [23:13:17] Can we release a new version and see what happens? [23:13:31] specially on ores-staging [23:13:48] I release it, you do the rest :D [23:13:58] kk will do. [23:14:05] thanks [23:17:07] halfak: did you get the ores pages? [23:17:10] there was some flappint [23:17:14] yeah [23:17:16] saw that [23:17:24] what was it? [23:17:44] not sure [23:17:50] it re-appeared quickly enough [23:17:59] I'll try to catch the next one [23:18:03] 2 second syeah [23:19:51] halfak: Done [23:19:56] try again plz [23:20:05] 0.1.6 [23:20:30] gotta run. Will do that later tonight. [23:20:32] Sorry [23:20:57] it's okay [23:20:59] :) [23:21:44] does this mean I get to have Amir1 try out paws? :D [23:22:03] sure [23:22:11] let me try [23:22:21] YuviPanda: can you give me a guide? [23:22:42] Amir1: sure! [23:22:51] Amir1: https://www.mediawiki.org/wiki/User:John_Vandenberg/GCI_walk-through :) [23:27:34] YuviPanda: http://imgur.com/hlJFayu [23:27:37] \o/ [23:28:13] Amir1: \o/ [23:28:24] I've some notes [23:28:41] I'll share it with you today [23:28:57] Let me open a speech rec. device [23:29:16] Amir1: \o/ ok [23:34:57] YuviPanda: do you have telegram so I can send you voice message? [23:35:06] Amir1: yes [23:35:10] Amir1: 9176315499 [23:35:28] thanks