[14:11:09] o/ [14:23:33] \o_ [14:46:08] o| [14:46:17] I don't know what that represents [14:49:22] guillom: hey, I just wanted to let you know in metadata clean up we moved from 90% to 95% (for fawiki) [14:49:41] Amir1: \o/ That's great! Good job :) [14:49:50] I have deleted more than several hundred images [14:50:14] it helped me to find bad licensed images, fair use violations, etc. [14:50:20] Less yay… But needed I guess. [14:50:34] so it is super helpful for us, thus thank you [14:50:37] Emufarmers: "banging head against the wall"? [14:51:14] :D [14:51:20] Amir1: I'm glad my ugly experiment in Python coding ended up being useful and not horribly breaking two weeks in. [14:51:59] yeah, it super awesome that's very bootstrap [14:52:08] * guillom has not really touched Python since, apart from tinkering a bit with static sites generators. [14:52:16] working for itself without us being aware of [15:10:29] guillom: I think that would look more like o| o\ o| o\ o| o\ ō [18:42:44] DarTar, QR stuff on line 26 https://etherpad.wikimedia.org/p/revscoring [18:43:19] * halfak sends email so this doesn't get lost [18:43:59] halfak: wonderful, thanks [18:46:54] Given that wikileads is this week, I'll wait until next week to show you the really cool new ORES features. [18:46:56] DarTar, ^ [18:47:20] Will be late next week due to my own vacation time [19:00:46] halfak: sounds good [20:48:51] halfak, do you know of any python libraries that combines the functionality of mwparserfromhell with word ownership? So something like node ownership? Seems like a longshot. [20:49:06] heh no. [20:49:16] :( I can hope [20:49:16] But you can put arbitrary tokenizers into mwpersistence [20:49:28] hm [20:51:02] I was thinking of adding some functionality to Wikichatter to take in dump files so that we can have known ownership rather than inferred. [20:51:03] I'm struggling to think about how to define node-level ownership [20:51:24] If you modify a paragraph, do you now own it? [20:52:04] Maybe shared ownership? [20:52:25] :P [20:52:41] How do you know a node is the same one after it has been changed? [20:52:56] Why not just parse the nodes after generating regular persistence information? [20:53:08] shared ownship! tada! [20:53:13] ;) [20:53:52] Once I have the persistence information how can I reassociate it with the wikicode nodes? [20:54:44] The wikicode nodes have offsets [20:55:11] That could work [20:56:52] In a lot of ways, I think inferred ownership is more desirable [20:56:59] *apparent ownership [20:57:12] But it'll be interesting to see what you find. [20:58:46] That would still offer additional accuracy compared to signatures. [20:59:05] Although there would be a heavy time trade off [20:59:34] Yeah. That's a worrysome component. [20:59:48] But I know that Ellery has been generating diffs of all talk pages. [20:59:58] (and I see his jobs running on the analytics servers) [21:00:06] So I imagine he can save you a lot of time with that. [21:00:13] Regretfully he's rarely online. [21:00:32] DarTar, do you know where Ellery is storing talk page diffs? [21:01:18] he’s working on hadoop but I wouldn’t be able to point you to where the data is. He’s flying at the moment [21:01:35] DarTar, he's running the diffs on stat3 [21:01:41] My old code. [21:01:49] Is he Ellery on wikipedia? [21:01:50] I can see the processes [21:01:55] Likely not. [21:02:00] Not sure if he has a wikipedia account [21:02:01] * halfak looks [21:02:06] yes [21:02:22] if you mail him he should be able to respond [21:02:23] https://meta.wikimedia.org/wiki/User:Ewulczyn_(WMF) [21:02:33] ^ includes his email [21:02:54] thanks [22:36:02] halfak: how much work is it to work through all the revisions of a page for analysis purposes?