[00:14:25] Amir1, just finished edit to the Methods with some notes. [00:14:35] I have to do other work for the rest of the night. [00:14:40] reading [00:14:45] And hang out with the wifey because it is her birthday :) [00:14:55] But I'll be back at it tomorrow -- in about 14 hours. [00:15:02] oh [00:15:15] Happy birthday to her :) [00:15:30] go and enjoy your day :) [09:25:48] (03CR) 10Hoo man: [C: 04-1] Add PopulateDatabase.php (0311 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [10:06:36] (03PS14) 10Ladsgroup: Add PopluateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) [10:07:27] (03CR) 10jenkins-bot: [V: 04-1] Add PopluateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [10:07:54] (03CR) 10Nikerabbit: Add PopluateDatabase.php (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [10:09:48] (03PS15) 10Ladsgroup: Add PopulateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) [10:25:45] (03CR) 10Hoo man: [C: 04-1] Add PopulateDatabase.php (034 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [10:52:35] (03CR) 10Ladsgroup: Add PopulateDatabase.php (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [10:56:50] (03CR) 10Hoo man: Add PopulateDatabase.php (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [11:08:59] (03PS16) 10Ladsgroup: Add PopulateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) [11:29:15] (03CR) 10Hoo man: "Can we be sure there's always at least one score per revision? Otherwise you might end up with having the same revision appear in each bat" (035 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [11:47:40] (03PS17) 10Ladsgroup: Add PopulateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) [12:03:01] (03CR) 10Hoo man: "Nit picks. Please also answer my cover-comment from PS16." (035 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [12:13:52] (03CR) 10Ladsgroup: "Oh, I haven't seen it. For all wikis except Wikidata, we have at least one score per edit but in Wikidata for non-main-ns edits. It return" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [12:19:55] (03PS18) 10Ladsgroup: Add PopulateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) [13:35:42] halfak: o/ If you're around. I'm trying to apply ACM format to the latex file [13:35:52] Running into trouble? [13:36:10] not yet [13:36:37] I need to add address of WMDE and WMF. Adding them right now [13:36:58] in the mean time can you check the paper draft page? [13:37:06] I made some notes [13:37:47] Amir1, just about to get on the train and head to the University. Will be back online in ~1 hour. [13:38:05] ok :) [13:40:56] (03CR) 10Hoo man: [C: 04-1] Add PopulateDatabase.php (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [14:13:07] ok, finished :) [15:08:38] (03PS19) 10Ladsgroup: Add PopulateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) [15:19:12] halfak: o/ [15:19:16] I finished mostly [15:19:21] just categorization [15:19:34] I'll use other papers for that [15:22:18] (03CR) 10Hoo man: [C: 04-1] "Some style stuff left, didn't test. I think the performance should be ok as well now, given we also use rc_id to shard." (035 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [15:29:00] (03PS20) 10Ladsgroup: Add PopulateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) [15:33:20] (03CR) 10Hoo man: [C: 032] "Should work for the use case of populating for a few (ten) thousand revisions. Manually tested by Ladsgroup." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [15:34:22] (03Merged) 10jenkins-bot: Add PopulateDatabase.php [extensions/ORES] - 10https://gerrit.wikimedia.org/r/268874 (https://phabricator.wikimedia.org/T123795) (owner: 10Ladsgroup) [15:36:54] halfak: around? [16:05:09] I go to get some rest [16:05:14] ~10 min. [16:05:24] halfak: ping me when you're around [16:15:52] o/ [16:16:07] Sorry to be late. Got to the university and had to meet with some people right away. [16:16:21] Will be reviewing the draft shortly. [16:16:46] thanks [16:16:55] I'm here if anything needed [16:17:13] I'll be generally working on paper until 0100 UTC. [16:17:27] woha [16:17:35] that's really good [16:17:45] after that I do the rest [16:17:53] submitting etc. [16:17:59] Amir1, can you write up a list of to-dos on what the paper needs? [16:18:14] In the mean time I do other things [16:18:25] E.g. "Intro is OK, but could use copy editing, Related work needs expansion about OSM paper" etc. [16:18:38] * halfak makes up to-dos [16:18:46] sure [16:18:56] LI was thinking of etherpad [16:20:18] Sounds good to me. [16:20:49] halfak: https://etherpad.wikimedia.org/p/KDD_to-do [16:20:57] Just started [16:21:40] checkout names ;) [16:21:44] in the etherpad [16:29:51] halfak: listed [16:30:13] btw. one of the most important patches just got merged [16:30:18] re the extension [16:30:28] we are still waiting for an input from Ops [16:47:38] halfak: Which category do you think suits better? 1 - "CCS → Information systems → Information systems applications → Decision support systems → Data analytics" 2- "CCS → Information systems → Information systems applications → Decision support systems → Online analytical processing" 3 - "CCS → Information systems → Data [16:47:38] management systems → Database administration → Database utilities and tools" [16:49:28] 1 - http://dl.acm.org/ccs/ccs.cfm?id=10003244&lid=0.10002951.10003227.10003241.10003244&CFID=747541115&CFTOKEN=65442149 2- http://dl.acm.org/ccs/ccs.cfm?id=10010843&lid=0.10002951.10003227.10003241.10010843&CFID=747541115&CFTOKEN=65442149 3- http://dl.acm.org/ccs/ccs.cfm?id=10003213&lid=0.10002951.10002952.10003212.10003213&CFID=747541115&CFTOKEN=65442149 [16:50:43] Not database management systems [16:51:02] I think Online analytical processing as a primary [16:51:11] And secondary "Data analytics" [16:51:15] okay sure [16:51:20] thanks [16:51:26] have you seen the etherpad [16:52:48] Yup. [16:53:05] Will get to that shortly. It looks good. I'll start working through my items. [16:53:24] Also I got DarTar to sign on for reviewing at 2300 UTC [16:53:26] thanks [16:53:34] awesome [16:53:41] this is really good [16:53:50] So, be prepared to rewrite sections based on his feedback. [16:59:22] okay :) [17:16:06] halfak: I'm done with categorization, right now I'm trying to convert svgs to EPS [17:16:28] Amir1, could also pull high res PNGs from commons. [17:16:33] EPS is more desirable though. [17:16:44] I can re-generate the figures if it proves to be painful. [17:16:52] no, it's easy [17:16:57] Great :) [17:17:01] converting from svg is easier [17:17:08] using inkscape [17:17:17] Gotcha. [17:45:25] halfak: all of files are changed to eps, I used one of them and it is okay [17:46:23] what else should I do now? [18:18:09] I got to go, be back in ten min. [18:32:03] back [18:32:07] Hey! [18:32:36] hey, please check the above messages [18:33:03] I'm done with eps files, I want to do something [18:35:07] I think a summary of the OSM paper for the related work. We should have it. [18:35:13] We can critique it too. [18:35:26] ok, I'm on it [18:35:46] anything else you need for the paper? [18:35:55] for now [18:35:59] Oh! Also the features table is now incomplete since I needed to adapt the feature sets to revscoring 1.0 [18:37:00] what features are missing now? [18:37:08] See https://github.com/wiki-ai/wb-vandalism/tree/sample_subsets/wb_vandalism/feature_lists [18:37:25] I think it is just the features in wikibase.user_rights [18:37:29] https://github.com/wiki-ai/wb-vandalism/blob/sample_subsets/wb_vandalism/feature_lists/wikibase.py [18:37:56] ok. Doing them too [18:39:10] halfak: please review the whole draft [18:39:16] and add/remove anything you want [18:39:37] change [18:39:44] Working on abstract [18:40:00] awesome [18:40:02] awesome [18:40:09] thanks [18:45:46] Just added abstract. [18:46:39] greta [18:46:43] *great [18:46:44] checking it [18:51:16] it's amazing [18:51:37] Working on tables in methods for the corpus now. [18:54:45] reading the OSM paper [19:03:20] halfak: I want to summarize the paper to you before I got to writing: the OSM paper collected data of blocked users and modifications they made and based on that they got into a rule-based system that consumes too much resource and still human review after flagging is "preferable" [19:03:53] what do you think should be added too? [19:14:32] Bah! Wikipedia is down [19:14:35] Can't edit. [19:14:36] >:( [19:14:56] same happened to me here [19:15:22] I'm working on the table [19:15:28] Re. OSM paper, make sure you make it clear how it is related. E.g. OSM, like Wikidata, is an open structured database. [19:15:44] okay [19:15:44] But unlike our work, they did not draw from the substantial history of vandalism detection in wikipedia. [19:16:14] they did not use machine learning. They barely describe their methods so that their work is not reproducible. [19:16:31] okay [19:16:35] great [19:17:12] In our work, we draw extensively from past work building high fitness vandalism detection models for Wikipedia, we use replicable training and testing strategies, and we use standard and intuitive evaluation metrics. [19:18:31] I might just copy paste that [19:20:48] Amir1, sounds good. [19:20:59] * halfak just funnels paper-ish words at Amir1 [19:21:12] :D [19:24:23] halfak: I think the features are too much for a table [19:24:24] it should be a list [19:24:26] and sublists [19:34:40] halfak: I updated the features list [19:34:56] I need to give them a better tone, but everything is okay beside that [19:35:54] Sounds good to me. [19:36:06] Will be a trick to get them in latex format [19:36:21] Might do something like "Statements (added/removed/changed)" [19:37:14] I don't think that would be too hard [19:37:56] I already did that for definitions: \begin{description} [19:37:56] \item[Labels] A name for the item (unique per language) [19:48:31] meta is sooooo slow [19:51:55] Amir1, just finished https://meta.wikimedia.org/wiki/Research:Building_automated_vandalism_detection_tool_for_Wikidata#Building_a_corpus [19:51:57] BRB [19:52:29] awesome [19:52:30] thanks [20:07:02] Amir1, FYI I'll be working on the quality in open production section next [20:07:14] Need to hop offline quick. Heading to a coffee shop [20:07:31] ok [20:07:42] I work on something in the mean time [20:39:35] o/ [20:46:37] halfak: I modified features table so it has a slightly better tone. Also added https://meta.wikimedia.org/wiki/Research:Building_automated_vandalism_detection_tool_for_Wikidata#Conclusion [20:46:52] which is basically empty know [20:46:54] *now [20:57:47] Still working on that intro sub-section [20:58:03] Amir1, are there any source-adding/checking wikidata game/tools that we should talk about? [20:58:10] Re. quality and process in Wikipedia. [20:59:04] one of them just got introduced [20:59:34] also we used kian to expose possible difference between wikipedia and Wikdiata [21:01:00] another one is "constraint violation reports" which gives people list of strange values of statements. e.g. one is a cat mentioned as husband of a human [21:01:15] *e.g. if a [21:01:19] halfak: ^ [21:01:49] Amir1, will save my edits in a moment and let you take a pass on it. [21:01:59] sure [21:02:02] It would be great to have a summary of these ways that Wikidata editors manage quality in practice. [21:02:52] have you written it? if not. I do [21:07:43] https://meta.wikimedia.org/wiki/Research:Building_automated_vandalism_detection_tool_for_Wikidata#Quality_in_open_production [21:07:47] Ready for your contribs [21:07:49] Amir1, ^ [21:07:56] amazing [21:08:10] I pulled in a lot of related work -- pushing us well past the 20 reference rule-of-thumb [21:08:12] On it [21:08:29] awesome [21:12:12] halfak: there are simple questions I asked in the article from you. check them plz [21:18:16] halfak: https://meta.wikimedia.org/wiki/Research:Building_automated_vandalism_detection_tool_for_Wikidata#Quality_in_open_production [21:18:24] a very basic paragraph [21:18:30] I think it's enough [21:19:05] please review, I barely read it [21:20:02] Amir1, these questions are inside the draft> [21:20:03] ? [21:20:14] yeah [21:20:24] for one of them just search halfak [21:21:35] halfak: the other is "We should probably mention SPO, subject-predicate-object triple. It's a common practice in knowledge bases" (search it) [21:22:15] last one is in last sentence of feature engineering [21:22:25] +1. Can you write that bit. I think it deserves a couple of sentences. [21:22:54] sure [21:22:58] Wikidata is roughly equivalent to common SPO/triplet stores, but extends upon them by having qualifications, references, etc. [21:23:18] what about first question? levee article [21:23:29] I removed the note and left it :) [21:23:36] I think it is a fine citation. [21:23:57] We might also include the Banning of a Vandal cite there too. [21:24:00] I'll add it. [21:24:56] awesome thank [21:25:04] thanks [21:26:56] I'm going to work in the results section for a bit and move the big red block of text. OK? [21:27:55] Amir1, ^ [21:28:03] Just making sure you aren't working in that section [21:28:14] * halfak wishes that the wiki did merges better [21:28:47] OK [22:09:59] Amir1, see https://meta.wikimedia.org/wiki/Research:Building_automated_vandalism_detection_tool_for_Wikidata#Limitations [22:10:02] :) [22:10:04] I feel better after writing that. [22:10:25] I saw it, it's f**king awesome [22:10:38] sorry for the language but WoW [22:10:50] :D [22:11:17] OK. There are some links in the context as reference. Like link to Wikisource [22:11:35] should it stay like this or we change to footnote [22:12:11] Amir1, generally I think that links like that should be footnotes. [22:12:25] Anything that we're not referencing as a written work. [22:12:44] okay, I will fix them [22:12:53] probably in the latex [22:12:53] Are you already converting to latex? [22:13:10] I have converted lots of them to latex [22:13:20] Oh! Just the refs you mean. [22:13:29] I was worried that you were converting the body copy./ [22:13:31] but I'm waiting to finish the paper and then I start [22:13:36] +1 :) [22:13:52] I converted references but only papers [22:14:11] I want to refactor the section order. OK to do now? [22:14:18] Will cause a conflict for the whole page. [22:14:24] sure [22:14:28] but one thing [22:14:30] https://meta.wikimedia.org/w/index.php?title=Research:Building_automated_vandalism_detection_tool_for_Wikidata#Related_works [22:14:55] the part related to Wikipedia is incomplete, the section in intro is much biggers [22:14:58] *bigger [22:15:02] what should we do? [22:15:09] Yeah. that's one of the things I want to refactor [22:15:11] :) [22:15:16] yay [22:15:18] awesome [22:15:30] I do some stuff related to latex while you're doing it [22:17:46] Amir1, do we have a length limit for KDD? [22:18:11] 10 pages [22:18:25] Does that include references? [22:18:48] I think so [22:18:53] let me dig [22:20:26] It seems it's for research track [22:20:56] "Papers are limited to 10 pages, including references, diagrams, and appendices, if any." [22:21:20] but this is for research track, there is exactly one section in applied data track but without this sentence [22:22:13] I'm dumb. It's also applied data track [22:22:21] so 10 pages refs. included [22:24:06] Gotcha. We're probably pushing the limit. [22:24:21] When it comes to submission time, you'll likely need to make some hard decisions about what to cut. [22:25:08] okay [22:25:17] I try [22:25:41] one of the things that seems important but not in good shape is "conclusion" [22:26:35] their format is smaller than usual (9pt) so we have more [22:27:30] we have ten hours right now :) [22:33:27] I'll look at conclusions next. [22:33:37] A lot of the related work needs to be turned into prose. [22:35:39] okay [22:35:58] let's do it once you're done [22:45:17] Amir1, just finished moving content [22:46:34] yay [22:46:36] let me check [22:47:35] re. related works. That's much better now [22:48:45] halfak: what should I do now? [22:52:09] right now I'm adding refs to the refs.bib [22:59:30] Take an editing pass. Pick a section [23:00:26] halfak: before I go on: "pick a vandalism paper" is a real ref? [23:00:36] (search) [23:03:12] I'd go with related works section. Tell me when you're ready [23:03:14] halfak: ^ [23:05:18] rock on [23:07:15] sorry partially afk for a bit [23:07:49] it's ok [23:18:52] halfak: https://meta.wikimedia.org/w/index.php?title=Research:Building_automated_vandalism_detection_tool_for_Wikidata&diff=15344826&oldid=15344773 [23:20:07] also, halfak should reference to Kolbe article in signpost as ref.? [23:20:12] It seems [23:20:33] Yup. I think so [23:20:59] ok [23:21:03] On it [23:31:17] done [23:34:18] o/ Just got back to properly being here [23:34:39] I'm editing the corpus section halfak [23:34:42] Was working with one of my other collaborators (kjshiroo) [23:34:45] please check my edit [23:34:57] Will do [23:35:01] cool [23:38:41] halfak: ok, I made another edit [23:38:50] on building a corpus [23:39:25] note "English Wikipedia" is red, please add ref to it [23:39:51] the ref is non-existent [23:43:38] ok. I'm afk for ten min. [23:51:39] Working on related work [23:52:04] Amir1, when you get back, can you help me find the 2008 paper on wikipedia vandalism? [23:52:21] Nevermind. Got it. [23:56:01] ok, back