[03:36:27] 10Scoring-platform-team, 10Wikilabels, 10editquality-modeling, 10artificial-intelligence: Change "yes/no" in damaging_goodfaith form to "damaging/good" and "good-faith/bad-faith" - https://phabricator.wikimedia.org/T171493#3466879 (10He7d3r) See also https://github.com/wiki-ai/wikilabels/issues/71. [03:40:37] 10Scoring-platform-team, 10Wikilabels, 10editquality-modeling, 10artificial-intelligence: Change "yes/no" in damaging_goodfaith form to "damaging/good" and "good-faith/bad-faith" - https://phabricator.wikimedia.org/T171493#3499752 (10Zppix) @He7d3r Closed, thanks [10:11:27] 10Scoring-platform-team, 10Wikilabels, 10User-Zppix: Wikilabels should authenticate on the right wiki - https://phabricator.wikimedia.org/T166472#3500441 (10Tgr) How many ORES users have ever visited Meta and set their preferred user language there? A tiny fraction, I'd guess. [10:14:40] 10Scoring-platform-team, 10User-Zppix: Early Aug 2017 Wikilabels Deployment - https://phabricator.wikimedia.org/T172332#3500446 (10Tgr) [10:14:42] 10Scoring-platform-team, 10Wikilabels, 10User-Zppix: Wikilabels should authenticate on the right wiki - https://phabricator.wikimedia.org/T166472#3500444 (10Tgr) 05Resolved>03declined Changing status to declined which better reflects the outcome. [13:54:35] o/ [13:54:50] * halfak is in the great north of MN today. [13:54:59] Working from my mom's living room. [14:03:19] the great north of MN, in the great north of the US [14:03:23] it's very northy. [14:33:03] harej halfak you carn't get any norther then northampton heh (lol) [14:33:09] that's a joke :) [15:05:12] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3501364 (10Ottomata) Hey yall, I just bike shed revision-score event schema with @halfak for a while, we had some insights. F... [15:05:23] You know, I think Northampton *is* more north than all of MN. [15:05:36] When I think of northness, I usually think about coldness in the winter. [15:05:50] * halfak puts on his -40C badge [15:06:15] Actually, I've seen -60C, but -40 is the coldest it gets in a typical winter here. [15:06:40] Sorry, that was -60F, which is only -51C [15:11:11] Yeah, MN is somewhat surprisingly far south compared to Europe :) [15:19:18] halfak uk [15:19:27] not the one in the us :) [15:19:38] There's one in the US? [15:20:11] yes [15:20:22] halfak most uk towns and city's are in the us [15:20:28] most are right near new york [15:20:33] Thieves! [15:20:44] Quit taking our cities [15:20:50] Wait... probably the other way around [15:21:46] lol [15:21:47] haha [15:21:52] it's the other way around [15:22:23] halfak there's a boston here too [15:22:24] halfak and washington [15:22:25] and new york [15:23:02] https://en.wikipedia.org/wiki/New_Amsterdam [15:26:25] heh [15:27:30] “Old New York was one New Amsterdam… why they changed it I can’t say, maybe they liked it better that way?” :D [15:27:40] apparently I’m only here for comedic relief today, sorry! [15:28:44] (They Might Be Giants’ “Istanbul (Not Constantinople)” btw) [15:30:32] lol [15:33:47] there's about to be some noise from icinga due to puppet being disabled in labs [15:33:53] for the pending move to the new puppet master [15:34:41] PROBLEM - puppet on ores-redis-01 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 29 minutes ago with 0 failures [15:34:42] PROBLEM - puppet on ores-web-03 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 17 minutes ago with 0 failures [15:35:00] PROBLEM - puppet on ores-worker-05 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 2 minutes ago with 0 failures [15:35:25] PROBLEM - puppet on ores-web-05 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 10 minutes ago with 0 failures [15:36:20] PROBLEM - puppet on ores-redis-02 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 4 minutes ago with 0 failures [15:36:20] PROBLEM - puppet on ores-lb-02 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 12 minutes ago with 0 failures [15:37:02] PROBLEM - puppet on ores-worker-07 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 17 minutes ago with 0 failures [15:37:10] PROBLEM - puppet on ores-worker-06 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 25 minutes ago with 0 failures [15:37:23] PROBLEM - puppet on ores-worker-09 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 30 minutes ago with 0 failures [15:37:32] PROBLEM - puppet on ores-worker-08 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 23 minutes ago with 0 failures [15:37:43] PROBLEM - puppet on ores-worker-10 is WARNING: WARNING: Puppet is currently disabled, message: disabled during transition to new puppet master 2017-08-04, last run 2 minutes ago with 0 failures [15:45:00] 10Scoring-platform-team-Backlog, 10Wikimania-Hackathon-2017: How can I get ORES in my wiki? - https://phabricator.wikimedia.org/T170015#3501454 (10Halfak) [15:45:19] 10Scoring-platform-team-Backlog, 10Wikimania-Hackathon-2017: [Workshop] How can I get ORES in my wiki? - https://phabricator.wikimedia.org/T170015#3416777 (10Halfak) [15:45:41] 10Scoring-platform-team-Backlog, 10Wikimania-Hackathon-2017: [Workshop] How can I get ORES in my wiki? - https://phabricator.wikimedia.org/T170015#3416777 (10Halfak) a:03awight [15:46:18] 10Scoring-platform-team, 10Wikimania-Hackathon-2017: [Workshop] How can I get ORES in my wiki? - https://phabricator.wikimedia.org/T170015#3416777 (10Halfak) [15:46:53] 10Scoring-platform-team-Backlog, 10Wikimania-Hackathon-2017, 10Documentation: [Wikimania doc sprint] docs on how to install ORES - https://phabricator.wikimedia.org/T170506#3433863 (10Halfak) [15:48:09] RECOVERY - puppet on ores-lb-02 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [15:48:47] halfak: I’m curious about endorsements. It’s just a comment on a judgment? [15:49:38] awight, right. I think that when we think about this relationally, we don't want to duplicate "judgement" for every user who agrees. [15:49:38] Why doesn’t the second reviewer just provide a fresh judgment, with or without the quantitative data? [15:49:52] And right now, we're spec'ing this relationally [15:50:29] I made a note about this before looking too hard, starts “In what circumstances..." [15:50:46] * halfak got Nettrom [15:50:53] 's joke [15:51:01] But just didn't see it as I was phab'ing some stuff [15:51:11] awight, link handy [15:51:12] ? [15:51:17] https://etherpad.wikimedia.org/p/meta_ores_schema [15:51:30] hehe yeah I believe I lived in New Amsterdam for a minute [15:51:55] RECOVERY - puppet on ores-worker-06 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [15:52:03] RECOVERY - puppet on ores-worker-07 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:52:06] RECOVERY - puppet on ores-worker-08 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [15:52:07] So my thought about endorsements is that they’re identical to judgments [15:52:21] RECOVERY - puppet on ores-worker-09 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:52:28] & we’ll often want to view all of the comments as a timeline [15:52:34] seems most natural to use one table [15:52:47] RECOVERY - puppet on ores-worker-10 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [15:53:16] we could just have a judgment_id nullable column [15:53:33] also, I was ruminating on the artifact.onwiki_id foreign key if you’re interested. [15:53:44] 10Scoring-platform-team, 10Wikimania-Hackathon-2017: [Workshop] How can I get ORES in my wiki? - https://phabricator.wikimedia.org/T170015#3501478 (10Halfak) [15:53:52] :D [15:54:20] 10Scoring-platform-team, 10Wikimania-Hackathon-2017: [Workshop] How can I get ORES in my wiki? - https://phabricator.wikimedia.org/T170015#3416777 (10Halfak) [15:55:03] tl;dr about onwiki_id, although a graph might be fun, I think the practicality of directly joining to wiki tables will be a win for any real analysis we want to do. [15:55:04] awight lol :) [15:55:20] when i search up news for northampton (snow) it always has to bring up the us one [15:55:21] heh [15:55:42] Small queries are probably * get JaDE by (wiki, type, onwiki_id), and * get recent JaDE judgments [15:56:04] 10Scoring-platform-team-Backlog: Build mid-level WikiProject category training set - https://phabricator.wikimedia.org/T172321#3501480 (10Halfak) [15:56:24] but big queries would be like, intersect articles in wikiproject purview against judgments, etc... [15:56:24] 10Scoring-platform-team-Backlog, 10Research Ideas: Create machine-readable version of the WikiProject Directory - https://phabricator.wikimedia.org/T172326#3501482 (10Halfak) [15:56:45] 10Scoring-platform-team-Backlog: Efficient method for mapping a WikiProject template to the WikiProject Directory - https://phabricator.wikimedia.org/T172325#3501486 (10Halfak) [15:57:14] 10Scoring-platform-team-Backlog, 10editquality-modeling, 10revscoring, 10artificial-intelligence: Get signal from adding/removing images - https://phabricator.wikimedia.org/T172049#3501487 (10Halfak) [15:57:29] halfak: Why would “artificial-intelligence” be removed? Is that just a triage tag? [15:57:54] (if so, maybe we should mention that on the tag description page? https://phabricator.wikimedia.org/project/profile/2454/) [15:57:56] Oh! because the parent task already appears in that list [15:58:06] Just cleaning up the AI wishlist [15:58:15] which is what I like to use the AI tag for [15:58:16] RECOVERY - puppet on ores-redis-01 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:58:19] * halfak reads scrollback [15:58:22] RECOVERY - puppet on ores-redis-02 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:58:23] ah [15:58:47] lol #arty [15:59:13] RECOVERY - puppet on ores-web-03 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:59:23] RECOVERY - puppet on ores-web-05 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:59:31] RECOVERY - puppet on ores-worker-05 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [15:59:35] I made a tiny tweak to the tag description. [15:59:46] awight, what is this "onwiki_id"? [15:59:52] a bad name [16:00:03] Also, yeah, duplication of data is generally not done in relational representations. [16:00:10] I don't think it helps us to denormalize at this point [16:00:19] ^ re judgement vs. endorsement [16:00:40] but I wanted to distinguish between artifact.id (primary key, autoint) and the entity’s id on wiki [16:01:07] Right... I think that's already distinguished isn't it? [16:02:33] how? [16:02:44] there was no primary id, so I wanted to make it explicit [16:02:51] I don't think I know what you are talking about [16:02:55] What's an onwiki [16:02:58] ? [16:03:10] there’s one id which is autoincrement, primary [16:03:17] that’s the one that we link to from judgment [16:03:43] the onwiki_id is e.g. the page_id or revision_id [16:03:46] What is it though? [16:03:57] Oh! the identifier [16:04:00] That was there wasn't it? [16:04:07] nah, must have gotten deleted [16:04:25] Oh... Hmm... maybe "id" was it. [16:04:58] Wait... no actually that was right. [16:05:13] hehe mande? [16:05:24] I see now. I think we should have 'artifact' be an 'artifact_type' [16:05:30] E.g. "enwiki", "revision" [16:05:47] show me? [16:05:52] The judgement will have an 'artifact_type_id' and an 'artifact_id' [16:06:09] KShit. late to meeting [16:06:10] brb [16:06:44] ah gotcha, that makes sense [16:07:07] rather than a freeform string or exotic enum column [16:08:16] Meanwhile, I’m not sure I follow the denormalization point. denormalizing refers to collapsing endorsement and judgment, I think? [16:08:48] That strikes me as the opposite of duplicating data, though, so there’s something I’m missing. [16:10:16] ha. back. Was no big deal [16:10:49] awight, collapsing endorsement and judgement would be denormalization [16:11:08] Because then judgements that contain the same -- er. judgement would have duplicate data in multiple rows. [16:11:33] By having endorsements, we would keep the judgement data in judgement and have the many-to-one relationship in "endorsement" [16:12:10] E.g. if I think this edit is "good" and you think this edit is "good" [16:12:54] It's better to have judgement([1, "good"]), endorsement(["awight", 1], ["halfak", 1]) than ... [16:13:13] endorsement(["awight", "good"], ["halfak", "good"]) [16:13:35] Especially if we'll have other information about a judgement such as it's preferred status or something like that [16:14:05] From a user-interface perspective, the user would see the same thing. [16:14:31] because they are logically equivalent [16:19:10] Hey fajne [16:19:26] aha I see where you’re going [16:19:40] I don’t think endorsement should work like that [16:19:59] e.g. I want to be able to comment on someone else’s judgment w/o endorsing [16:20:06] pure critique [16:24:59] I mean, we can try anything, I assume this first iteration will be to show users a proof-of-concept, and we can refine the whole “voting” schema [16:25:39] I think I’m starting to appreciate what you’re suggesting, that people could “+1” and nothing more [16:26:25] but IMO that’s not a vote for anything. and +1’s are sort of useless in the real world, no? [16:27:05] Perhaps we should encourage either * casting your own judgment, or * saying something, which is not quantitative. [16:28:18] Doesn’t feel like we’re duplicating data to have people make the same judgment as someone else. With a good-faith scale, for example, it’s just a boolean value, which doesn’t need normalization. [16:28:50] mmm I’m realizing that people might already want to make judgments on multiple scales at once. [16:29:06] e.g. wikilabels would be use the [damaging, good-faith] scales [16:29:18] s/be/ [16:29:21] / [16:29:23] sigh [16:32:09] awight, commenting on someone else's endorsement is meta-meta-ORES and I don't think our schema supports that either way. [16:32:32] This is a schema that is commonly used for tags. [16:32:57] E.g. you might have 500 people who tag a movie with "horror", but you only store the string "horror" in the database once. [16:33:06] But for every user, it looks like they are adding a new tag. [16:35:03] awight, I think you're thinking about this by the use-case that the data model seems to imply and not the use-cases that the data model affords. [16:36:36] I'm just normalizing, not making an assertion about how people should submit judgements. [16:36:38] :) [16:41:04] For tags like “horror”, that’s an alternative to having an enum. By that argument, we would actually keep quantitative judgment blobs in a separate normalized table, and would reference each unique value from judgments about multiple artifacts. I don’t think either of us like that? [16:41:11] *likes [16:41:42] & in the boolean case, we surely don’t need to normalize [16:41:54] I wouldn’t even normalize if it were a street address [16:42:51] err ignore that last line, it’s not relevant [16:43:29] but the conclusion of this normalization talk is that we have two judgments values true + false and all judgments refer to that…. :( [16:44:19] awight, that's right. We could normalize out a "judgement_value" table. [16:44:34] In this case, I don't see how that would give us any additional flexibility [16:44:48] yeah I don’t like it [16:45:07] Whereas, my example with "preference" would benefit from flexibility with regards to "judgment" being separate from "endorsement" [16:45:33] Where did “preference” come from? [16:46:56] I’d love to understand endorsement more—my suspicion is that you’re drawing that from real wiki practices, where a vote includes comments on one another’s opinions? [16:47:14] If so, i want to point out that those are probably more often contradicting the vote than endorsing. [16:50:40] lol I found an example of onwiki voting, https://en.wikipedia.org/w/index.php?title=Wikipedia:Requests_for_adminship/Gadget850&diff=prev&oldid=195949771 [16:51:03] biab, walking four- and two- legged animals [16:53:50] lol [16:53:51] kk [16:53:59] lol [16:54:10] whats the two legged animal? [16:54:20] "preference" == "preferred" as in the way that Wikidata will highlight the best of many statements for an item property. [16:55:00] hm [16:55:02] E.g. Chelsea Manning is more a "Female" than a "Male" and that preferred fact was the result of a consensus discussion. [16:55:29] seems like that should be its own judgment object [16:55:38] rather than just randomly picking an existing judgment [16:55:40] trixy [16:55:53] k will think on the jungle gym 8D [16:56:13] paladox: There are two dangerous primates living with me. [16:56:18] lol [16:56:21] a dog [16:56:21] I can’t believe it’s legal to keep them indoors [16:56:27] monkey? [16:56:36] 4 dogs, 2 little humans [16:56:40] = madhouse [16:56:40] 04Error: Command “madhouse” not recognized. Please review and correct what you’ve written. [16:56:41] lol [16:56:45] O_o [16:56:51] AsimovBot: {{done}} [16:56:51] How efficient, awight! [16:57:32] aww asimov doesn’t respond to PMs [16:57:39] lol [16:58:21] * paladox will be going around the uk in a weeks time heh [16:58:27] * paladox will have mobile internet :) [16:59:00] like last year i was on it in scotlands moutains. Now there i lost a lot of mobile signal heh [17:00:44] paladox: That sounds like a hoot! [17:00:54] yep [17:01:52] loooonch! [17:01:56] back in a bit [17:04:53] halfak: when you have a minute, could you tell me how i can aggregate another testing set for, say, enwiki's editquality, consisting only of manually labeled and not defaulted to goodfaith or not damaging edits... The idea is to check the validity of the editquality classifier [17:14:35] fajne, check out a row in the dataset. It should have indications that it was automatically labeled. [17:15:43] Specifically, '"auto_labeled": false' [17:15:53] As opposed to '"auto_labeled": true' [17:15:58] * halfak|Lunch goes back to food [17:23:31] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3501748 (10mobrovac) Hm, ok, so we are back at discussing work-arounds for not being able to use `oneOf` and friends. I am str... [17:25:04] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3501750 (10Pchelolo) > Hm, ok, so we are back at discussing work-arounds for not being able to use oneOf and friends. Why exa... [17:30:27] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3501754 (10Ottomata) This goes beyond what we can validate in jsonschema. If a field can have multiple types, we can't easily... [17:41:30] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3501806 (10He7d3r) [17:54:20] o/ [18:13:59] halfak: this is a line from relevant_label_docs (source: enwiki.human_labeled_revisions.20k_2015.json): {'data': {'damaging': False, 'goodfaith': True, 'automatic': 'advanced-rights', 'unsure': True}, 'timestamp': 1431793049.98557, 'user_id': 41948920} Why 'unsure' is true? [18:14:37] Oh! Unsure is true about there's an 'automatic' note in there. [18:14:54] what? [18:14:57] If you look at "labeled_revisions", you'll see that normalized to "auto_labeled": true [18:16:08] {"damaging": false, "rev_id": 644933637, "auto_labeled": false, "goodfaith": true, "autolabel": {}} [18:17:46] halfak: so, what is 'unsure'? i thought it was this checkbox in wiki labels... [18:18:08] It is. When auto-labeling, assume "unsure" for the resulting dataset [18:18:23] Arguably, automatically assuming labels is a very "unsure" process. [18:18:36] )) [18:18:37] In this case, we ran a bot against wikilabels to label a bunch of edits. [18:18:45] So we were literally submitting the form. [18:19:21] why the file called "human_labeled_revisions" i wonder [18:29:15] halfak: Can’t fajne just use the human-labeled data set, pre-autolabeling? [18:29:36] maybe I misunderstood the question [18:31:08] Adam, i think i am using it [18:31:30] cool—it should be clear from the Makefile [18:31:36] isn't it enwiki.human_labeled_revisions.20k_2015.json ? [18:31:49] that sounds right from the title—lemme check the Makefile [18:32:32] https://github.com/wiki-ai/editquality/blob/master/Makefile#L508 [18:32:33] yup [18:33:15] so, another problem is that even human labeled revisions have some autolabeling in t [18:33:45] for example, when we assume goodfaith if there is no label in place [18:34:18] oh really! How does that work? [18:34:18] I think I saw some autolabeled things being fed into WikiLabels to be human-labeled [18:34:43] not getting it [18:34:46] Unfortunately, that’s the one step that isn’t represented in the Makefile [18:36:12] K sounds like halfak’s answer is what you were looking for, then…. I’ll just note that we should be explicit about what data gets used to create the Wiki Labels campaigns. [18:36:17] you saw autolabeled thing being relabeled by humans? [18:36:44] awight, this is one of the few cases where this is not explicit [18:36:52] because we didn't have an autolabeler script back then [18:36:56] And the original sample was lost. [18:37:07] So we just draw the sample directly from wikilabels. [18:37:34] I welcome any work on wikilabels campaign management [18:38:02] Right now, it is done manually and there is not a formal log [18:38:38] If RoanKattouw is able to pick that up next quarter, it would be rad if the upload were an idempotent step in the Makefile. [18:38:42] i feel completely lost in this labeling-relabeleing-lost-samples thng [18:39:12] I’m starting to tread water :D. but it would be nice to document on wiki [18:40:26] halfak: I’ll do some scribbling about what I think is happening with autolabeling, unless there’s already a page somewhere? [18:41:21] awight: please! [18:41:41] fajne, I don't understand what's missing? [18:42:07] I think the best documentation for autolabeling is the doc string and the Makefile. [18:43:03] fajne, as I said earlier, the human labeled dataset for enwiki (and a few other early wikis) includes autolabeling because we used a bot to label them in Wikilabels. [18:43:17] Now we just don't load up the observations that don't need human labels. [18:43:57] If you arrive at "labeled_revisions" everything will be normalized along the way. [18:45:30] Maybe the makefile would be a better place to document this. [18:45:50] halfak: Thanks for pointing me at the docstring, it’s a good description of what the tool does, but not why nor the context. [18:46:31] Plus, a wiki page could include images to illustrate the processes [18:47:17] I think this is the docstring? https://github.com/wiki-ai/editquality/blob/master/editquality/utilities/autolabel.py [18:47:39] the docsting of what? fetch_labels? [18:47:54] oh, ok [18:49:28] * halfak needs to not talk about technical stuff while reviewing a terrible paper with resistant authors. [18:49:38] My grouchiness is bleeding out [18:50:37] loving docopt so far. [18:51:48] I want to add standard deviation (or better stat for disagreement) to the output from fetch_labels.aggregate_labels. [18:52:23] awight: ?? [18:53:18] fajne_: we summarize labels by taking their mean, but another statistic we would want to analyze and use later is the disgreement within labelers. [18:53:48] awight: approx how many labelers per label btw? [18:54:17] halfak: This can totally start as an expanded introduction to the Makefile. [18:54:30] codezee: I think 1 right now :D. good point, maybe YAGNI yet. [18:56:39] I don’t know what the ideal number of labelers is, if we could afford the labor, but I would imagine it should be enough opinions to get confidence that your σ reflects agreement. [18:57:04] codezee, awight, yeah, we can hardly finish labeling campaigns with one label per edit as it is. [18:57:13] Even with our agressive autolabeling [18:57:30] So what I want to do is get re-labels for anything that looks sketchy. [18:57:38] E.g. what fajne examined in her analysis [18:57:50] What do we know about the number of reverts caught by autolabeling? [18:57:59] awight, what? [18:58:26] Of all the edits which are later reverted, what proportion are we catching with our revert search window? [18:59:26] what is a revert search window? [19:01:58] sorry to derail the other conversations. I guess I’m talking about the mwreverts api called from https://github.com/wiki-ai/editquality/blob/master/editquality/utilities/autolabel.py#L219 [19:03:02] halfak: with regard what you just said about the autolabeling campaign and my previous question: so there is no way to aggregate true human labeled edits? bc some of the "human" labeled are in fact bot labeled, and they are impossible to filter out? I am sorry that i am so confused now.. [19:03:24] fajne, please follow the instructions I gave earlier. [19:03:36] Use the labeled_revisions file and look for "autolabeled": false [19:04:58] ok [19:05:55] https://etherpad.wikimedia.org/p/autolabeling [19:06:24] fajne_: cat enwiki.human_labeled_revisions.20k_2015.json|grep '"auto_labeled": false'>>human_labels [19:07:59] codezee: righteous! “>>” might be dangerous for the record though cos not idempotent [19:08:10] mebbe “>” [19:08:10] codezee: aaron said labeled_revisions, not human_labeled [19:08:25] +1 [19:09:05] awight: yes, slipped there :P [19:15:33] Sorry I have more questions than answers in that etherpad. [19:17:54] awight: btw comparing recent reverted edits with their ORES score might provide a good degree of implicit feedback to the models if used, which I think doesn't happen now? [19:19:02] cool—how do we harness implicit feedback? Maybe a secondary set of models which employ backpropagation? [19:19:24] that’s a rad idea. [19:19:26] like if yesterday some edit is reverted, but its corresponding ORES score was indicating "not damaging", we could learn where it made a mistake by aggregating such cases which shouldn't be hard [19:19:39] write a ticket! [19:20:04] codezee, ores affects people's judgement [19:20:08] learning exactly what the mistake was sounds tricky [19:20:23] halfak: the feedback loop would run in both directions! [19:20:23] I'd be much more accepting of using good examples based on people's behaviors than bad. [19:20:43] But manually examining "good"/reverted "bad"/not-reverted would be insightful at least [19:20:57] hmm, trusted users’ edits would shape the model in a weird way [19:21:14] If you sound really officious, you make good edits :) [19:21:24] i'm not so sure of current stats but it might be worth looking into "how much" the users are using ORES scores to base their decisions? [19:21:50] awight, agreed that they would [19:21:50] like all ores damaging are reverted, or are there additional one's which are reverted which ORES misses [19:23:16] i thought i examined this manually... [19:23:43] fajne_: can you tell the gist of your observations? [19:24:06] it you mean this $ cat enwiki.labeled_revisions.20k_2015.json | grep '"damaging": false' | grep '"reverted_for_damage": true' | json2tsv rev_id | sed -r "s#([0-9]+)#https://en.wikipedia.org/wiki/?diff=\0 (Reverted, Not Damaging)#" | shuf -n 50 [19:24:32] What’s it called to overfit just a subset of your data? [19:24:52] or you wanted to compare reverted model against goodfaith model? [19:25:22] oh sh*t, fajne_ just upped the cmdline ante [19:26:12] * awight puts the sed expression in a jacket pocket [19:26:29] this is halfaks cmdline [19:26:42] nice one [19:27:57] codezee: my observations here https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_edit_quality/Work_log/2017-07-24#Labels.27_validity_test the very last paragraph [19:28:36] thanks! [19:28:48] and the link in the last line is the etherpad with all 50 edits [19:28:57] described [19:31:50] halfak: if I watch https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_edit_quality will I get notified of new work logs? [19:32:05] Regretfully not. [19:32:11] argh! [19:32:15] Not unless one posts on the talk page to talk about them [19:32:17] yeah :( [19:32:19] /phabricator add [19:32:24] Can you watch all sub-pages? [19:32:40] I’m dreaming in RSS [19:32:54] https://www.mediawiki.org/wiki/Extension:WatchSubpages [19:34:30] D: https://meta.wikimedia.org/wiki/Special:Version [19:37:39] To be clear, I did see fajne_’s through and entertainingly-written blog last week, so announcing it did work. I’m just trying to click “subscribe” here. [19:40:04] *thorough [19:52:46] do we even use work logs anymore i rarely see any done [20:10:13] Zppix: I’ve been seeing them (and wrote a couple) on that Research_talk: page above. It’s a fun practice, if a little clunky! [20:11:12] * halfak used to write them all the time, but now I'm doing a ton of management and writing so I've been out of the work log practice for a little while [20:11:21] Nettrom's been doing them for a while [20:18:08] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3502152 (10Ottomata) Talked with Marko and Petr, we agreed to do Option A. @Halfak, let me know when you have the 'model-spec... [20:24:08] ah ok [20:37:41] 10Scoring-platform-team-Backlog, 10ORES, 10revscoring, 10artificial-intelligence: Include label-specific schemas with model_info - https://phabricator.wikimedia.org/T172566#3502272 (10Halfak) [20:37:57] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3502287 (10Halfak) OK! {T172566} [20:38:06] 10Scoring-platform-team-Backlog, 10ORES, 10revscoring, 10artificial-intelligence: Include label-specific schemas with model_info - https://phabricator.wikimedia.org/T172566#3502272 (10Halfak) [20:38:08] 10Scoring-platform-team, 10Analytics, 10EventBus, 10ORES, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3502289 (10Halfak) [20:38:58] Props to codezee for remembering that :)