[13:33:46] o/ [13:33:48] I'm back [13:33:54] Working on email :| [13:43:07] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10artificial-intelligence: Add language support for galician - https://phabricator.wikimedia.org/T201142 (10Halfak) @Ladsgroup, can you run the BWDS script on Galacian? [13:51:19] 10Scoring-platform-team, 10Wikilabels, 10articlequality-modeling, 10artificial-intelligence: Build article quality model for Galician Wikipedia - https://phabricator.wikimedia.org/T201146 (10Halfak) Ahh yes. It looks like we'll need to sample and generate a labeled dataset like we did for euwiki. I've... [14:07:51] 10Scoring-platform-team (Current), 10DBA, 10JADE, 10Operations, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Halfak) I think that querying by within-judgement content should be very limited (and probably within the... [14:13:27] 10Scoring-platform-team, 10ORES, 10Services (designing): Merge ORES precaching with ORESFetchScoreJob - https://phabricator.wikimedia.org/T201868 (10Halfak) Keying on page title doesn't work because we store scores for revisions historically. Thus revision IDs are necessary. Also, it is important to note t... [14:55:43] 10Scoring-platform-team, 10ORES, 10Operations, 10vm-requests: Site: 4 VM request for ORES poolcounter - https://phabricator.wikimedia.org/T203465 (10akosiaris) [14:55:55] 10Scoring-platform-team, 10ORES, 10Operations, 10vm-requests: Site: 4 VM request for ORES poolcounter - https://phabricator.wikimedia.org/T203465 (10akosiaris) p:05Triage>03Normal [15:11:47] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Update monthly article quality datasets - https://phabricator.wikimedia.org/T203468 (10Halfak) [15:18:49] halfak: welcome back! [15:19:04] Thanks. Pounding through back emails [15:21:23] halfak: I want ORES for CI: https://www.openstack.org/summit/berlin-2018/summit-schedule/events/22388/reduce-your-log-noise-using-machine-learning [15:21:36] :) [15:21:44] haha. Certainly possible! [15:22:23] If you can formalize the problem, I can advise on how to proceed :D [15:22:42] ORES could be a good platform for this. [15:41:01] One formalization that could work is "given a log line, how likely is it to be *interesting*?" [15:52:26] Uh oh. I need to update tech-management on what we have been up to. [15:52:43] harej, Amir1: What's the status of things? [15:52:50] * halfak prepares to take notes. [15:53:45] halfak: hmm, the pool counter is almost there, tested and works on labs, akosiaris is making vms for prod now, the puppet patches in prod also deployed [15:53:46] That's a very general question. I've been working generally on AI Community and specifically on finding people to talk to about JADE. Trying to use my usual strategies to find people. In retrospect I should've written down questions I had for you. [15:54:03] Amir1, can you link me to the task? [15:54:17] We had an RFC meeting about the JADE concept and Adam is considering different implementation options. I'm not sure there was a definitive outcome of that meeting, but it gave us some ideas. [15:54:51] the change itself is not deployed yet due to being block on this (RelEng): https://phabricator.wikimedia.org/T203246 [15:55:07] the main ticket is: https://phabricator.wikimedia.org/T160692 [15:55:19] git lfs for models is already deployed in prod [15:55:34] wp10 models is being renamed to articlequality [15:56:56] harej, can you link me to the RFC task? [15:57:10] Amir1, tasks for git lfs and the wp10 rename? [15:57:17] Thanks for helping me get this together :) [15:58:11] halfak: https://phabricator.wikimedia.org/T200297 [15:58:33] wp10 https://phabricator.wikimedia.org/T196240 [15:58:51] LFS https://phabricator.wikimedia.org/T197097 [16:00:26] Awesome [16:02:16] halfak: for JADE deployment strategy I'm thinking of what initial wikis I want to work with, as well as different workflows that could be integrated. Later I'll be talking with Joe M who's done lots of research on anti-vandalism tools; I'm thinking that will be one of our earlier focuses since it's probably the bigger use case for ORES. [16:02:50] harej, I'd like to talk to Petr (Huggle) too [16:02:57] Seems interested in working with us. [16:02:59] See #huggle [16:04:57] Yes, working with him is a high priority for me [16:06:23] Great. Is there a signup list or anything for the people you have recruited for the focus group so far? [16:07:23] harej, ^ [16:12:29] There's a spreadsheet. Not so much people who have actively signed up so much as people I want to talk to. [16:12:46] It's still under development. https://docs.google.com/spreadsheets/d/1rWWUdBEasPzbbVch18rvjjW5APlfdy9MKFOoeE-c56Q [16:16:35] halfak: do we have a list of communities that have hand-written equivalents to JADE? [16:17:46] So far I've found Finnish Wikipedia and Wikidata; I know Italian Wikipedia does this too but I don't know where their page is [16:17:46] https://www.wikidata.org/wiki/Q29348528 [16:17:50] Needs expansion [16:18:32] One idea I've been thinking of is having an ORES page on every wiki that has ORES. We should continue using MW.org for documentation but each wiki should have at least a stub page saying "by the way, this technology is used here." [16:18:54] Seems to be up to that community, but I like the idea of providing a template. [16:20:13] https://fi.wikipedia.org/wiki/Wikipedia:ORES looks pretty great [16:20:41] How much interaction have you had with the Finnish community? [16:21:05] A substantial amount. Worked with some folks in Austria at the hackathon there. [16:21:52] Ohhhhhh. [16:22:03] Well, that explains everything :) [16:32:42] o/ awight [16:32:46] * halfak is in meeting node. [16:32:52] Just wanted to say, "Hi" [16:32:54] :) [16:33:09] halfak: this is my spreadsheet where I keep track of what wikis have what features, as a way to narrow down candidates for initial deployment https://docs.google.com/spreadsheets/d/1WH-srlQulQMT_5BHq4BTfiYoQ1k4mZqTarjFe08V4zw/edit#gid=0 [16:33:26] deployment is being thought of in terms of wikis *plus* integrations [16:33:37] or, wikis *times* integration? [16:35:23] halfak: welcome back! [17:02:28] halfak: I wanted to get your input on a schema change, when you're out of meeting mode. [17:03:09] I am but I'm just about to run to lunch [17:03:21] Eh. Let's see it :) [17:04:07] awight, ^ [17:04:47] sorry, notifications still not dialed in [17:04:48] pasting [17:04:58] https://phabricator.wikimedia.org/P7511 [17:05:08] This is mostly about the duplication of judgment.notes [17:05:28] I got overly annoyed comparing our UI prototype against the normalized schema [17:06:10] so want to change it to a list of heterogeneous judgments, rather than a list per-schema [17:06:12] IM [17:06:29] IMO it simplifies the structure by a lot. [17:06:42] Seems confusing. [17:07:01] Why is heterogeneity good? [17:07:15] Or what does homogeneity complicate? [17:07:24] that wasn't the goal, just a description of what I did. [17:07:27] the goal is [17:07:39] Oh I see. [17:07:40] to make the data easy to deal with in the common use cases [17:07:47] yah sorry iz confusing over text [17:08:12] So, in our tool a user can submit a damaging judgment and a good-faith judgment at the same time, with one notes field. [17:08:12] Maybe it would be easier to observe two full schemas side-by-side [17:08:24] schema or example application of the schema. [17:09:13] awight, seems like that the dual submission should be possible regardless, isn't it? [17:09:21] Even if it takes two edits. [17:09:26] I think that's how wikidata works. [17:09:42] E.g. if you provide multiple aliases in different languages and click "save" once it makes multiple edits. [17:09:55] /o\ [17:10:00] Doesn't it make the logic extra complicated if the UI and the data have totally different shapes? [17:10:29] so with the judgment.notes field, we're causing a serious car wreck if we have two judgments (damaging and goodfaith) with duplicated notes. [17:10:39] standard DRY issues [17:10:41] awight, I'm not sure that is the case. [17:11:03] But one option would be to have an "editquality" schema with "damaging" and "goodfaith" fields. [17:11:10] If it makes more sense to have separate notes for each judgment, then the original schema is fine and we need to tweak the UI [17:11:15] So that you have one "notes" field for the combined judgement. [17:11:27] It'd be weird for anyone who doesn't want to judge both at the same time though. [17:11:37] u can leave one empty [17:11:42] Fair point. [17:11:51] I wouldn't mind enforcing this though. [17:11:56] so the original API is still possible [17:12:29] One thing I hadn't considered is that we might still want to group certain data within judgment of an entity type. [17:13:12] I was imagining, any judgment of a Diff is going to be interrelated and we might as well make it a single judgment, but your point above about "editquality" makes me question that. [17:13:40] awight, edittype and editquality are not really interrelated. [17:14:18] Vandalism that adds a new factual statement: "Abraham Lincoln is gay." [17:14:21] halfak: that's helpful, thanks. So we would have separate workflows for making those judgments. [17:14:48] awight, when I imagine the wikidata UI, I see them as a dynamic document with different fields. [17:14:55] Not sure if that lines up with the current mockup. [17:15:33] If we wanted a dynamic document with different fields then that would be a case for a flat array of judgment objects [17:15:37] Since that's what it looks like in the UI [17:15:50] harej, I don't think that's what it looks like in wikidata? [17:16:03] Unless I am misunderstanding what you mean by "flat array" [17:16:20] I see a mapping with lists under each key. [17:16:31] Homogenous within each key [17:16:38] Heterogenous based on key. [17:16:38] Hmm, I think that's right [17:16:51] Yeah, the property number is the key. [17:17:05] Right. So in our case, a schema name would be a key [17:17:16] And you'd have a list of judgments beneath each key. [17:17:29] On a slightly different thread, it seems like a flat list of judgments has some nice properties for handling. For example, section editing and merging becomes easy. [17:17:55] awight, I don't think we're talking about the same thing. [17:18:07] Seems section editing would work better if we group by schema name. [17:18:07] halfak: so, i'm in the UI, and I mark an edit as non-damaging/good faith, and write one comment to describe my decision. How does that get saved? [17:18:16] Maybe "flat array" == "group by schema name"? [17:19:02] harej, I think we have options. One I suggested was that "editquality" is a schema with "damaging" and "goodfaith" fields and one "notes" field. [17:19:21] another would be that "damaging" and "goodfaith" are their own schemas with independent notes fields. [17:19:43] And that a tool dev would decide whether a user was allowed to provide separate notes or not. [17:19:57] halfak: By flat list I meant the schema I proposed today. Section editing doesn't work if a person wants to change both the damaging and goodfaith data. It does work well with the in-between schema you proposed a minute ago. [17:19:58] You might imagine two text fields -- one for "damaging" and another for "goodfaith" [17:20:19] awight, aha! I see. [17:20:42] I don't think adding two notes fields to our proof-of-concept UI is a very good idea. [17:20:43] So, section editing may not apply nicely to structured JSON. I don't think we have that. [17:21:03] "Editquality" seems like a good option. [17:21:04] +1 ^ not a requirement, just a nice property [17:21:11] But you might imagine the dynamic document listing "damaging" and "goodfaith" far apart making it difficult to work with them at the same time. [17:21:42] harej, when I label things, very often I'll make a note about "goodfaith" that doesn't really apply to "damaging" [17:21:49] Grouping by something larger than model but smaller than entity type does seem like a good alternative, but it also seems like the most complex data structure. [17:22:02] But on the otherhand, I feel like if I am looking at both at the same time, I can make that note make sense. [17:22:20] awight, it would match what we do in wikilabels. [17:22:31] I think it really just moves our complexity from one level to another. [17:22:55] Doesn't exactly add something. But it does add a logical grouping which might be nice for other judgements. [17:23:06] We did make schemas arbitrary JSON for a good reason. [17:23:17] halfak: I agree with your train of thought here, I like the idea that damaging and goodfaith judgments have distinct rationales, but it seems unpleasant to expose that in the UI. [17:23:19] Are there other neat parallels where 2+ features pair up and can be described in a combined way? Like how editquality = damaging + goodfaith? [17:23:42] harej, good question. [17:23:45] * halfak thinks. [17:24:01] we should have a name for that conceptual granularity, too :-) [17:24:14] On the subject of notes, are we planning on doing any sentiment analysis or similar machine analysis? This would determine whether it is important for each note field to parallel each feature or if notes are just a convenience for humans. [17:24:21] are there other modeling repos where we have multiple types of models? [17:24:48] articlequality(1), edittypes(1), draftquality(1), editquality(2) [17:25:01] page_level / articlequality / itemquality ? [17:25:21] Hmm. Fair point, but they won't overlap. [17:25:34] I'm wondering if we should have renamed wp10 to "contentquality" :| [17:25:38] on that note, I've been thinking we should rename all of those articlequality [17:25:41] ^ ah yeah that. [17:25:59] halfak: don't let perfect be the enemy of the good! [17:26:09] (we can make that breaking change later ;) [17:26:20] lol Maybe we should just break it once instead of twice :P [17:26:23] OK. I have to go grab food. Starting to feel icky. [17:26:29] godspeed! [17:27:09] I still don't think I understand the "flat array" proposal. But I do like the "editquality" group proposal and it seems we all see the value in that. I'll think more while I track down food. [17:28:22] Simple idea: "models" and "model groups" [17:28:25] editquality is a model group [17:28:33] damaging and goodfaith are the individual models [17:31:16] sold. [17:33:20] So, will the API allow submission of all data values in a model group plus notes as one transaction, and those will be stored together in the schema? [17:33:50] But data for separate model groups requires different APIs or at least separate API calls. [17:44:58] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10ORES-Support-Checklist, 10Patch-For-Review, 10User-Ladsgroup: Change mentions of wp10 to articlequality in products - https://phabricator.wikimedia.org/T203080 (10Catrope) Wouldn't articlequality need to be switched on before your patch is... [18:30:01] o/ [18:30:37] awight, harej: JADE != Models [18:30:52] JADE can contain judgments we have no intention of ever modeling. [18:31:28] The concept of a model should be external to JADE [18:40:27] Makes sense, so "model group" would be an ORES concept if anything. [18:54:22] (03CR) 10Catrope: "recheck" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 (owner: 10Catrope) [18:54:24] awight, right. I think so. [18:55:23] I should mention that I'm interested in getting this schema right, but it's actually secondary to what I need to do at the moment. [18:55:32] I'll explain in case it makes the conversation easier. [18:56:53] I want to iterate on feedback from the RFC IRC session, by implementing the tables and indexes which would be used for joins and e.g. RC pager queries. [18:57:15] I'll also bring the API in line, and implement the hooks to maintain these indexes. [18:58:52] (03CR) 10jerkins-bot: [V: 04-1] Add special page with model statistics [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 (owner: 10Catrope) [18:58:58] awight, when you say "Joins for RC pages", what do you have in mind? [18:59:16] (03PS14) 10Catrope: Add special page with model statistics [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 [18:59:22] showing "3 jade judgments" in the history or RC feed [19:03:41] halfak: that is certainly a good point but it means we need a parallel concept for JADE. “Concept against which I’m manually scoring.” [19:31:25] test [19:50:11] harej: FYI, halfak and I just talked about the content schema a bit and he had some useful things to add. I'm going to try to put those in writing, and then I'd like to expand the first two lines in our use cases doc to get much more detailed, and see how various schemas will support that. [19:50:44] I think our focus when writing that was on the other, secondary schema concerns FWIW, so we didn't cover cases like "what I do if I disagree with a judgment" [19:51:11] (03CR) 10Legoktm: Add special page with model statistics (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 (owner: 10Catrope) [19:57:41] (03CR) 10Acamicamacaraca: [C: 031] Add special page with model statistics [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 (owner: 10Catrope) [20:10:00] 10Scoring-platform-team, 10ORES, 10Documentation: Feedback on ORES threshold optimization docs - https://phabricator.wikimedia.org/T203505 (10awight) [20:54:06] (03CR) 10Ladsgroup: [C: 04-1] "Except the ORESServices thingy, everything else is nitpick and optional." (035 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 (owner: 10Catrope) [21:06:04] I'm heading out. Have a good evening, folks! [21:08:27] o/ [21:08:55] o// [21:09:08] I have to stay for a little longer for a SWAT window :| [21:10:01] always a good time ;-) [21:11:39] not really, at 01:00 my time I'm better watching a CSI episode in bed :P [21:12:10] we can make a CSI cyber remake investigating issues in ORES, Jade and Puppet - lol [21:14:08] It would be much less interesting. [21:17:10] make ORES make judgements about the characters! [21:17:22] "likely a vandal" [21:18:38] I'm alarmed at where this is headed :-) [23:53:41] (03CR) 10Catrope: Add special page with model statistics (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 (owner: 10Catrope) [23:53:50] (03PS15) 10Catrope: Add special page with model statistics [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 [23:59:50] (03CR) 10Legoktm: Add special page with model statistics (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956 (owner: 10Catrope) [23:59:55] (03PS16) 10Catrope: Add special page with model statistics [extensions/ORES] - 10https://gerrit.wikimedia.org/r/433956