[00:40:03] Who should I consider the lead engineer on JADE? [00:40:14] I guess more generally, who is writing code for JADE? [01:28:41] 10Scoring-platform-team, 10Collaboration-Team-Triage (Collab-Team-This-Quarter): Enable srwiki edit quality filters in RecentChanges - https://phabricator.wikimedia.org/T197012 (10Catrope) >>! In T197012#4319768, @Acamicamacaraca wrote: > @Catrope How's it going? xD Sorry for the delay, this caught me right i... [08:33:40] I am finding article titles of non-latin character in the enwiki dump. [08:33:52] Kind of off-topic, but weird. [08:55:38] RoanKattouw: Thanks for thinking about this! There's already a FetchScoreJob, or were you thinking an api.php endpoint? [08:56:55] In fact, that job is run on all recent changes, so once you enable draftquality in config it might cache everything you need by default? [09:15:36] * awight_mob mopes around waiting for CR in my inbox [09:15:39] Amir1: ^ [09:21:53] awight_mob: commuting, will answer soon [09:26:42] Thanks! [09:55:51] okay, In office now [10:06:32] Amir1: On a completely different topic, I'm starting to document the internal MediaWiki edit conflict algorithm, and making a bloody mess of the diagram: https://docs.google.com/drawings/d/1xjsACtch4cM0igmio1CXiv58BdgbTt6UMqDp3nT5js8/edit [10:06:59] Wondering what kind of drawing people normally use... [10:07:05] Jesus Christ [10:07:50] hehe [10:08:01] It's not a happy place. [10:08:38] There are parameters, configuration, state, intermediate vars, logic, actions... [10:09:30] I suppose it could be captured in a process diagram, plus a glossary of functions and their behavior [10:09:35] :-x [10:12:45] (03CR) 10Ladsgroup: [C: 04-1] "I want to disagree about this. My point is about times that we switch a model from aggregated to non-aggregated. Then we'll have both sco" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/443641 (owner: 10Sbisson) [10:15:22] 10Scoring-platform-team (Current), 10JADE, 10Operations, 10User-Joe: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 (10jcrespo) Sorry, you didn't understood what I meant- for ORES, it was: T159753 and for translation, T183485, both as sum... [10:16:08] grabbing a lunch, see you in 30 min. [10:23:40] 10Scoring-platform-team (Current), 10JADE, 10Operations, 10User-Joe: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 (10awight) @jcrespo I see, well in this case content storage is exactly what we're planning to use. Is there anything sp... [10:24:09] ok lunch really [10:25:40] (03CR) 10Sbisson: "> I want to disagree about this. My point is about times that we" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/443641 (owner: 10Sbisson) [10:27:03] 10Scoring-platform-team (Current), 10JADE, 10Operations, 10User-Joe: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 (10jcrespo) Ok, that is much better, but I guess it still would double the revision table (or the 5 new tables that are to... [11:28:17] (03CR) 10Ladsgroup: [C: 04-1] "> Patch Set 1:" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/443641 (owner: 10Sbisson) [11:58:15] Amir1: halfak created the scoring-internal mailing list, right? [11:58:38] awight_mob: yup and he should have the password (I hate mailman from bottom of my heart) [11:59:17] +1 down with that horrific codebase [11:59:33] awight_mob: who is responsible for JADE implementation that I should be working through as the product manager? [11:59:38] :D [11:59:40] presente. [12:00:25] harej: Might want to help me keep the civil hat well-fastened, btw... T196547 [12:00:25] T196547: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547 [12:00:34] * awight_mob eyes clock suspiciously. [12:00:45] harej: ^ you sure you're not in a time bubble? [12:01:18] * awight_mob is picturing a lot of coffee in that bubble [12:02:51] i went to bed early and woke up especially early, and will probably fall back asleep at some point, particularly since it's a holiday. [12:03:28] anyways, I was wondering if there was a way you wanted to handle backlog grooming for JADE specifically [12:10:01] RoanKattouw: I just sent you an email answering the questions. [12:10:49] harej: Sounds great, I stare at that backlog occasionally and wonder the same myself. Want to start with a chat on a non-holiday? [12:11:17] Currently, the only process is that I randomly decide to drag things in and out of the Scoring Platform current work board. [12:14:00] That's another thing we'll need to figure out -- how JADE backlog grooming fits into broader Scoring Platform backlog grooming [12:15:03] Also, what will be the relationship with Wikilabels? Will Wikilabels be a UI on top of JADE> [12:15:04] ? [12:15:13] That's one possible outcome [12:15:46] Probably the best way to go, but maybe not a high priority. [12:16:39] But at least a theoretically correct option? [12:16:58] Exactly, it's a natural place to store that data. [12:17:15] Basically what I am trying to map out mentally is if Wikilabels and JADE are meant to be fundamentally different things, or if JADE is a re-imagining of the original Wikilabels concept and Wikilabels is "historical." [12:17:19] Then, having it onwiki allows collaborating auditing [12:17:59] interesting. I think your first formula was right, that Wiki labels is a workflow and a UI, and JADE is a structured storage for the same type of data that wiki labels deals in. [12:18:33] wiki labels won't go away because it's a specific workflow, and has been proven to mostly do the job we need. [12:19:05] Though I imagine it won't get much in terms of new investments. [12:19:17] patrolling is a similar argument, we won't do anything to the workflow at first, but may integrate the backend to silently write JADE labels. [12:19:45] In the long run, we might be able to add a "notes" field to patrolling decisions, which would take advantage of the JADE judgment schema. [12:21:02] I'm not even sure that my workflow integration concept is valid btw. [13:29:57] Amir1: do you have a moment? [13:30:34] stephanebisson: sure [13:31:45] Amir1: Thanks for your thoughts about the aggregated scores. What you said makes sense when you consider the shape of the data (all probabilities adding up to 1, and only 1 above .5), which I didn't do initially. [13:33:11] Would it make sense that the same idea works without multiplying? For instance: [0 to 1/6[ is Stub, ]1/6 to 2/6] is Start, etc [13:33:44] stephanebisson: yeah, that would work as well. I was thinking about it as well. [13:35:15] Do you anticipate any db issues with querying like that in PageTriage? Note that the users will be allowed to select multiple classes (Stub OR Start OR C) and even disconnected classes (C OR FA) [13:36:26] We would end up with queries like: WHERE ( oresc_score >= 0 AND oresc_score < 1/6 ) OR ( oresc_score >= 5/6 AND oresc_score < 1 ) [13:36:57] stephanebisson: hmm, if we don't store scores for all revisions and keep only the first one, it'll be no big deal (at the most it'll be 30K rows read) [13:38:08] but if we score all revisions of pages or pages created in the last month, it'll be hard and we need to index ores_probability which won't help in case of disconnected classes [13:39:17] We're thinking latest revision only (with the "cleanup parent" option or something) for wp10 [13:40:19] oh yeah, you're talking about wp10 not draft quality [13:40:27] I haven't looked at draftquality yet but it doesn't seem to be aggregated. Do you think it'll have to be based on the volume? [13:40:38] that makes things a little bit different because we store the scores forever [13:41:06] stephanebisson: no, it should not because the classes don't follow an order [13:41:15] unlike wp10 [13:42:11] stephanebisson: in case of wp10, if you determine revision base line (e.g. oresc_rev > ####) it won't make much problem [13:43:18] Yeah, in the context of page triage we can do that. we're dealing with a small subset of all revisions. [13:44:14] Actually, we would probably be joining with the revs in pagetriage tables. [13:45:47] Amir1: Last thing, did you guys consider keeping only the rows where oresc_predicted = true instead of aggregating the probabilities to save space? When else can be learned from the aggregated score? [14:03:58] stephanebisson: this was also an option but we ditched it because aggregated score can be used as a simple quantity measure [14:04:31] so e.g. you can order by that number in a category and get the worst and best articles [14:04:42] I see [14:07:18] 10Scoring-platform-team (Current), 10Analytics, 10EventBus, 10MediaWiki-JobQueue, and 4 others: ORESFetchScoreJob fails quite a lot - https://phabricator.wikimedia.org/T196076 (10Ladsgroup) hmm, yeah. I think we should make another phabricator ticket because that's about the ores service and not the extens... [14:43:58] Daily annoying nudge: https://gerrit.wikimedia.org/r/#/projects/mediawiki/extensions/JADE,dashboards/default [14:44:10] It's fun, I promise... [14:59:24] (03Abandoned) 10Sbisson: Store prediction for aggregated scores [extensions/ORES] - 10https://gerrit.wikimedia.org/r/443641 (owner: 10Sbisson) [15:01:51] It's been exciting, but I'm taking a little break. Back in a few hours. [15:28:40] (I lied, I'm still in the library) [21:00:57] 10Scoring-platform-team, 10Wikilabels, 10User-Ladsgroup: Fix stats display to show how many tasks are complete - https://phabricator.wikimedia.org/T106861 (10Ladsgroup) 05Open>03Resolved a:03Ladsgroup This is not needed anymore. Let's just close it. [21:15:30] 10Scoring-platform-team, 10ORES: ORES returns 200 for timed out scores - https://phabricator.wikimedia.org/T198819 (10Ladsgroup) [21:15:44] 10Scoring-platform-team (Current), 10Analytics, 10EventBus, 10MediaWiki-JobQueue, and 4 others: ORESFetchScoreJob fails quite a lot - https://phabricator.wikimedia.org/T196076 (10Ladsgroup) Made {T198819}