[01:48:27] (03CR) 10Jforrester: [C: 032] "CI failures unrelated, good to merge whenever." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/408207 (owner: 10Esanders) [01:57:56] (03CR) 10jerkins-bot: [V: 04-1] build: Update linters [extensions/ORES] - 10https://gerrit.wikimedia.org/r/408207 (owner: 10Esanders) [11:51:04] (03CR) 10Sbisson: "recheck" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/406597 (https://phabricator.wikimedia.org/T179718) (owner: 10Sbisson) [11:54:07] (03CR) 10jerkins-bot: [V: 04-1] Enable ORES filters on RecentChangesLinked [extensions/ORES] - 10https://gerrit.wikimedia.org/r/406597 (https://phabricator.wikimedia.org/T179718) (owner: 10Sbisson) [15:54:35] halAFK: I forgot to mention that ACL has strict rules to preserve anonymity and putting a link to the drafttopic repo compromises with that probably? [15:55:13] right, so we could just create a shallow copy of the repo. [15:55:24] I've done this before for another conference. [15:58:11] oh, great! [16:12:16] halfak: I was thinking "carving out/refactoring revscoring as an independent ML library" is an interesting hackathon project :D [16:14:22] Heh. That'd be pretty complex. [16:14:55] atleast for a start, or POC [16:15:16] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3946492 (10Halfak) Works for me. Thanks :) [16:16:26] codezee, would you create a page for "Research:Automated classification of draft topic" [16:16:27] ? [16:16:36] I thought we had one but I can [16:16:38] 't find it [16:17:57] halfak: https://meta.wikimedia.org/wiki/Research:Automatic_new_article_topics_suggestion [16:18:45] Ahh great [16:18:45] which reminds me I need to sit down and update it to reflect latest developments [16:19:16] i captured any statistics that we might lose here - https://meta.wikimedia.org/wiki/Research:Automatic_new_article_topics_suggestion/Statistics [16:20:30] +1 thank you [16:58:42] Just finished https://office.wikimedia.org/wiki/Tech_program_proposals/Scoring_Platform_FY2019#Program_outline [16:58:48] Well, my first pass anyway. [17:23:04] halfak: You mentioned something about killing endorsements, but according to our discussion at https://www.mediawiki.org/wiki/Topic:Tzw0uv2bucrdprm4, we’re keeping, right? [17:23:24] awight, I don't think they fit well into the MW page/revision model. [17:23:55] JADE:Revision/12345/Endorsements/54321 ? [17:24:59] JADE_talk:Revision/12345/Endorsements/54321 would be the optional associated comment… yeah it’s gross [17:25:09] Yeah... [17:25:43] I think I'm convinced that we can drop that complexity. I think that, if we're going to do something that looks like page/revisions, cramming endorsements on top of that would be difficult conceptually. [17:26:01] It totally works for a talk-page discussion but there's no other formalization in MW land. [17:26:14] Maybe some day Flow/Structured dicsussions can solve for that. [17:26:30] oh, sorry. I remember now that we did deprecate endorsements. You wanted to replace with BRD, right? [17:26:43] Right yeah. [17:26:48] I think that makes sense. [17:26:57] Sad from a design perspective, but happy from an engineering on :) [17:27:00] *one [17:27:18] So maybe the less structured way to keep the intention is that anyone may comment on Jade_talk:Revision/12345, pro or con [17:27:25] +1 [17:27:29] kk thanks [17:27:32] :) [17:27:51] halfak: Want to take a quick look at https://etherpad.wikimedia.org/p/JADE_API_changes [17:28:33] * halfak clicks [17:28:45] Good timing. I just finished up some annual planning stuff. [17:29:00] Ooh while I'm looking at this, will you look at https://office.wikimedia.org/wiki/Tech_program_proposals/Scoring_Platform_FY2019#Program_outline ? [17:29:11] hehe I was about to musical chairs right into that [17:29:19] I don't think there is anything there that'll be contentious, but please make sure it matches what you expect. [17:29:50] I was thinking of s/Goals for FY2019/Alternate goal structures for FY2019 [17:29:58] something to grab attention [17:30:09] *WE ARE UNABLE TO DO OUR WORK* [17:30:10] ;-) [17:30:38] Oh. If you're looking at anything before "Program outline" do that in the MediaWiki page. [17:30:45] The Program Outline is what Finance wants. [17:30:51] * awight double-takes [17:30:52] ok I see [17:31:08] But I am hoping you'll look at the Program outline too :) [17:33:57] awight, re. EventBus and consuming from MW. I'm imagining a loop of sorts [17:34:20] E.g. sometimes a suppression event will come directly from JADE. When that happens, it will get consumed by MW. [17:34:20] +1 it’s awkward but I think it works [17:34:33] Shouldn’t be an infinite loop, at least. [17:34:39] Other times, a suppression event comes from MW, gets consumed by JADE and then JADE emits its own suppression event. [17:35:00] Assuming that MW can dedupe that, no loop. [17:35:20] OK it sounds like we're on the same page there. [17:35:21] +1 both sides will perform operations idempotently. [17:35:27] I want to put this in front of someone else. [17:35:40] Maybe ottomata would be a good sounding board. [17:36:00] We could even have an audit flag in the JADE DB that verifies that we were recorded by MW [17:36:11] I’d love ottomata’s opinion, I can ping him. [17:36:42] :) Thanks! I'm thinking of making a quick diagram of a couple example loops. [17:37:06] *ouch* I just found a problem [17:37:17] JSON data in the JADE namespace isn’t going to be easy to query in Quarry. [17:38:44] awight, right. I think we'll need to work that out some other way. [17:39:41] Extension:JADE might mirror the data as a proper set of tables? :( [17:39:57] e.g. ottomata has been working on some basic strategies for columnar-izing JSON data [17:40:17] Each schema would have a columnarizer definition that goes with it, maybe? [17:40:24] I just know they use these in HIVE. [17:40:28] PostgreSQL is great for that [17:40:41] Oh yeah. Wish we could do that :| [17:42:15] Well fwiw, the events that we’ve discussed can be used to create a high-quality mirror in any data store. [17:42:28] +1 [17:42:30] “high-fi” is more what I was trying to say [17:42:33] but yeah [17:42:58] awight, what was our status re. identifiers and having a local state for validation? [17:43:03] One thing I haven’t thought enough about is how E:JADE will subscribe to the JADE events. I’m not sure anyone else does that yet. [17:43:04] Inside of JADE that is. [17:43:24] What's E:JADE? [17:43:38] Extension:JADE, sorry bad place to take shortcuts [17:43:52] halfak: I think that my weird event design gets rid of the need for identifiers [17:44:18] hmm [17:44:26] * awight looks at actions [17:44:39] Do we have a schema that lacks identifiers? [17:44:48] there are identifiers [17:45:08] I’m confused about my previous thoughts [17:45:58] re. describing events, what sort of event "description" does E:JADE need? [17:46:16] Good question! [17:46:36] I know EventLogging uses metawiki:Schema, but don’t know what EventBus uses. [17:47:33] Oh gotcha. [17:47:44] Is it okay if new_judgment doesn’t return an ID? I don’t know. [17:48:10] How will a suppression event look? [17:48:20] If judgement doesn't have an identifier. [17:48:25] IMO those only come from MediaWiki [17:48:59] Judgments do have IDs in MediaWiki, but I didn’t figure out how they’re created. [17:49:33] There can be two steps to create_judgment [17:50:00] one event from the API to the JADE DB, where it’s created and the primary ID is stuffed into a new create_judgment event, which is propagated to MW [17:51:26] In my drawing, that would mean moving the arrow from “JADE API -> sync judgments” to “JADE DB service -> sync judgments” [17:55:49] I’m still bothered by not getting an ID back from the new_judgment API [17:56:03] We could get an event ID back, which could be used to lookup the primary ID [17:56:05] :-/ [18:00:23] With changes: https://docs.google.com/drawings/d/1Lagl0BJWVWHNvHLy5y6RNNKvl0C1tdVrE5YniwgqFJY/edit [18:01:19] (not red-green colorblind-friendly) [18:03:31] If JADE creates a unique ID on Judgement Creation, that solves everything. [18:03:36] I had the proto-event pattern. [18:03:47] It looks like you're thinking the same with "create judgement raw" [18:04:19] Sure, we can use unique IDs, but then the URL is even more crazy in MediaWiki [18:04:38] Na. I think we can have a minor bit of the Ext to convert JADE ids to revision IDs [18:04:42] Or something like that. [18:05:15] Straw dogging it—why would the judgment creator want to receive an ID response? [18:05:52] Answering my own question, I think the creator might want to immediately jump to the MW page for the ujudgment [18:07:36] I don't know if the creator wants it, but a consumer does [18:10:53] good point! [18:10:57] E.g. if I'm consuming suppression events. I want an ID that will help me know what is being suppressed. [18:11:10] Rather than having to dig for a judgement based on timestamp and entity IDs [18:11:11] I’m bugging ottomatta in -analytics btw [18:11:19] Also, you’ll be happy to see, https://wikitech.wikimedia.org/wiki/EventBus#Event_Schemas [18:13:48] awight, https://docs.google.com/drawings/d/1bteR1vacXOdgeoKJY0AjAyC80N3moXG0TP-rIPbSZig/edit [18:14:29] In this diagram, we'll need a way to convert rev_id to judgement_id :| [18:14:50] Unless we have a custom event come from E:JADE [18:15:43] Also a good point. I think we’ll have that information already stored in the JADE DB? [18:15:57] Only if we do a synchronous call to MW [18:18:39] "event bus loop (create judgement)" https://docs.google.com/drawings/d/1OJItNMcC4Me0gF02rt6xh_5Npy8CAylg_ztR5h9P-E8/edit [18:18:51] In this case, there's no matching IDs on JADE's side. [18:18:58] Hi halfak :) question for when you have time - following a conversation with J-Mo . How easy would it be to give a little feedback to wikilabels users after completing a task? For example regarding the accuracy of their labels? Or regarding how much they agree with other users labeling the same item? [18:18:58] JADE doesn't know about MediaWiki. [18:19:25] miriam, how will you make judgements of this and how would you like to deliver the feedback? [18:21:38] awight, I'm stuck on this ID thing. Getting identifiers without sync'ing is a hard problem. [18:21:50] halfak: I see what you mean by revision ID. The revision ID of a specific judgment to suppress. [18:23:23] I only see two ways to build that mapping. One is to have a sync call from “JADE DB service” to the extension’s “sync judgments” service. This synchronous behavior is okay since it’s not blocking the API response. [18:24:20] The other way would be for the JADE DB service to make a sync API call upon receiving “mediawiki.revision-create” from changeprop. The call would retrieve the content of the revision, which includes the judgment’s UUID. [18:24:37] Not sure why blocking the API is bad. [18:25:17] In fact that might be really good. We can use it for validation [18:25:37] halfak: one way to possibly do this would be to match users' decision with a sort of ground-truth. If, for example, a user judges a sentence from a featured article as needing citation, this label is 'correct' if the sentence in the original article actually contains an inline citation. Just thinking of ways of gamyifing the labeling task. Ways to deliver feedback could be: tracking correctly/wrongly labeled sentences [18:25:37] throughout the batch of tasks , or just a simple pop-up at the end of each task. [18:25:44] E.g. if a user can't safely save a revision to a page (via OAuth and an external call from the JADE service), it's good that it got blocked. [18:25:51] It’s fine, but unnecessary. I think we decided validation was unnecessary aside from user status checks? [18:26:19] miriam, this is confusing, but no there's currently no infrastructure for providing such a functionality. [18:26:34] miriam, I don't think you'd want to anyway since WikiLabels allows people to change their labels. [18:26:57] awight, I always thought that full validation was absolutely necessary [18:27:04] We don't want an event in the stream unless it is valid. [18:27:45] E.g. let's say a user saves a judgement and then is immediately banned. We don't want an edit to happen in MW *after* they were banned. [18:28:03] We'd like that action to fail at validation. [18:29:33] halfak: ok, thanks! [18:30:04] We can’t do much about the race conditions, there will be some short period of time in which future-invalid things happen. [18:31:09] awight, if we make a synchronous call to MW, we can get a rev_id and then use that rev_id as the judgement ID. [18:31:29] Or at least mint our own ID, and associate the two together both in MW and in JADE. [18:32:08] miriam, you could write a bot that posts a score from each workset as a user completes them. [18:32:37] +1 I’m okay with the sync call, especially since it doesn’t need to block our API [18:32:45] halfak: do you think it'd be good to run an experiment with the embeddings shilad was working on? bec i can do it this week [18:33:09] awight, where does the concern about blocking our API come from? [18:33:35] codezee, interesting. I think I'd rather see more work towards getting the current model in ORES. [18:33:36] It’s just a nice-to-have. [18:33:59] halfak: My original obsession with decoupling the API was actually to get rid of the distributed transaction. [18:34:12] awight, roger. Let's re-consider it after we work out this ID stuff. I'm sure there'll either be a clear pain in the ass or an obvious way we can not block the API. [18:34:42] The ID is fine: ottomata informed me that event producers are responsible for setting meta.id, which can be a UUID-1 for example. [18:34:58] halfak: regarding ORES integration, once OneVsRest merges, getting word2vec in drafttopic mainline should be the final thing...i'll devote sometime to that task [18:35:11] on how we can load word embeddings [18:35:14] That gives us a perfectly acceptable, permanent ID, immediately upon submission to eventbus. [18:35:42] awight, how would we match it to a rev_id? [18:37:08] Your suggestion is good, to make a sync call to the MW API. We can decouple that from the API though, just to be fancy. [18:37:14] Lemme update the drawing for your review. [18:38:52] there. [18:39:55] I don't see the update. [18:40:03] Oh the DB served has a sync call? [18:40:07] yes [18:40:27] Not see why there's such a gap between the API and the JADE DB [18:40:49] It also turns out that we don’t want to be the first ones to implement a Kafka consumer in a MW extension [18:40:57] You know… that’s a good point. [18:41:10] let me check for dread distributed transactions, then make that sync [18:41:17] I've got to get some food in me. [18:41:24] I'ma be back in bit. [18:43:02] BTW, I love having this conversation over a drawing rather than 100s of lines of code :) [18:43:12] +1000 :) [18:43:51] Also nice to cough up some of the Kafka kool-aid [18:44:08] Of *course* we can make sync calls to our own DB service, we’re doing it for “get” APIs [18:44:28] I’ll take my lunch break now, too. [18:55:14] did you all ever come up with a solution for GPU training? I'm toying with a relatively shallow (3 layer) NN for predicting search relevance from clickthrough logs. The problem is my training takes 30s per epoch on a toy dataset of 4MB. I could maybe train 10x faster on a stats machine with more cores, but i also have at least 1000x more data. which gives a naive approximation of 50 minutes per [18:55:20] epoch... [18:57:47] may I naively ask, are you training with cpus right now? i assume the stats boxes don't have fancy gpus in them? [18:58:11] apergos: right traiing on laptop cpu ATM, estimating training on a stats machine would be maybe 10x faster with more cores [18:58:48] but still cpu, i know we looked into it before and the problem is there is no good open source GPU drivers for NN training [18:59:03] (one of the stats machines has an AMD gpu, but we don't have the drivers + software to make it run) [18:59:08] not so much, you are stuck with cuda whazzit I guess [18:59:14] and nvidia [18:59:33] right, and your team has decided that binary kernel drivers are not going into prod machines :) [18:59:46] which sure makes sense [18:59:49] yup [18:59:54] being at the mercy of the vendors for prod is no good [18:59:58] ugh [19:00:25] otoh we can't really ask everyone to build their own desktop with gpu :-/ [19:01:05] even just shifting the data around would be painful, my input data as a compressed sparse matrix is ~4GB [19:01:09] kinda wish we could use our ever closer relationship with google to weasle some google cloud itme out of them for free [19:01:27] mm ouch [19:02:10] yea if i could just submit data + models to google cloud ml platform would be amazing :) This particular dataset is PII free so it would be plausible to do off-site. I don't know if all datasets would be though [19:02:47] I am sure a bunch would not [19:03:19] but eg image stuff might be ok if needed, though how one get a few T of images over there quickly is another thing [19:05:22] well the google cloud "always free" stuff says no gpus, so we'd have to nefotiate something special with them [19:06:24] yea sounds about right [19:06:46] in the meantime nviia doesn't want folsk to use the GTX gpus in data centers *except for mining coins* and that's not a joke. [19:06:51] *nvidia [19:07:19] sheesh, i just double checked. my 4GB compressed sparse input, when converted to dense arrays fed into the network, is 5.5TB :P [19:07:35] so even if we said "let's have a tiny little cluster of 3 boxes with gpus in them for testing only etc" it would have to be the much more expensive gpus [19:07:35] yea they want everyone on their $3k datacenter gpu's [19:07:40] which is complete crap [19:07:58] 5.5t! owie [19:08:21] what you really want is googles TPU's anyways :P TPU's have 64GB memory vs the 12 you see on GPU's. But TPU's arn't something you can even buy yet :P [19:08:30] nope [19:08:37] and that brings us back to finagling a special deal :-P [19:08:41] :) [19:08:46] who's the partner team again? uh [19:10:28] dan foy would be a starting point maybe [19:10:54] and you can tell him I mentioned you in case he looks at you funny :-P [19:11:02] lol, can't hurt to send an email at least :) [19:11:15] it might be good to see who else in the org would benefit from something like this [19:11:27] maybe drop a line by the research team [19:11:36] * apergos pokes miriam gently [19:36:40] ebernhardson: there was a discussion about GPU at our joint offsite with Analytics, I’d ping them in the channel if they are not aware of these ideas [19:43:49] o/ [19:44:03] Hi all. I'm trying to access WikiLabels and, after oauth access, the authorization popup turns into a loop and I can't connect. wiki labels is already authorized in my connected apps at wikimedia [19:44:27] Hi DebianTUX! [19:44:32] I'll check it out quick. [19:44:39] Can you walk through the steps with me? [19:45:00] halfak: hi! i'm working with Henrique Crang and he told me you are the wiki label man [19:45:06] DebianTUX, First start with http://labels.wmflabs.org/auth/logout/ [19:45:11] halfak: no problem, i'm all yours [19:45:21] That will make sure you can fully remove session stuff. [19:45:27] done [19:45:39] { "success": true } [19:45:44] Perfect [19:45:46] Then go to http://labels.wmflabs.org/ui/ptwiki/ [19:45:58] It should forward you to start the OAuth dance. [19:47:28] ok, i'm at the allow/cancel popup [19:47:39] "In order to complete your request, Wiki labels needs permission to access information on all projects of this site on your behalf. No changes will be made with your account." [19:47:42] Click allow and tell me what happens next. [19:48:25] quickly opens http://labels.wmflabs.org/ui/ptwiki/ , a message for checking authorization and then back to oauth [19:48:42] What's your username on-wiki? [19:48:58] Felipeferreira [19:49:44] OK one more thing to try. Can you log into quarry.wmflabs.org? [19:49:50] It should use OAuth all the same. [19:50:13] let me check [19:50:32] yes! i'm logged in [19:51:15] Weird. [19:51:23] ;( [19:51:34] i don't deserve to use wikilabels P [19:51:35] :P [19:51:52] https://labels.wmflabs.org/auth/whoami/ [19:51:54] ^ what do you get there? [19:52:06] { "user": { "id": 52833167 } } [19:52:16] Oh! [19:52:19] You are logged in. [19:52:20] Hmm. [19:52:47] i promisse you i am :P [19:52:59] Thor would say: "not worthy" [19:53:13] Oh! I know what it is. [19:53:16] This is a problem. [19:53:20] man [19:53:29] now it works O.o [19:53:30] You have HTTPS everywhere or some related browser extension don't you? [19:53:58] worked, under http [19:54:10] right. [19:54:22] * halfak files a bug for https not working after oAuth call. [19:55:06] cool, seeing en campaigns [19:55:32] halfak: thanks a lot, bro [19:56:38] Thanks for asking and making us aware of this bug! [19:56:41] DebianTUX, ^ [19:57:04] 10Scoring-platform-team, 10Wikilabels: WikiLabels OAuth handshake doesn't work with HTTPS - https://phabricator.wikimedia.org/T186557#3947294 (10Halfak) [19:57:44] :D [19:58:12] 10Scoring-platform-team, 10Wikilabels: WikiLabels OAuth handshake doesn't work with HTTPS - https://phabricator.wikimedia.org/T186557#3947309 (10Halfak) It looks like there is something going on here: https://github.com/wiki-ai/wikilabels/blob/master/wikilabels/wsgi/routes/auth.py#L64 [19:58:12] i'm showing wikilabel labeling tool to my boss here. we are planing to use the tool for labeling here [19:58:40] Oh cool. :) [19:59:02] * halfak looks into using mwoauth with Wikilabels so we don't have problems like this. [20:02:28] o/ awight [20:06:20] halfak, (sorry, I was in meeting) Re: scoring bot. Actually that's a good suggestion. Just exploring options here! Thanks! [20:20:19] ebernhardson,apergos: defnititely interested on GPU-related discussions ;) I've been using GPUs from AWS services before. Pretty efficient and reliable. And they have options to accelerate data transfer. [21:06:21] awight|afk, coming to docs? [21:06:44] I got stuck on a voice call for a minute, sorry [21:25:28] Our tasks look doable now. [21:25:52] wrong window >.< [21:31:21] Must be a wrong window. Our tasks are not do-able here :P [21:36:16] Our tasks are well-done [21:37:05] Amir1: I tried running revscoring on stat1005 but already got stuck at creating a venv [21:37:10] I get "The virtual environment was not created successfully because ensurepip is not [21:37:13] available. On Debian/Ubuntu systems, you need to install the python3-venv [21:37:15] package using the following command." [21:37:35] and it seems python3-venv is installed but ensurepip is not [21:37:53] tgr. User "virtualenv" [21:37:54] I imagine I could compile it myself but is there an easier way? [21:38:05] *Use [21:38:25] virtualenv -p python3 DIR [21:38:37] https://gist.github.com/halfak/9f4830895496af9e9731 [21:44:11] duh, that was simple indeed [21:44:22] \o/ [21:44:52] halfak: I think we’re back to distributed transaction hell, wrt. JADE. [21:44:58] halfak: See lines 11 and 12 in https://etherpad.wikimedia.org/p/JADE_API_changes [21:45:27] Now the create_judgment call will create a JADE DB entry, and make the MW API call to create a judgment. [21:47:49] awight polygerrit is on gerrit.wikimedia.org now :) [21:50:55] * awight takes a cautious peek [21:51:13] awight it's fast [21:51:18] faster then gwtui at least. [21:51:43] I click “New UI”, right? [21:52:31] Yep [21:52:32] awight ^^ [21:52:48] I like it [21:52:58] awight see my user status :) [21:53:06] awight https://gerrit.wikimedia.org/r/q/owner:thomasmulhall410%40yahoo.com [21:54:10] halfak: on another topic, s/Goals for FY2019/Contingency planning for FY2019/ ? [22:02:18] augh, edited on the wrong wiki [22:03:49] halfak: I’m trying to do something with the “increased staffing” section, IMO it should emphasize what we can do rather than what we’re been unable to do [22:05:58] awight, +1 to the sentiment. [22:06:06] :) [22:10:40] Okay, there’s a rough pass at rewording as positives. [22:11:15] I also added a “risks and challenges” section to explain why we’re f**** at static funding levels [22:11:34] I’ll take another pass in a minute [22:15:22] halfak: Can I list the roles being underserved due to low staffing? [22:15:36] hmm that’s tricky actually, there are so many. [22:15:47] (need to relocate) [22:27:10] halfak: It’s hard to wrap my mind around, but the jade.create_judgment_raw really does seem to avoid the distributed transaction. [22:27:36] We get an at-least-once guarantee on both the JADE DB insert and the MW API call [22:27:47] The consumers can put the message back if they fail. [22:28:34] * awight checks whether that’s true [22:30:13] Nice article about the alternatives: https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a [22:39:13] Another reason to emit a message for create_judgment, at some stage in the operation, is that we want the event log to be a complete record of what happened. [22:40:37] ooof [22:41:03] That plays into the distributed transaction, itself. We don’t want to emit an event if the judgment can’t be created. [22:42:08] Is ottomata EST? [22:42:27] seems so. [22:49:20] halfak: Just noting that the current design doesn’t emit any events. I’m re-adding on the assumption that we do want an event log and want to support consumers. [22:49:36] But we have the opportunity to change our minds :) [22:50:46] +1 just not implemented yet [22:51:08] sorry, by “current design” I mean the changes we just made to the drawing [22:52:13] Oh yeah. I wasn't quite sure how to read that from your diagram. [22:52:18] This is really gross, but seems consistent with the event-based designs we read about: the JADE API creates a jade.create_judgment_raw event, which is consumed by the DB service. If it has failures, it retries until success. Upon success, it emits a jade.created_judgment event, which is consumed by changeprop, which then POSTs to the mw-ext-JADE create_judgment endpoint [22:52:19] just got out of meeting [22:52:33] hmm [22:52:45] awight, I think that should be synchronous and then that solves our problems. [22:52:58] in backscroll, I realized that it doesn't [22:53:04] The only events that wouldn't be synchronous originate from MW and are Truth-no-matter-what [22:53:11] Oh. readng. [22:53:30] I haven’t found a nice way out of the distributed transaction, it turns out. [22:53:41] But what we can do is have a cascade of events [22:53:56] What exactly is the distributed transaction issue? [22:54:15] I don't see the problem with being synchronous here. [22:54:34] it’s that we need to have either all-success or all-failure for: * create JADE DB row for the judgment, * create or edit MW DB page for it, * emit the event [22:55:53] The trick with a distributed transaction is that you need to roll back each of these if the others fail [22:55:58] Ahh fair point. [22:56:25] Could we use the result of the MW edit to make sure that the all-success happens? [22:56:44] E.g. if we get a MW edit in the JADE namespace that we don't have a judgement for, we know that we need to emit a judgement-create. [22:56:48] So we can make the JADE DB insert, call the MW API, and if the MW API fails, then we have to roll back the JADE DB. If that fails, we have no way to guarantee that the JADE DB is consistent [22:57:07] Yes I’m on the same page as your last comment. [22:57:30] OK so it would be sync in all cases except the weird failure scenario [22:57:36] If we have a cascade of events, and the very last step is the “event of record” create_judgment, then we can at least guarantee that the event log is correct. [22:57:51] right. [22:58:31] IMO we’re okay with a few sync steps in there, I’m certainly not an event purist. [22:59:10] so the diagram would be correct, but the jade events of record aren’t emitted until we hear back from MW, so might as well be translations of the MW events. [22:59:27] awight, I'm starting to questions why we aren't just using MW as the main store. [22:59:37] right [22:59:56] In theory, we want this system to be generalizable outside of the MW context [23:00:05] But this cascade does make it really weird to say that. [23:01:17] Most of the failures we’re talking about indicate a serious system meltdown, on the flip side. [23:02:02] The two normal cases are, * OK we got the record and inserted, thanks, and * we already got the record earlier, thanks [23:02:50] AFAICT, most failures outside that are something like, “MediaWiki doesn’t work, hold everything”, “the extension is fubar, redeploy” [23:03:47] Events get us most of what we want outside of MW. [23:04:05] +1, as long as they’re valid like you’ve been saying. [23:04:13] Like being able to use multiple stores (Fast-next-to-ORES, Analysis, etc.) [23:04:36] Right. So what if we use the schema validator that EventLogging uses. [23:04:49] And we build a UI with ContentHandler like Wikibase. [23:05:03] I really like the idea that the API either * queries the DB in a read-only way, or * emits an event that something should happen [23:05:06] * halfak tries to pull back as much as possible to check assumptions. [23:05:10] lol [23:05:31] I do too. I think this is a future of API's level thing we're looking at. [23:05:52] But damn curation and current community requirements make us need to live in MW and be transactional with it. [23:08:20] Let’s say we emit the events and consumers record that. All the API validates is that the user was allowed to do something. What could invalidate the event? [23:08:49] If pgsql or MW DB fails, we just need to stop all business as usual until that’s fixed. [23:08:57] I don’t see any normal use cases. [23:09:27] Stopping event consumption and retrying with polite, exponential backoff or something is fine. [23:10:23] awight, I guess there's some events that don't make sense depending on the order. [23:10:41] E.g. deleting something that has already been deleted. [23:10:58] I think that’s fine, consumers have to accept at-least-once [23:13:03] Oh. But you can have different events from different people. [23:13:22] This isn't an at-least-once type of issue. [23:13:37] sorry? Each event topic is emitted by only one producer [23:16:45] Or you’re talking about race conditions, I guess. [23:17:17] Right. [23:17:30] If we have a local validation strategy we can enforce order validity [23:20:15] I can’t come up with an example where this is a problem. Event order is guaranteed by Kafka. [23:20:47] If the API emits events, I don’t see how we can get out-of-order. [23:20:51] I'm always allowed to delete things if I have the right [23:20:59] It doesn't matter if it was deleted last week. [23:21:13] Validation is just as much about state as it is rights. [23:21:29] Why is it a problem to have double-deletion? [23:21:48] … if the consumer contract includes indempotent at-least-once expectations? [23:24:01] That's not the same event though. [23:24:10] How about if I delete something that never existed? [23:24:38] Do we write a new log event for the deletion to the logging table? [23:24:47] Or ignore the event because it's not valid [23:24:56] Yes (provisionally), I thought we were saying it’s fine because the consumers don’t freak out [23:25:33] It is a problem if we let through an event where an invalid user does something ridiculous, but deleting a non-existent event is harmless. [23:27:18] gtg, but this is a fun detail to tie up tomorrow. [23:28:27] I’ve provisionally changed the diagram to show the sync/async API behavior. The JADE DB service is still hitting the MW API synchronously, to give us a simple way to retrieve the rev_id where a judgment was inserted. [23:29:31] COol. Yeah I'm heading out too. Have a good evening! [23:29:49] o/ [23:30:15] I’ll try to corner ottomata about this distrib transaction issue again. [23:30:26] specifically.