[06:36:55] PROBLEM - https://grafana.wikimedia.org/dashboard/db/ores grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/ores is alerting: 5xx rate (Change prop) alert. [06:50:55] RECOVERY - https://grafana.wikimedia.org/dashboard/db/ores grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/ores is not alerting. [08:46:38] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/ORES] - 10https://gerrit.wikimedia.org/r/409267 (owner: 10L10n-bot) [09:41:50] 10Scoring-platform-team (Current), 10ORES, 10Patch-For-Review: Make sure ORES is compatible with stretch - https://phabricator.wikimedia.org/T182799#3957803 (10akosiaris) >>! In T182799#3956108, @Halfak wrote: > ORES is currently deployed in cloud VPS on Stretch machines. This is not an issue of ORES compat... [15:16:27] o/ [15:16:30] I’m in today, fwiw [15:26:11] hey awight [15:30:28] awight, I was wondering -- if we go with the 3rd party JADE service option, will we need to build most/all of the MW-only functionality? [15:31:15] It seems to me that MW-only is a subset of MW+External JADE. [15:31:53] Is there something big that I'm missing? [15:33:20] No, the only piece we need to build in MW is the synchronization to and fro. [15:33:53] btw I’m bugging ottomata in -analytics ;-/ [15:34:20] stopping now! [15:35:36] I think I need to mini-blog about the MW-external shared data abstraction. [15:37:18] awight, seems like we'll need a lot more than simple "sync to and fro" [15:37:34] like what? [15:38:00] We'll need to rewire MediaWiki to not allow direct edits or to handle direct edits in a way that we can sync. [15:38:09] We'll need to rewrite suppression and handle that. [15:38:12] What I’m anticipating is, we need to unpack one data model into the other [15:38:23] Essentially JADE will be listening to what happens in MW and updating as appropriate. [15:38:26] Suppression just happens, and we receive notifications [15:38:27] yes [15:38:43] We don’t write anything in Extension:JADE regarding suppression, happily. [15:38:52] So MW is acting kind of like MW-only where sometimes edits happen through the extension. [15:39:16] awight, how will MW-only handle suppression events that happen directly against the JADE service? [15:39:18] That’s normal, lots of edits are made through the API [15:39:51] awight, this won't be a normal API edit most likely. [15:39:57] I’m not sure what that means. MW-only meaning, we don’t have a JADE service? [15:40:01] Because the user and timestamp will matter. [15:40:05] Right. [15:40:32] In either alternative, we aren’t providing a suppression API [15:40:37] That’s entirely in MW, right? [15:40:45] Well, it's entirely MW if we do MW-only. [15:40:59] and also if we have a 3rd-party service [15:41:00] If we have external JADE, it seems like we'd want to allow suppression there. [15:41:29] ok, well we could do that if we wanted. It would just relay stuff to MW, I guess. [15:42:03] FWIW, here’s a schema for what matters in a suppression event: https://github.com/wikimedia/mediawiki-event-schemas/blob/master/jsonschema/mediawiki/revision/visibility-change/1.yaml [15:42:19] awight, right. that relay will be complicated though. [15:42:27] Especially if we want to handle the case of a backup [15:42:36] I’m not planning to offer a suppression API… [15:43:16] It was part of our spec. [15:43:27] We could, though! I think you’re right, it would just collect (user, timestamp, rev_id, text_visibility, user_visibility, comment_visibility) [15:43:31] I can understand that you want to drop it now that it seems complicated. [15:43:42] We already have this spec'd :| [15:43:44] I understood our spec as, “suppression must be possible" [15:44:33] "User information, comments, and entire endorsements can be suppressed by changing visibility" [15:45:00] gotcha, https://www.mediawiki.org/wiki/JADE/Schema [15:45:07] I guess we didn't spec out the API end points. [15:46:08] Here’s another example of something I’d like to move out of the JADE API and leave entirely within MW, “A flow topic can be associated with every artifact/schema” [15:46:46] Iono. I’m open to lots of approaches here, but I have hardcore scope-limiting instincts after sitting in a Fundraising foxhole for 5 years ;-) [15:47:28] Right. This is a big reason why I'm pushing back on external JADE. Just seems unnecessary and doesn't give us much if anything. [15:47:34] Mind if I rewrite that Schema page according to my current understanding? [15:48:01] Hehe I’d chew my arm off to avoid writing PHP/MediaWiki [15:48:30] Did you see this, yesterday? https://martinfowler.com/bliki/CQRS.html [15:48:38] > Despite these benefits, you should be very cautious about using CQRS. [15:48:51] > adding CQRS to such a system can add significant complexity. [15:49:59] It just seems that we'll need to write PHP/MediaWiki regardless. [15:51:21] I didn't have a chance to read it yet. What's the relevance? [15:51:32] Is MW-only CQRS? [15:52:30] CQRS turns out to be a name for reading synchronously and writing using events, and he says it’s usually a recipe for death by complexity. [15:52:46] That’s before even looking at multi-master sync, which is known to be hard. [15:53:21] Do you have thoughts about the benefits of external JADE, which I listed yesterday evening? [15:54:50] let me review. [15:55:22] I’m going to sew that rant into our parachutes to keep the morale up :D [15:56:20] Most of it is just very-nice-to-haves [15:56:32] but I can’t get around how we’ll include JADE results in ORES. [15:57:03] I don't buy the "Other people will use it argument". I just don't think we can be responsible for that on our first pass. Also, we're not building something technologically complex. I think others would like to have tight integration in the same way we want tight integration in MW. [15:57:05] ORES would have to track and cache judgments with a one-way sync mechanism that == half of what we’re writing for external-JADE, right? [15:57:26] Wait, ORES would have no problem reading Event Stream. [15:57:33] To track judgements. [15:57:36] That’s what I mean—it’s half of our sync [15:57:52] half of what sync? [15:57:58] I don't understand. [15:58:09] the sync between judgments in PostgreSQL and in MediaWiki [15:58:26] Also, what? [15:58:27] So the only part of the sync that we’re evading by going MW-only would be the incoming API which takes create_judgment and splats that into a page. Really not hard. [15:59:00] Also edits. [15:59:03] And suppression [15:59:19] Edits and suppression need to be propagated to ORES, right? [15:59:24] Sure. [15:59:36] But JADE != ORES-JADE backend. [15:59:49] We'll probably want a redis state store for the ORES-JADE backend so that it can be fast. [16:00:21] postgresql is wicked performant, but we’ll see [16:00:50] Either way, it’s the same data model that we’ll be maintaining externally. [16:02:33] Slightly nudging the topic forward, I’ve been thinking about where to write each piece of the data model munging, in any sync scenario. This conversation is convincing me that I should write that in MediaWiki. [16:02:49] +1 that it will be useful in either the MW-only or external-JADE designs. [16:03:16] In MW-only, what sync scenario are you talking about? [16:03:26] It’s also the right place for that code to live, because the mangled MW-page representation should only live inside of MW, and external systems would want the proper data model. [16:03:37] In MW-only,—hold on [16:04:04] What is this "mangled MW-page representation"? [16:04:33] Sorry, it’s the crappiest oval I’ve ever drawn, but see here: https://docs.google.com/drawings/d/1Lagl0BJWVWHNvHLy5y6RNNKvl0C1tdVrE5YniwgqFJY/edit [16:04:50] The red oval needs to be done for either MW-only & ORES, or for JADE-external & ORES & MW-ext [16:05:48] “mangled MW-page representation” is the hybrid thing where judgments live as revisions in MW pages. [16:05:57] I feel like we're bouncing all over the place. [16:06:04] I’m up for a call [16:06:20] Sec. I need my headphones. [16:06:38] It’s a good conversation, but I feel like we’ve been over a lot of this repeatedly and I’d like to write down some conclusions soon. [16:07:00] The sync-to-ORES thing that just came up today seems really important. [17:08:01] 10Scoring-platform-team, 10Collaboration-Community-Engagement, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10User-notice-collaboration: Deploy ORES filters to Simple Wikipedia - https://phabricator.wikimedia.org/T182012#3958867 (10Halfak) I just talked to @Adotchar and it appears that this is not do... [17:13:06] halfak: Kinda strange, https://github.com/adamwight/mw-ext-JADE/compare/example_data [17:13:44] Everything else we wanted is included as vanilla MediaWiki metadata, AFAICT [17:15:07] My thought was to be verbose about the artifact, because extracting that information from the Jade: is a custom calculation, and requires API calls…. Does that make sense? [17:15:53] <awight> Also, maybe we provide a schema_uri and not the schema_id/_name [17:19:12] <halfak> awight, maybe something like "schema": {"spec": "https://.../damaging/v1.json", "name": "damaging"} [17:19:39] <awight> +1 [17:19:41] <halfak> Also, we should probably drop "title" because it can change. [17:19:47] <halfak> E.g. someone might rename the "Earth" page to "Terra" [17:19:51] <awight> nasty [17:28:34] <awight> Wait, isn’t it worthwhile to include the title at the time the artifact revision was created, as part of the context? [17:28:44] <awight> Namespace seems helpful as well. [17:29:30] <awight> I’m trying to save the client from making API calls to get the context, but OTOH they’ll have to make an API call to get the revision content. [17:31:37] <awight> We could dispense with page_id too, and just have “db_id”, which is interpreted according to “type”. [18:24:33] <awight> halfak: want to review that branch so I can merge and have a valid URI for the schemas? [18:25:22] <halfak> the example_data one? [18:26:21] <awight> yessir [18:30:12] <halfak> Got 1.5 hours of meetings and then can come back to this. [18:33:13] <awight> no rush, I can hack in file URIs for now [19:25:04] <awight> PHP JSON schema validator is making me want to pull my own teeth [19:25:48] <awight> Why would I expect a package with 12.M downloads have legible docs? [19:54:14] <legoktm> awight: are you using the justinrainbow one? [19:54:44] <awight> legoktm: /me hangs head [19:54:45] <awight> yes [19:54:52] <awight> Let me show you something... [19:57:11] <awight> You probably want to stay out of these weeds, but https://github.com/adamwight/mw-ext-JADE/commit/7e51713bd11966f8beab1fe4bab467e646126812 [19:58:09] <awight> The direct JsonSchema\Validation calls work as expected, but same call inside of JudgmentContent always returns true. [21:08:32] <awight> legoktm: lol I wasn’t serious about not getting into the weeks. Did you have some experience with justinrainbow’s lib to share? [21:08:36] <awight> *weeds [21:10:00] <legoktm> whoops, I forgot to look at this channel again [21:10:06] <legoktm> yeah, we use it for extension.json validation [21:10:49] <legoktm> https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/includes/registration/ExtensionJsonValidator.php [21:13:31] <legoktm> And I think Kartographer also uses it for their geojson validation? [21:17:58] <awight> legoktm: Right on, thanks for the hints. [21:18:06] <halfak> o/ [21:18:08] <halfak> awight, finally done with meetings and stuff. Picking up that brach [21:18:10] <halfak> *branch [21:18:17] <awight> At this point, I’m imagining that something else in MediaWiki is messing with internal state of the library. [21:18:35] <awight> halfak: Cool—I have a WIP edit validator running through unit tests [21:19:04] <halfak> Nice. [21:19:37] <awight> legoktm: Well, ExtensionJsonValidator calls it exactly the same way I do. [21:20:08] <awight> Yeah AFAICT, it’s getting exactly the same input in my TestSchemas test as in TestJudgmentContent. [21:20:28] <awight> Hopefuly it’s something embarrassing and syntactical... [21:21:07] <legoktm> I used to run into problems with the version of the json-schema draft being used [21:23:34] <awight> Interesting. I’m validating my schemas against the json-schema draft 07, but then I’m validating test fixtures against my custom schema. [21:23:47] <awight> I’ll look for hidden state. [21:44:40] <awight> legoktm: lol. $this->getData()->getValue(); [21:45:08] <awight> Turned out, Validator would happily take a Status object and tell me that it has the desired JSON structure. [21:48:08] <awight> OK, the validation tests pass. [21:48:33] <awight> I’ll smoke test to see if this means the editor will fail if the schema is broken. [21:48:38] <halfak> awight, just finished my review. I added a couple of comments. [21:48:41] <awight> It should. [21:48:43] <awight> great! [21:48:51] <halfak> I wish we had multi-content revisions so we could store meta-data somewhere else. [21:49:17] <halfak> It feels weird to store "entity" in the JSON content, but I think it's the best available alternative. [21:51:09] <legoktm> awight: ouch >.< [21:51:16] <awight> hehe [21:51:52] * halfak starts working on the new ORES cluster. [21:52:02] <halfak> awight, how do you feel about merging the new ORES wheels? [21:52:18] <awight> halfak: We don’t need to do it until we’re fully migrated, right? [21:52:23] <awight> I don’t like having master != production [21:52:25] <halfak> Maybe there's a way I can tell scap to deploy a branch? [21:52:27] <awight> yes [21:52:34] <awight> just check out that branch :) [21:52:40] <halfak> Oh. duh. [21:52:41] <halfak> lol [21:52:48] <awight> Don’t forget to limit the machines you’re deploying to... [21:52:53] <halfak> What command structure do you use for that? [21:53:37] <awight> scap deploy -l "ores*" "(non-production) have fun” [21:53:55] <halfak> Cool. Thanks [21:55:12] <halfak> Strange. Looks like there are edits in the repo dir on tin [21:55:19] <halfak> "group_size: 1" [21:55:23] <halfak> Know anything about that? [21:55:26] <halfak> In scap.cfg. [21:55:39] <awight> no, that sounds like akosiaris though [21:55:56] <awight> Out of curiosity, lemme see what that option does. [21:56:07] <halfak> Looks like akosiaris [21:56:34] <halfak> https://phabricator.wikimedia.org/P6677 [21:56:37] <halfak> ^ awight [21:57:24] <awight> yikes [21:58:24] <wikibugs> 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3959677 (10Halfak) I found this in our deploy repo. {P6677} Not sure what is going on as this change was not submitted to gerrit AFAICT [21:59:14] <awight> I don’t understand what group_size does. Possibly it slices a list of hosts into subgroups, so defeats the parallelism? [21:59:20] <awight> very odd. [22:01:00] <awight> Yes, my undereducated guess was right. [22:01:07] <awight> https://doc.wikimedia.org/mw-tools-scap/scap3/repo_config.html#available-configuration [22:01:14] <awight> search for group_size [22:01:57] <awight> halfak: ^ Safe to revert [22:05:34] <halfak> awight, gotcha. [22:05:34] <halfak> Lots of other changes though. [22:05:34] <halfak> I'll get a diff and we can look at them together :D maybe [22:05:41] <halfak> https://phabricator.wikimedia.org/P6678 [22:06:08] <awight> halfak: What’s the base revision btw? [22:06:17] <awight> == master? [22:06:29] <awight> or stretch_migration? [22:06:32] <awight> or something… else [22:07:34] <halfak> still on master [22:07:47] <halfak> oh wait. one saec. [22:08:59] <awight> Those changes all look helpful. [22:09:31] <awight> Also like cleanup that we could live without until akosiaris pushes for review. [22:12:36] <halfak> Maybe I should stash it for him? [22:12:46] <halfak> And just work from stretch_conversion [22:14:38] <Vermont> Halfak [22:14:41] <Vermont> hi [22:15:07] <halfak> o/ [22:24:21] <wikibugs> 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review: Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3959737 (10Halfak) OK I put all of the changes in "alex_stuff" ``` halfak@tin:/srv/deployment/ores/deploy$ git branch -l CELERY_4 STABLE S... [22:25:10] <halfak> Vermont, what's up? [22:25:27] <Vermont> Halfak: just saying hi [22:25:35] <halfak> ^_^ [22:25:44] <awight> +1 stashing is solid [22:39:09] <awight> o/ [22:45:01] <halfak> WTF >:( [22:45:28] <halfak> I just had a scap deploy fail because the hosts were unreachable. But they are online and they work. So I try again and it goes lightning fast, but did nothing. [22:51:53] <wikibugs> 10Scoring-platform-team (Current), 10ORES: Preliminary deployment of ORES to new cluster - https://phabricator.wikimedia.org/T185901#3959811 (10Halfak) Posted this in #wikimedia-releng: ``` [16:48:53] <halfak> So, I just did a scap deploy that failed with a stream of "Timeout, server ores100*.eqiad.wmnet not...