[01:04:53] 10ORES, 10Scoring-platform-team (Current), 10Analytics, 10Patch-For-Review: Wire ORES scoring events into Hadoop - https://phabricator.wikimedia.org/T209732 (10awight) Updated patches should have working DDL and HQL scripts, but I still need to refine and smoke test the job definitions. Denormalized outpu... [01:06:36] Getting my butt kicked by Oozie. [01:07:01] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/482753 [01:10:04] gotta run [04:04:17] o/ [09:24:33] 10Scoring-platform-team, 10Diffusion, 10GitHub-Mirrors, 10Release-Engineering-Team: LFS objects are not mirroring from Github through Phab to Gerrit consistently - https://phabricator.wikimedia.org/T212818 (10hashar) Can you better describe the issue? It is not clear to me what is happening here. [09:38:38] 10Scoring-platform-team, 10Gerrit, 10GitHub-Mirrors, 10Release-Engineering-Team: articlequality repo mirroring is broken - https://phabricator.wikimedia.org/T212962 (10hashar) >>! In T212962#4859205, @hashar wrote: >> We have several repos which each contain large machine learning model files, and these re... [10:08:56] 10Scoring-platform-team, 10Diffusion, 10GitHub-Mirrors, 10Release-Engineering-Team: LFS objects are not mirroring from Github through Phab to Gerrit consistently - https://phabricator.wikimedia.org/T212818 (10Ladsgroup) This issue is easily fixable. Let me explain why it happens. (Probably duplicate of T21... [10:09:25] 10Scoring-platform-team: Gerrit repo scoring/ores/editquality LFS broken (smudge filter lfs failed) - https://phabricator.wikimedia.org/T212544 (10Ladsgroup) I don't think this is a big issue: T212818#4865068 [10:55:22] 10ORES, 10Scoring-platform-team, 10translatewiki.net: Are we breaking translatewiki's API? - https://phabricator.wikimedia.org/T213131 (10akosiaris) No service running in production can reach out directly to services running on the internet. It needs to go via a proxy as @Bawolff pointed out above. That's a... [15:06:17] o/ [16:07:32] halfak: Feel like doing SoS this morning? I think we have to set up a new rotation... [16:08:00] Hmm. Not especially, but I guess I could. [16:08:28] * halfak tries to process akosiaris' opposition to the translatewiki model. [16:09:59] halfak: essentially it's that it should not be the job of ores.wikimedia.org to serve scorings for an external wiki [16:11:22] Right. I see that summary statement but I'm struggling to understand its foundation. [16:11:25] halfak: One alternative is to serve that model from labs... [16:11:38] I don't think it is well argued so I'm trying to pick it apart. [16:11:51] awight, I've got some notes on that. I'll post them shortly. [16:12:32] akosiaris, TL;DR: I don't see how "separation of concerns" applies so I'm trying to address other arguments against supporting translatewiki. [16:13:09] halfak: The problem that akosiaris is describing makes sense to me, for example if TWN is taking 10s to answer requests one day. In that case, there's nobody watching TWN graphs and we're unaware of the issue. ORES will slog through the high latency and it will affect production services by keeping our workers threads busy. [16:13:28] I'll just post what I have because we're re-hashing arguments I have addressed. [16:13:38] 10ORES, 10Scoring-platform-team, 10translatewiki.net: Are we breaking translatewiki's API? - https://phabricator.wikimedia.org/T213131 (10Halfak) @akosiaris, I'm confused by the source of your opposition. How exactly does this violate the separation of concerns principle? Surely the fact that we could othe... [16:13:42] ^ [16:13:53] halfak: I'll take SoS today, no worries [16:14:03] awight, we're already in a high latency situation with TWN [16:14:21] Because it's fully blocked, requests use the max retries and timeout before failing. [16:14:45] At most a single origin IP can block two uwsgi workers at a time. [16:15:01] Due to poolcounter. [16:15:14] Is there changeprop running for TWN? [16:15:38] Or some other job? [16:16:01] No changeprop. So they'd need to use the mediawiki jobqueue version. [16:16:10] Which we're currently also doing. [16:17:06] That could get around poolcounter... Hmm. Maybe we should just not let it get around poolcounter. I don't see an issue with translatewiki's currently level of activity. [16:17:42] Cool. Perhaps we can work around the coupling concerns by hosting the model on a TWN server? [16:18:04] Wait... you're proposing a new ORES install? [16:18:34] Also, I don't think we get to call this a "coupling concern" if we're continuing to use clearly defined interfaces. [16:18:47] I think this is a "Not our Cluster" concern. [16:19:31] For better or worse, TWN is a critical part of our production infrastructure. [16:20:48] Maybe relevant: https://translatewiki.net/wiki/Succession_plan [16:22:36] Moving TWN onto WMF hardware seems like the best solution all-around, but I'm assuming there is a big political landmine or it would have happened already. [16:22:54] https://meta.wikimedia.org/wiki/Betawiki_being_hosted_by_Wikimedia [16:23:03] Apparently Betawiki is the old name of TWN [16:23:19] Seems like there was consensus and then nothing was done (from a quick scan) [16:23:37] 10ORES, 10Scoring-platform-team, 10translatewiki.net: Are we breaking translatewiki's API? - https://phabricator.wikimedia.org/T213131 (10Nemo_bis) > This sounds more like Relying on Anything that isn't Ours is Bad(TM). It seems that the concern here is that communication with non-Prod systems that we don't... [16:24:20] 10ORES, 10Scoring-platform-team, 10translatewiki.net: ORES instance can't connect to translatewiki.net API - https://phabricator.wikimedia.org/T213131 (10Nemo_bis) [16:24:33] 10ORES, 10Scoring-platform-team, 10translatewiki.net: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10awight) [16:24:57] Maybe ping nikerabbit? [16:26:17] 10ORES, 10Scoring-platform-team, 10translatewiki.net: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10Halfak) This case may be a bit unusual. With Yandex, we're relying on an external service in order to serve WMF productio... [16:36:55] <_joe_> hey, before I get into a discussion (T213131 ) without getting the details of it [16:36:56] T213131: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 [16:37:19] <_joe_> this is about calling translatewiki to score articles on our platform? [16:37:55] <_joe_> so an article on itwiki gets rated, and the scoring system might need to call translatewiki.net? [16:38:01] <_joe_> or something else? [16:40:19] _joe_ something else. [16:40:28] We're scoring edits to translatewiki for translatewiki. [16:41:01] <_joe_> oh ok, via the production ORES? that's peculiar indeed. Let me try to explain why [16:41:38] <_joe_> there are two pattern of interaction between a service and an external entity [16:41:56] <_joe_> where I define "external" as "something I cannot control from root" [16:42:18] <_joe_> by this definition, translatewiki is an external entity to us [16:42:29] ^ right. Understood. TWN is weird. [16:42:41] In that we really ought to have it in a place where you *can* control it from root. [16:42:43] <_joe_> so, it might happen that we rely on an external entity in order to serve some content [16:43:12] <_joe_> take the yandex case Nemo was mentioning in the ticket [16:43:51] Right. That case seems more problematic. If Yandex goes down or otherwise stops serving us, our production tools/services suffer. [16:44:03] <_joe_> that comes with various caveats, like the ones you underlined in https://phabricator.wikimedia.org/T213131#4866380 [16:44:24] <_joe_> if yandex goes down, our mt service is degraded, but will keep working [16:44:29] <_joe_> but anyways [16:44:37] Right! [16:44:43] <_joe_> this is one model of interaction with an external entity [16:44:48] <_joe_> a rather problematic one [16:44:56] <_joe_> then there is another type of interaction [16:44:57] Same for ORES except no degradation even! [16:45:11] <_joe_> where the external entity is the client [16:45:18] Right. [16:45:18] <_joe_> so it asks information to the service [16:45:37] <_joe_> this is a completely different type of interaction [16:45:58] <_joe_> in our case, because we love logical loops when creating architectures, we would put ORES in the condition of doing both [16:46:17] <_joe_> it would be called from translatewiki and will have to call back translatewiki too [16:46:57] Right. Essentially, translatewiki.net doesn't know what information to provide in the first call (and it shouldn't know) so we need to ask for more information before we can respond. [16:47:09] This is how all wikis work actually. [16:47:18] But TWN is outside of prod. [16:47:19] <_joe_> this kind of loop is problematic locally (when we do this kind of logical loop on a highly performant LAN) [16:47:40] <_joe_> it seems to be very dangerous over the open internet [16:47:41] _joe_, it's the price we pay for separating concerns [16:47:50] _joe_, what's the specific danger? [16:48:24] <_joe_> that you create a positive feedback system where the unavailability (or flooding) of an external entity (or our service!) renders both mostly unavailable [16:48:53] <_joe_> having the loop only raises the possibility a flood of requests on translatewiki brings it down [16:48:55] I'm not sure I see what you're saying. Could you provide a more concrete example? [16:49:02] <_joe_> sure [16:49:14] <_joe_> let's say someone makes 200 edits per minute on translatewiki [16:49:31] <_joe_> this will cause ores to make 200 more requests to the twn api [16:49:50] <_joe_> this will make the flood even worse [16:50:09] <_joe_> this will probably be fine on our side because we're much larger than twn [16:50:20] Also we have rate limiting in place in ORES. [16:50:22] <_joe_> so ores would just time out a few workers [16:50:29] And TWN can rate-limit us. [16:50:29] <_joe_> and it has rate limiting, yes [16:50:54] <_joe_> yes, but what I'm trying to say is that if we did something like [16:51:13] <_joe_> sorry, let me get back to my original line of thinking for a second [16:51:32] <_joe_> I think we would minimize/remove concerns in one of the following two ways [16:52:01] <_joe_> 1 - modifying slightly the ORES api so that it can accept scoring requests that include all the data needed in the payload [16:52:42] <_joe_> 1a - recognizing how important twn is and trying to include it in our env [16:53:04] <_joe_> 2 - helping the twn folks to install their own ores mini-cluster [16:53:25] We can accept scoring requests that include all of the data already. Hmm. We don't require that because it's strong coupling. [16:53:35] <_joe_> what do you mean? [16:53:46] <_joe_> that requests need to be synchronous then? [16:53:53] You can send all of the data ORES needs to score something and it won't make an external request. [16:54:06] <_joe_> yeah, can't twn just do that? [16:54:13] 10ORES, 10Scoring-platform-team, 10translatewiki.net: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10awight) @Nikerabbit You might be interested in this discussion—one of the alternatives being discussed is to finally host... [16:54:14] Most clients aren't TWN. [16:54:27] <_joe_> it would minimize the amount of round-trips [16:54:39] <_joe_> most clients don't need ores to score their data :) [16:54:48] I'm talking about ORES clients. [16:54:48] <_joe_> but they want scores for our edits [16:55:02] User-scripts, bots, etc. [16:55:07] 10ORES, 10Scoring-platform-team, 10translatewiki.net, 10Security: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10chasemp) [16:55:38] <_joe_> and those don't ask us to score anything outside of our production, right? [16:55:50] With TWN they would. [16:56:03] I mean, they can't now because until now, we didn't score anything outside of Prod. [16:56:11] <_joe_> oh I see [16:56:16] <_joe_> you want to store that data [16:56:23] <_joe_> in ores [16:56:34] <_joe_> I thought you wanted to store the scores in translatewiki [16:56:34] Hmm. Not sure what you mean by ORES. [16:56:46] *mean by store in ORES [16:56:48] What data? [16:57:02] <_joe_> the score for revision XYZ of translatewiki.net [16:57:12] <_joe_> where do we store it? [16:57:31] <_joe_> I thought we'd store it in twn's database as a consequence of each edit [16:58:15] <_joe_> if your idea is to allow ores to respond to any client for scores on arbitrary revisions on twn.net [16:58:16] Oh yeah. We have our own internal cache. TWM would also have its own. We score things on demand and in real time to maintain an LRU. [16:58:25] <_joe_> that won't work well, IMHO [16:58:27] _joe_, right. That's how ORES works. [16:58:32] It's been working well so far :) [16:58:42] <_joe_> on twn? [16:59:11] Well, it works on our experimental wmflabs cluster. [16:59:19] <_joe_> I'm fine with that! [16:59:32] <_joe_> it's not something I can get paged for on saturday [16:59:37] Obviously it doesn't work in prod because we haven't moved forward with the proxy after akosiaris filed his opposition. [16:59:44] I'm not sure why you asked about that. [16:59:53] Should I get paged on a saturday then? [17:00:03] For a wmflabs service supporting TWN? [17:00:06] <_joe_> I pretty much agree with alex, now that I have a clearer picture of what's proposed [17:00:43] I understand that you agree, but I don't think you appreciate the position you are asking my team to be in with adding an SLA to our WMFlabs cluster. [17:00:48] 10ORES, 10Scoring-platform-team, 10translatewiki.net, 10Security: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10awight) [17:01:00] <_joe_> halfak: did I imply that? I just said what you proposed to do is, in my professional opinion formed in years of doing this job, potentially problematic and not something I'd like us to do [17:01:21] Right. I understand that. FWIW, I was asked to do this as well. [17:01:39] <_joe_> and yes, probably if twn is so fundamental for us (it is, IMHO), it should be in our production environment [17:02:00] <_joe_> or at least receive adequate funding (and maybe be hosted not two continents away from everything else) [17:02:41] * awight loudly types into empty team hangout [17:02:43] <_joe_> to the extreme, I'd prefer to help maintain a small ores cluster for them on a cloud once we've moved to kubernetes [17:02:51] <_joe_> ahha sorry halfak [17:02:54] Agreed. So in the meantime, it looks like you'd like to say no to Security and TWN based on concerns about accessing an external server. And I'd like to say, "No" to promising to keep the model online in labs. [17:02:56] <_joe_> seems like you have a meeting [17:03:12] yup [17:03:14] * halfak runs away [17:03:26] <_joe_> halfak: I'm ok with a temporary solution, if we have a goal and a deadline for moving away. [17:03:37] <_joe_> I can discuss this with chase too [17:03:45] <_joe_> if the request is coming from security [17:25:31] 10ORES, 10Scoring-platform-team, 10translatewiki.net, 10Security: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10awight) [17:25:34] 10ORES, 10Scoring-platform-team, 10translatewiki.net: ORES should be configurable to access some APIs through a proxy - https://phabricator.wikimedia.org/T213203 (10awight) 05Open→03Declined Actually... this is probably a bad idea. [17:51:34] 10Jade, 10Scoring-platform-team, 10Design: Jade Wireframes: Entity view mode - https://phabricator.wikimedia.org/T212379 (10Halfak) [17:53:17] * awight growls at Chrome coming up with a new UI convention for -Q [17:54:07] _joe_, thanks for chatting with us about this. I really appreciate you working through your perspective on the ORES/TWN issues. [17:54:41] I think that, in the short term, we want to raise a minor alarm about the problematic status of TWN. [17:54:52] And to explore supporting TWN from wmflabs. [17:54:58] <_joe_> halfak: heh I just wanted to understand the options, now I'm off for the day, but I'll comment on the ticket [17:55:05] With a very minimal SLA. [17:55:14] Thanks all the same. [17:55:16] :) [17:55:21] <_joe_> what's the SLA of twn btw? :) [17:55:29] No idea :) [17:55:39] <_joe_> I'm not sure there is one. Your SLA can't be better than that [17:55:46] I don't think we even have one written down for ORES prod. But we enact on nonetheless. [17:55:56] <_joe_> yeah, about that :) [17:56:08] <_joe_> we should really agree on SLI/SLOs for the service [17:56:13] <_joe_> when we move it to k8s [17:56:22] * halfak looks at akosiaris for help knowing what a good SL* statement looks like. [17:56:31] <_joe_> ahah sorry I was about to link [17:56:47] <_joe_> https://cloud.google.com/blog/products/gcp/sre-fundamentals-slis-slas-and-slos [17:56:49] <_joe_> :P [17:57:03] <_joe_> I apologize for the acronym attack [17:58:16] " Don’t make your system overly reliable if you don’t intend to commit to it to being that reliable. " [17:58:17] lolol [17:58:33] We need a chaos monkey for ORES. We've been up for too long. [17:58:37] (not really) [17:58:51] Woah. "Within Google, we implement periodic downtime in some services to prevent a service from being overly available." [17:58:55] Wow. [18:02:49] 10ORES, 10Scoring-platform-team, 10translatewiki.net, 10Security: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10Nemo_bis) [18:04:25] 10ORES, 10Scoring-platform-team, 10translatewiki.net, 10Security: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10Halfak) [18:05:33] Looks like no one is here for BS time so I'm taking my lunch [18:05:55] another lenghty thread at wikimedia-l (and I'm in part responsible) :| [18:13:10] which one? [18:19:18] the blocks targeting minorities one [18:20:30] Well, I wouldn't claim too much responsibility unless you're the one targeting minorities with blocks [18:22:36] No, I am not [18:22:49] to the contrary I desysopped the guy [18:51:18] 10ORES, 10Scoring-platform-team, 10Documentation: Peer review ORES Feature Injection doc page - https://phabricator.wikimedia.org/T213207 (10srodlund) Edited lightly for language. [19:24:23] 10ORES, 10Scoring-platform-team, 10translatewiki.net, 10Security: New ORES model relies on translatewiki.net API, which is not hosted on WMF production - https://phabricator.wikimedia.org/T213131 (10chasemp) I will bring this up in the weekly meeting for the security team but I wanted to respond briefly no... [21:50:52] New entity view: https://docs.google.com/drawings/d/1BZsRNrKHcjetpMRvl66DJqeonLTrz59LA3iSbfXfiM0/edit [21:56:05] (03PS1) 10VolkerE: build: Update 'stylelint-config-wikimedia' to v0.5.0 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/483292 [22:00:54] I want to do one more before I move on to other views: One where we use the single column mobile diff. [22:31:58] harej, ^ [22:32:19] I am trying a few different strategies to get rid of the tabs but still achieve the goals of the design. [22:32:23] I will look at it soon [22:32:27] It's very easy to make it very confusing ^_^ [23:53:42] (03CR) 10Hoo man: [C: 03+2] Store content quality as an integer index (031 comment) [extensions/JADE] - 10https://gerrit.wikimedia.org/r/476994 (owner: 10Awight) [23:56:00] WOOT [23:56:18] ^!!! [23:59:27] (03Merged) 10jenkins-bot: Store content quality as an integer index [extensions/JADE] - 10https://gerrit.wikimedia.org/r/476994 (owner: 10Awight)