[03:10:55] TimStarling: I'll take a look at what you wrote, but I'm mostly out this week [03:12:47] right, I see that is in your calendar [15:44:20] anomie: around? wanted to talk about recentchanges again... [15:58:48] SMalyshev: Yes, I'm around. I just posted some comments on your patch, too. [16:01:03] anomie: one of the problems I've discovered that I do need namespaces for my usecase :( [16:01:23] I could filter manually of course, but that would be suboptimal [16:02:58] anomie: so I wonder maybe there is another way? the problem with timestamp is that it misses updates - and update can be inserted in the stream with old timestamp, and there seem to be pretty much no limit about how far back it can go [16:03:33] SMalyshev: To query with namespaces and not have a bad query, you'd need an index on (rc_namespace, rc_id). An index on just (rc_namespace) is supposed to be equivalent with InnoDB, but we don't have that index either. Note that the existing code is already problematic if you query with rcnamespace but don't limit on rctype and rcshow=patrolled or !patrolled, for which we have T149077 open already. [16:03:34] T149077: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077 [16:06:33] anomie: yeah we'd probably need an index. Not sure what to do about additional parameters. In general, it looks like we have too many combos to index them all even with timestamp... [16:09:40] SMalyshev: You might miss updates with IDs too. Consider: Process A begins a transaction, inserts a row, then gets preempted. Process B begins a transaction, inserts a row, commits. Process C reads the table, it'll see B's row but not A's. Then A commits and that row shows up. [16:11:16] How much of a window there is there depends on how much other work is done between "inserts a row" and "commit". I think $wgTransactionalTimeLimit may limit the maximum amount of time is in there. [16:11:19] anomie: hmm... I guess this is possible. probably less drastic than timestamps but still possible. So I wonder if there's a way to get stable set of updates that won't miss things? [16:12:37] wgTransactionalTimeLimit is 120 secs, but I can't ask back 120 secs, that's too much data unless we give up on being current :( [16:13:58] For going by IDs, instead of rememebering "last seen ID" for your next query use "last ID seen more than 120 seconds ago". Or some variation on that idea. [16:15:08] but I can't ask for IDs that were 120 seconds ago.... that'd be too much old data, we have tens updates per second, 120 seconds ago it'd be thousands of old updates [16:19:58] I have no more ideas for you then. [16:20:33] anomie: so basically no way to get reliable data from recentchanges without missing things? [16:20:50] There are ways, you just don't like any of them. [16:20:52] I mean reliable reasonable current ones, not old ones [16:21:21] anomie: so far I heard one, use old data, older that 120 seconds. Is there another? [16:21:44] Oh, perhaps $wgMaxUserDBWriteDuration is the transaction-limiting setting rather than $wgTransactionalTimeLimit. [16:23:46] anomie: wgMaxUserDBWriteDuration set to 3 in config. But I am not sure what this means - I've certainly seen IDs that are backwards by timestamp more than 3 secs [16:24:19] the problem is that timestamp there is not related to when the row was written, it is when the change was done (and row may be actually written way later) [16:24:42] You can delay the data you consume to give transactions time to commit, or you can reconsume a window of old data in case transactions committed. I doubt you'd be able to convince anyone to allow you to make whatever kind of blocking reads on the database would be needed to ensure all old transactions commit before your read and no new transactions can start until after. [16:25:27] anomie: I don't want to redo all db structure. I'm trying to find a way to work with existing one... [16:25:39] But "last ID seen more than 3 seconds ago" is better than "last ID seen more than 120 seconds ago". [16:27:06] anomie: ok, that means I still need to query by ID. There's no way to do it in current service model [16:27:24] That's what your patch is for, isn't it? [16:28:43] anomie: yeah, and it looks like it requires new indexes and getting more and more complicated, so I wonder if it's going to be the solution.... looks like even after dealing with all this (which probably would take nontrivial time to set up all the indexes etc.) I still don't get reliable stream [16:31:18] so I wonder if there's some better way I am missing [16:40:19] isn't this use case what the services folks have been building chageprop for? [16:40:27] * bd808 does not have the patch for context [16:41:17] broadly, using a RDBMS as a reliable queue is a hard problem [16:43:55] bd808: changeprop is a good base, but doesn't have the service like recentchanges. It needs to be built yet [16:46:52] there is also https://wikitech.wikimedia.org/wiki/EventStreams but both things are straying away from core and into Wikimedia specialty stuff [16:48:06] I don't care about specialty, but EventStream is not very useful for me as such, as it is not seekable [16:48:34] and also has no filters [16:48:52] *nod* I've heard about both problems [16:49:34] the filtering debate is at T152731 [16:49:34] T152731: Implement server side filtering (if we should) - https://phabricator.wikimedia.org/T152731 [16:49:48] so I am assuming it is possible to build what I need on top of EventStream. I am just trying to postopone the time when I have to learn node.js and implement it myself :) [16:50:14] that seems very sane ;) [16:50:56] you should trick Magnus or Timo into caring about this problem and wait for the magic solution to come from the 'crowd' [16:51:59] that's my secret hope :) unfortunately turns out recentchanges is even worse that I thought it was, so we're missing updates right now, so I'm looking for a temporary fix that I can do right now... [16:52:12] I've been asked to switch RTRC from API-polling to EventStream. I too feel hestitant about doing this without at least wiki-level filtering. [16:52:42] Krinkle: exactly. I don't want to deal with enwiki edits firehose client-side [16:52:56] or wikidata [16:53:00] :D [16:53:07] Krinkle: well, I have to deal with wikidata ;) [16:53:41] but I can appreciate when people that don't need it don't want to see it [16:57:02] Krinkle, SMalyshev: post your use cases on T152731. It sounds like both of you have real world reasons for filtering that will help the team understand why and what you want [16:57:02] T152731: Implement server side filtering (if we should) - https://phabricator.wikimedia.org/T152731 [17:05:51] bd808: Yeah, will do. Though I'm fairly sure the team is aware of it already. [17:37:24] * DanielK_WMDE wibbles [17:37:30] somebody said wikidata? [17:43:12] DanielK_WMDE: only as a curse word ;) [18:26:15] anomie, bd808 I've created a task T161731 where I summarize what's currently missing for me, please feel welcome to comment [18:26:16] T161731: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731 [20:31:09] Krinkle: you can edit https://simple.wikipedia.beta.wmflabs.org/wiki/MediaWiki:Group-user.css . can you make a fix for me? please add: .skin-timeless #p-logo { position: relative; } [20:31:24] currently the badge is all messed up on Timeless. e.g. https://simple.wikipedia.beta.wmflabs.org/wiki/Florida?useskin=timeless [20:43:35] MatmaRex: https://simple.wikipedia.beta.wmflabs.org/w/index.php?title=MediaWiki%3AGroup-user.css&type=revision&diff=3266811&oldid=3266687 [20:47:16] thanks Reedy, you're the best [20:47:36] +1