[11:18:13] duesen: ping :D [13:27:40] Amir1: uh, hey! [13:27:55] hey! [13:28:25] So... this all comes back to my old friend "SiteInfo". [13:28:46] The question is how to get there in small steps. [13:29:42] We can introduce it, make it a part of Site and SiteStore keep it separately, that was my idea [13:29:54] In my mind, we don't need any object. We need a service object that can map from one kind of ID to another. On top, it would be nice if it could map from ID to arbitrary "info", e.g. which cluster a wiki is on, what the content language is, etc. [13:32:28] That would be nice but I'm not sure it would accommodate my usecases. [13:32:52] e.g. https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities [13:33:05] So, how about a super simple interface, that can start out supporting only a very limited set of information, but can be extended to support more in the future... Something like SiteInfo::getInfo( $siteId, $key, $default = null ). We could define a few well-known keys (e.g. API_URL or IS_MEDIAWIKI) [13:34:40] Yeah, that sounds good [13:35:49] Ah, for that use case you'll need SiteInfo::getSiteIds(), which returns all canonical IDs. If we want to support different "kinds" of IDs and aliasing, we could allow something like getSitreIds( [ 'canonical', 'alias'] ) or something like that [13:36:49] Introducing a simple/narrow interface like this will allow us to simplfy and optimize the underlying implementation, and to slowly move more code towards using it. [13:38:01] Hm... we'll also need hasSite(). I think we'll want getInfo() to throw when it is asked for info about an unknown site, but return null when asked for an info key that is not set. [13:38:31] no, that's about something else. I'm terrible at explaning [13:39:07] we could also so SiteInfo::getInfo( $siteId, $key, $default = self::REQUIRED ), so the method would throw if the key is not known and no default is given explicitly. [13:40:27] Amir1: maybe I'm terrible at listening :) What I understodd is that wbgetentities needs to list all available site IDs, for self-documentation and also for validating input. [13:40:29] Let me get back to the problem, currently the api.php to produce list of acceptable values for sites= argument loads SiteStore->getSites() [13:40:31] Is that correct? [13:40:45] that itself is not a problem [13:41:08] the problem is when it needs to load it from APCu, it gets copied again in memory [13:41:32] so you will end up with 1000 rather heavy serializied object being copied all the time [13:41:39] just to get a small list [13:42:40] Yes, that'S nasty. [13:43:01] this showed up again in load.php in another wya [13:43:30] SiteModule (a RL module) also loads this from APCu cache in every load.php request [13:43:47] 1000 sites object being deserialized in every request [13:43:59] We should change how that list is cached. But we can't to that as long as the calling code is asking for data that it doesn't need: SiteLookup::getSites() returns a list of full SiteObjects. The caller then throws that away. [13:44:12] exactly [13:44:26] So we should make a leaner interface better suited for the caller'S use case. That will allow us to optimize the implementation. [13:44:30] I see this as a huge violation of ISP [13:45:20] so I was thinking if we could make Site object have a baby Site [13:45:23] I see it as a violation of open/closed: we are exposing too much to the caller, coupling to internals rather than designing the interface based on the caller's need. [13:45:53] yeah [13:46:09] also Site object is mutable making it more nasty [13:46:14] Can we just make the relevant code stop using Site objects at all? [13:46:31] for that API call yeah [13:46:42] but for the load.php, it's a little bit more complicated [13:46:55] let me grab the code [13:47:26] https://github.com/wikimedia/Wikibase/blob/fd9a91bea924c08bfe47e48a531201a86310abea/lib/includes/Modules/SitesModule.php#L144 [13:47:41] This is being done 1000 times in every request to load.php [13:49:07] it needs type, group, global id, language code, $site->getPageUrl(),$site->getFileUrl( 'api.php' ) [13:49:23] but maybe this one needs a different solution [13:49:30] Also, site->getGroup() [13:52:00] yup [13:52:45] In the SiteInfo world, we could do this with SiteInfo::getInfo( $siteId, SiteInfo::PAGE_URL ) etc [13:53:53] In the SiteInfo implementation, we can add optimization for "lean" caching of values for specific keys. [13:54:16] So e.g. have one cache entry for all the API URLs, one for all the page URLs, etc. [13:55:19] Optimizing cache access is always tricky (entry size vs. number of requests). What is optimal depends on a lot of things. [13:56:49] The caller shouldn't have to worry about that, but we should try to design the interface in a way that allows the caller to expose useful information. E.g. we could allow multiple keys to be looked up at once, or multiple sites at once. That gives the service object to try and optimize in a smart way. [13:56:52] in this case, I can see huge value on caching this. It shows up in flamegraph a lot [13:57:08] yeah [13:58:16] Right... if we has something like getInfoBatch( $sites, $keys ), we can optimize access patterns in the service implementation. The caller asks for exactly the info it needs at once. [14:00:11] We could e.g. have logic that hits "horizontal" (per site) cache entries when the caller asks for lots of keys for a few sites, but "vertical" (per key) cache entries when the caller asks for a few keys for many sites. We can also group together caches for keys that are frequently used together, etc. [14:00:34] Funnily enough such class exists [14:00:36] SiteList [14:00:45] we call SiteList::getGlobalIdentifiers() [14:00:50] that can be sorta cached [14:01:01] but sitelist is not a service on its own [14:01:49] SiteList does too much magic, and it contains Site objects, so it always needs all info about all sites. [14:02:34] The more I think about it, the more I like getInfoBatch( $sites, $keys ). It's simple and obvious, and proves a lot of opportunities for internal optimization based on access patterns. [14:06:31] how to get info on all sites? [14:06:47] Amir1: I'd suggest to introduce a service interface that has (at least) the getInfoBatch() and getSites() methods. It can be implemented on top of SiteInfo, but add extra caching for the relevant use cases. [14:07:06] To get info on all sites, you first call getSites(). [14:07:18] (or listAllSites or whatever you cant to call it) [14:08:03] makes sense [14:08:26] (note that we probably need to variants of getSites(), using flags or separate methods: one that lists all valid site IDs, including aliases, and one that only lists canonical IDs, so each site is only listed once) [14:09:14] maybe the case that includes aliases should be called getSiteAliases, and return a map of alias to canonical ID. Yea, i think that would be nice. [20:45:29] Pchelolo: User is a property of FeaturedFeedChannel, which are cached too [20:46:50] oh right.. didn't notice that. [20:47:24] But also the use of NonSerializable trait in User was removed, so User should be still serializable. [21:10:11] that's a bit I didn't realize (or forgot after it was told to me)