[00:09:40] I wish they made the colours actually mean something in flame graphs [00:09:46] It's wildly confusing [00:09:58] (ik it's not a WMF thing, just a general dumb thing but heh) [00:11:38] In speedscope I think similar colors represent classes in the same namespace [00:27:54] It would be nice if they were colour coded by execution time though [00:28:02] So you can immediately see what is taking the most time [00:29:19] You can immediately see that based on their width [00:29:24] That's the point of the graph [00:29:27] Not really [00:29:40] The widths are not very different [00:29:56] [1/2] ? [00:29:57] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1439774524701937674/image.png?ex=691bbe04&is=691a6c84&hm=d027bfba7439bcb84fa355f25b44c9dfaafb279658856bfd701aca2e9fb8e1a7& [00:30:09] They are very different [00:30:58] Okay I mean height not width [00:31:15] Even so it's very misleading [00:31:51] For example Html::processTemplate is red, which screams "this takes a long time" when in reality it doesn't [00:32:00] Comparatively [00:32:25] I just ignore the colors and focus on the width, but i guess that's subjective [00:32:59] I guess. But if you had half a brain you'd co-ordinate the colours with the timing [00:33:06] Common sense is not that common ig [00:33:23] [1/2] btw do you have any idea how to fix an error like this? I'm trying to use ManageWikiDataStoreFactory at a point in time when other stuff isn't loaded yet [00:33:24] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1439775393547550833/image.png?ex=691bbed3&is=691a6d53&hm=d84f873a35a2e94eb7437a3fcd64502201c35053901dab10569d0831fceeb905& [00:33:37] [1/2] I added the two lines with mwDataStore to DeletedWiki.php [00:33:38] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1439775451747979324/image.png?ex=691bbee1&is=691a6d61&hm=08274b4e6144d61e48e5052272a6c086f9f8c611919676e2aa2a997d34f248b3& [00:33:49] because the wiki cache needs to be synced at that point to fix undeletions [00:34:19] Is this in MirahezeFunctions? [00:34:26] You can't access MediaWikiServices in there [00:34:27] but apparently ManageWikiDataStoreFactory depends on a lot of stuff, and I can't even find where `$wgTmpDirectory` is set in our config and according to mediawiki.org it's false by default? [00:34:41] the code did that before already, it uses CreateWikiDataStore [00:35:00] Let me get my laptop [00:35:17] [1/2] The error page code is called here [00:35:17] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1439775870419079189/image.png?ex=691bbf45&is=691a6dc5&hm=ee53871e7119f52630f318a8bb2fa013a0e1d5cea95329c3404291426134d451& [00:35:25] or rather scheduled to be called [00:36:48] maybe the issue is that the final setup callback is called before settings are applied? [00:37:06] yeah [00:37:11] how do we fix that then [00:37:19] [1/2] so its called before MediaWikiServices::allowGlobalInstance() [00:37:19] [2/2] https://github.com/wikimedia/mediawiki/blob/b6339da63f721b0895fccabe699c0e92cf603f97/includes/Setup.php#L323 [00:37:29] yeah but DeletedWiki.php calls allowGlobalInstance [00:37:42] it just doesn't call $wgSettings->apply which idk whether it is relevant [00:37:56] but giving that the error is due to `$wgTmpDirectory` being false/unset, ig it is? [00:38:25] well if $wgTmpDirectory is not set then ig that will throw the error yeah [00:38:48] if $wgSettings->apply isn't called then I don't think even default settings will be available? [00:39:19] i guess so [00:39:48] maybe it should just call wgSettings->apply() as well before calling ManageWiki code? [00:40:03] this seems like an odd way to do it [00:40:06] yeah [00:40:33] if $wgSettings->apply or anything else is going to be set - so that full wiki context is available - then its probably better to do the check in onBeforeInitialize hook? [00:40:35] thats the route I took [00:40:44] ohh this is where wgTmpDirectory is set [00:40:50] since MWServices will be fully available then and all default settings will be [00:41:13] probably, unless that's going to cause any issues due to the wiki being deleted [00:41:42] but deleted is just a config flag really isn't it [00:41:45] until the db is dropped [00:41:52] is that before or after SetupAfterCache? [00:41:53] at which point its dropped from cw_wikis [00:42:18] its called afterwards [00:42:23] hm [00:42:36] CreateWiki/ManageWiki do a lot of caching stuff in SetupAfterCache [00:42:55] onSetupAfterCache may also be a place to do it [00:44:09] if we could move DeleteWiki.php to SetupAfterCache the ManageWiki/CreateWiki issues would be fixed because they would sync their caches themselves [00:44:33] we would just need to make sure it kills the process after SetupAfterCache [00:44:34] [1/25] my setup goes a little like: [00:44:34] [2/25] ```php [00:44:34] [3/25] public function onBeforeInitialize( $title, $unused, $output, $user, $request, $mediaWikiEntryPoint ) [00:44:35] [4/25] { [00:44:35] [5/25] global $wgConfigCentreWikiFlags; [00:44:35] [6/25] if ( !( $wgConfigCentreWikiFlags & Wiki::FLAG_CLOSED ) ) { [00:44:36] [7/25] // isn't closed; let MediaWiki continue as normal [00:44:36] [8/25] return true; [00:44:36] [9/25] } [00:44:37] [10/25] [ ..... ] [00:44:37] [11/25] $context = RequestContext::getMain(); [00:44:38] [12/25] $context->setTitle( SpecialPage::getTitleFor( 'ClosedWiki' ) ); [00:44:38] [13/25] $context->setSkin( $skin ); [00:44:39] [14/25] $output = $context->getOutput(); [00:44:39] [15/25] $output->setContext( $context ); [00:44:40] [16/25] $output->setStatusCode( 410 ); [00:44:40] [17/25] $output->disallowUserJs(); [00:44:41] [18/25] MediaWikiServices::getInstance() [00:44:41] [19/25] ->getSpecialPageFactory() [00:44:42] [20/25] ->executePath( 'ClosedWiki', $context ); [00:44:42] [21/25] $output->output(); [00:44:43] [22/25] // Ensure MediaWiki doesn't continue processing [00:44:43] [23/25] exit( 0 ); [00:44:44] [24/25] } [00:44:44] [25/25] ``` [00:44:53] which would require running code after SetupAfterCache [00:44:55] i guess [00:44:59] thats for closed wikis, but deleted wikis the same heh [00:46:21] that would be onBeforeInitialize probably [00:48:23] Probably, but I think in either case, it makes little difference, since its calling ::allowGlobalInstance() anyway in DeletedWiki.php, its probably not saving a whole lot of performance versus letting MediaWiki finish init and then doing it in a hook [00:48:58] [00:49:16] i love looking at this channel [00:50:35] wow you're blue now [00:50:40] should work yeah - don't think it needs to have the same function signature does it? since we're ot suing any of the params? [00:51:17] probably [00:51:43] im gonna test it on beta [00:51:45] send it ig [00:54:01] i think i accidentally fully deleted my beta wiki [00:54:04] time to test it on prod ig [00:54:20] on mw183 since that has the ErrorPages change [00:56:03] This why I abandoned MW/CW [00:57:55] [1/2] this breaks deletion pages [00:57:55] [2/2] https://cdn.discordapp.com/attachments/1006789349498699827/1439781564375502879/image.png?ex=691bc492&is=691a7312&hm=23ca3625f10a56f634ab2116d36a0cdc090dc4233095fe2c4d9861cb02299114& [00:58:34] ohh [00:58:36] I know why [00:58:44] it's still calling resetGlobalInstance a 2nd time [00:59:04] I love that error page [00:59:35] It needs to call exit( 0 ) to stop MediaWiki processing since it's in the hook now [00:59:49] Else MediaWiki will add it to the output buffer [00:59:53] it does that already [01:00:02] it needed I think [01:00:17] But that's already been called [01:00:22] By the point that this runs [01:00:27] yes, that's why I just removed it [01:00:30] Oh [01:00:38] I missed the colour lol my bad [01:01:37] this is almost dope and I almost want to see something like this exist [01:01:45] but for something like steward closed wikis and that kind of thing [01:02:29] I think this fixed the issue, thanks a lot @originalauthority [01:02:42] I'm going to deploy this tomorrow because I have to go to bed [01:08:04] Woo, thanks for powering though, enjoy rest [01:08:42] i still wonder why that page still hasn't been changed [01:12:34] Possible now it's been moved to a hook, since the user associated with the request will be known [09:51:35] its that or just an easy thing that can be transplanted to a given wiki's main page and overriding the no permission on other pages, which might be easier than trying to code a special solution [10:18:57] [1/2] Could someone from the Tech department address this request about interwikis? [10:18:57] [2/2] https://meta.miraheze.org/wiki/Community_portal#Remove_the_www._prefix_from_www.github.com_in_the_global_interwiki_settings [10:33:49] I think I can change it too since Meta's interwiki table is global IIRC. A RfF seems like an overkill for something small like this: www.github.com is already on a 301 redirect so this seems uncontroversial. I'm tempted to just action on it and close the RfF, but I'm too sleepy to be using admin powers right now. [10:49:00] I have no idea why that's a vote [10:49:10] Happy for me to do it as tech? [10:53:01] I would if I could find how to edit the global one [10:53:54] It seems changing meta should do it [10:54:24] It did [11:48:41] [1/2] Deployed; it will still take one request per server per wiki for the deletion page to be gone. Since the ErrorPage is shown after the relevant bootstrapping stuff now anyway, I don't think it would have a big performance impact if we synced the ManageWiki cache before showing the deletion page so undeleted wikis work immediately, but that would prob [11:48:41] [2/2] ably require re-initializing the relevant variables and other data retrieved by MirahezeFunctions before the cache sync, and that would get complicated [11:55:22] Here's a dancing banana for your work [11:55:27] https://cdn.discordapp.com/attachments/1006789349498699827/1439947038174216282/image0.gif?ex=691c5eae&is=691b0d2e&hm=1340c5f0c006d5bc99bbc96b631b95f44f9e6d7ca179954ab84f6198b64643c1& [11:55:31] Treasure it always [12:04:34] hey guys, just an idea: instead of placing all the details onto a new wiki's main page, why not place the bcrat stuff into the bcrat's talk page instead, so the main page only shows the "wiki has not been replaced yet" [12:14:10] Thx lol [12:48:28] Lots of people probably won't know how to find that, especially if they're new. I still don't, actually. [12:49:12] And I'm fairly newish to Miraheze, but that's sort of the point I was making so it doesn't really matter. [13:11:09] when does translatewiki.net usually send out the new translations? [13:11:59] I think weekly [13:12:19] https://tenor.com/view/dog-puppy-silly-silly-dog-eyes-gif-10708604210797057779 [13:12:35] i should finish translating the pt-br language before that [16:52:36] @paladox did you deploy your db change? [16:52:52] i used set global (it's dynamic) [16:53:08] @paladox can you roll back? [16:53:14] We've just gone down [16:53:15] i can [16:54:05] @reception123 @pskyechology can one you do comms please? [16:54:10] on kt [16:54:11] it [16:54:43] reverted [16:56:13] @paladox db161 looks to be stalled [16:56:17] i really don't know why that caused an outage (i restarted php8.2-fpm) [16:56:29] looks like we are back up [16:57:05] except db161 wikis [16:58:05] @paladox can you reboot 161? [17:00:40] i restarted mysql before you said reboot as i saw max connects. I'm not sure i should do a reboot whilst it's restarting. [17:00:58] If it's stalled then you might need to hard kill [17:01:05] But graceful is fine [17:01:08] load is 500 [17:01:14] Please acknowledge if you are doing stuff though [17:01:25] That's not fun [17:02:00] @paladox if it doesn't cleanly restart after like 5 minutes do a hard one [17:02:46] i'm going to depool c2 in wgDatabaseClustersMaintenance [17:03:53] by hard you mean go into proxmox and click reboot? [17:04:10] what's our policy on post-mortems [17:04:24] kill -9 on the proc might work [17:04:30] But reset in promox wil [17:04:38] We don't do them [17:04:56] i'm going to do reset @rhinosf1 [17:05:02] any objections? [17:05:31] unless we fuck up real bad? [17:05:32] Nope [17:05:41] We are even worse at it then [17:06:57] db161 is back [17:07:10] all clear on the downtime? [17:07:48] yeh [17:07:51] Should be [17:09:58] Does this require me to toggle deletion status on and back off again for the wikis that got stuck halfway, or should this just naturally resolve on a new visit? [17:10:37] Just visiting the wikis should be sufficient [17:10:58] Fantastic stuff, glad to finally see the death of that bug. [17:11:26] Was pretty rough having to tell undeletion requests: "It's a 50-50 shot of whether your wiki will come back" [17:11:36] will this also fix the issue of new wikis taking on a dead one's name being shown as deleted [17:11:52] I would assume so [17:12:09] oh this is most joyous [17:12:15] you're on a killing spree [17:12:30] Tbf the fix is still not optimal () [17:12:59] The deletion page is shown once by each application server before it refreshes the wiki's cache [17:14:01] There's which would fix this, but I don't really want to touch that PR because I have no idea what impact it has [17:14:42] lgtm (lets gamble, try merging) [17:14:52] lol [17:14:57] i love that take on lgtm so much [17:15:03] gonna tell my mom about it [17:15:22] looks garbage to me [18:46:38] This is why I suggested replacing the cache files with a Redis/memc based cache but 🀷 [18:49:00] I think we'll have to switch to a different format at some point [18:49:01] https://issue-tracker.miraheze.org/T14409 [18:49:44] I personally would prefer one that's either properly synced across all servers or stored on a separate server (e.g. Redis) [18:50:25] Because both the db171 load spikes and the undeletion issues were caused by the file + memcache-based cache invalidation implementation we're currently using [18:51:58] Wouldn't that substantially increase cross-server traffic in situations where it doesn't today/introduce a new SPoF in our design? πŸ€” [18:52:25] (not my area of expertise, just thinking about the shape at a high level) [18:53:08] Ideally we would have some form of redundancy similar to SQL replication so it doesn't all depend on one server [18:53:10] [1/2] Depends how it's architected. [18:53:11] [2/2] Both Redis and Memcached can be made redundant; the Foundation has the cost to do so now. [18:53:18] (also clearly room for improvement over what we do today) [18:53:56] My main concern would be the performance impact caused by contacting an external service (even if it's on an adjacent server) every single request [18:55:00] [1/4] My approach personally has been to [18:55:00] [2/4] Try cache -> unavailable -> fallback to database. [18:55:00] [3/4] Checking cache first is approx 1-2ms delay, (probably the same for checking a cache file) and about 5ms if you need to init the database. [18:55:01] [4/4] Both are negligible in either case as you can get a connection to either and then allow it to be reclaimed by MediaWikiServices - IE, you do not pay for that connection again. [18:55:27] Yeah, we're no top 100 website, but we do push an appreciable about of traffic, so worth considering [18:55:39] [1/2] > probably the same for checking a cache file [18:55:39] [2/2] I think our first step should be to evaluate the performance of the current implementation [18:56:25] Probably - but at some point you have to weigh the marginal performance loss against the headache that comes by the current implementation. [18:56:32] I think we have a lot of room for performance improvement in other areas as well that might be easier to solve [18:57:13] Yeah, also the current implementation is that every x requests, the server takes a couple of seconds to rebuild all dblists [18:57:30] That summoned some heartburn [18:57:33] [1/2] But in any case Redis and Memcached were built to handle thousands of concurrent connections at once so I don't think it would be an issue. [18:57:33] [2/2] And in any case, if necessary, a server can keep the connection open so that subsequent requests don't need a new connection if that makes sense [18:59:42] I think if we're going to rewrite caching at some point, we should also consider what exactly we cache [18:59:49] Some dblists are unnecessarily big [19:00:05] Do we actually need to cache the database list? Is what we need to think about [19:00:44] The only reason it's being used iirc is to push onto wgLocalDatabases and some wgConf stuff that could be replaced with something like this [19:00:49] Let me find ut [19:01:10] https://github.com/Telepedia/TelepediaCore/blob/main/includes/LBFactoryMulti_TP.php [19:01:35] In which case a wiki is only looked up when it's requested [19:01:48] [1/2] We should at some point be able to read one or more dblists for every requests without degrading performance havily, this is currently blocking (security task) because fixing it would require reading a list of all private wikis and setting an option to that list, but the option can't be cached for too lo [19:01:48] [2/2] ng [20:16:24] interested in this one, how can this be done? Since the files are not very small and contain a lot. E.g. configs or in the database file a lot of wiki names. [20:16:36] what's your idea for the key with the data? [20:18:18] You could potentially put it all in one key, but I use different keys for different things, 1 key for perms, 1 key for ext, 1 key for WG variables. That way one can be updated without updating the others. [20:18:43] [1/2] https://gitlab.com/telepedia/mw-config/-/blob/main/LoadWiki.php?ref_type=heads [20:18:44] [2/2] Is what I'm currently running (waits for someone to point out a security vuln πŸ˜Άβ€πŸŒ«οΈ) [20:18:50] ^ @paladox [20:20:25] Theoretically it would probably be easy to just put everything in one cache key since that's what MW already does (just write to the cache instead of a file); Redis supports up to 512mb per key so that would be sufficient as I doubt any one wiki would exceed 512mb per key [20:20:25] oh, and that's quite performative? [20:20:37] @abaddriverlol maybe something we could do ^? [20:20:45] About 2ms for that whole file to run on cache hit, about 6-7 when we have to get a database connection [20:21:08] yeh we don't want to stick a lot in a key, cuz you have to download it and you want it to be really performative [20:21:48] yes, I like the idea as well [20:22:25] [1/2] When it hits the cache I've found it's almost instant, sometimes the whole file takes only 1ms to execute because the cache connection is kept open for that server until PHP cycles that worker [20:22:25] [2/2] Most of the overhead on cache hit is parsing the config and setting it onto wgConf which is pretty quick in any case [20:25:49] I think if you wanted to get even more into the performance things, and were inclined to use Redis rather than Memcached, we could use PRedis (or whatever Redis php ext Miraheze uses) directly to fetch the keys, since you can get multiple keys at once, but that isn't supported by Object cahe, but probably wouldnt be required [20:31:13] oh seems you're still using the php files in https://gitlab.com/telepedia/mw-config/-/blob/main/TelepediaFunctions.php?ref_type=heads ? [20:40:08] Nope it's not loaded https://gitlab.com/telepedia/mw-config/-/blob/main/LocalSettings.php?ref_type=heads#L18 [20:40:21] I just didn't delete it incase I fucked up and needed to revert but it worked fine [21:25:13] Oh [22:58:12] Free space on cp201 is down to ~2gb (out of 440). There are nearly 30G of varnish logs and about 50G of haproxy logs. Might be worth tweaking log rotation if this happens again, but for now I've just forced logrotate to run for haproxy [23:05:13] And it ran out of space first, so I just deleted the log file