[02:02:53] I'm confused, since you're just trying to allow stewards to edit any page you would just return true if they're a steward, no? [02:15:16] Yeah you could just `if ( MediaWikiServices::getInstance()->getPermissionManager()->userHasRight( $user, 'editall' ) ) { $result = false; return false; }` [02:15:39] If I'm getting this correctly [02:16:51] Bizarre of them to deprecate a hook named "userCan" and recommend replacing it with a hook named "getUserPermissionsErrors" though [02:39:08] @pixldev I stashed your changes to mw-config on test151 since I'm testing other PRs. [04:02:19] Thank you! [04:03:45] Actually two hooks LOL [04:06:35] I have a preference for adding the actual rights in case other things would check rights instead of doing the full userCan but i think im too attached to it maybe [04:07:03] Not to say it’s bad just that I feel I’m too inclined to prefer that over considering alternative implementations [13:55:39] discordapp.com/invite/TvkPacEUkR [13:55:39] @tickingcuckooclock_offical No invite links. [13:59:35] Then you can call Permissions manager:: addTemporaryUserRight I guess which will add it just for the reqeust [14:00:24] MediaWiki will automatically remove the right after the function ends (although you're still in the situation where other calls won't know about the right since those temporary permissions are scoped to the function it's running in) [14:09:24] I'm asking, but are there plans to pull all the changes to Citizen skin as of right now to MH [14:35:03] It needs a security rereview [15:13:17] https://issue-tracker.miraheze.org/T15192 [17:59:44] _waves at @kostajh_ [20:04:24] Any thoughts on https://github.com/miraheze/mw-config/pull/6369? I'm still not sure what's the best way to handle special pages in languages other than English, but at least we should get something going so that search engines are no longer wasting time sending requests to special pages. [20:08:23] Google will not visit the page if you mark it as disallowed so it won't know there's noindex,nofollow in the response [20:09:18] Maybe you don't have to worry about that the comment just sounds misleading [20:11:02] Did you check robots.txt on Weird Gloop wikis, they have a lot more ignored parameters [20:15:47] No, no and no [20:15:56] See my task from earlier [20:16:07] [1/2] Yeah I looked at WG's and wgg's robots.txt. I was hoping to start with something simple since the majority of issues come from crawling special pages and hitting pages such as `action=edit` and `action=history`. If it made a difference in a good direction we can continue adding more. [20:16:07] [2/2] I'm still not sure if a comprehensive robots.txt is the best solution for this problem. Maybe I'm just too attached to the nofollow idea. [20:16:09] Adding them to robots.txt isn't really a good idea [20:16:37] Did you check how many requests you get from Googlebot on these pages [20:16:55] Setting any links towards special pages to nofollow and the special namespace to nofollow,noindex should do more [20:17:01] Too many [20:17:16] Well the robots.txt change can fix that at least [20:17:28] There are something like a 100M mostly special page URLs sitting in google's discovered pages list [20:17:55] It stops them from doing some stuff but I want to slow down their discovery in the first place [20:18:04] I meant how many requests you actually get daily not how many pages it discovered [20:18:12] Those aren't mutually exclusive options. Since Cook reported good results from their robots.txt I think this will at least solve parts of the problem. [20:18:18] The bigger problem is them following links to stuff like Special:UserLogin and discovering a load of crap [20:18:39] I'd rather try that first looking at the what googles doing [20:19:17] Blocking it from robots.txt means it won't see that it shouldn't be indexed [20:19:21] So it won't learn as well [20:19:33] Google will still index things in robots.txt [20:19:39] rather annoyingly [20:20:01] I would prefer it stopped following links to the special namespace and from it and started to learn it wasn't to be indexed [20:20:03] Sure but I feel like those undiscovered pages don't really make anything worse [20:20:28] It's discovering a lot by doing dumb things from links from other pages [20:20:46] Stopping it following them in the first place would probably help a lot [20:20:48] I think [20:21:10] I spent some time fiddling around in GSC earlier and it's doing some dumb as fuck stuff [20:21:55] [1/3] From https://developers.google.com/search/docs/crawling-indexing/robots/intro it suggests that [20:21:55] [2/3] > A page that's disallowed in robots.txt can still be indexed if linked to from other sites. [20:21:56] [3/3] So when crawling the wiki itself (which is the same site) Google should be discouraged from visiting Special:UserLogin pages and other special pages. This will save Google some time because they won't be needlessly sending a request to a special page and we won't need to come up with rules to block Google from requesting expensive pages. [20:22:30] Google should be discouraged from visiting by setting nofollow [20:22:51] Which there's a hook I posted on Phorge that should do that for any link to a special page [20:22:57] I think that might help the problem [20:23:40] Yeah nofollow accomplishes similar things as robots.txt. The former is a lot harder to accomplish though since we need to touch stuff like edit/history buttons as well as all sorts of special pages. [20:24:28] that's why I want to see if the hook in https://issue-tracker.miraheze.org/T15235 will apply to things like the personal tools bar [20:24:31] and sidebar [20:24:43] if it does, that will solve a lot of it [20:25:13] nofollow gives a very different signal to blocking it in robots.txt from my understanding [20:25:53] If you disallow Google from visiting these pages in robots.txt, you will get a sure way for Google to not send dumbfuck requests even though it will mark these pages as discovered in GSC [20:26:09] I'm still not sure why we can't have both. robots.txt shouldn't negatively affect performance since it's just an extra layer of hint which other folks are already using. [20:26:17] You can go the nofollow route but I feel like that hook in particular might not cover all of your links [20:26:29] Hmm that might be a good reason to just use `nofollow` then. [20:26:29] I haven't tested that yet [20:26:34] Especially those in skin templates I'm not sure those fire any hook at all [20:26:47] because they mean very different things to google [20:27:12] robots.txt just tells it not to actually visit the page when deciding whether to index it [20:27:45] I think the question here is what do you worry about more: a lot of pages appearing in GSC or Googlebot sending many requests to Miraheze only to discover the page should not be indexed [20:27:51] nofollow means don't follow this link to work out if there's anything here or use the link between these pages within the PageRank algoritmn [20:28:17] noindex on the namespace will then tell it to take it out of the index in the case it has [20:28:47] Googlebot's scraping levels can be controlled once we get it looking at the right things [20:29:12] Sadly at the moment it's not discovering very much useful content and wasting it's time following links to useless content [20:29:52] we need to start to hint to it to stop prioritsing dumb links and fix it's sitemaps so it actually discovers useful stuff [20:30:17] about 80% of what we've submitted to google hasn't even been crawled yet [20:30:58] That's just standard Google behavior [20:30:58] (8M of 12M submitted pages haven't been crawled) [20:31:38] but it's discovered 100M pages it can't index [20:31:54] Discovered, but probably hasn't visited yet [20:32:22] only 30K of them aren't indexed [20:32:22] Because if it did try to visit them I'm pretty sure they would have been removed from discovered pages already [20:32:36] discovered doesn't mean indexed [20:32:47] it doesn't remove from the discovered list [20:34:16] you can tell from the errors it's attempted to crawl most of them [20:34:40] I dunno, I guess you can go the nofollow links route but I feel like that's way more complexity for marginal difference compared to robots.txt [20:35:01] which means it's wasting time thinking it needs to crawl pages (becuase we haven't set nofollow on links not to follow) rather than bothering to use our submitted list [20:35:12] Either way I don't think it'll make a dent in the SEO space [20:35:15] it has a different impact on the PageRank side [20:37:51] Oh btw you should enable the sitemaps API [20:38:28] @rhinosf1 would you like a sidequest of bashing ancient herald entries in [20:39:09] go on [20:39:44] we already generate sitemaps for google and submit them an index (although I'm not convinced they are reading it properly) [20:39:48] [1/2] users such as joritochip and amandacath have not been around for years and likely aren't going to be [20:39:49] [2/2] why are we still keeping their herald entries? i love email waste [20:40:27] Yeah but you can get rid of the job that generates them if you enable the sitemaps API and just submit /w/rest.php/site/v1/sitemap/0 [20:41:58] After a while Google starts reading your sitemaps by itself without having to submit them anyways [20:42:39] apparently I need to use shell for that [20:43:14] disable rule does not work at https://issue-tracker.miraheze.org/H80 for example? wtf [20:43:15] okay that could be a different way of doing things [20:43:45] nope, I guess you haven't learnt about Phab and it's permission model and how admins work yet [20:43:59] does the word admin mean anything these days jfc [20:45:08] to phorge, basically not [20:45:13] Admins are not all powerful [21:03:20] I don't think email waste is even the biggest problem with old herald rules; every time you perform certain maniphest actions on phorge, it will check all of the rules, so each rule makes everything slower [21:03:44] For the WMF, this is a problem, probably less for us, but it's still good to have less rules in general [21:04:17] reading this like we get DOSed by two herald rules is funny af [21:17:48] What lmao [22:51:08] [1/2] 59bc4aa894a02c709b55edfd [22:51:08] [2/2] seems like something got cached during db161's little hiccup and a user now only sees a broken page. it only happens for a single page and they've purged their browser cache, I don't know what else to do about it [22:51:54] varnish needs a kicking but i still haven't got the ritual down for that [22:52:01] much easier if you happen to be infra [22:58:53] if you tell me what to purge exactly I can run it via salt [23:05:28] this was https://battlecats.miraheze.org/w/load.php?lang=en&modules=ext.gadget.core%2Cranger&only=styles&skin=citizen [23:09:45] I used `&&` for an and condition in the varnishadm command and salt interpreted it as two separate bash commands, so now it purged all of battlecats.miraheze.org lol [23:10:23] problem is extra solved ig [23:10:38] `/bin/bash: line 1: req.url: command not found` [23:11:01] well that's one way to do it lol [23:14:54] [1/3] I think I understand what you mean now. robots.txt exempts the page from being visited but it can still be indexed: because Google cannot visit the page itself to see that the page is set to `noindex,nofollow`, it may decide that the page should be still be indexed despite being in robots.txt. [23:14:54] [2/3] However, I don't think that poses a real problem. If I search for `site:runescape.wiki intitle:RecentChanges` or `site:wiki.gg intitle:RecentChanges` (or `intitle:oldid` and many other variants) I don't see `Special:RecentChanges` being indexed even though RuneScape Wiki/wiki.gg only use robots.txt but not nofollow. [23:14:54] [3/3] This issue is more theoretical I think. Google doesn't even index content pages for various reasons (e.g. too short). It would require a page to be immensely popular for Google to consider indexing it without ever knowing the page content. [23:49:45] That's what I said yeah [23:50:33] It isn't a real issue for public wikis because there's nobody linking to these pages externally while using followed links [23:50:55] And when there is it's only a couple pages