[05:34:00] <.labster> Incidentally I just security reviewed Extension:Cloudflare and Extension:CloudflarePurge and we could use either of them IMO. The first one looks a little better assuming all of its bugs are resolved (which I think they are? I just asked in github.). [05:41:26] <.labster, replying to originalauthority> This I think? https://github.com/octfx/mediawiki-extensions-MultiPurge/blob/51b3e23b4f33f5f7d51a3208dc9c06db1014c996/includes/Hooks/PurgeHooks.php#L91 [05:42:29] <.labster> I think MultiPurge is likely better now that I read it. [05:45:15] i love how me making https://github.com/miraheze/CreateWiki/pull/626#issuecomment-2453864854 is technically not vandalism [05:54:54] <.labster> I technically not vandalized in reply. [05:59:29] it'll only reply in "approve", "reject" or "revise" [06:01:05] https://cdn.discordapp.com/attachments/1006789349498699827/1302875248462528632/image.png?ex=6729b4a1&is=67286321&hm=d93a76e0eed0146d62d11586720f2a7c7303261d5ae9eae72646a430f4c3ec0a& [06:01:10] bad bot [06:04:00] <.labster> Now ask it to output in CSV format [06:04:44] MiraGPT [06:04:48] the future is now [06:05:37] <.labster> Can I just say that this evening has been a typical exploration of the Mediawiki ecosystem, where worse extensions are easier to find, and a bunch of extensions do the same thing and no one cooperates. And extensions used in production for 5 years are marked as "beta" [06:06:41] wasn’t VE marked as beta for like 10 years [06:06:48] <.labster> should still be IMO [06:07:03] That’s the wiki spirit! [06:19:31] agentisai: ngl i kinda wanna play around w/ it [06:36:48] other things to consider: https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G/ and https://www.dtu.dk/english/newsarchive/2024/03/researchers-surprised-by-gender-stereotypes-in-chatgpt [06:37:16] i also read a blog post about output being biased for gender even though gender wasn't directly fed as user input, but i couldn't find it [06:57:18] it'll be fun [08:01:47] There's big comment above that saying it only purges site styles [13:23:20] Why are we using openAI [13:24:45] wha [13:25:08] there's a draft pull request by agent to use chatgpt to auto-{approve,request more details,decline} wiki requests [13:25:19] i don't recall any talk about it on phorge [13:25:45] Sounds like hell [13:25:56] https://github.com/miraheze/CreateWiki/pull/626 [13:26:04] i'd love to probe it [13:26:32] i can't even get started tho, because agent hasn't provided the system prompt [13:26:51] (fyi, my knowledge of the openai apis are dated like a year ago) [13:28:29] Hasn't been provided to the rest of tech either [14:55:36] That is not good to hear.. [15:01:02] @reception123 @agentisai are at all going to do considerations of the cost of this, both what we pay and environmental concerns of contributing to a practice that is very not good for sustainability [15:14:10] Cost yes and it’s been discussed with the secretary and DSRE. Environmental concerns seem way too far fetched for us to begin to analyze when the impact of such queries seems too negligible. [15:15:34] Tech hasn’t been briefed because it’s just a pipe dream which might not even advance [15:16:05] Well, by tech, I mean everyone other than the D[D]SRE [15:21:21] because it’s the best model I can think of unless you’d like to code support for a new model [15:21:38] I think it's been proven that our in house model hasn't been great [15:21:48] OpenAI has been providing much better results from what I've seen [15:21:59] We’ve already tested it and wow, it’s real good [15:22:10] It’s been trained with all our policies and 3 years of approved wiki requests [15:25:03] not a single mess up? [15:25:09] no [15:25:53] but regardless, it’ll be finetuned much more [15:26:28] and we’ll probably modify it so certain requests always require human decision [15:32:18] Funny you should mention these things, cost, privacy and ethical concerns were the first three I raised. [15:35:11] On the cost front at least, I'm confident in the controls discussed as there are several approaches to setting a hard limit + protecting us against arbitrary price increases [15:35:37] funny that we are blocking OpenAI from scrapping as well [15:36:27] Yeah, it is a bit ironic, though they were scraping with DDoS levels of traffic as I recall [15:36:39] Bad scraper is bad [15:36:59] yeah, I wouldn't want it to scrap wikis regardless of DDoS effect tho [15:42:01] By environmental I meant more ethically concerns [15:42:03] So much for free software [15:42:14] to help train it, I fed it 2TBs of revisions [15:42:42] we tried phpml but it was ultimately a rather slow and bad experience [15:43:13] unless someone wants to pick that up again but the AI job always used to use too many resources [15:43:14] And that [15:43:31] could you elaborate? [15:57:19] [1/2] Its really concerning that this wasnt at least disused with A) the community or b) the rest of the tech team at least informally. [15:57:19] [2/2] And to call it "a late night pipe dream" seems like an excuse given its already been discussed with members of the board AND been fed "2TB of revision" [15:57:33] It hasn't been approved for use [15:57:34] Big L [15:57:52] (I never said it had) [15:57:55] They are doing initial proof of concept, a discussion once feasibility has been determined is inbound [15:59:10] I would have definitely preferred someone to have at least gone to mattermost and said hey guys we got drunk last night and wanna try to feed everything to Sam Altman anyone wanna help [16:01:21] That's an incorrect characterization of what has happened with limited information, let's move to these precise channels you mention. [16:01:54] Noted [16:02:23] (With what information I am privy to) [16:08:40] No worries, at this point not all of my concerns are fully assuaged either, which is why discussion will be important. [16:11:09] 2TBs of revision is obviously a joke \:) [16:12:10] so it truly was a pipe dream because really, it hasn’t been fed anything other than just our policies [16:19:26] Before discussing further it makes sense to actually test it to present the community with an accurate idea of how it would work [16:20:05] It wouldn't make sense to have a vote on a vague idea where the community doesn't have any evidence/examples of how the AI works or functions [16:58:41] well it's chatgpt based so we already know how it works [16:59:06] we should never forget that AI is ultimately a bullshit generator machine [17:01:21] also I oppose every attempt to integrate propietary SaaS platforms on our GPL-licensed extensions [17:02:47] would you like to help in fixing our phpml implementation? [17:02:59] no [17:03:05] There's a bit more to it than bullshit generator [17:03:18] then I don't see any other alternative [17:03:25] wiki request queue is too much to handle [20:18:22] Apple says otherwise [20:18:44] It’s apple you trust them to get shit right? [20:19:52] Apple have AI [20:19:59] They just call it Apple Intelligence [20:20:10] I never said they didn't [20:20:21] Which I'm yet to try and break [20:20:34] cause it's not in the UK yet [20:24:14] isn't it like, just a smart word generator that can't count? [20:38:31] I don't even know today [23:19:47] There are some cool open source LLMs of various sizes/scopes on Ollama if there is a general dislike towards using specific models from openAI. [23:22:51] I did propose using Ollama but self hosting isn’t something we have resources for, a cheap accurate and hosted OSS model may be good [23:23:02] Yeah, self-hosting isn’t an option [23:23:27] We’re already running out of resources despite our current infra having 2-3x the resources of our old servers [23:23:46] so an LLM hosted off-premises might work if such exists [23:23:53] otherwise, OpenAI is the only way I see [23:27:21] I’ll do some looking at Ollama [23:29:41] Understandable. Just thought I'd mention in case something like that wasn't considered yet. I've been messing around with Ollama and WebUI the past couple of days so it was fresh on my mind. [23:29:59] Do you have any idea about if Ollama would meet our needs given your own experiences? [23:30:06] Did you self host? [23:37:03] [1/2] Yeah I was self hosting. I was hoping to integrate the tinyllama model with a discord bot on my raspberry pi. But it was too slow for what I wanted it to do. [23:37:03] [2/2] Just messing around, Llama3 proved too much for my MacBook Pro. But llama3.2 (smaller model) worked okay. [23:39:27] I’m sure it would get the job done just same as an openAI model would. There’s lots of different models on Ollama to choose from. Smaller ones would use less resources but output less quality results. [23:40:02] id be curious to see some benches on our current hardware or local just for why not [23:42:20] wonder who does ollama hosting and how much [23:42:24] gonna look into it