[00:17:18] <wm-bot>	 !log bd808@tools-bastion-12 tools.gitlab-webhooks Built new container image from 0e8b79e4 (T363114)
[00:17:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-webhooks/SAL
[00:17:59] <wm-bot>	 !log bd808@tools-bastion-12 tools.gitlab-webhooks Restarted to pick up proposed fixes for T363114
[00:18:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-webhooks/SAL
[00:36:45] <bd808>	 Wurgl: The new bastions are deliberately "thin" now that the grid engine is disabled. We are hoping not to have to make them "thick" again, but as stated in https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/UAMLGQ42CVHLRZ5W2CZBJDJFRNSBT4DC/ if you have a workflow that is not possible on the new bastions please open a task explaining what you need to do and what packages you think would allow that.
[00:37:59] <bd808>	 It is possible to script `webservice php8.2 shell -- php ...` directly from a bastion which might help you out.
[07:13:33] <taavi>	 !log cloudinfra taavi@cloudinfra-cloudvps-puppetserver-1:~$ sudo systemctl restart puppetserver
[07:13:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudinfra/SAL
[10:31:57] <carlinmack76>	 I've migrated from gridengine to k8s and have been trying to run scheduled jobs. They don't seem to run and when I show the job it says `Hints:        | No pods were created for this job.`. Any idea why it's not creating a pod?
[10:33:15] <arturo>	 carlinmack76: can you share the name of the tool?
[10:33:47] <carlinmack76>	 tool-addletterboxdfilmidbot
[10:34:44] <carlinmack76>	 it should have run once yesterday, then I replaced the job with one with file logging (it did not create a file today)
[10:34:49] <arturo>	 carlinmack76: it seems it still waiting for the schedule time?
[10:35:13] <carlinmack76>	 ah okay I will check again in 30 mins
[10:35:34] <arturo>	 carlinmack76: you can try running the job by hand, try `toolforge jobs restart add-letterbox-ids-job`
[10:35:49] <carlinmack76>	 I just assumed it had already failed given it was the same Hint as last time
[10:35:53] <wm-bot>	 !log bsadowski1@tools-bastion-13 tools.stewardbots Restarted StewardBot/SULWatcher because of a connection loss
[10:35:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL
[10:35:56] <Bsadowski1>	 ohai
[10:36:12] <carlinmack76>	 I'd rather wait for the scheduled time, will update if there's a problem with the run
[10:37:35] <arturo>	 carlinmack76: it has `schedule: 0 11 * * *` meaning the job will run once a day, at 11:00
[10:45:35] <wm-bot>	 !log lucaswerkmeister@tools-bastion-13 tools.bridgebot Double IRC messages to other bridges
[10:45:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL
[11:10:34] <carlinmack76>	 arturo: ran and output the following error
[11:10:34] <carlinmack76>	 python: can't open file '/workspace/add-letterbox-ids.py': [Errno 2] No such file or directory
[11:11:00] <carlinmack76>	 you can view the source here https://gitlab.com/carlinmack/addletterboxdfilmid
[11:13:37] <carlinmack76>	 *the source is available here https://gitlab.com/carlinmack/addletterboxdfilmid
[11:48:17] <dcaro>	 carlinmack76: there's a typo in the run.sh script, the file name is not `add-letterbox-ids.py` but `add-letterboxd-ids.py` (`letterbox` + `d`)
[11:49:51] <carlinmack76>	 good spot! thanks
[11:50:39] <dcaro>	 it seems though that it will fail importing `rich`
[11:51:15] <dcaro>	 it's not pulled in by `tqdm` directly
[11:52:34] <carlinmack76>	 thanks again, when I rebuild the image do I need to submit the job again?
[11:52:47] <arturo>	 no!
[11:52:56] <arturo>	 (I don't think so)
[11:53:14] <carlinmack76>	 great, btw is it a pain to have a persistant file between runs?
[11:53:20] <dcaro>	 no need no, it will pull the new image on the next run
[11:54:01] <carlinmack76>	 there's a file for all the invalid QIDs so that they aren't called everyday
[11:54:56] <dcaro>	 you can have NFS access if you run the jobs with `--mount=all`, then from within the code you can access it by using the environment variable `$TOOL_DATA_DIR` (`os.environ["TOOL_DATA_DIR"]`)
[11:55:18] <carlinmack76>	 ah okay I'm already doing mount all so maybe it will just work ?
[11:55:33] <dcaro>	 depends, if you access the file with a relative path it will not work
[11:55:40] <dcaro>	 (as the working directory is not the home of the tool)
[11:56:45] <dcaro>	 you are already using the envvar I see, so it should work yes :)
[11:56:46] <dcaro>	 https://gitlab.com/carlinmack/addletterboxdfilmid/-/blob/main/add-letterboxd-ids.py?ref_type=heads#L13
[11:57:00] <carlinmack76>	 all the accesses are with — yeah I think it will work :)
[11:57:24] <carlinmack76>	 thanks for all the help!
[16:33:58] <wm-bot>	 !log bd808@tools-bastion-12 tools.gitlab-webhooks Built new container image from 78266e21
[16:34:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-webhooks/SAL
[16:34:35] <wm-bot>	 !log bd808@tools-bastion-12 tools.gitlab-webhooks Restart to pick up latest container image
[16:34:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-webhooks/SAL
[17:38:31] <wm-bot>	 !log bsadowski1@tools-bastion-13 tools.stewardbots Restarted StewardBot/SULWatcher because of a connection loss
[17:38:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL
[17:38:41] <Bsadowski1>	 :D
[18:06:50] <bd808>	 Bsadowski1: is the problem that the bot keeps falling off of irc?
[18:08:14] <bd808>	 If so, I wonder if adding a bouncer would help? I built https://gitlab.wikimedia.org/toolforge-repos/wikibugs2-znc in a way that I think others could reuse if they wanted to put a ZNC bouncer into their tool.
[18:20:28] <taavi>	 iirc the problem is with the connection to eventstreams
[20:30:38] <wm-bot>	 !log anticomposite@tools-bastion-13 tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC
[20:30:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL
[20:30:49] <wm-bot>	 !log anticomposite@tools-bastion-13 tools.stewardbots SULWatcher/manage.sh restart # SULWatchers disconnected
[20:30:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL
[21:24:02] <AntiComposite>	 bd808, it's falling off irc and eventstreams. I never added PingServer to the SULWatchers but probably will soon, which should fix the IRC side most of the time at least
[21:25:58] <AntiComposite>	 a bouncer might help but wouldn't fix the eventstreams problems, which seem to require restarting the entire container much of the time
[21:26:54] <arturo>	 AntiComposite: is the eventstream thingy hitting a production wiki endpoint?
[21:27:43] <AntiComposite>	 https://stream.wikimedia.org/v2/stream/recentchange
[21:28:36] <AntiComposite>	 https://phabricator.wikimedia.org/T329327 is still the presiding failure, but it occasionally fails in other ways
[21:29:56] <arturo>	 this is very likely a rate limit on the wiki endpoint
[21:30:14] <arturo>	 either in the endpoint itself, or in the cdn, or similar
[21:32:16] <JJMC89>	 yes and that would be due to many toolforge tools using it
[21:34:42] <JJMC89>	 I had similar issues using steams from toolforge in the past and gave up on using it there. streams seem to work fine in other vps projects.
[22:27:15] <bd808>	 hmmm... what does a 429 from an SSE gateway even mean?
[22:27:49] <taavi>	 "too many concurrent connections" I guess?
[22:30:05] <bd808>	 I've never looked at the code for eventstreams. I suppose that would be a good place to start trying to figure the 429 problem out.
[22:31:10] <bd808>	 It should be reasonably possible to stand up a forwarding proxy for eventstreams too that would offload some clients from the upstream
[22:31:45] * bd808 has made two SSE servers in the last few weeks and feels n00b confidence
[22:32:36] <JJMC89>	 https://phabricator.wikimedia.org/T308931#7950582
[22:34:22] <AntiComposite>	 tbh I think the better fix is "production services should not be applying naive IP-based limits to Toolforge"
[22:34:40] <bd808>	 if that setup sees the Cloud VPS SNAT and not the internal IPs then :boom: there's the problem