[00:17:18] !log bd808@tools-bastion-12 tools.gitlab-webhooks Built new container image from 0e8b79e4 (T363114) [00:17:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-webhooks/SAL [00:17:59] !log bd808@tools-bastion-12 tools.gitlab-webhooks Restarted to pick up proposed fixes for T363114 [00:18:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-webhooks/SAL [00:36:45] Wurgl: The new bastions are deliberately "thin" now that the grid engine is disabled. We are hoping not to have to make them "thick" again, but as stated in https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/UAMLGQ42CVHLRZ5W2CZBJDJFRNSBT4DC/ if you have a workflow that is not possible on the new bastions please open a task explaining what you need to do and what packages you think would allow that. [00:37:59] It is possible to script `webservice php8.2 shell -- php ...` directly from a bastion which might help you out. [07:13:33] !log cloudinfra taavi@cloudinfra-cloudvps-puppetserver-1:~$ sudo systemctl restart puppetserver [07:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudinfra/SAL [10:31:57] I've migrated from gridengine to k8s and have been trying to run scheduled jobs. They don't seem to run and when I show the job it says `Hints:        | No pods were created for this job.`. Any idea why it's not creating a pod? [10:33:15] carlinmack76: can you share the name of the tool? [10:33:47] tool-addletterboxdfilmidbot [10:34:44] it should have run once yesterday, then I replaced the job with one with file logging (it did not create a file today) [10:34:49] carlinmack76: it seems it still waiting for the schedule time? [10:35:13] ah okay I will check again in 30 mins [10:35:34] carlinmack76: you can try running the job by hand, try `toolforge jobs restart add-letterbox-ids-job` [10:35:49] I just assumed it had already failed given it was the same Hint as last time [10:35:53] !log bsadowski1@tools-bastion-13 tools.stewardbots Restarted StewardBot/SULWatcher because of a connection loss [10:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [10:35:56] ohai [10:36:12] I'd rather wait for the scheduled time, will update if there's a problem with the run [10:37:35] carlinmack76: it has `schedule: 0 11 * * *` meaning the job will run once a day, at 11:00 [10:45:35] !log lucaswerkmeister@tools-bastion-13 tools.bridgebot Double IRC messages to other bridges [10:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [11:10:34] arturo: ran and output the following error [11:10:34] python: can't open file '/workspace/add-letterbox-ids.py': [Errno 2] No such file or directory [11:11:00] you can view the source here https://gitlab.com/carlinmack/addletterboxdfilmid [11:13:37] *the source is available here https://gitlab.com/carlinmack/addletterboxdfilmid [11:48:17] carlinmack76: there's a typo in the run.sh script, the file name is not `add-letterbox-ids.py` but `add-letterboxd-ids.py` (`letterbox` + `d`) [11:49:51] good spot! thanks [11:50:39] it seems though that it will fail importing `rich` [11:51:15] it's not pulled in by `tqdm` directly [11:52:34] thanks again, when I rebuild the image do I need to submit the job again? [11:52:47] no! [11:52:56] (I don't think so) [11:53:14] great, btw is it a pain to have a persistant file between runs? [11:53:20] no need no, it will pull the new image on the next run [11:54:01] there's a file for all the invalid QIDs so that they aren't called everyday [11:54:56] you can have NFS access if you run the jobs with `--mount=all`, then from within the code you can access it by using the environment variable `$TOOL_DATA_DIR` (`os.environ["TOOL_DATA_DIR"]`) [11:55:18] ah okay I'm already doing mount all so maybe it will just work ? [11:55:33] depends, if you access the file with a relative path it will not work [11:55:40] (as the working directory is not the home of the tool) [11:56:45] you are already using the envvar I see, so it should work yes :) [11:56:46] https://gitlab.com/carlinmack/addletterboxdfilmid/-/blob/main/add-letterboxd-ids.py?ref_type=heads#L13 [11:57:00] all the accesses are with — yeah I think it will work :) [11:57:24] thanks for all the help! [16:33:58] !log bd808@tools-bastion-12 tools.gitlab-webhooks Built new container image from 78266e21 [16:34:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-webhooks/SAL [16:34:35] !log bd808@tools-bastion-12 tools.gitlab-webhooks Restart to pick up latest container image [16:34:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.gitlab-webhooks/SAL [17:38:31] !log bsadowski1@tools-bastion-13 tools.stewardbots Restarted StewardBot/SULWatcher because of a connection loss [17:38:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [17:38:41] :D [18:06:50] Bsadowski1: is the problem that the bot keeps falling off of irc? [18:08:14] If so, I wonder if adding a bouncer would help? I built https://gitlab.wikimedia.org/toolforge-repos/wikibugs2-znc in a way that I think others could reuse if they wanted to put a ZNC bouncer into their tool. [18:20:28] iirc the problem is with the connection to eventstreams [20:30:38] !log anticomposite@tools-bastion-13 tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC [20:30:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [20:30:49] !log anticomposite@tools-bastion-13 tools.stewardbots SULWatcher/manage.sh restart # SULWatchers disconnected [20:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [21:24:02] bd808, it's falling off irc and eventstreams. I never added PingServer to the SULWatchers but probably will soon, which should fix the IRC side most of the time at least [21:25:58] a bouncer might help but wouldn't fix the eventstreams problems, which seem to require restarting the entire container much of the time [21:26:54] AntiComposite: is the eventstream thingy hitting a production wiki endpoint? [21:27:43] https://stream.wikimedia.org/v2/stream/recentchange [21:28:36] https://phabricator.wikimedia.org/T329327 is still the presiding failure, but it occasionally fails in other ways [21:29:56] this is very likely a rate limit on the wiki endpoint [21:30:14] either in the endpoint itself, or in the cdn, or similar [21:32:16] yes and that would be due to many toolforge tools using it [21:34:42] I had similar issues using steams from toolforge in the past and gave up on using it there. streams seem to work fine in other vps projects. [22:27:15] hmmm... what does a 429 from an SSE gateway even mean? [22:27:49] "too many concurrent connections" I guess? [22:30:05] I've never looked at the code for eventstreams. I suppose that would be a good place to start trying to figure the 429 problem out. [22:31:10] It should be reasonably possible to stand up a forwarding proxy for eventstreams too that would offload some clients from the upstream [22:31:45] * bd808 has made two SSE servers in the last few weeks and feels n00b confidence [22:32:36] https://phabricator.wikimedia.org/T308931#7950582 [22:34:22] tbh I think the better fix is "production services should not be applying naive IP-based limits to Toolforge" [22:34:40] if that setup sees the Cloud VPS SNAT and not the internal IPs then :boom: there's the problem