[07:18:15] <Alien333> I'm probably being stupid again, but I can't ssh to toolforge since yesterday. All I get is "Connection closed by 185.15.56.62 port 22" [07:19:27] <Alien333> can someone tell me what exactly that means? [07:31:33] <Alien333> to be precise, the complete log ends with [07:31:34] <Alien333> debug1: Server accepts key: /home/a/.ssh/id_ed25519 ED25519 SHA256:[...] agent [07:31:34] <Alien333> debug3: sign_and_send_pubkey: using publickey-hostbound-v00@openssh.com with ED25519 SHA256:[...] [07:31:35] <Alien333> debug3: sign_and_send_pubkey: signing using ssh-ed25519 SHA256:[...] [07:31:35] <Alien333> debug3: send packet: type 50 [07:31:36] <Alien333> Connection closed by 185.15.56.62 port 22 [11:50:19] <lucaswerkmeister> (FTR, the above messages were also reported at T393829 which was then marked as a duplicate of T393732) [11:50:20] <stashbot> T393829: Ssh to toolforge failing with "Connection closed by 185.15.56.62 port 22" - https://phabricator.wikimedia.org/T393829 [11:50:20] <stashbot> T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [11:52:20] <lucaswerkmeister> !log root@tools-bastion-13:~# systemctl restart sssd-pam{,{,-priv}.socket} # all three failed with start-limit-hit / Start request repeated too quickly; T393732? [11:52:22] <stashbot> lucaswerkmeister: Unknown project "root@tools-bastion-13:~#" [11:52:38] <lucaswerkmeister> !log tools root@tools-bastion-13:~# systemctl restart sssd-pam{,{,-priv}.socket} # all three failed with start-limit-hit / Start request repeated too quickly; T393732? [11:52:40] <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:53:16] <lucaswerkmeister> !log tools T393732 note: restart of sssd-pam.service actually failed, “may be requested by dependency onlyâ€; overall it still seems to have worked though (so next time restarting the sockets is probably sufficient) [11:53:19] <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:57:57] <lucaswerkmeister> right now it seems to be working for me again [14:07:39] <kanashimi> Hi, I can't run `become <mytool>`. Have we changed the way to login? [14:08:18] <kanashimi> It says "sudo: a password is required" [14:08:22] <wm-bb> <nokibsarkar> try `dev.toolforge.org` instead of `login.toolforge.org`; [14:08:56] <wm-bb> <nokibsarkar> @kanashimi [14:10:29] <lucaswerkmeister> !log tools root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # service-start-limit-hit, T393732? [14:10:33] <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:10:33] <stashbot> T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [14:10:57] <kanashimi> Thank you. It works. What happens to login.toolforge.org? [14:11:12] <lucaswerkmeister> it’s been having issues for a few days (see the task above) :( [14:13:38] <taavi> lucaswerkmeister: sigh, so my fix attempt from earlier clearly didn't work? [14:13:47] <kanashimi> OK I see [14:13:49] <lucaswerkmeister> seems so, yeah :( [14:14:07] <lucaswerkmeister> I’ve just been blindly restarting the stuff in `systemctl --failed` in the hope that it helps at least temporarily [14:14:28] <lucaswerkmeister> (though this time I actually wasn’t able to reproduce the error, `become` still worked for me. maybe it was cached somewhere) [14:32:52] <taavi> https://phabricator.wikimedia.org/T393732#10809252, unfortunately any of those will probably have to wait until business hours on Monday [16:20:46] <wm-bb> <marufhasan24> I'm getting Connection closed by 185.15.56.62 port 22 [16:22:02] <lucaswerkmeister> !log tools systemctl restart sssd-{pam{,-priv},sudo}.socket # service-start-limit-hit, T393732? [16:22:06] <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:22:06] <stashbot> T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [16:22:48] <lucaswerkmeister> went right down again :( [16:23:17] <wm-bb> <lucaswerkmeister> you can try dev.toolforge.org instead, that might work better at the moment (re @marufhasan24: I'm getting Connection closed by 185.15.56.62 port 22) [17:33:58] <lucaswerkmeister> !log tools root@tools-bastion-13:~# systemctl reset-failed sssd-{pam,sudo}.service && systemctl restart sssd-pam{,-priv}.socket # try to reset the rate limits this way (T393732) [17:34:02] <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:34:02] <stashbot> T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [17:35:53] <lucaswerkmeister> !log root@tools-bastion-13:~# systemctl restart sssd-sudo{,.socket} # looks like the reset-failed didn’t work properly, systemd didn’t even try to start the service again afaict (T393732) [17:35:53] <stashbot> lucaswerkmeister: Unknown project "root@tools-bastion-13:~#" [17:35:56] <lucaswerkmeister> !log tools root@tools-bastion-13:~# systemctl restart sssd-sudo{,.socket} # looks like the reset-failed didn’t work properly, systemd didn’t even try to start the service again afaict (T393732) [17:35:59] <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:43:01] <lucaswerkmeister> (FTR, these restarts didn’t work, I left a comment on the task) [17:44:14] <lucaswerkmeister> !status login.toolforge.org bastion unstable, dev.toolforge.org may work better [17:47:15] <taavi> lucaswerkmeister: you need to start/stop the socket unit (sssd-sudo.socket), doing anything on the service itself is going to do nothing useful [17:47:42] <lucaswerkmeister> I had restarted both (brace expansion) [17:48:15] <lucaswerkmeister> but restarting just the socket didn’t seem to bring the service out of the rate-limited state [18:48:58] <Guest85> Looking for a FOSS website analytics solution that doesn't need a lot of storage space on the database, is easy to set up on Toolforge and that can be used with static webpages. Any recommendations? [18:56:52] <Guest85> or maybe there are statistics for 'tools-static' published somewhere already? [19:29:17] <LD> Hi, I might need some help. I changed my SSH on toolforge but I fail to connect to the server. [19:30:43] <lucaswerkmeister> the main bastion server is having some issues at the moment; dev.toolforge.org might work better [19:31:11] <LD> oh that might explain [19:31:18] <LD> what about login.toolforge.org? [19:31:56] <lucaswerkmeister> that’s the one with issues [19:32:32] <LD> any phab ticket related to it? [19:33:27] <lucaswerkmeister> yes, T393732 [19:33:27] <stashbot> T393732: Toolforge bastion sssd/LDAP flakiness (May 2025) - https://phabricator.wikimedia.org/T393732 [19:34:14] <LD> thanks [19:46:52] <LD> by any chance, can we webrestart or even stop letaxobot.toolforge.org? [19:58:30] <lucaswerkmeister> !log tools.letaxobot webservice restart (per request on behalf of tool maintainer, as the bastion is having issues atm) [19:58:32] <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.letaxobot/SAL [19:58:47] <lucaswerkmeister> seems to be responding again, yay [19:59:36] <LD> thanks lucaswerkmeister ; btw something is weird about this tool, like I have to webrestart it times to times or it fails at some point [20:00:47] <lucaswerkmeister> hm, there’s a bunch of noise in error.log that doesn’t look related [20:01:05] <lucaswerkmeister> possibly there was something else in the pod logs that is now gone due to the restart :S [20:02:18] <lucaswerkmeister> a health check (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Health_checks) *might* help? but I’ve never tried them on a PHP / lighttpd webservice [20:02:29] <lucaswerkmeister> (that’s a suggestion for later, once the bastion works again, not now ^^) [20:02:58] <LD> indeed, thanks I'll try later