[12:45:09] Hi! I'm seeing errors connecting to gitlab from digitalocean gitlab runners [12:45:12] like https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/jobs/363466 [12:45:26] fatal: unable to access 'https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api.git/': Failed to connect to gitlab.wikimedia.org port 443 after 4 ms: Could not connect to server [12:45:37] is there a known issue going on? [12:47:05] oh, even some that seem to work, did fail to contact gitlab [12:47:06] https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/jobs/363459 [12:47:14] ERROR: Uploading artifacts as "dotenv" to coordinator... error error=couldn't execute POST against https://gitlab.wikimedia.org/api/v4/jobs/363459/artifacts?artifact_format=gzip&artifact_type=dotenv: Post "https://gitlab.wikimedia.org/api/v4/jobs/363459/artifacts?artifact_format=gzip&artifact_type=dotenv": dial tcp 208.80.153.8:443: connect: connection refused id=363459 token=glcbt-64 [12:51:08] maybe you know something? jelto^ (it's breaking all my ci builds) [12:54:15] oh, some do pass still https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/jobs/363487 [12:55:01] hmm, this is in the same runner, almost at the same time, and failing https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/jobs/363488 [12:56:09] now it passed that one, but failed on the next, so it's not failing all the time, it's flaky enough to not pass a full pipeline [13:03:04] if I retry enough times I might get the pipeline finished xd, but it's many times (the issue seems to happen ~75% of the time) [14:26:18] oh, things seem to be running again :) [14:28:53] oh no, I was just lucky twice in a row xd [14:28:54] https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/jobs/363560 [14:28:59] ^still failing :/ [14:33:42] dcaro: would it be an option for you to use our own infra, the runners in cloud instead of digital ocean? [14:33:59] afaik those are experimental [14:34:25] the current issue should be gone if you avoid DO for right now [14:37:21] would be an option yep, we need the big mem ones though for golang stuff, do we have those on cloud too? [14:38:39] dcaro: glad to hear that. I'm sorry I don't know that off hand. I can try t find out [14:38:59] np, let me know if I can help in any way [14:39:45] dcaro: the main problem is that the DO runners open too many connections at once [14:39:58] it's probably timing that more than one user is on it at the same time [14:40:12] aaaahhh [14:40:18] one fix would be to somehow slow them down [14:40:46] you would not be limited from our own runners [21:19:43] production gitlab server needs to be upgraded. please expect short downtime in a few minutes.