[07:46:56] morning [08:00:09] guten tag [08:00:55] I would need a review here https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/550 has been a long time since last time I modified a toolforge quota [08:02:45] thanks [08:34:15] interesting question in slack, sounds like a potential use case for cloud vps https://wikimedia.slack.com/archives/C05H0JYT85V/p1728374605256499 [08:36:39] dhinus: see https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_project#Reviews_of_Cloud_VPS_Project_requests "laptop in the cloud" section [08:41:01] see also T329750 [08:41:02] T329750: Allow volunteer developers to create MediaWiki development environments in CloudVPS - https://phabricator.wikimedia.org/T329750 [08:42:09] hmmm I think it's a fine line, we do have staff members using cloudvps for testing things that are not exposed to the public [08:43:43] they pointed to this phab with some previous discussion: T345340 [08:43:44] T345340: Setup Wiki Family on CX / SX staging - https://phabricator.wikimedia.org/T345340 [08:47:29] ok, so they already have a project [08:47:54] so I guess the "laptop in the cloud" problem is now irrelevant [08:48:53] they can have a dedicated `.project.wmcloud.org` DNS zone, no problem with that [08:49:15] that bit is already considered in https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS#*.wmcloud.org [08:49:29] they might need to set up a reverse proxy in the VM, to route the traffic to the correct MW instance [08:49:36] but that shouldn't be too hard [08:50:39] do they have a project though? what's the name? [08:51:57] mmm [08:52:34] https://openstack-browser.toolforge.org/project/language this one [08:52:56] right [08:52:59] they already have the zone `language.wmflabs.org.` [08:53:30] so I don't see any problem with them having `language.wmcloud.org`, so they can create arbitrary records like `abc.language.wmcloud.org` or whatever they need [08:57:42] yeah that's what we did for catalyst [08:57:50] I replied here https://phabricator.wikimedia.org/T345340#10209716 [08:58:13] thanks, I'll suggest to bring the conversation in slack back on phab [08:58:27] ok [10:00:57] dhinus: I noticed this T376705 and this T376704 [10:00:58] T376705: tofu-infra: update to openstack provider to 3.x - https://phabricator.wikimedia.org/T376705 [10:00:58] T376704: tofu-infra: update to opentofu 1.8.x - https://phabricator.wikimedia.org/T376704 [10:09:46] arturo: yep I also noticed 3.0 was out, thanks for the task [10:40:16] * dcaro lunch [10:44:42] dhinus: I just found the ultimate reason why the openstack provider is vendored: cloudcontrols don't have access to the internet to download it over the network [10:46:25] arturo: ahhhh that makes sense! [10:46:46] but could we upload it to our internal tofu repo? [10:47:13] I guess, yeah [11:02:28] I created https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/OpenTofu#Vendored_openstack_provider [12:13:37] the deploy worked, but I'm getting 499s from the builds-api, maybe it's too slow? (the logs on the python side say it takes >10s to get the builds of a tool) [12:20:19] hmm, that's the default timeout for the api client :/ [12:20:34] I'll increase it for now, and then investigate why it became so much slower [12:21:57] yep, that gets us unstuck [12:25:28] manually increased it for now (so people can do stuff), sending patch [12:29:58] quick review? https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/59 [12:36:58] dcaro: done [12:37:05] thanks! [12:54:34] dcaro: yay btw, that update was a hairy one \o/ [12:54:57] it went quite nicely, for what it could have been xd [13:55:03] do you know if we have prometheus-based alerts if a kernel panic happens on a server? [13:56:55] I have not seen anything like that no [13:57:06] is that info in prometheus at all? [13:58:17] I don't know [13:58:40] I could write a single line exporter doing something like this [13:58:42] sudo journalctl -k -b -0 | egrep "\[ cut here \]"\|WARNING [14:05:17] I did not find any metric with `panic` that looked promising on prod grafana/prometheus :/ [14:14:32] something like this [14:14:32] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1078684 [14:42:11] reviews requested ^^^ [15:20:37] * arturo offline [15:47:58] Working on: [15:48:00] * toolforge k8s upgrade, reviewing and testing the patches for the 1.28 upgrade (T362867) [15:48:00] T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 - https://phabricator.wikimedia.org/T362867 [15:48:02] * tofu-infra, refactoring the "project" module (T375283) [15:48:02] T375283: tofu-infra: refactor repo structure - https://phabricator.wikimedia.org/T375283 [16:44:28] * dcaro off [17:11:29] ^ my messages above were supposed to go in #wikimedia-cloud-daily :P [22:39:24] T376220 is still waiting for someone to pick it up. [22:39:24] T376220: Labslogbot needs new SUL OAuth credentials after Wikitech authn changes - https://phabricator.wikimedia.org/T376220