[06:57:45] greetings [08:11:41] morning [11:03:27] ok I set May 6th and 12th as the dates for https://etherpad.wikimedia.org/p/T420565 please LMK if sth needs changing and I'll send it out later today [11:05:10] LGTM! [11:07:46] nice [11:13:25] thank you [11:18:59] * volans lunch [12:19:18] re: tf-infra-test failures, I've updated the repo on tf-bastion with the latest commit [12:20:06] failure was "datastore version xyz is not active" when talking to trove, which makes sense [12:30:46] I'm processing https://phabricator.wikimedia.org/T419525, I think I've done all the steps correctly, but if I hit the new domain it redirect to the welcome to cloud service page. I think because they need to create the web proxy setup in horizon. But if I go to horizon in their project and look at the form to create a webproxy, it doesn't allow to pick a different domain, only wmcloud and [12:30:52] wmflabs are shown. Did I miss something? [12:31:04] just to make sure I give the proper directions to the user on task [12:41:04] wish I could help, though I'm not sure how things are supposed to work [12:41:14] volans: I'm not sure what's missing, but I checked another project with a custom domain (wlmaz) and it does show the domain in the horizon web proxy dropdown [12:42:58] is it possible it needs the project id instead of project name in puppet (proxy profile)? [12:43:17] s/proxy profile/"proxy" puppet prefix/ [12:43:41] the other settings have names, I didn't add the id field as per docs, but I can track it down and add it given that the other entries in the list have it [12:44:15] some have ids some have names :) probably the ones with ids are old projects where id=name [12:44:43] ok let me use the id [12:45:29] I'm talking about "project: {id}" and not "id: {id}" (to add more id confusion LOL) [12:45:55] that was clear [12:45:56] :D [12:46:19] but I'm also wondering if after the first run I should find the id too and add it [12:47:42] yep was it, thx [13:15:03] go nu [13:39:34] if anyone wants to do a oneliner review: https://github.com/toolforge/paws/pull/511 [13:43:23] volans: approved [13:44:42] <3 [14:13:06] does anybody remember how this gitlab pipeline gets run? https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/blob/main/toolforge-cd/create_update_poetry_mrs.yaml [14:13:20] I see a mention of "schedules", but I'm not finding a repo where scheduled jobs are set [15:31:33] topranks, cloud team, just wanted to confirm, 208.80.153.128/27 - https://netbox.wikimedia.org/ipam/prefixes/37/ - is not used (and not planned to be). I think it used to be reserved or used before we started using 185.15.57.0/24 [15:35:13] XioNoX: that does not sound familiar to me, nor are there any hits in in codesearch in our control repos [15:36:51] XioNoX: yes that's my understanding too, it's not routed on our network and I don't believe was ever used [15:37:01] was before my time but I think the 185 range got used instead yep [15:39:38] cool [15:50:32] yeah, actual public IPs in dallas are 185.x [17:17:09] dhinus: have you deployed PAWS recently enough to remember how long it takes? [17:21:06] andrewbogott: if it's just running the deploy script without recreating the cluster, I think less than 1 min [17:21:40] I'm running `bash deploy.sh eqiad1` and is stuck ad TASK [Deploy paws] since ~13m [17:24:08] that's not great [17:24:20] I saw the same in codfw1dev yesterday but assumed it was because my k8s cluster was messed up [17:24:51] from ps it's at: [17:24:54] /usr/local/bin/helm upgrade -i --reset-values --timeout 50m --create-namespace --values=../paws/secrets.yaml --values=../paws/production.yaml paws ../paws [17:25:07] so it might timeout in 50m sigh [17:25:08] that might be some weird firewall thing? I'd just let it run and hope that there's a useful error message when it gives up [17:25:22] hm, if that incudes downloading a helm chart, could be the repo is down or similar [17:26:16] I need to eat something but will return shortly [17:26:26] strace has ETIMEDOUT (Connection timed out) and EAGAIN (Resource temporarily unavailable) [17:29:39] that seems like a repo being down [17:29:47] I'm debugging [17:29:52] so maybe just try again later as your first troubleshooting step :) [17:30:53] an external one? [17:31:15] isn't paws deployed via github actions? I assumed it had build the release already [17:46:25] I'm not 100% sure, it might be all internal or it might have a stray external dependency [17:46:34] are you able to get any info about what timed out? [17:48:52] if I'm not mistaken is trying to read from a socket connected to paws-127c-uwce57bvcgrt-master-0.paws.eqiad1.wikimedia.cloud. [17:49:27] wher eI can't ssh [17:54:11] oh, you can, hang on... [17:55:13] I'm not saying it's not possible, I'm just saying with my standard key or my root key I can't with my current .ssh/config :D [17:56:26] oops, you're right, doesn't seem to be a key installed on that host. Thought I fixed that... [17:57:22] It ought to have 'paws-magnum-vm' installed on it but maybe it's been that long since this cluster was re-deployed [17:57:57] k [17:58:03] I guess that's an argument in favor of doing a blue/green soon or right now [17:58:25] I can do it tomorrow [17:58:31] (btw, https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/VM_access is where the docs /should/ be for this) [17:58:54] I think I'll let this one timeout for today given the time [17:59:29] seems fine, it's just an upgrade [18:00:05] timed out just now: * timed out waiting for the condition [18:01:04] I can confirm my notebook still works, let it be a tomorrow problem then :D