[09:07:20] morning [09:46:33] couple minor cookbook fixes starting from https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1223146 [10:06:09] morning, and happy new year! I'm working today, out tomorrow (public holiday) [10:10:08] taavi: thanks for the fixes, all +1d [10:10:19] ty [12:14:10] could use a second opinion on this: https://toolsadmin.wikimedia.org/tools/membership/status/2086 [12:55:54] dhinus: (seeing your comments in there) does T413510 still need more work? [12:55:54] T413510: [bug] server won't launch - https://phabricator.wikimedia.org/T413510 [13:05:47] Happy new year and good morning! [13:06:35] morning! [13:07:45] dhinus (optional others) how would you feel about me moving our daily checkin 90 minutes earlier for a few weeks? I'm one TZ closer to Europe for a while but trying to avoid having meetings at the same time as Jenna while we're in this small space. [13:13:27] taavi: re that toolforge request, I don't love the idea of enabling another scraper but I guess I would ask for feedback like "how will this differ from existing systems like the action API?" and see if they persist. They might be solving a real problem somehow... [13:15:07] taavi: I want to double check that error message about NFS mounts, but I haven't looked yet. leave it with me and I'll resolve/update it later. [13:16:37] andrewbogott: that's ok but not today, you can move it from wed onwards. we can probably skip today and I'll see you wed! [13:16:47] sounds good, thank you! [13:34:32] Could I get a quick look at https://gerrit.wikimedia.org/r/c/operations/dns/+/1218333? (moving wikitech-static dns to AWS) [13:35:21] andrewbogott: that does not seem to work: [13:35:21] taavi@runko:~ $ curl --connect-to ::3.126.27.158 https://wikitech-static.wikimedia.org [13:35:21] curl: (7) Failed to connect to 3.126.27.158 port 443 after 28 ms: Could not connect to server [13:35:33] it's http only [13:35:42] should work if you remove the s [13:36:09] it won't, wikimedia.org is HSTS preloaded meaning browsers will refuse to use plaintext connections [13:36:26] oh, good point. [13:36:28] Hm [13:36:54] I wonder if that means it should just have a non wikimedia tld [13:37:41] but doing a non-TLS web service in today's era is just a bad idea in general IMO [13:38:29] I agree and also I envision a cert expiring 5 minutes before the outage that this site is meant to help with [13:38:48] Maybe AWS provides some kind of automation for cert renewal, if so I haven't found it yet but I should search some more. [13:39:30] that's not been a problem with the current wikitech-static, why do you think it's going to become one? [13:39:39] also, why no IPv6? [13:40:48] we have never needed/relied on wikitech-static, so technically /nothing/ has been a problem with the current implementation :) [13:42:47] The answer to most 'why haven't you... yet' questions is "I'm trying to build a minimum viable version of this so it can survive until an actual team actually owns this who isn't me" [13:56:40] taavi: would you 1) terminate ssl with the nginx running in the container or 2) run an ssl-terminating proxy outside the container that proxies to the container? [13:56:54] My concern with 1 is that I also want people to be able to run the container locally [13:57:38] i don't know/remember the details about the new setup, but if you want to make it re-usable then running a separate tls-terminating proxy on the host or in an another container seems nicer [13:58:18] yep, ok. [14:00:10] (the answer to 'can AWS automate this' seems to be 'yes, poorly and expensively') [15:24:44] *insert xkcd compiling meme but with 'rebuilding lima-kilo' here* [15:38:27] since when lima-kilo has persistent caching outside the VM? I've been trying to debug why deleting and re-creating the VM doesn't actually seem to re-create what I'd expect it to re-create [15:44:56] taavi: you mean the "limactl disk" cache? it was added a long time ago to help with slow network connections, but it started misbehaving recently (sometime about december IIRC) [15:45:44] dhinus: by 'misbehaving' you mean T411208 or something else? [15:45:44] T411208: [lima-kilo] error mounting docker cache - https://phabricator.wikimedia.org/T411208 [15:46:10] taavi: that's the error I was referring to yes [15:46:34] ah, I thought I introduced that when upgrading lima-vm [15:46:51] I think that started when we upgrade the vm image to trixie [15:47:03] *upgraded [15:47:47] ack, I worked around that already, I'll split the short-term fix to a separate patch then. the thing I've been dealing for the past ~hour is that the cache disk was just full and I couldn't figure out how to empty it [15:48:05] ah ok that's a different one! haven't seen it before [15:49:04] unrelated, I opened T413786 about the PAWS nfs mount error, maybe andrewbogott remembers something about those mounts? [15:49:04] T413786: paws-nfs-1 attempts invalid NFS mounts - https://phabricator.wikimedia.org/T413786 [15:49:20] I found some puppet patches removing similar mounts, but they're still present in that vm [15:49:56] I don't remember them specifically but they're surely leftovers from a previous setup [15:53:13] I'm trying to figure out if they are puppetized [15:53:28] maybe they were removed but not absent-ed? [15:53:35] remove them and see if puppet adds them back? [15:54:50] taavi: makes sense :) [15:58:16] yeah, I bet that they're just leftovers from former puppet [16:03:25] seems to have worked, puppet is not recreating them :) [16:32:40] overlay2 fix, and then kubernetes 1.31 upgrade lima-kilo patches, if someone has time: https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/300 [16:32:54] tested that all the functional tests pass with 1.31 plus poked a bit more manually [16:40:27] now with CI passing too, I think [16:45:15] +1d, I'm also rebuilding my lima-kilo vm with your changes, and it looks fine [16:46:30] any objections to me upgrading toolsbeta to 1.31 tomorrow? [16:47:37] sgtm! [17:36:44] andrewbogott: do you have the powers to "create next milestone" here? https://phabricator.wikimedia.org/project/subprojects/2773/ [17:37:07] I do, give me a second [17:37:17] seems I don't [17:38:07] dhinus: https://phabricator.wikimedia.org/project/view/8405/ [17:38:14] thanks taavi! [17:40:37] sigh T413801 [17:40:38] T413801: "Additional Hashtags" field is inconsistently placed - https://phabricator.wikimedia.org/T413801 [18:47:35] taavi: do you have experience with certbot? The docs are telling me to make a cron for renewal but the debian certbot package seems to already install a timer for that. [18:47:45] oh, oops, you're probably afk sorry [19:22:18] I have moved a few tasks from the old Q1-Q2 board to the new Q3-Q4 board, resolved some tasks that looked completed, and moved some that did not seem currently active to the "Inbox" or to "Watching" [19:23:57] there are only a handful left in the old board that I was unsure about, I added a note to the etherpad so we can discuss them in the next team meeting [19:24:38] I'm off for today, and off tomorrow (public holiday). see you on wednesday! [20:13:29] see you later!