[07:44:52] <XioNoX>	 can I get an (hopefully) easy +1 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1180724
[07:47:50] <fabfur>	 XioNoX: could be useful to add what (broadly) the test vm would be used for? 
[07:48:03] <Emperor>	 XioNoX: yes
[07:48:22] <fabfur>	 ah sorry I just seen the T396864 on the comment
[07:48:22] <stashbot>	 T396864: Routed Ganeti: same node DHCP limitation - https://phabricator.wikimedia.org/T396864
[07:48:24] <fabfur>	 nv
[07:48:24] <Emperor>	 sorry, fabfur, didn't see your comment here (though I'd say the link to the task was sufficient) 
[07:48:50] <fabfur>	 Emperor: yes, sorry, I didn't saw that before, anyway +1 as it was non-blocking anyway
[08:03:34] <XioNoX>	 thx!
[08:10:07] <moritzm>	 FYI, I'm installing Java 17 security updates on the Puppet servers, these need an immediate restart of the Puppet server, so there will few Puppet failures (usually 10-20 fleet-wide), I'll spread these out to minimise impact
[14:35:29] <Krinkle>	 nemo-yiannis: I propose 1) deploy restbase change, 2) figure out how to get mobileapps responsive, 3) implement ats plugin for beta that is basically "rest-gateway light" which is similar to what production does already but mapping to mobileapps direclty instead of through another layer.
[14:35:53] <Krinkle>	 I can do #3 and but need some help with #1 and #2.
[14:35:59] <Krinkle>	 Is the config.labs.yaml file in the repo actually used?
[14:37:50] <Krinkle>	 maintenance would be the less that way, since we either keep beta versions of 3 things (2 ats plugins + rest-gateway) and keeping the service running and configured and updated vs keeping a beta version of 1 thing (1 ats plugin).
[14:44:02] <nemo-yiannis>	 I am not sure if i understand, did we change the domains for beta? Because other than that, mobileapps should respond properly
[14:44:28] <Krinkle>	 I added a comment on the task. the backend service doesn't seem to work. your example works because it is returning an old cached revision restbase has locally
[14:44:35] <nemo-yiannis>	 ah got it
[14:44:39] <nemo-yiannis>	 i can check the mobileapps on beta
[14:44:50] <Krinkle>	 but yes we did change domains. wmflabs.org>wmcloud.org
[14:45:11] <nemo-yiannis>	 ok got it
[14:45:11] <Krinkle>	 Gergo's patch might fix that, I'm not familiar with where that is decided/maintained.
[14:45:35] <Krinkle>	 but that will, I suspect, only reveal the next issue whcih is that mobileapps is not responding since the domain restrict feature was added
[14:46:17] <Krinkle>	 once the mobileapps service is working, I'm happy to implement the ats plugin for beta that routes /api/rest to mobileapps as-needed
[14:49:17] <Krinkle>	 ref T402206
[14:49:18] <stashbot>	 T402206: HyperSwitch/errors/not found (404) on beta cluster: There was an issue displaying this preview - https://phabricator.wikimedia.org/T402206
[14:49:24] <nemo-yiannis>	 ok i think (?) that i can allow all domains on beta
[14:50:41] <Krinkle>	 I assume that mechanism is defense in-depth given that it is not publicly exposed in prod or beta, I think?
[14:51:14] <Krinkle>	 although I do see `https://mobileapps.wmflabs.org` defined.
[14:51:18] <Krinkle>	 not sure what that's used for 
[14:52:10] <Krinkle>	 in prod we use `mobile_html_rest_api_base_uri: "//{{host}}/api/v1/"`
[14:52:15] <Krinkle>	 seems like that should work in beta too
[15:00:11] <Krinkle>	 hm.. looks like we already do that the same way in beta on the actual server. maybe the config.labs.yaml file isn't used?
[15:38:25] <Krinkle>	 it's also referenced here: https://gerrit.wikimedia.org/g/mediawiki/services/restbase/deploy/+/1586262e70251e81a12ea0f01482b7e45e2b683c/scap/vars.yaml#26
[15:38:54] <Krinkle>	 whcih appears to be a prod config (new domains are regularly added), but I assume prod restbase doesn't call beta.
[15:41:47] <Krinkle>	 looks like puppet writes a different config file:
[15:41:48] <Krinkle>	 https://gerrit.wikimedia.org/g/operations/puppet/+/9aaf502587ab8dc389bc6c0c0d3a283e346fa4d1/modules/service/manifests/node/config/scap3.pp#72
[15:42:05] <Krinkle>	 https://gerrit.wikimedia.org/g/operations/puppet/+/9aaf502587ab8dc389bc6c0c0d3a283e346fa4d1/modules/profile/manifests/restbase.pp#141
[15:42:28] <Krinkle>	 so I'm guessing somewhere both are read, and we don't want the underlying layer to be empty/incomplete?
[16:40:53] <taavi>	 moritzm: hi, seems like you upgraded openssh-server fleet wide earlier today? seems like that uninstalled systemd-timesyncd and some other packages at least on cloudnet1005 for some reason
[16:42:57] <volans>	 AFAICT from debmonitor systemd-timesyncd is missing only on 34 hosts
[16:43:27] <andrewbogott>	 that's interesting -- what do they have in common? weird apt sources?
[16:47:29] <volans>	 cloudnet[1005-1006].eqiad.wmnet and dns* hosts AFAICT
[16:47:54] <taavi>	 for DNS that is expected I think, as they run 'proper' NTP daemons
[16:48:13] <andrewbogott>	 volans: on the host we're looking at (cloudnet1005) systemd-timesyncd is installed, but puppet wants to upgrade it and can't
[16:48:28] <andrewbogott>	 So the symptom in that case isn't a missing package but a puppet failure
[16:48:54] <volans>	 it shous as rc  systemd-timesyncd
[16:48:59] <andrewbogott>	 same on cloudnet1006 but /not/ on cloudnet2005-dev which should have identical packages 
[16:49:06] <andrewbogott>	 oh you're right, nm
[16:49:16] <taavi>	 it is not installed. it was, but the openssh upgrade made apt solver think removing that was the correct choice
[16:49:44] <volans>	 so, the good news is that it doesn't seem a widespread issue
[16:49:51] <andrewbogott>	 yeah
[16:50:06] <taavi>	 apparently these hosts have a /etc/apt/preferences.d/systemd.pref with `release n=bookworm`??
[16:50:32] <taavi>	 that could explain this, now that the packages are coming from the security repo
[16:51:00] <volans>	  /var/log/apt/history.log is clear on the what happened, less on the why
[16:52:22] <taavi>	 andrewbogott: I think the systemd apt pin in openstack::serverpackages::epoxy::bookworm is to blame, it's excluding the systemd package in bookworm-security
[16:52:54] <andrewbogott>	 hm, why not happening on other epoxy/openstack hosts then?
[16:52:56] * andrewbogott makes sure it isn't
[16:54:42] <andrewbogott>	 it isn't
[17:02:51] <andrewbogott>	 cloudnet1006 is the standby, I tried removing the epoxy pins and retried, still can't install timesyncd
[17:03:05] <andrewbogott>	 so I don't think it's related to the osbpo
[17:05:52] <taavi>	 did you try removing the systemd pin?
[17:09:13] <andrewbogott>	 that seems to be doing something. That pin is related to the epoxy osbpo somehow?
[17:14:23] <taavi>	 that pin is defined in the server packages class. the comment references T247013 from 2020.
[17:14:23] <stashbot>	 T247013: cloudservices1003/1004: Warning: NTP not enabled! - https://phabricator.wikimedia.org/T247013
[17:15:09] <taavi>	 the pin seems odd to me (in general these hosts should require an explicit opt-in for backports anyway), but are also buggy as they exclude the -updates and -security repos as I said earlier
[17:15:39] <taavi>	 if they're really still needed then they could be reversed to set a low/negative priority for -backports, but I suspect they could be dropped entirely
[17:16:40] <andrewbogott>	 yes, the reasons described in that thread seem very no-longer-applicable
[17:36:27] <moritzm>	 not sure what led apt to uninstalling these, but the systemd pre indeed seems to be the root cause
[17:37:46] <moritzm>	 it's probably from a time when the openstack repo included systemd as well?
[21:26:56] <inflatador_>	 heads-up that we've depooling eqiad search again.  We've suppressed alerts for the next 2 hrs as well, but do reach out if you see anything
[21:50:24] <inflatador_>	 OK, eqiad is repooled and everything is looking good