[08:45:11] <elukey>	 hey folks, just an heads up - me and Riccardo reworked a little the tox settings for spicerack, it should be faster now both locally and in CI. All tracked in https://phabricator.wikimedia.org/T420475, I suggest to clean up your .tox spicerack dir the next time you work on it. We are still working on reducing the timing further, we'll keep you posted :)
[08:45:26] <fabfur>	 👍
[08:47:45] <XioNoX>	 nice!
[09:10:08] <bjensen>	 hey folks, i'm trying to understand the difference between 'critical' and 'warning' alerts. is the difference primarily how those alerts are displayed on alerts.wikimedia.org? or is there more to it?
[09:13:50] <volans>	 icinga or alertmanager? in general warnings don't notify on IRC 
[09:14:01] <bjensen>	 alertmanager
[09:14:22] <bjensen>	 hm, i see
[09:16:40] <volans>	 but you can also have alerts that create phab tasks for example
[09:17:16] <volans>	 so depends what you want to achieve, but I'll 301 to o11y for the details as I'm not too familiar with the details
[09:32:33] <bjensen>	 thanks!
[10:17:31] <XioNoX>	 bjensen: it depends on how the routing is configured in https://github.com/wikimedia/operations-puppet/blob/production/modules/alertmanager/templates/alertmanager.yml.erb each team have their own policies kind off
[10:32:04] <bjensen>	 ah, okay, so not a standard, makes sense
[10:32:21] <Emperor>	 elukey: I've updated T423286, but: I tried ms-be2069 without any firmware upgrades today, and it's the same failure mode - installer works, reboots fine, but after the initial puppet run it's unbootable, hanging at "GRUB " forever :(
[10:32:21] <stashbot>	 T423286: Initial puppet run makes ms-be2068 unbootable - https://phabricator.wikimedia.org/T423286
[10:35:19] <elukey>	 Emperor: very weird, at this point I'd try to target a different os to see the difference. IIUC you are targeting bullseye but we'll have to move away from it during the next 2/3 months anyway, I am wondering if bookworm and/or trixie make any difference (namely, I am trying to exclude variables like you did with the firmware upgrade)
[10:38:28] <Emperor>	 elukey: I can try another OS, but we realistically can't move swift off bullseye in the near future (constructing a test cluster to even test the process is a goal for this quarter)
[10:38:45] <Emperor>	 I'll give trixie a go on ms-be2069
[14:07:51] <James_F>	 In testing the Wikifunctions k8s staging services, curl is saying the certificates have expired. Is this a known thing? Should I file a task?
[14:10:31] <claime>	 elukey: Might this be related to your work on cert-manager?
[14:18:33] <elukey>	 James_F: yes yes my fault sorry, I was testing something that it sound working :D I am going to revert later on, sorry for the trouble
[14:18:44] <elukey>	 for the moment you can just accept those certs expired
[14:18:57] <James_F>	 How do I do that? curl -k still just reports the error.
[14:21:54] <cdanis>	 is curl saying the certs have expired or is curl proxying an upstream reverse proxy telling another reverse proxy that its certs have expired 😅
[14:22:02] <James_F>	 Probably the latter.
[14:22:18] <cdanis>	 can you `curl -v` ?
[14:23:03] <cdanis>	 you can `|& phaste` if you don't wanna read all that
[14:23:20] <James_F>	 https://phabricator.wikimedia.org/P90793
[14:23:44] <cdanis>	 ah ok so it is curl directly, while talking to what must be the staging ingress
[14:24:00] <cdanis>	 and then you ignore that, and then you get an openssl error from envoy
[14:25:07] <James_F>	 Yes, if I don't pass in -k I get roughly the same error but formatted differently, presumably from… whatever is between me and the staging ingress.
[14:31:52] <elukey>	 James_F: will revert in ~30 mins if it is ok for you
[14:32:06] <James_F>	 elukey: No worries, I've reverted my attempted deploy.
[14:57:28] <elukey>	 James_F: should be fixed now or in few mins, cert-manager is up and certs should renew now
[15:33:06] <James_F>	 elukey: Confirming it's fixed, thanks!
[16:21:17] <Krinkle>	 mutante: could use some help/ideas around what to do with locks at T421147
[16:21:18] <stashbot>	 T421147: Codesearch stuck at Feb 12th? - https://phabricator.wikimedia.org/T421147
[16:21:42] <Krinkle>	 do you know of any cases where a git lock is not safe to delete – if we take as given there are no running git processes?
[19:25:12] <andrewbogott>	 One of my favorites is when the syslog of a crashed server just says "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"  -- does that /mean/ something?  Why that character in particular?
[19:25:37] <cdanis>	 andrewbogott: that's often how NUL (0) renders in a terminal
[19:26:05] <andrewbogott>	 yeah. So it really is just the server going  'ummmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm'
[19:26:26] <andrewbogott>	 not a userful diagnostic :(
[19:26:30] <andrewbogott>	 *useful
[19:38:26] <inflatador>	 I've seen that quite a few times after a disk failure
[19:40:16] <andrewbogott>	 disk failure would certainly explain the lack of log messages