[04:32:13] <Krinkle>	 _joe_: updated in your behalf at https://phabricator.wikimedia.org/T266055#6782828, as Aaron was wondering. hope I got that right, more or less.
[08:19:11] <marostegui>	 In 40 minutes I will restart m5 (wikitech) database primary master, wikitech will be unavailable for around 1 or 2 minutes
[08:54:37] <marostegui>	 In 5 minutes I will restart m5 (wikitech) database primary master, wikitech will be unavailable for around 1 or 2 minutes
[09:10:39] <_joe_>	 vgutierrez: I'll upgrade pybal on the ulsfo backup lvs now
[09:10:50] <vgutierrez>	 cool, go ahead
[09:44:27] <moritzm>	 FYI, I'm enabling CAS for debmonitor in ~ 10 minutes
[09:45:22] <volans>	 ack, thx
[09:46:56] <_joe_>	 will that affect docker-report?
[09:47:10] <_joe_>	 I'm guessing it won't, given it doesn't use password auth
[09:47:53] <moritzm>	 yeah, this only affects the web UI, the internal endpoint for image submission will remain unchanged
[10:09:22] <_joe_>	 hnowlan: I don't find a dns PTR for the IPs of the sockpuppet service, nor the A record for what it's worth
[10:09:50] <_joe_>	 which means, given the service is in state:production, that I'm not sure how icinga might monitor it
[10:10:02] <_joe_>	 oh you have the discovery record, I see
[10:25:38] <hashar>	 elukey: I commented on the archiva/nginx config. Thanks for jumping into it!  I am now wondering whether the slowness could be caused by Nginx itself since it seems to be buffering proxied requests.  I have commented on https://gerrit.wikimedia.org/r/c/operations/puppet/+/608812 :]
[10:36:10] <elukey>	 hashar: thanks for the review, I commented as well :)
[10:36:31] <elukey>	 it feels more that Jetty/Archiva is the culprit, and send_file in nginx should help a lot
[10:36:55] <elukey>	 but I am open to try different settings
[10:54:15] <hashar>	 elukey: yeah your change (bypassing jetty/archiva) kind of make sense
[10:54:27] <hashar>	 but that left me to wonder how the perf could be THAT crippled :]
[10:55:16] <hashar>	 last night I looked at Archiva doc a bit, it seems to indicate it has an in memory buffer for artifacts. So assuming that cache is enabled on our setup, I would assume the files to be served straight from memory after a second maven run
[10:55:19] <hnowlan>	 _joe_: true, I have a CR open for that https://gerrit.wikimedia.org/r/c/operations/dns/+/658976
[10:55:56] <_joe_>	 hashar: it's java(TM)
[10:56:06] <_joe_>	 hnowlan: ack :)
[10:56:33] <elukey>	 hashar: yes I agree that it could be default settings + jvm + jetty the main culprit, but also remember that archiva1002 is relatively tiny ganeti vm so we cannot cache a ton of things in memory (but we can expand it if needed)
[11:09:31] <hashar>	 elukey: yeah dully noted :]
[13:15:02] <hashar>	 elukey: disabling nginx buffering made it faster (from 25 seconds to 17 seconds) (maven central is 7 seconds)
[13:15:11] <hashar>	 but yeah that got improved
[13:15:31] <hashar>	 it still spend 1.1 seconds idling before the transfer starts though
[14:36:33] <marostegui>	 This is pretty cool https://blog.seekwell.io/gpt3
[14:50:04] <elukey>	 hashar: thanks great!
[14:52:44] <bblack>	 can we get a GPT-3 that writes OKRs for us? :)
[15:10:52] <_joe_>	 bblack: who told you someone doesn't already have it?
[16:02:22] <arturo>	 can someone tell if this homer diff is OK to merge?
[16:02:25] <arturo>	 https://www.irccloud.com/pastebin/M0cJza1d/
[16:02:53] <arturo>	 nevermind, that wasn't the device I was looking for
[16:03:27] <cdanis>	 volans and I have been puzzling over that diff for a day or two now
[16:03:53] <cdanis>	 arturo: please let me know when your homer run is completed
[16:03:59] <arturo>	 cdanis: done
[16:04:03] <cdanis>	 ty
[16:04:10] <arturo>	 https://www.irccloud.com/pastebin/S9XrZB3F/
[16:04:13] <arturo>	 for the record
[17:18:47] <dancy>	 Hi SRE folks.  Can someone +2 https://gerrit.wikimedia.org/r/c/operations/puppet/+/659335 ?
[17:20:40] <effie>	 I can
[17:23:20] <dancy>	 Thanks effie!
[17:24:49] <effie>	 :D
[18:07:40] <_joe_>	 dancy: btw, I did score some progress en route to restart php-fpm at every deployment
[18:07:55] <dancy>	 I saw that!
[18:08:03] <dancy>	 Happy to see the progress.  Looks promising
[18:08:05] <_joe_>	 there is one detail - how scap will handle failures from the restart script
[18:08:22] <dancy>	 Nod.  I believe Tyler answered that question well.
[18:08:23] <_joe_>	 I need to understand that, basically, then I think we can start testing
[18:08:28] <dancy>	 (the answer is, it keeps going)
[18:08:31] <_joe_>	 oh ok, I missed it :)
[18:09:16] <_joe_>	 heh yeah I might want to do some more tweaks then to the way we restart the service, then
[18:09:48] <_joe_>	 right now the script is "defensive". if something goes wrong, it exits with non-zero exit code expecting someone with experience to take a look
[18:10:08] <_joe_>	 we might want to instead just restart unconditionally if something goes wrong with depooling/repooling
[18:10:57] <dancy>	 Sounds reasonable to me.