[07:28:34] <duesen>	 Hi! I'd like to deploy a follow-up to yesterday's deployment today. I already put it into the calendar for 10:00 UTC. 
[07:28:41] <duesen>	 Will any of you be around?
[08:17:19] <effie>	 I will be around 
[09:32:10] <effie>	 duesen: btw, is it ok for redioscope to run from one DC only ? context is that we are upgrading the k8s cluster (cc elukey )
[09:59:11] <duesen>	 effie: redioscope should actually only be running in one DC... there should only be one instance of it. 
[09:59:55] <duesen>	 a second instance isn't a problem as such, we'd just have to make sure the grafana dashboards aren't edding up the numbders from prometheus, otherwise we'll double-count everything
[10:01:53] <effie>	 it looks like it is runnin on both DCs atm
[10:03:58] <effie>	 it sounds like it is ok for luca to proceed with the codfw upgrade, and potentially not redeploy redioscope on codfw once the k8s cluster is up?
[10:05:39] <duesen>	 effie: hold on, let me double-check something... maybe i'm not remembering the setup correctly...
[10:07:58] <duesen>	 Ok, I was indeed wrong. We should have one instance per DC. Redioscope is generating additional stats for the rest gateway rate limits by talking directly to redis. So on a DC where there's no traffic on the gateway, it doesn't matter if redioscope is running. And in any case, if redioscope is down for a while, nothing breaks. All it does is generate metrics.
[10:08:31] <duesen>	 So, go head with the upgrade, but please re-deploy on both DCs
[10:09:04] <duesen>	 effie: I was going to hit +2 on my first patch and start deployment, is that ok with you?
[10:10:23] <effie>	 duesen: go ahead
[10:18:53] <duesen>	 applying to staging and running tests now
[10:26:47] <duesen>	 tests are a bit flaky, running again...
[10:30:50] <duesen>	 ok, looks good. pushing to eqiad
[10:32:15] <effie>	 duesen: we have some elevated error rate on eqiad, we are looking into it 
[10:32:22] <effie>	 but i tis not related to your deployment 
[10:35:31] <duesen>	 ok, thanks for letting me know
[10:35:57] <duesen>	 I was about to apply to codfw and merge the second patch. is that ok?
[10:37:31] <duesen>	 oh right, I see the jump in 500 errors at 10:00 UTC... yea I didn't apply my change until 30 minutes later.
[10:50:43] <duesen>	 testing the second patch on staging didn't work... i as expecting issues there, it's likely a problem of the test setup, since it requires a different host header. I'm still pretty confident. I'll fiddle with the tests for a bit
[11:11:01] <duesen>	 something is off... 
[11:12:25] <duesen>	 Raine, claime: could it be that www.wikifunctions.org isn't routed through the gateway (but abstract.wikipedia.org is)? I'm not seeing wikifunctions in the hosts list.... But i did get ratelimits for calsl from abstractwiki to wikifunctions... I think?
[11:12:42] <duesen>	 When I set host:www.wikifunctions.org for a request to staging, I get a 404
[11:13:38] <duesen>	 It's not a *huge* deal, but it does prevent proper testing. I'd still like to get the ratelimit policy for the wikifunctions and abstractwiki endpoints out. as far as I can tell, they exist on both domains.
[11:25:57] <duesen>	 effie: about 15 minutes ago  request times went up and error rates as well (again). I also see a substantial increase in api requests classified as "anon browser". Could be related? 
[11:26:10] <effie>	 yes it could
[11:28:50] <duesen>	 ok... i'll leave the latest patch mered but undeployed for now, until i hear from raine or clement. I'll revert later today if I don't hear back from them.
[12:14:38] <matthieulec>	 FYI both Raine and Clement are OOO until Monday
[14:23:21] <duesen>	 matthieulec:  uh... darn... perhaps hnowlan can help, once the current emergency is over. Otherwise I'll have to revert.
[17:12:59] <duesen>	 I reverted the patch and applied the revert to staging.
[17:42:12] <swfrench-wmf>	 duesen: apologies, missed this while troubleshooting something else: yes, you're correct that API requests to wikifunctions.org are not routed through the gateway.
[17:42:42] <swfrench-wmf>	 https://gerrit.wikimedia.org/g/operations/puppet/+/0f92e9b968d397ba2980d818e5906b5f51258eb2/hieradata/common/profile/trafficserver/backend.yaml#458 <- no gateway-check.lua in the plugin stack