[12:24:15] <Mvolz>	  General heads up to SREs I'm about to reapply a change to the Zotero service which last time caused performance issues. It contains a patch that hopefully fixes it, but I'm not 100% confident the patch fixes it entirely, so there's potential it'll make zotero alert again. 
[12:36:25] <Mvolz>	 Ok, it's all looking pretty good to me, phew! I'll stick around a bit more though. 
[13:29:39] <cdanis>	 thanks for the heads up and the keeping an eye Mvolz
[13:57:35] <moritzm>	 I'm upgrading Java on the puppetservers, which requires an immediate restart due to the jruby JIT, there'll be a few failing puppet runs, but I'll splay these out so rate remains low
[14:01:14] <sukhe>	 thanks
[15:09:42] <andrewbogott>	 Existing bastion hosts seem to be on 1G networking. Is there any reason for us to move new ones to 10G?  That's a question for basically anyone who has a reason, but mostly for moritzm
[15:10:13] <cdanis>	 is there a reason not to? :)
[15:11:03] <andrewbogott>	 only that it's currently on 1G and we'll have to move it
[15:11:25] <andrewbogott>	 or, more accurately, john will have to move it
[15:14:11] <taavi>	 move what from where? if you're talking about new hosts?
[15:14:32] <moritzm>	 from what I can tell all new servers we buy have a 10G NIC, don't they? then it's mostly a matter of having a free 10G port. we won't really need the full capacity, but if there's a free port let's use it
[15:14:56] <taavi>	 do we still have 1G-only switch ports?
[15:15:37] <andrewbogott>	 um... jclark-ctr ^ ?
[15:17:22] <moritzm>	 is this for https://phabricator.wikimedia.org/T416254#11578273 ? not sure why this was assigned for you, happy to take over since SRE IF owns these anyway
[15:19:49] <andrewbogott>	 works for me! jclark-ctr just pm'd me about it I think because it's in a flood of other servers he was asking me about.
[15:19:55] <andrewbogott>	 I just added you to the (very boring) site.pp patch
[15:22:20] <jclark-ctr>	 Majority of a and b rows are 1g
[15:22:27] <jclark-ctr>	 @taavi:
[15:24:07] <jclark-ctr>	 It would be more Beneficial power wise between racks to be in 1g @andrewbogott @taavi
[15:25:26] <andrewbogott>	 I never use bastions for anything other than ssh so bandwidth never ever matters to me. I was asking in the channel in case someone wanted to say "omg please move to 10G I use bastions to rsync full videos daily"
[15:25:32] <andrewbogott>	 although that would raise other interesting questions :)
[15:26:19] <cdanis>	 10G bastions is good for tunnelencabulator usage growth ;)
[16:04:45] <Firefishy>	 _joe_: Here is the HN about IPIDEA shutdown by Google with the very surprising commentary I mentioned on call yesterday: https://news.ycombinator.com/item?id=46802748
[16:05:24] <Firefishy>	 cdanis: ^^ if interested.
[16:14:30] <swfrench-wmf>	 `Deployment zotero-production in zotero at codfw has persistently unavailable replicas`. looks like zotero is struggling a bit, starting at around 12:30 UTC: https://grafana.wikimedia.org/goto/yidzosNDR?orgId=1
[16:14:30] <swfrench-wmf>	 readiness probes are flapping, and there's a pretty sizable uptick in CPU usage: https://grafana.wikimedia.org/goto/SLtmosHDg?orgId=1
[16:14:59] <cdanis>	 swfrench-wmf: maybe related to Mvolz's deploy this morning?
[16:15:12] <swfrench-wmf>	 yeah, that's what I'm wondering
[16:15:38] <_joe_>	 Firefishy: ugh I will read it with spite and hatred :D thanks
[16:16:17] <swfrench-wmf>	 Mvolz: ^ not sure exactly which signals you're using to evaluate the new zotero release, but FYI it might be struggling a bit (not exactly clear why / how, or whether this actually corresponds to badness for clients).
[16:20:23] <fabfur>	 Firefishy: reading these comments is giving me stomachache  
[16:30:34] <Firefishy>	 yeah, I had to close the window after a bit and go for a walk... ;-)
[16:40:01] <hnowlan>	 zotero connection length is notably way up, along with a few similar metrics https://grafana.wikimedia.org/goto/IBa0TyNDR?orgId=1 
[17:15:37] <elukey>	 For the oncallers: as FYI I am dropping old Tegola buckets on Thanos swift via s3cmd for https://phabricator.wikimedia.org/T396584. If you see anything weird, I have a tmux session running on stat1010, feel free to kill it.
[17:20:11] <_joe_>	 poor thanos
[17:20:22] <_joe_>	 dropping a bucket of tegole onto it
[18:00:58] <inflatador>	 elukey I used to have to do bulk Swift container deletes for Rackspace swift back in the day, you might try https://gholt.github.io/swiftly/2.06/# if you need to delete based on a pattern
[18:01:24] <inflatador>	 I have actually used it successfully on Thanos although it's been a few years
[18:03:52] <inflatador>	 https://wikitech.wikimedia.org/wiki/Swift/How_To#Fine-grained_object_deletions_with_Swiftly 
[18:47:57] <herron>	 thanks for the heads up elukey
[18:52:52] <elukey>	 inflatador: thanks! So far s3cmd is really nice, with one since command it can iteratively remove all buckets in batches
[18:53:11] <elukey>	 I used the swift API/command and it was way worse and less streamlined
[19:01:38] <inflatador>	 elukey cool, `swiftly` is different from `swift` CLI but it would not shock me to learn that s3cmd is way better. If you have to do that a lot LMK, I've been looking for an excuse to write an Airflow DAG to delete all our Flink buckets on a schedule
[19:12:07] <Mvolz>	 I am back! *sigh*
[19:12:37] <Mvolz>	 So the effect on clients would be more like https://grafana.wikimedia.org/d/NJkCVermz/citoid?orgId=1&refresh=5m&from=now-12h&to=now&timezone=utc&var-dc=000000026&viewPanel=panel-46 and it was looking okay but I can see at 6 utc it looks like there's an effect on clients... 
[19:12:54] <Mvolz>	 Not nearly as dramatic as last week but I think probably revert. 
[19:12:54] <Mvolz>	 sigh