[12:24:15] General heads up to SREs I'm about to reapply a change to the Zotero service which last time caused performance issues. It contains a patch that hopefully fixes it, but I'm not 100% confident the patch fixes it entirely, so there's potential it'll make zotero alert again. [12:36:25] Ok, it's all looking pretty good to me, phew! I'll stick around a bit more though. [13:29:39] thanks for the heads up and the keeping an eye Mvolz [13:57:35] I'm upgrading Java on the puppetservers, which requires an immediate restart due to the jruby JIT, there'll be a few failing puppet runs, but I'll splay these out so rate remains low [14:01:14] thanks [15:09:42] Existing bastion hosts seem to be on 1G networking. Is there any reason for us to move new ones to 10G? That's a question for basically anyone who has a reason, but mostly for moritzm [15:10:13] is there a reason not to? :) [15:11:03] only that it's currently on 1G and we'll have to move it [15:11:25] or, more accurately, john will have to move it [15:14:11] move what from where? if you're talking about new hosts? [15:14:32] from what I can tell all new servers we buy have a 10G NIC, don't they? then it's mostly a matter of having a free 10G port. we won't really need the full capacity, but if there's a free port let's use it [15:14:56] do we still have 1G-only switch ports? [15:15:37] um... jclark-ctr ^ ? [15:17:22] is this for https://phabricator.wikimedia.org/T416254#11578273 ? not sure why this was assigned for you, happy to take over since SRE IF owns these anyway [15:19:49] works for me! jclark-ctr just pm'd me about it I think because it's in a flood of other servers he was asking me about. [15:19:55] I just added you to the (very boring) site.pp patch [15:22:20] Majority of a and b rows are 1g [15:22:27] @taavi: [15:24:07] It would be more Beneficial power wise between racks to be in 1g @andrewbogott @taavi [15:25:26] I never use bastions for anything other than ssh so bandwidth never ever matters to me. I was asking in the channel in case someone wanted to say "omg please move to 10G I use bastions to rsync full videos daily" [15:25:32] although that would raise other interesting questions :) [15:26:19] 10G bastions is good for tunnelencabulator usage growth ;) [16:04:45] _joe_: Here is the HN about IPIDEA shutdown by Google with the very surprising commentary I mentioned on call yesterday: https://news.ycombinator.com/item?id=46802748 [16:05:24] cdanis: ^^ if interested. [16:14:30] `Deployment zotero-production in zotero at codfw has persistently unavailable replicas`. looks like zotero is struggling a bit, starting at around 12:30 UTC: https://grafana.wikimedia.org/goto/yidzosNDR?orgId=1 [16:14:30] readiness probes are flapping, and there's a pretty sizable uptick in CPU usage: https://grafana.wikimedia.org/goto/SLtmosHDg?orgId=1 [16:14:59] swfrench-wmf: maybe related to Mvolz's deploy this morning? [16:15:12] yeah, that's what I'm wondering [16:15:38] <_joe_> Firefishy: ugh I will read it with spite and hatred :D thanks [16:16:17] Mvolz: ^ not sure exactly which signals you're using to evaluate the new zotero release, but FYI it might be struggling a bit (not exactly clear why / how, or whether this actually corresponds to badness for clients). [16:20:23] Firefishy: reading these comments is giving me stomachache [16:30:34] yeah, I had to close the window after a bit and go for a walk... ;-) [16:40:01] zotero connection length is notably way up, along with a few similar metrics https://grafana.wikimedia.org/goto/IBa0TyNDR?orgId=1 [17:15:37] For the oncallers: as FYI I am dropping old Tegola buckets on Thanos swift via s3cmd for https://phabricator.wikimedia.org/T396584. If you see anything weird, I have a tmux session running on stat1010, feel free to kill it. [17:20:11] <_joe_> poor thanos [17:20:22] <_joe_> dropping a bucket of tegole onto it [18:00:58] elukey I used to have to do bulk Swift container deletes for Rackspace swift back in the day, you might try https://gholt.github.io/swiftly/2.06/# if you need to delete based on a pattern [18:01:24] I have actually used it successfully on Thanos although it's been a few years [18:03:52] https://wikitech.wikimedia.org/wiki/Swift/How_To#Fine-grained_object_deletions_with_Swiftly [18:47:57] thanks for the heads up elukey [18:52:52] inflatador: thanks! So far s3cmd is really nice, with one since command it can iteratively remove all buckets in batches [18:53:11] I used the swift API/command and it was way worse and less streamlined [19:01:38] elukey cool, `swiftly` is different from `swift` CLI but it would not shock me to learn that s3cmd is way better. If you have to do that a lot LMK, I've been looking for an excuse to write an Airflow DAG to delete all our Flink buckets on a schedule [19:12:07] I am back! *sigh* [19:12:37] So the effect on clients would be more like https://grafana.wikimedia.org/d/NJkCVermz/citoid?orgId=1&refresh=5m&from=now-12h&to=now&timezone=utc&var-dc=000000026&viewPanel=panel-46 and it was looking okay but I can see at 6 utc it looks like there's an effect on clients... [19:12:54] Not nearly as dramatic as last week but I think probably revert. [19:12:54] sigh