[05:53:42] <_joe_> Amir1: https://gitlab.wikimedia.org/repos/sre/conftool/-/merge_requests/37 [09:33:37] _joe_: thanks! [10:11:34] hey folks! If you run "tox" locally for the cookbooks repo please rm the .tox dir since we just merged https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1078427 (to fix some pylint issues). [10:12:04] I know it is not ideal and annoying so if you have anything against it please blame me and Riccardo [10:12:13] rm -rf .tox/py*-prospector "is enough" :D [10:12:24] (signed: your dear automation folks) [10:26:03] <_joe_> let me repeat again [10:26:10] <_joe_> please freeze the prospector version [10:26:18] <_joe_> both in cookbooks and spicerack [10:26:20] <_joe_> don' [10:26:26] <_joe_> t make me do it for you :) [10:26:40] <_joe_> and upgrades should only happen explicitly [10:26:50] <_joe_> signed: everyone in SRE not named volans :) [10:27:06] it was frozen yesterday and has been upgraded explicitly today [10:27:18] and due to tox inability to manage the venv it requires the rm [10:32:12] _joe_ I pulled the trigger on this change, sometimes we just need to upgrade, hence the msg :) The rest is ok don't worry :) [10:33:31] and I plan to get rid of it soon~ish [13:53:25] do you know if we have prometheus-based alerts if a kernel panic happens on a server? [14:02:22] jhathaway, brett: calm morning, I only documented https://phabricator.wikimedia.org/T303534#10209698 [14:02:39] sorry, that was for swfrench-wmf [14:03:09] no worries, and thank you :) [14:15:08] thanks jynus [14:47:18] !log deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461) [14:47:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:21] T376461: Remove unused user property growthexperiments-tour-newimpact-discovery - https://phabricator.wikimedia.org/T376461 [16:29:15] puppetserver1001 needs to be taken down for hardware maintenance, starting in 10 minutes, I would disable Puppet fleet-wide and for 15-20 mins reimages and decoms would fail [16:29:32] unless it's bad time for anyone, then please let me know [16:29:55] moritzm: I think swfrench-wmf is still doing some LVS restarts [16:30:18] ah, good timing :) [16:30:36] moritzm: I have not touched anything yet, so please proceed [16:30:47] and thanks for flagging, cdanis [16:30:50] ok, thanks. I'll ping you when we're done [16:31:02] (but would still wait another 10mins for anyone to object) [16:57:02] puppetserver1001 maintenance is complete and Puppet is re-enabled fleet-wide [16:57:22] swfrench-wmf: you can proceed with pybal restart [16:58:01] and for general context: we have been updating the RAM on puppet servers in eqiad (codfw to follow) to 128G [16:58:39] mortizm: ack, and thanks for the heads-up - that was quick :) [16:58:53] sacrifices to the Java gods essentially [16:59:01] moritzm: ^ (apologies for typos, heh) [16:59:30] np at all, I've seen every permutation of my name by now :-) [17:00:44] and for comparison: the old puppet 5 masters (mod_passenger and Ruby) had 16G and we barely used half of it even when all servers were on Puppet 5 [17:01:06] and now jruby/clojure [20:49:56] mutante: either you can merge mine, or let me know if it's OK to merge yours :) [20:50:18] urandom: yea, I just noticed the same thing. it's ok if you type "multiple" [20:50:25] sometimes it can separate them and sometimes not [20:50:38] mutante: done. [20:50:44] I am checking my part isnt breaking anything. thank you [22:32:05] current number of active critical alerts: 76 (are they really that critical if we can ignore them?) [22:32:30] maybe a good chunk should actually just be warnings