[07:26:36] <kormat>	 moritzm: thanks for fixing my brainfart in admin/data :)
[07:29:02] <_joe_>	 kormat: if only we had type checks on the content of admin.yaml :P
[07:29:11] <_joe_>	 and some spec tests for the admin module
[07:29:31] <kormat>	 how soon will we have puppet rules to write puppet data? :)
[07:31:22] <_joe_>	 kormat: the type system in puppet is basically a data validation system :)
[07:32:22] <_joe_>	 with some added awesomeness because implicit casting between types is sometimes allowed, sometimes not
[07:33:25] <_joe_>	 as usual, brandon's first law of puppet holds https://bash.toolforge.org/quip/AVfTAUmefIH_7EDsriqu
[07:33:51] <kormat>	 oh god :)
[09:17:51] <_joe_>	 rzl: I love httpbb every day more
[09:18:16] <_joe_>	 everyone should know and use it, I'm thinking it's useful even from my computer for testing externally.
[13:52:02] <jynus>	 you gotta appreciate the RAID utility literally saying "Status: �"
[13:54:25] <rzl>	 _joe_: \o/
[13:56:50] <_joe_>	 rzl: oh, that means you will need to make a proper debian package 😱
[13:57:01] <rzl>	 sorry, you broke up for a minute, I couldn't quite hear you
[13:57:11] <cdanis>	 kormat: btw puppet rules to write puppet data have been proposed before, see also https://wikitech.wikimedia.org/wiki/Cergen#Future_work
[13:57:34] <_joe_>	 oh the Cergen topic
[13:57:52] <kormat>	 hah :)
[13:57:55] <_joe_>	 I used cfssl for a docker compose thing to test envoy this week
[13:58:01] <liw>	 rzl, if you need help and guidance for making a Debian package, there are a bunch of people capable of doing that, so do ask
[13:58:04] <_joe_>	 it's so much better than cergen :P
[13:58:13] <rzl>	 liw: thanks!
[14:01:53] <cdanis>	 kormat: wow you've used dbctl already? don't look too closely at that sausage
[14:03:06] <jynus>	 don't worry, he won't notice because he will be staring at the code using dbctl, which is much worse!
[14:03:24] <jynus>	 :-D
[14:03:30] <jynus>	 0:-)
[14:21:20] <kormat>	 cdanis: fortunately i live somewhere that has a deep appreciation for sausage
[14:23:27] <cdanis>	 kormat: okay well then I hope this is tasty https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/77eccc165e73dd2d137ca91f408ed295c2d87d3f/wmf-config/etcd.php#45
[14:24:06] <kormat>	 it's written in php? 😮
[14:25:11] <cdanis>	 no, dbctl (and conftool) are in python, in another repo
[14:25:31] <cdanis>	 this is the 'config' code that glues its output into the internal datastructures used by mediawiki
[14:27:47] <cdanis>	 dbctl code is here, if you are morbidly curious https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/conftool/+/master/conftool/extensions/dbconfig
[14:28:37] <kormat>	 ahh
[14:36:02] <jynus>	 in my mind, in the future dbctl should orchestrate a proxy service, not php code, but that's how it works now :-D
[14:40:07] <cdanis>	 yeah, that seems better :)
[14:40:49] <jynus>	 which is exactly how conftool/pybal works, but those have the privilege of being on a top layer :-D
[14:41:04] <jynus>	 dbs being on a lower one
[14:43:06] <jynus>	 cdanis: it is all planned on T119626, 3 or 4 days of work at most :-)
[14:43:06] <stashbot>	 T119626: Eliminate SPOF at the main database infrastructure - https://phabricator.wikimedia.org/T119626
[15:36:22] <chaomodus>	 o/
[16:44:28] <cdanis>	 elukey: do I recall correctly that the things blocking upgrading memcache hosts past jessie was having either the gutter pool live or doing the DC switchover?
[16:44:52] <rzl>	 yeah, I think that's correct
[16:44:56] <cdanis>	 like, we can reimage memcache hosts in the secondary datacenter, but not in the primary, unless we have gutter pool?
[16:45:41] <volans>	 cdanis: here's the perfect solution, failover memcache to codfw while keeping mw on eqiad :-P
[16:45:44] * volans hides
[16:46:00] <rzl>	 cmon volans that'd never work
[16:46:05] <moritzm>	 yeah, but the reimage approach is a bit of a gamble if one only learns with the DC failover if the new setup works fien (given new OS and new memcached release)
[16:46:10] <rzl>	 what we have to do is move the mc2* hosts into eqiad, for latency reasons
[16:46:23] <volans>	 lol
[16:46:34] <cdanis>	 moritzm: I mean, not having a testbed environment, or additional hardware, are each their own problem, yes
[16:47:06] <moritzm>	 yeah, totally
[16:50:22] <cdanis>	 what was the status of gutter pool testing?  all I remember was that ef.fie was working on it, some initial results, but ofc she isn't now
[16:51:36] <rzl>	 most of the way there, and handed over to elukey to finish up I believe, not sure of the latest
[16:52:28] <rzl>	 (of course that was the beginning of march, in the Before Times)
[17:28:22] <_joe_>	 cdanis: no that's not a very accurate characterization of the problem
[17:28:33] <_joe_>	 of either of them actually
[17:28:41] <_joe_>	 so.
[17:29:06] <_joe_>	 Upgrading memcached was postponed waiting for redis to be dismissed there, as we really wanted not to have to manage a transition to redis 5
[17:29:39] <rzl>	 ohh, that's right
[17:29:57] <_joe_>	 at the same time, having the gutter pool on buster allows us a relatively low-risk way of getting accustomed to tuning a new memcached version, with different tunables and slab allocation algorhythms, to our reality
[17:30:06] <_joe_>	 we've had to do that in the past too
[17:30:44] <_joe_>	 now, that can be easier using mcrouter, that allows you to shadow a % of the traffic to a secundary pool
[17:30:59] <_joe_>	 we might want to do that, but that's the kind of testbed you want
[17:31:48] <_joe_>	 as for the gutter pool, I honestly have no idea what is the current situation. There is a puppet patch, some test results reported on tasks, but I'd have to go and read through them to get a clearer picture of where we are
[17:32:15] <_joe_>	 I'm not a huge fan of that puppet patch, and I agree with most of the observations made by aaron in CR
[17:32:45] <_joe_>	 so I think next week or the one after that, I'll pick up that work, redo testing for the parts where I'm not sure what the status is, and deploy to production
[17:40:47] <cdanis>	 sorry, dumb question, what does "waiting for redis to be dismissed there" mean?
[17:41:34] <chaomodus>	 "that'll do redis, thatt'll do"
[17:43:17] <_joe_>	 cdanis: finish the migration to sessionstore, then reassess if there is any remnant use of it
[17:43:23] <cdanis>	 ack
[17:43:43] <_joe_>	 we've paused as we deemed it a bit dangerous for this period
[17:44:05] <_joe_>	 but if the situation persists, we might decide to take the risk
[17:44:17] <_joe_>	 we == serviceops and core platform
[17:45:08] <cdanis>	 makes sense
[18:17:36] <cdanis>	 elukey: _joe_: https://w.wiki/MvZ I literally just deployed this, lol
[18:18:04] <cdanis>	 I think we'll find that our memcache hosts have micro-bursts all the time
[18:20:21] <bblack>	 nice graph!
[18:21:01] <bblack>	 so long as they're very micro, in theory the upper-layer protocols can smooth that out and handle it without too much hiccup
[18:21:10] <bblack>	 but yeah they're not a great sign
[18:21:37] <cdanis>	 bblack: well, AIUI mediawiki has rather short RPC timeouts to memcached
[18:22:47] <bblack>	 they're all 1G except for the gutter set?
[18:22:55] <apergos>	 that link didn't give me a graph, only the list of dashboards
[18:23:06] <cdanis>	 apergos: log in
[18:23:12] <apergos>	 hm am I not? woops
[18:23:31] <cdanis>	 maybe i should file a FR upstream for a non-logged-in link to /explore to give an error message :)
[18:23:52] <cdanis>	 bblack: that's right, and the gutter set isn't being used anywhere real yet (see just above)
[19:08:54] <marostegui>	 cdanis: with the addition of the new host in wikidata today we're doing a lot better now
[19:08:59] <marostegui>	 considering that db1092 is still depooled
[19:09:04] <marostegui>	 I was just checking the graphs
[19:09:06] <cdanis>	 marostegui: yeah I had meant to check
[19:09:12] <cdanis>	 but was enjoying not seeing alerts :)
[19:09:40] <marostegui>	 cdanis: still quite nice to see 10.4 (db1114 and db1111) performing better than 10.1 (db1126) with more weight even
[19:09:47] <cdanis>	 also good news :)
[19:25:02] <James_F>	 +1
[19:49:35] <apergos>	 awesome!
[19:50:04] <cdanis>	 preliminary results say we have several memcached microbursts an hour
[19:56:05] <apergos>	 interesting
[19:56:16] <apergos>	 (much more interesting once logged in :-P)