[05:31:26] 10DBA, 10Operations, 10netops, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4141772 (10Marostegui) So this is almost confirmed related to atop. I killed it yesterday at around 14:30 and it was remained stopped till 00:00 (where it started automatic... [05:35:32] 10DBA, 10Operations, 10netops, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4141775 (10Marostegui) RX buffers reverted ``` root@db1114:~# ethtool -g eno1 Ring parameters for eno1: Pre-set maximums: RX: 2047 RX Mini: 0 RX Jumbo: 0 TX: 511 Current... [08:01:16] 10DBA: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462#4141903 (10Rduran) a:03Rduran [08:59:44] <_joe_> let me know when everyone's around, and we can start [09:00:04] * volans \o/ [09:00:50] I'm here [09:01:05] I don't think manuel will make it [09:01:16] <_joe_> ok [09:02:07] <_joe_> so, I started thinking about the "databases pooling/depooling on etcd" problem [09:02:29] <_joe_> and I wanted first and foremost confirmation from you that what we're interested in is what follows: [09:02:51] <_joe_> - generate 'sectionLoads' (https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/master/wmf-config/db-eqiad.php#104) from etcd data [09:03:23] <_joe_> - generate 'groupLoadsBySection' (https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/master/wmf-config/db-eqiad.php#259) from etcd [09:04:16] <_joe_> jynus: is that correct? I extracted those as the two parts of that file that you and marostegui modify most [09:04:26] so yes and no [09:04:37] <_joe_> please do tell :) [09:04:48] personally, what I would want is something else, speciall on interface [09:05:12] whether we have to translate into that, I guess will depend? [09:05:29] what we *need* [09:05:50] is a way to pool and depool fully individual servers [09:06:05] and to set which of the pooled servers is the master [09:06:15] <_joe_> set the master from etcd? [09:06:23] yes [09:06:33] <_joe_> ok, I thought you specifically didn't want that [09:06:38] technically what you mention already did that [09:06:45] jynus: let me clarify one thing, when joe means 'modify' it doesn't mean modify those arrays in conftool as they are now [09:06:46] sectionLoads sets the mediawiki master [09:07:02] as the one being defined the first one [09:07:27] whatever we have in etcd will be re-constructed into those structures becaue MW wants that, but we can have more or less what we want in the etcd side [09:07:28] <_joe_> jynus: when you say "depool" you mean "remove from the configuration", right? [09:07:32] no [09:07:43] there is 2 states- added into the config [09:07:47] <_joe_> you mean weight=0 ? [09:07:52] (lines at the end) [09:07:58] and pooled (with load) [09:08:08] actually weight 0 does not depool a server [09:08:08] <_joe_> how do you currently depool a server? [09:08:27] we commend all lines on sectionLoads and groupLoadsBySection [09:08:35] <_joe_> ok, as I suspected [09:08:50] <_joe_> are you interested in being able to change the weights as well? [09:08:53] seting weight to 0 makes it still being pooled by load balancer [09:08:55] <_joe_> via etcd I mean [09:09:02] yes [09:09:15] we can have weight -1 that removes it and weight >= 0 that sets the weight [09:09:16] let me put a priority to all of that [09:09:26] <_joe_> ok [09:09:30] if that is helpful [09:09:41] <_joe_> this is all doable, but yes please [09:10:09] pool/depool > weights per server > master definition > read-only [09:10:28] <_joe_> ok [09:10:30] the problem as it is now, is that pooling/depooling is overly complicated [09:10:37] (aside from static) [09:10:54] each server can be pooled on 6 different "services" [09:10:58] actually more [09:11:03] <_joe_> jynus: so this would all be solved (minus master definition) by the strawman I prepared [09:11:19] 09:10 < jynus> pool/depool > weights per server > master definition > read-only --- agreed! [09:11:20] <_joe_> but I see one problem - you would need to visualize the state of clusters [09:11:28] I don't care how mediawiki does it [09:11:42] but we need a ./depool in the end [09:11:47] for easy orchestration [09:11:58] (e.g. rolling restart/schema changes) [09:12:00] <_joe_> you need a way to see - immediately - what is configured for a specific section [09:12:06] yes [09:12:11] <_joe_> ok [09:12:35] one common error nowadays FYI [09:12:46] is to depool a server from a service [09:12:51] Ideally we should be able to say: ./depool dbXXXX vslow or soomething like that. But for now a general depool/pool would be cool from me [09:12:55] and probably a safe check like pybal to not depool if there are too few pooled [09:12:55] but forget other kinds of traffic [09:13:00] Will simplifly things a lot already [09:13:17] volans: correct, there should always be 1 server for each type of traffic [09:13:26] with exceptions [09:13:39] <_joe_> please let's remain focused :) [09:13:51] <_joe_> the discussion on safety checks can come later [09:13:54] <_joe_> and will happen [09:13:57] _joe_: my point is we may need deep mediawiki changes too [09:14:05] <_joe_> jynus: I don't think so [09:14:07] not just moving variables to etcd [09:14:16] for what? [09:14:30] <_joe_> but let's first concentrate on the desired workflow for you people? [09:14:52] _joe_: I can tell you how we work for those patterns [09:15:02] as "common things we do" [09:15:09] is that helpful? [09:15:15] <_joe_> jynus: yes [09:15:20] <_joe_> very useful [09:15:22] <_joe_> :) [09:15:27] so, roughly in order of frequency [09:15:54] server is bad state or will be for maintenance, depool server - no longer send any traffic to it [09:16:08] server is back and maintenance finished, -pool with low weight [09:16:09] +1 [09:16:31] server is ok, pool with original traffic [09:16:51] (a server may take 1h to 12h hours to be in perfect condition because cache) [09:17:24] Q: in those cases, let's say the server is also in the vslow section, and is the only one [09:17:26] a server is overloaded, spread traffic among other hosts (change weights for several types of traffic) [09:17:41] <_joe_> volans: wait please [09:18:01] I have other kinds of things, I am thinking which is more likely [09:18:11] but on a lesser immediate need: [09:18:28] - swichover/failover the master to another host for maintenance, upgrade or failure [09:18:46] - for the previous, read only has to be set for the section temporarilly [09:19:15] things like adding new servers or removing them can be fully static (git) [09:19:43] <_joe_> yes, adding servers will happen via commits to (for now) both mediawiki-config and conftool-data [09:19:47] that is ok [09:19:55] also for decommissioning [09:20:13] our biggest pain is that a schma change or other maintenace (upgrade) [09:20:23] requires a rolling pool/depool [09:20:28] <_joe_> so, can I focus on the first set of actions? [09:20:32] and now that is very very painful [09:20:34] please do [09:20:39] <_joe_> depool/warmup pool/ full pool [09:20:49] <_joe_> there is a major UX change at hand [09:21:15] Yeah, I agree depool/warm up pool/full pool will aliviate A LOT already [09:21:22] my thoughts were to change the interface from section-based [09:21:24] <_joe_> right now, to make sure you don't screw anything up, you are relying on static arrays you can inspect in your editor, and on code-review [09:21:27] to server-based [09:21:40] at least for our side [09:21:54] <_joe_> once data is in etcd, you don't have that luxury anymore [09:21:55] so instead of saying sectionX has server X, Y and Z [09:22:12] <_joe_> jynus: we'll get to implementation later, bear with me [09:22:16] ok [09:22:28] <_joe_> so what I envision as a workflow is: [09:23:34] <_joe_> - dbconfig get dbXXX[:PORT] gets you all the current configuration of a mysql instance (so either host:port or host only if it's the default port) [09:24:12] <_joe_> - dbconfig list s1 shows what is the current configuration for s1 [09:24:47] <_joe_> - dbconfig depool dbXXX:PORT removes the database for all configurations [09:25:12] (small correction, we should just use label, which is normally host:port, but could be arbitrary, labels are defined staticaly on code) [09:26:08] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/master/wmf-config/db-eqiad.php#516 [09:26:09] <_joe_> - dbconfig pool LABEL repools the server at the weights defined in its configuration [09:26:47] <_joe_> - dbconfig edit LABEL allows you to edit the details of the configuration, including weights [09:27:25] can the configuration be multivalued? [09:27:41] <_joe_> - dbconfig warmup LABEL 0.1 pools a depooled database with weights set to ceil(weights*0.1) [09:27:53] <_joe_> jynus: what do you mean "multivalued"? [09:28:10] a server has a weight for a type of traffic [09:28:14] <_joe_> yes [09:28:35] in other words, it has multiple weights for multiple 'main', 'vslow', etc. [09:28:40] <_joe_> so my original strawman for the etcd schema was https://gerrit.wikimedia.org/r/#/c/422373/ [09:29:01] now, given this status, I mentioned that we would loose the cluster view of things, that I think is important when moving things around [09:29:22] <_joe_> volans: yes, that's why I think we should decide safety checks [09:29:35] <_joe_> there is another thing in my proposal that I wanted confirmed [09:29:52] <_joe_> a single LABEL can refer to multiple sections, right? [09:30:19] letts call it instance, label is not really the official name [09:30:33] <_joe_> ok [09:30:41] <_joe_> can one instance serve multiple sections? [09:31:00] not normally, but it can happen [09:31:08] <_joe_> if so, I guess we need "depool/pool" to be able to act on all sections or just one [09:31:17] but it would be ok [09:31:24] to depool only 1 section in that case [09:31:31] that is a very special case [09:31:40] <_joe_> ok [09:31:45] like when moving a wiki from one to the other [09:31:56] if we go towards multi-instances a host:port should belong to only one section AFAIUI [09:32:03] or creating a new section [09:32:10] volans: yes, but there are cases where it can happen [09:32:22] when we created s8 by splitting s5 [09:32:33] hosts were at the same time on s8 and s5 [09:32:36] <_joe_> anyways, let me list the safety checks I would like to add, and tell me if more are neededL [09:32:38] sure [09:32:49] volans: it should be no the norm [09:33:13] <_joe_> Any action we take should guarantee that the following conditions still hold valid: [09:33:16] _joe_: how easy is to edit the schema afterwards [09:33:21] ? [09:33:37] <_joe_> jynus: not very hard, just need care :) [09:33:42] e.g. new sections can be added [09:33:53] one extra clariffication [09:33:57] before you go on [09:33:57] <_joe_> sections can be added easily [09:34:13] there is additional sections that are hidden [09:34:45] <_joe_> ? [09:34:47] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/master/wmf-config/db-eqiad.php#666 [09:35:07] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/master/wmf-config/db-eqiad.php#10 [09:35:24] those are sections like the others from our point of view, but are defined on other arrays [09:35:26] <_joe_> ok, those sections are separated from the main databases [09:35:35] <_joe_> for now [09:35:40] es1, es2, es3 and pc1,2,3 [09:35:47] <_joe_> we can convert them later I guess [09:35:51] just saying that those eventually will have to be handled [09:35:58] but will be more of the same [09:36:05] <_joe_> yeah, we'll bend those use-case to the general one I guess [09:36:24] <_joe_> so, back to safety checks, I would say that after an edit we have to ensure: [09:36:30] <_joe_> - a section has a master [09:36:39] <_joe_> - a section has at least N instances [09:36:51] <_joe_> - every group has at least 1 instance per section [09:37:04] <_joe_> group as in vslow, etc [09:37:20] let me thing about that [09:37:33] as that is trivially true for normal cases [09:37:42] I am thinking bad states now [09:37:42] <_joe_> if any of these conditions is not met (we can add/remove any) we refuse an edit [09:37:51] _joe_: I would say also groups should have N configurable minimum instances [09:37:56] <_joe_> unless the user passes the magic --bblack command-line swithc [09:38:02] and whether if we should enforce on schema [09:38:12] or just give a warning on edit [09:38:38] <_joe_> jynus: I would say we add a --force switch to the command line to let you shoot yourself in the foot in case of need [09:38:43] <_joe_> bypassing all safety checks [09:38:49] In particular, "- every group has at least 1 instance per section" is not true right now [09:38:55] and that is as a normal thing [09:38:55] <_joe_> also, you can use confctl directly, which bypasses all that [09:39:09] <_joe_> jynus: ok so we can remove that safety check [09:39:19] for example, s3 (small wikis) do not really need different groups because how small wikis are [09:39:25] so we consolidate on just 4 servers [09:39:28] <_joe_> ok [09:40:03] of course, that could be changed to have an arbitrary host, or all of them [09:40:07] <_joe_> I have enough material to make you a complete and concrete proposal, I think [09:40:10] for the sake of normalization [09:40:16] that's why I think groups should have a configurable N of hosts, maybe tomorrow for s1 we need at least 3 recentchanges, who knows [09:40:35] <_joe_> volans: yeah we can do that adding an object per section [09:40:54] _joe_: I think checks should be configurable outside of the intrinsic schema [09:41:08] so it is easy to add exceptions without touching it [09:41:53] _joe_: also outside of etcd, interfaces to see the state [09:41:54] <_joe_> jynus: yes, my idea was you have a "section s1" object where you define things like: which instance is master, minimum number of general instances, minimum number for each group [09:42:11] checks can be done on 2 levels, basic ones with json schema to enforce that the schema is formally correct, and then with code for more complex logic checks (joe correct me if I'm wrong) [09:42:13] <_joe_> jynus: yes, that's relatively easy I think [09:42:17] basically, https://noc.wikimedia.org/conf/highlight.php?file=db-codfw.php [09:42:28] <_joe_> volans: yes that's what I was talking about [09:42:38] (important, remember there is 2 configuration, one per datacenter at the time) [09:43:00] I think that is already in place for mediawiki [09:43:02] <_joe_> jynus: basically the output of "dbconfig show all" :) [09:43:07] so probably it is trivual to do [09:43:08] yeah [09:43:27] I would also search the feedback from aarong (I can ask him) [09:43:30] <_joe_> so my first strawman for mediawiki was https://gerrit.wikimedia.org/r/#/c/422374/ [09:43:34] as he is the reciver of all of that [09:43:45] <_joe_> we will have a more refined version of that in some time [09:43:53] as in, his loadbalancer is the one that takes that and uses it [09:43:57] <_joe_> I'll write a ticket with all the info I gathered now [09:44:13] jynus: as of now the proposal is to have MW see exactly the same structure [09:44:23] just that the data comes from etcd instead of the static file [09:44:23] _joe_: regarding workflow [09:44:24] <_joe_> yes, mediawiki will not change at all [09:44:37] should we test on a single, small section first? [09:44:49] <_joe_> jynus: in beta, even [09:44:53] of course [09:44:56] <_joe_> and then on a small section, yes [09:44:56] I mean after that [09:45:01] on production testing [09:45:12] so normally we go for s6 and s2 [09:45:19] <_joe_> we can even switch everything only on the debug servers only [09:45:27] that would be cool [09:45:41] as this is the kinf od thing we find issues after a long time [09:45:53] <_joe_> yeah [09:45:59] then go an test maintenance cycles with it [09:46:03] <_joe_> you want to play with a sandbox [09:46:05] automation,etc. [09:46:06] <_joe_> makes sense [09:46:27] do you have 5 minutes to review one thing [09:46:33] that is not technically related to it [09:46:38] but in a way it does [09:46:50] (unless you want to ask more questions) [09:46:55] and please ask for help [09:46:56] Q: I know you're designating some hosts to be delegated masters in the sense that should be the first host to look in case of a master failure and have STATEMENT binlongs, is that an information that would be useful to have in the dbconfig output from etcd? [09:47:09] volans: I don't think so [09:47:16] these are comments for us [09:47:28] so we pre-select and prepare in case of an emergency [09:47:31] <_joe_> jynus: no I have all the info I needed [09:47:40] it would be nice to have may comments on etcd? [09:47:43] mark, jynus I have changed the meeting for tomorrow same time. [09:47:52] <_joe_> it looks like volans and I have some coding to do, but this is generally already in the direction we had [09:48:20] as we loose the "# broken, do not repool"? [09:48:29] maybe a comment on the host entry? [09:48:36] that make sense [09:48:39] *makes [09:48:40] or something on the text configuration version? [09:48:41] <_joe_> jynus: heh, fair enough [09:48:53] I assume that is trivial [09:49:09] and not be used for mediawiki [09:49:37] <_joe_> yes [09:49:45] <_joe_> mediawiki will just get the data structures [09:49:51] so I wanted to talk a bit of high level [09:50:03] about how we do database configuration [09:50:15] etcd is part of it, but not the only part [09:50:19] Q: from the operational point of view is it ok to just have a way to see the cluster status and then edit a single db object and then re-check the cluster status or do you think you need some sort of cluster-diff before saving the change? [09:50:41] <_joe_> volans: MVP please [09:50:44] <_joe_> :P [09:50:59] _joe_: I didn't say right now, but to understand the mid-term plan ;) [09:51:03] <_joe_> yes that would be nice for sure, but maybe 2nd iteration? [09:51:24] <_joe_> ack [09:51:26] So I created P6953 [09:51:44] https://phabricator.wikimedia.org/P6953 [09:52:03] <_joe_> oh that's juicy [09:52:13] and want to know your opinion on it [09:52:22] the title is more alarming that it really is [09:52:27] so I can catch your attention [09:52:34] (which apparently, I did) [09:52:42] <_joe_> I generally agree with the thesis - puppet sucks for configuring databases [09:53:18] the main issue right now is that when we reimage and change,eg. a section [09:53:29] I have to update 5 things [09:53:49] 2 on puppet, 1 on internal lists, 1 on mediawiki, and 1 on internal monitoring [09:54:31] we also have moved to a model where there is an arbitrary number of instances per server [09:54:47] and we do manage ganeti instances or kubernetes on puppet [09:54:57] I want a paradigm shift for dynamic things like that [09:55:03] *we don't [09:55:25] were things are managed, but not staticaly [09:55:30] <_joe_> yes, we are in the middle of a paradigm shift I tried to start 3 years ago :) [09:56:19] accounts are hell, everyvody adding them without tracking where [09:56:29] <_joe_> so, from a quick skim of what you wrote, it seems like you want to automate procedures, right? [09:56:33] yes [09:56:45] and for that, I need to move (some) away from puppet [09:56:47] <_joe_> so for accounts, I'm not sure that's "dynamic configuration", to be honest [09:57:11] it may be not, but it is the same issue [09:57:18] <_joe_> the problem is that puppet (and other tools like it) suck at managing those [09:57:25] puppet sucks ... exactly [09:57:35] <_joe_> in theory, you'd like db grants and users to be declared in say a yaml file [09:57:41] I mentioned a related issue with tracking relationships between roles [09:57:50] X need an account on service Y [09:58:03] (it may be also how way of using puppet) [09:58:04] <_joe_> and to have a tool to ensure they're like that [09:58:19] yes, that would work [09:58:26] but also track ips of origin servers [09:58:31] and hosts service those services [09:58:37] both of which are constantly changing [09:58:52] e.g. mediawiki servers need access to core databases [09:58:59] <_joe_> origin servers == applications? [09:59:19] but there is not a good way right now to track those set of servers and create the relationship [09:59:26] this is only an example [09:59:27] * volans has to jump on another meeting but will read backlog later [09:59:41] and I do not have a firm proposal [09:59:46] except to tool it separately [09:59:52] or [09:59:57] <_joe_> well, actually puppet would help for that [10:00:04] add the functionality to puppet [10:00:11] for accounts it is feasable [10:00:19] for others, isn't (topology changes) [10:00:20] <_joe_> but we'd need to invest significant time in building what is needed [10:00:40] <_joe_> the one thing you can't do via puppet is what needs coordination between nodes [10:00:40] you cannot track sensibly master-slave on puppet [10:00:47] <_joe_> puppet is not designed for that [10:00:47] I agree [10:00:58] that is why I adde account handling to the issue [10:01:23] <_joe_> what you can do via puppet is gathering the list of ips of all nodes that run a specific service, for instance [10:01:24] as a DBA, my initial thoughts is to create a database + monitoring, but I am biased [10:01:55] <_joe_> but that's just a puppetdb query away from any tool, too [10:02:29] let me give you a concrete example [10:02:37] very quickly [10:02:59] I want to build an inventory app to track table schema status [10:03:11] and setup rolling schema changes based on that cached state [10:03:38] <_joe_> ok that definitely cannot be managed via puppet [10:03:40] so a database (which is the state of all production tables) + a monitoring (retrieving that regularly) [10:04:06] same for topology changes, that can only be seen by checking the current state of the instances [10:04:28] but doing that, means that we no longer track mariadb::master [10:04:46] but mariadb (with state master, sometimes replica) [10:04:51] <_joe_> good candidates for management outside of puppet are things that have the following features: [10:04:58] puppet keeps installing packages, config, etc. [10:05:02] <_joe_> - changes are internal to mysql [10:05:13] <_joe_> - they require cross-node coordination [10:05:18] but read only state and replication is handled and tracked outside of puppet [10:05:36] <_joe_> yeah replication topology checks both conditions [10:05:42] <_joe_> schema changes, too [10:05:44] same for provisioning [10:06:06] is the host empty? clone it from the backups server! [10:06:23] <_joe_> if we had the switchdc spinoff, most of those things would be almost trivial... [10:06:35] <_joe_> I have the same issues with etcd, by the way [10:07:04] <_joe_> I want to manage failovers, rolling restarts, full backup/recovery of the cluster from a disaster [10:07:10] <_joe_> switching replica direction [10:07:25] <_joe_> all things that I could mostly solve if I had that spinoff [10:07:26] basicaly make our bare meta install a bit more cloud-like [10:07:28] <_joe_> :) [10:08:13] "these servers are databases", but exactly the role ther are doing is dynamic based on needs [10:08:29] <_joe_> yes, I get it. For the provision part, that can be solved by having a systemd timer set OnBootSec that runs a script that does that check :) [10:08:33] so in a nutshell that is my biggest issue with puppet roles [10:08:36] <_joe_> you mean the sections installed as well [10:08:45] not that the style is wrong [10:09:06] is that things that are roles right now, shouldn't be on puppet [10:09:11] <_joe_> your problem is you want to change and mix the way machines are configured on the fly [10:09:18] not fully [10:09:41] just content and a small subset of the config/state [10:09:54] <_joe_> you consider those servers as the PaaS for your mysql services, let's say [10:10:07] again, not the full way [10:10:14] but to some extent [10:10:30] e.g. not need full configurability [10:10:31] <_joe_> yeah, I got it, or I would've proposed to use kubernetes and statefulsets :D [10:10:52] just here is a database server, most of the config on start is the same [10:11:12] but how mediawiki uses it, and if it is a master or a replica, it is dynamic [10:11:50] <_joe_> so it's ok for the list of instances for a server to be static [10:12:09] that is the part I am not 100% sure [10:12:19] in most cases, yes [10:12:22] <_joe_> and the correspondance instance <=> section too, I guess? [10:12:44] there may be some cases where moving instances around, long term, may be needed [10:12:55] specially now that we are going multi-instance [10:13:07] an backups with autoprovisioning [10:13:17] but that is not an immediate need [10:13:28] for now, the number if instances is fixed, but different [10:13:40] <_joe_> so, I think I get your issue with how we define "roles" in puppet [10:13:42] so I put them on hiera, but I don't like that [10:13:54] and I am not following the style [10:14:18] (eg. dbstore2001 and dbstore2002 have an arbitrary number of instances) [10:14:42] the more immediate change [10:14:46] <_joe_> in this scenario, you basically want a generic profile::mariadb::multiinstance applied to your servers, say, and have hiera define which ones, per host [10:15:05] would be all hosts are "core" (for mediawiki) [10:15:19] and master/replica is handled somewhere else [10:15:27] <_joe_> I don't think it's bad per se, you are managing your infrastructure under different premises than the rest of it [10:15:47] as I said, like kubernetes or VMs state [10:15:56] with many buts [10:16:09] and simplifying things [10:16:18] <_joe_> and you could keep the role/profile structure (which is good imho), and just use hiera on a per-host basis, or via (ugh) regex.yaml [10:16:29] well, [10:16:50] <_joe_> this is quite reasonable overall, and we can remove things from puppet's preying hands if we need to [10:17:55] I don't think you will like this: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/hieradata/hosts/dbstore2001.yaml [10:17:58] <_joe_> to be honest, I never got before what your vision was, hence the confusion on the topic [10:18:48] (forget about the actual structure, I mean having one of those per host) [10:18:50] <_joe_> jynus: no I think with the help of puppet4's magic we can find a way for you to define a single hash of properties, basically, to feed to a common-purpose profile [10:19:06] <_joe_> jynus: I think it's ok if an host is unique [10:19:19] that is another possibility, not sure if the best option either [10:19:21] <_joe_> and it's logically not tied to others [10:20:15] so at least I think I was able to trasmit my needs and make you understand my vision [10:20:24] no need to seach for a solution now [10:20:31] but you have now the background [10:20:39] <_joe_> yes, so I would ideally see a puppet structure for this: [10:21:06] and that is the part that puppet can do [10:21:14] <_joe_> - role::mariadb (includes standard, profile::firewall, ..., profile::mariadb::multiinstance) [10:21:20] I belive there are others that will have to be handled separatelly [10:21:39] (account managemnt and tracking, topology) [10:21:55] e.g. etcd may be helpful not ony for mediawiki [10:22:09] <_joe_> - profile::mariadb::multinistance (includes firewalling, monitoring for all instances defined via hiera, and defines all the corresponding mysql::instance entries) [10:22:12] but also as the reference for many other things, even if they are triggered by puppet [10:22:29] <_joe_> s/mysql/mariadb) [10:22:31] _joe_: that is almost done [10:22:44] of course, there is a lot of pending migration [10:23:12] so many roles that have to be moved to profiles [10:23:24] duplication between core and core_multiinstance [10:23:28] <_joe_> and for provisioning, you can really solve it by adding a script that runs OnBootSec and runs after the corresponding mysql instance is up [10:23:30] artifacts of the migration [10:23:33] yes [10:23:40] I am ok with puppt triggering changes [10:23:42] <_joe_> and that checks if the db is empty and provisions it [10:24:00] but then they have to check on a script logic beyond puppetdb [10:24:19] <_joe_> the script is local to the machine, puppet doesn't trigger it [10:24:22] (e.g. etcd- what does this instance should contain?) [10:24:34] <_joe_> but it can configure it [10:24:35] the script can be provisioned by puppet [10:24:40] exactly [10:24:47] <_joe_> both the script and its configuration [10:24:50] I think you get my vision [10:25:06] <_joe_> yes, I think most of this is doable with a decent amount of effort [10:25:08] the exact details depend [10:25:19] there are things like provisioning that can wait for a puppet run [10:25:39] other things like monitoring, in most cases, should be in sync with the actual configuration [10:26:11] <_joe_> oh yeah, devil is in the details and all that [10:26:26] so with that I hope you better understand my complains [10:26:41] which you seem to agree with once I express myself better [10:27:31] <_joe_> I just didn't understand the base of that train of thoughts [10:27:45] <_joe_> because I was looking at what we have today [10:28:00] yeah, I also see that if you are doing a lot of apache work [10:28:01] <_joe_> and that could fit easily a different scheme [10:28:13] This has probably already been looked at and I missed it… but labsdb1011 is terribly lagged [10:28:15] (apache here means application servers) [10:28:22] hoo: yes [10:28:49] <_joe_> jynus: yes, I'm trying to standardize those machines further [10:28:58] you have roles, I guess, like api, imagescalers, etc. [10:29:01] Ok :) [10:29:16] which are "roles" different than "s1 for vslow" [10:29:50] here I am talking about the general concept of role [10:29:55] not what we use for puppet [10:30:24] so we reuse the same word, and confussion starts [10:31:05] but yes, I am lately using MySQLAsAService really meaning it [10:31:41] so thing like "misc servers" in the future are really hardware resources for many different services [10:32:46] <_joe_> which is basically what we do with kubernetes [10:33:02] but less hardcore [10:33:03] <_joe_> only difference is, you get no benefit from being able to spin up 100 instances in 1 minute [10:33:10] we do not need that [10:33:14] <_joe_> and you get a huge penalty for running into a container [10:33:22] just be less static that we are now [10:33:28] just a bit [10:33:38] <_joe_> we don't need that also because how long does it take to load a dump into a db? [10:33:45] a lot [10:33:59] but the oppsite extreme [10:34:11] is one hw server == instance, forever [10:34:17] <_joe_> yes [10:34:22] we need a bit more flexibility there [10:34:24] <_joe_> which is where we're coming from more or less [10:34:41] <_joe_> if we ordered hardware by the 100s every quarter [10:34:47] <_joe_> it could be an handy abstraction [10:35:35] so I just want the automation to do that, and we are actually building it right now [10:37:41] https://phabricator.wikimedia.org/diffusion/OSMD/browse/master/wmfmariadbpy/ is starting to get interesting [10:39:13] let's call it a good meeting and we will keep in touch [10:39:20] <_joe_> yes! [11:04:06] 10DBA, 10Cloud-Services: Prepare storage layer for euwikisource - https://phabricator.wikimedia.org/T189466#4142395 (10Jayprakash12345) 05stalled>03Open [11:05:23] * volans back and read backlog [11:05:57] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare storage layer for lfnwiki - https://phabricator.wikimedia.org/T183566#4142403 (10Jayprakash12345) 05stalled>03Open [11:06:06] <_joe_> I'm taking a break now [11:06:21] re:spinoff, I've tried (failing) to get it into the last 2 or 3 quarterly goals, let's see if by any chance I'm able to get debmonitor out early enough this Q and get some time for it [11:06:43] what do you mean with spinoff? [11:07:23] switchdc spinoff to easily code orchestration tasks [11:07:30] that joe mentioned earlier in the backlog [11:07:41] ok, not a huge deal right now [11:07:49] until everthing else can be automated [11:08:09] volans: quick question [11:08:28] it is in general though, many others need that ;) [11:08:29] sure [11:08:39] would it be feasable to create a cumin sql transport? and where should we look at? [11:09:08] would it need a lot of changes, as it really is not a remote execution thing? [11:09:19] not that easy in the way you want it (having back sql client objects) [11:09:44] should we go on the direction of its own thing? [11:09:56] there are basically 2 ways of doing it, and in both cases we need to manage ourselves the parallelization [11:10:10] one is multithreads, easier but with more limitations [11:10:13] ah, because clusterssh does taht for you? [11:10:30] the other is multiprocess, harder to get back python objects [11:10:56] well, compare.py already does multiple calls at the same time [11:11:02] yes it does the multiprocess stuff and async io but parses strings, not python objects from the client [11:11:18] why would I need python objects? [11:11:25] a third way is if there is an async mysql client ofc [11:11:33] you told me that you don't want to parse mysql output [11:11:40] but have mysql client objects to play with [11:12:04] I don't understand what I mean, that probably was my fault [11:12:22] but I guess I meant I could need to maintain a connection [11:12:46] what about sharing code for the querying? [11:13:05] the puppet to get list of servers? [11:13:19] then do the rest on its own? [11:13:40] I guess that also doesn't make much sense as we want instances, not hosts [11:14:17] I think I would use cumin to do remote calls and that should be enough [11:14:19] I'd like to add a mysql transport to cumin, let's just design it properly, as I'd like to add one for conftool, that will give you instances for example [11:14:29] 10DBA, 10Operations, 10netops, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4142413 (10Marostegui) No more errors for the last 6 hours after killing atop. Also no drops or connections errors running the RX original buffers after reverting them as c... [11:14:57] so you would centralize the knoledge on etcd? [11:15:12] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare storage layer for lfnwiki - https://phabricator.wikimedia.org/T183566#4142428 (10Urbanecm) The wiki was created. [11:15:22] I'd like to be able to query for hosts with role MW that are pooled for example [11:15:23] not sure about that- if we should only do it for mediawiki databases [11:15:43] we use conftool for many things, and manage the live state of pooled/depooled there [11:15:55] being able to query for that will be useful in many cases I think [11:16:05] oh, query it, yes [11:16:14] yes as a backend [11:16:14] I don't disagree with that [11:16:26] nore sure if it should be the only backend, or the direct one [11:16:30] *not [11:16:45] e.g. something else that caches etcd [11:16:45] not the only, cumin can mix queries from multiple backends [11:17:12] I will have to think more about what I want to do [11:17:24] and then come back with a proposal [11:17:27] for your doubt of before there are 2 ways [11:17:40] either we do ssh + mysql and get a string output [11:17:53] nah, I can do that already [11:17:54] or we do mysql client directly and manage python objects [11:17:57] and I don't like it [11:18:02] I understood you wanted the latter [11:18:03] and I do not like that either [11:18:07] lol [11:18:14] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare storage layer for lfnwiki - https://phabricator.wikimedia.org/T183566#4142430 (10Marostegui) a:03Marostegui This needs to be filtered. Assigning it to myself to indicate it is blocked on me before this can be handed over to #cloud-services-team [11:18:28] the thing is, the use case is a bit different [11:18:34] what's wrong in the latter? [11:18:39] it is not only for "remote eecution" [11:18:46] I may also want monitoring [11:19:04] which I may do by caching other sources of truth [11:19:15] and automating on top of that [11:19:24] including puppetdb/cumin [11:19:48] e.g. I check what host there are, I discover the instances using cumin [11:19:54] but then I query those directly [11:20:13] ok, but if you think of a mysql transport for cumin, I guess you want that cumin opens multiple mysql connections and allow you to perform stuff over them, is that correct? [11:20:16] so one tool on top of the other [11:20:37] yes, but you say it is not trivial [11:20:55] or an ugly hack (ssh + mysql) [11:21:15] I prefer to do the ugly hack and hav, eg. local toos [11:21:20] it's just that we need to handle the parallelization ourselves, we don't get it for free from clustershell [11:21:42] eg. cuming runs check_health.py locally [11:21:46] have you tried aiomysql by any chance? [11:21:50] that's an option too [11:22:24] note that unlike remote execution [11:22:35] we don't need to do fancy things [11:23:19] and I don't have a problem with multiple threads, given we have less than 200 hosts [11:23:27] 200 instances, actually [11:23:32] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare storage layer for lfnwiki - https://phabricator.wikimedia.org/T183566#4142439 (10Marostegui) This has been filtered. I see a new created user (myself) that has all the critical data redacted on labs hosts. I am going to run a check private data to make comple... [11:23:50] the issue is the inventory [11:25:22] anyway, I need to think more about this [11:25:35] me too! [11:26:01] if the needs are too dissimilar, I think it is ok to be independent [11:26:24] and just share the "knoledge" (e.g. etcd information) [11:27:07] you can always import cumin and do a query with 2 lines of python to get the hosts, although right now you will not have the instances [11:27:34] also for things like etcd, I think it should be the place for mediawiki reference data [11:28:47] but it is common to have databases outside of mediawiki with its own handling (infrastrcture information vs. application information) [11:29:25] e.g. mediawiki should now know about backup hosts [11:29:32] *not [11:29:37] etcd is our live-state key-value store, I don't see why those couldn't be there in different object types [11:29:43] independent from mw [11:29:50] but infrastructure level should not where to search for those [11:30:05] sure, but those are not critical to be dynamic [11:30:27] but between puppet and etcd I guess you'll get all of them [11:30:46] I call puppet "infrastructure information" :-) [11:30:46] either they are on puppet or on etcd I guess we'll not introduce a third way [11:30:51] :) [11:31:31] actually my conversation with joe was to have some of that somewhere else [11:31:38] where, I don't know [11:32:22] I see it a bit differently, I think that what you need is an orchestrator that applies those changes in a programmatic and safe way across a fleet that needs coordination [11:32:34] oh, I agree [11:32:37] the source of truth can still be provided by puppet or etcd [11:32:45] but that orchestrator needs a state [11:32:46] based if they are static or dynamic data [11:33:05] does it? or can it just enquire the fleet to get it? [11:33:05] I am not onboard (yet) to use etcd for everthing [11:33:18] e.g. we have prometheus with state [11:33:29] we can query prometheus to see if a host is up [11:33:51] there are things beyonf configuration [11:33:58] (mostly monitoring) [11:34:00] sure [11:34:15] btw that is another backend I'd like to add ;) [11:34:24] give me instances based on a prometheus query [11:34:27] I was talking before of "applying schema changes automatically" [11:34:54] that will unlikely be controlled by none of the 3 mentioned [11:35:17] or topology changes [11:35:28] yes, this adds quite some complexity that probably needs it's own thing that uses the other tools [11:35:37] I guess topology could be on etcd [11:35:47] but you get the idea, the backed is the least problem [11:35:51] sure [11:35:54] how to store it is [11:36:13] as long as things are documented and interoperable [11:36:25] e.g. cumin can query it [11:37:01] sure [11:37:45] I think when possible we should have stateless things, you have an required configuration/topology/etc. and you check the current live state in the fleet [11:37:54] without keeping a state of the current status [11:38:04] yes, that is possible [11:38:12] for schema changes this is not possible of course and you'll need to keep the state somewhere during the process [11:38:19] but I some cases cache is most likely required [11:38:25] exactly [11:38:31] and as long it is understood it is cached [11:38:35] and not the real thing [11:38:36] yep [11:38:38] I think is ok [11:38:47] it is just a "tool" cache [11:38:53] not a source of truth [11:39:11] I think we are in sync now [11:39:25] agree [11:39:30] same happens with etcd, it will be ok to cache it [11:39:37] for non-vital operations [11:40:07] cache output of etcd? [11:40:28] yes, mediawiki does it, for example [11:40:39] for 10s [11:40:41] other tooling could do it [11:40:50] but I would advice against it [11:40:59] better to watch the keys [11:41:03] again, depends on the context [11:41:04] and be notified when they change [11:41:10] not everthing requies real time state [11:41:23] and remember you were pushing to extend etcd usage to other things [11:41:34] you were the one doing that [11:41:57] sure, what I mean is that to have an up to date thing from etcd you don't need to query all the time, just query once and then watch [11:42:04] for modifications [11:42:04] I was the one suggesting to use e.g. prometheus, which as 1 minute granularity for e.g alerts on high load [11:42:27] I think we have in mind different applications [11:43:43] "show a tree of db topology on a web" can be heavily be cached [11:43:56] "perform a master failover" cannot :-) [11:44:11] ofc :D [11:47:42] I'm going to get something for lunch, to be continued, lot of interesting ideas and applications [12:40:26] 10DBA, 10Operations, 10netops, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4127027 (10BBlack) >>! In T191996#4139205, @Marostegui wrote: > For the record, the irq for eno1 is balanced across CPUs, so I don't think it is the bottleneck here: > ```... [12:48:02] No matter what I do, deleteAutoPatrol logs in commonswiki gives me slow timer (on read, write is fast) [12:48:03] [Thu Apr 19 12:46:56 2018] [hphp] [23185:7f5766835200:0:000005] [] SlowTimer [21390ms] at runtime/ext_mysql: slow query: SELECT /* DeleteAutoPatrolLogs::getRows */ log_id FROM `logging` WHERE log_type = 'patrol' AND log_action = 'autopatrol' AND (log_id > '156606681') AND (log_timestamp < '20180223210426') LIMIT 1000 [12:48:14] Even turned batches to 100, still the same time [12:49:43] that is ok [12:50:41] oh okay, also it will hopefully gets reduced as the table gets smaller (fingers crossed) [13:33:53] 10DBA, 10Operations, 10netops, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4142669 (10Marostegui) >>! In T191996#4142547, @BBlack wrote: > > Not that it's probably the issue here, but this probably isn't ideal. If you look at `grep eno1 /proc/in... [13:40:32] 10DBA, 10Operations, 10netops, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4142677 (10Marostegui) 05Open>03Resolved a:03Marostegui So, as soon as I started atop, errors came back and packets dropped. So the culprit is clearly `atop`. I am go... [13:43:12] 10DBA, 10Operations, 10netops, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4142681 (10Marostegui) [14:01:07] 10DBA, 10Operations, 10netops, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4129002 (10Marostegui) [15:09:05] 10DBA, 10Cloud-Services, 10cloud-services-team, 10User-Urbanecm: Prepare storage layer for lfnwiki - https://phabricator.wikimedia.org/T183566#4142939 (10Marostegui) a:05Marostegui>03None Everything looks redacted on labs hosts, so all good! This is now ready for #cloud-services-team to create the views. [15:41:19] 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team, 10Patch-For-Review: Configure Toolforge replica views and dumps for the new MCR tables - https://phabricator.wikimedia.org/T184446#4143067 (10Anomie) Someone needs to run https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+... [15:45:51] 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team, 10Patch-For-Review: Configure Toolforge replica views and dumps for the new MCR tables - https://phabricator.wikimedia.org/T184446#3883142 (10Marostegui) >>! In T184446#4143067, @Anomie wrote: > Someone needs to run https://gerrit.wik... [18:35:56] 10DBA, 10Collaboration-Team-Triage, 10StructuredDiscussions, 10Patch-For-Review, 10Schema-change: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#4143759 (10Catrope) >>! In T149936#4132732, @Marostegui wrote: > @Catrope let me know about the one in s3 (it has no writes since 2015... [19:02:25] 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team, 10Patch-For-Review: Configure Toolforge replica views and dumps for the new MCR tables - https://phabricator.wikimedia.org/T184446#4143845 (10Anomie) I can do it myself too, if @Bstorm doesn't want to. [20:28:10] 10DBA, 10Cloud-Services, 10User-Urbanecm, 10cloud-services-team (Kanban): Prepare storage layer for lfnwiki - https://phabricator.wikimedia.org/T183566#4144141 (10bd808) Ready for the steps described at https://wikitech.wikimedia.org/wiki/Add_a_wiki#Cloud_Services [23:38:10] 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team, 10Patch-For-Review: Configure Toolforge replica views and dumps for the new MCR tables - https://phabricator.wikimedia.org/T184446#4144780 (10Bstorm) I think I'm supposed to hang back in cloud-land rather than pushing out production c...