[09:44:01] Blocked-on-schema-change, DBA, Operations, Revision-Scoring-As-A-Service, and 3 others: Remove oresc_rev index - https://phabricator.wikimedia.org/T140803#2479251 (Gehel) p:Triage>Normal [11:48:00] Blocked-on-schema-change, DBA, Operations, Revision-Scoring-As-A-Service, and 3 others: Remove oresc_rev index - https://phabricator.wikimedia.org/T140803#2479599 (Ladsgroup) Here's one of queries: ``` mysql> explain SELECT /* SpecialRecentChanges::doMainQuery Someone */ rc_id,rc_timestamp,rc_us... [13:04:25] DBA, Wikidata: Migrate wb_items_per_site to using prefixed entity IDs instead of numeric IDs - https://phabricator.wikimedia.org/T114904#2479869 (daniel) [13:04:36] DBA, Wikidata: Migrate wb_terms to using prefixed entity IDs instead of numeric IDs - https://phabricator.wikimedia.org/T114903#2479871 (daniel) [14:24:51] hi [14:25:02] hi [14:25:08] this tends to be quieter [14:25:18] so first question: [14:25:20] yeah operations is botfest [14:25:30] (first answer to your question) [14:25:43] we have both single instances with multi-source replication [14:25:54] and multiple instances per host [14:26:02] the usual identifier we use [14:26:05] is host:port [14:26:16] 99% are on 3306 [14:26:24] but we have a few that are not [14:26:35] and this is not legacy, we will probably increase that [14:26:45] have https://tendril.wikimedia.org/tree handy [14:27:07] ok! [14:27:15] as you can see on that tree [14:27:29] we have the shards on top [14:27:44] but a single host may appear multiple times [14:27:59] e.g. db1047 is on both s1, s2 and m4 [14:28:23] but only has a single instance [14:28:38] db1069 has 7 instances in the same host [14:28:45] ports 3311-7 [14:29:09] I will probably repeat this pattern soon, in cases where isolation > performance [14:29:24] so a mysql instance is hostname:port [14:29:49] ah ok, so in the case of multiple mysql instances per machine we'd run multiple mysqld exporters, one per instance [14:29:51] which is exactly the identifier we use for it https://tendril.wikimedia.org/host/view/db1057.eqiad.wmnet/3306 [14:30:00] having said that [14:30:15] right now, core production is all one instace per host [14:30:34] so it would not be a huge issue if our monitoring didn't support that [14:30:55] but allow me to present *ALL* cards beforehand, ok? [14:31:03] sure, thanks that's helpful! [14:31:24] the second question [14:31:35] which is more acurate? is difficult to say [14:32:02] mediawiki is ok, because if it is wrong in mediawiki, we get errors [14:32:13] but there are many hosts that are not in mediawiki [14:32:22] because they are non-core [14:32:26] labs [14:32:29] analytics [14:32:30] misc [14:32:42] or just because they are temporarilly down or depooled [14:32:56] so it is the source of truth for pooled instances [14:33:14] but does not know anything about unpooled ones [14:33:20] then we have puppet [14:33:42] all puppet mysql hosts have a variable "$shard" [14:33:59] however, puppet is initial config, it is not "live data" [14:34:22] e.g. right now I have some hosts with puppet running but mysql is not [14:34:35] probably puppet is the right place [14:34:47] if mysql is not running, we do not care if monitoring doesn't work? [14:35:03] I guess we care if mysql isn't running but should [14:35:15] hence the last bit [14:35:19] finally we have tendril: tendril does not have bogus data [14:35:34] it has all live data- it generates the replication trees dynamically [14:35:48] so even if for some reason puppet or mediawiki is wrong [14:36:09] tendril doesn't care, it shows the reality: X is a slave of this shard [14:36:22] ah, interesting, so it isn't configured statically [14:36:28] very useful for current status [14:36:39] think orchestration without the orchestration part :-) [14:36:53] hehehe indeed [14:37:07] now, the question is, what do we want? [14:37:12] but there's no way to know if what tendril shows is correct or not, as in if it should be like that [14:37:42] no, for example, when changing the master [14:37:56] of a shard, tendril may be outdated until I manally set the new master [14:38:11] mediawiki is there the "reality" [14:38:28] but first thing we are going to do in the future [14:38:40] is to move pooling/depooling status to etcd [14:38:56] in any case, do we care about master/slave? [14:39:13] because shards do not change easily, it mostly requires a reimage [14:39:33] which as you may understand, is is like starting with a new server [14:40:03] I think I never have changed the shard of a host ( I may be do it now for the first time, after the latest server purchase) [14:40:17] mhh yeah, I wanted to ask you about the case of multiple shards on a single mysql instance, what should the monitoring report in that case? if we take what puppet says it'll only ever be one shard [14:40:26] If it is a mere "grouping" à la ganglia [14:40:37] we should just use puppet [14:40:51] godog, in the ideal world [14:41:12] the percentage of the monitoring that depends on the databases of the shard [14:41:21] in reality, maybe nothing [14:41:28] as in, maybe the grouping should be [14:41:37] "core production" [14:41:47] and not include multi-source [14:41:59] and the "others" separatelly [14:42:04] they have different roles [14:42:10] so it is easy to separate them [14:42:25] after all [14:42:30] look, if I check monitoring [14:42:37] I want to see the load on core production [14:42:51] I do not care if labs replicas have more inserts from production [14:43:21] that is me as a user [14:43:25] I see, though for example for alerting different shards [14:43:31] might have different patterns [14:43:37] yes [14:43:43] if there is lag on analytics [14:43:52] what I mean is that it is easy to split things and aggregate them later, much harder to do the oppositve [14:43:57] we do not care (it happens due to the long running queries) [14:44:00] it doesn't page [14:44:12] but core production does, it affects users directly [14:44:28] loook at https://tendril.wikimedia.org/activity?research=0&labsusers=0 [14:44:44] there are 4 filters: wikiuser [14:44:50] "regular application activity" [14:45:01] wikiadmin: administration and long-running queries [14:45:08] labs (labs queries) [14:45:16] and analytics (analytics queries) [14:45:23] so rather than shards [14:45:33] maybe the groups should be something like: [14:45:41] core production-s1 [14:45:43] core production-s2 [14:45:47] *etc [14:45:54] labs [14:45:58] analytics [14:46:02] misc-m1 [14:46:06] misc-m7 [14:46:17] (sorry, we only have 5 of those) [14:46:34] I see, is that in puppet? [14:46:36] those are roles [14:46:43] mariadb::core [14:46:52] $shard='s1' [14:47:25] we can add exports to those [14:47:31] at will [14:47:56] I think that is the easiest [14:49:16] yup that seems the easiest, one more question, that would still leave e.g. the db1047 case with only one shard [14:49:32] db1047- analytics [14:49:32] even though ATM is in two, maybe that doesn't matter given what you just said [14:49:53] we do not care about shards there too much, it is not production [14:50:02] it may get more complex in the future [14:50:13] because I am probably going to add multi-instances [14:50:21] in production [14:50:26] but forget about that for now [14:50:29] not a thing now [14:50:36] db1069 is the exception [14:50:56] yeah multiple mysql instances on a machine isn't a big deal [14:51:04] from monitoring that [14:51:06] that is [14:51:20] let's say that non-core production [14:51:34] doesn't really need too fine-grained replication [14:51:47] enwiki master and slaves does [14:52:19] (1 third of the total traffic goes through it) [14:52:44] in fact, I would not bother with non-core for now [14:53:07] let's focus on mariadb::core hosts, which all have its $shard [14:53:27] which make more than 50% of the hosts [14:53:52] those are all on 3306 and all have 1 instance [14:54:08] (just have in mind that will not be always true for the rest) [14:54:59] ok that seems like a good starting point, so for the ::core hosts we can generate the config and it'll be always one host <-> one shard [14:55:07] err, one mysql instance rather than one host [14:55:37] all that is relatively clean on: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/manifests/site.pp;d1afdf03a0977127a945dda635eb0717841ab4f0$421 [14:56:03] so we'd be aggregating per-shard, and see aggregates for all ::core [14:56:04] so if you need to export anything, we just need to change one role (role::mariadb::core) [14:56:14] that is ok [14:56:25] so, if there is an issue with one shard [14:56:31] it will be see on the aggregates [14:56:37] independently of which hosts are pooled [14:56:43] that is my use case #1 [14:56:58] "too many insert on s4" [14:57:19] "selects have gone wild on s5" [14:57:30] (always only considering core) [14:58:22] so, next steps [14:58:24] yup, I was asking about a single instance in multiple shards because that'd mean that when aggregating for core with all its shards, there would be double counting [14:58:40] yes [14:58:51] do we care for now? [14:59:14] or we will just not use shards for the multi-source ones? [14:59:50] e.g. labs will just be all labs hosts together [14:59:51] ah I see what you mean, heh for data from puppet I don't think there will be a problem because one host is in only one shard [15:00:02] yes, for core yes [15:00:16] dbstore, analytics, etc. [15:00:22] is when things get weird [15:00:28] so we have to have it present [15:00:41] if you go back to https://tendril.wikimedia.org/tree [15:01:00] yup [15:01:07] you will see dbstore1001 in allmost all shards, but it is not "core" [15:01:35] it would be nice to have per-shard stats, though [15:01:46] but that probably will need package changes [15:01:57] to support mariadb multi-source replication [15:02:35] it is on puppet, though [15:02:50] see: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=dbstore1001 [15:03:06] so all this chaos, it is ordered [15:03:17] nothing is manual [15:03:34] but yes, those are 42 icinga checks for a single host [15:03:38] :-) [15:03:55] hehehe [15:04:20] so, do we export the shards for mariadb core? [15:04:32] anything else that would be needed for a first phase [15:04:46] I would start with only a few hosts installing the agent [15:05:01] yeah core with shard information is good for the goal I think [15:05:16] so, we do not need that for the goal [15:05:22] the review for mysqld_exporter is https://gerrit.wikimedia.org/r/#/c/296385/ btw [15:05:23] but I want that to have present [15:05:28] for, at a later time [15:05:36] make it possible [15:05:54] so, not to work too much for something we cannot do in limited time [15:06:04] but let's design with that in mind [15:06:26] so, things I will do is apply the new account to all core servers only [15:06:32] in case we change the grants later [15:06:35] yeah, I think the distinction core/labs/analytics etc is good, each with optionally shard info [15:06:53] we do not even need like 2 levels [15:07:04] it could even be core-s1, core-s2, labs [15:07:25] whatever is easier [15:07:50] then include it on what, all codfw hosts? [15:08:04] yeah all codfw seems a good start [15:08:08] how do you see the deployment, does it need to be very gradual? [15:08:22] do we need to tune the options on a single host first? [15:08:39] you may have more experience with the tests you did [15:08:52] please share any thoughts [15:09:08] sure, on the mysql server side I don't think we've seen much impact from the tests [15:09:15] well [15:09:24] I am more concerned about overlading the backend :-) [15:10:15] let me count production core dbs on codfw [15:10:15] heheh yeah that we'll have to test, we can do a gradual rollout by generating the config only for a subset of hosts, but have the agent deployed everywhere already [15:10:36] ah, so we can install it disabled, maybe? [15:10:40] how? [15:10:54] if you install it but nothing asks for metrics over http it isn't going to do anything [15:11:08] so it is a poll model? [15:11:08] i.e. it runs queries only when asked [15:11:13] great [15:11:25] then we can roll less carefuly [15:11:35] allow me to roll it in codfw only first [15:11:38] anyway [15:11:50] and we can start polling it as needed [15:11:55] yup, since it is all polling that's why I wanted to start right away with max user connections [15:12:37] how does authentication work? [15:12:49] there is no authentication? [15:13:09] if yes, why not limiting the firewall to a single ip? [15:13:22] or a couple, whatever it works [15:13:44] yup we could do that too, I believe there's already a list of monitoring hosts [15:14:12] to answer your question, no by default there's no auth to ask for metrics, though it is all http so relatively straightforward to add [15:14:13] last thing I want is exhausting the number of connections [15:14:24] which is why I undestand now your worry abou tthat [15:14:34] I will definitely add it before deploying it [15:15:03] ok! yeah better safe than sorry [15:15:14] so, what I need is like a plan now [15:15:29] now that I finally have time to work on this [15:15:54] things I am going to do now is finish the account thing for core only (for now) [15:16:11] then we roll in the agent [15:16:22] but in a secure way, with the firewall up [15:16:29] as limited as possible [15:16:50] then we have to finish deploying the server [15:17:00] have you tested the script? [15:17:17] which script in particular? [15:17:27] the one generating the list of hosts [15:18:03] yup, I did test it, it isn't pretty heh but works [15:18:10] it is ok [15:18:32] but maybe needs tuning to limit the list to codfw or a smaller list for now? [15:18:47] so we do not start pooling all hosts? [15:19:29] *polling [15:19:29] what I am forgetting? [15:20:15] how do we handle the groups? [15:20:33] which groups? [15:20:44] the shards [15:20:53] or do we start withc basic monitoring first? [15:21:05] I do not know, I am asking what do you want to do? [15:22:15] ah you mean basic monitoring for the host-level metrics? [15:23:16] my question is, what do you want me to focus immediately to help the goal? [15:24:30] I think the thing to focus on is getting the mysql agent onto db hosts in codfw [15:24:51] and next export shard info from ::core and generate the config from that [15:25:01] ok, that's easy [15:25:26] I will do that and let's continue being in contact [15:25:35] while we're at it to also help with the goal, it'd be nice to have the host agent going too [15:25:42] essentially the same thing I did in https://gerrit.wikimedia.org/r/#/c/299970/ [15:26:04] what is the name of the hosts? [15:26:31] of the prometheus hosts? prometheus* [15:26:35] ok [15:26:37] thanks [15:26:53] np, I'll provision the hosts in eqiad too shortly, they are only in codfw atm [15:27:19] that is good to know [15:28:56] indeed, thanks for taking the time jynus ! feel free to send reviews my way [15:29:06] same [15:39:35] jynus: e.g. for example I used the socket in https://gerrit.wikimedia.org/r/#/c/296385 but a better idea might be host:port considering also multiple instances in the future [15:40:44] no [15:40:49] socket is preferred [15:40:56] specially if you disable ssl [15:41:14] every instance will have its own socket, in case there was several instances [15:41:24] it just has to be configurable in some way [15:42:01] ah that's ok already then, defaults to /tmp/mysql.sock [18:06:15] DBA, Operations, ops-eqiad: dbstore1002 disk errors - https://phabricator.wikimedia.org/T140337#2481228 (Cmjohnson) Rebuild is extremely slow.... Rebuild Progress on Device at Enclosure 32, Slot 6 Completed 45% in 391 Minutes.