[07:44:06] morning [07:47:43] hi [07:51:06] there has been high load on s6 for a while [07:52:06] I'm missing a server there [07:52:37] ok, I see also something happened to db1034 tonight (s7 watchlist & co) around 22 and midnight, some lag and spikes in wfLogDBError [07:53:46] that is normally a deployment [07:54:22] on one slave only? [07:55:10] I would bet 1034 is a watchlist slave [07:55:37] yes, see above ;) [08:00:48] SpecialRecentChangesLinked::doMainQuery [08:01:03] and ApiQueryContributions::execute [08:01:41] also ApiQueryRevisions::run seems, but it's hard to tell which one is the cause and which one the effect [08:02:03] SpecialRecentChangesLinked is a relativelly new functionality [08:02:22] the others are bad, but bad known [08:03:50] and it is mostly a msn bot scaning those, taking 30-40 seconds [08:10:04] did you see my replies on https://gerrit.wikimedia.org/r/#/c/287144 ? [08:17:06] yes, I thought I had +1 that, I must have get it confused with another ticket [08:18:32] what are the procedures for restarting the postgres instances running on labsdb100[67]? [08:18:56] I think I didn't because we actually have misc on both datacenters [08:19:19] so we actually can already setup the same thing than core [08:19:40] ok, but we switchover them based on $mw-primary? cannot happen that we need or want to switchover them alone? [08:20:17] then the whole thing is wrong [08:20:46] and master should not depend on that [08:21:04] talk to giussepe [08:21:16] and expose the issue [08:21:31] what happens if we want to failover only s1? [08:21:37] or not misc [08:21:41] or only misc [08:23:18] agree but s1 is part of MW and used by MW, the various misc are used by different softwares unrelated to MW AFAIK [08:23:58] then the TODO is wrong [08:25:05] I talked to faid*n yesteday, we should have a point of entry [08:25:31] and then run pt-hearbeat and all clients based on that [08:25:54] why the TODO is wrong? assuming we have misc-foo that is used by software bar and baz, as long as we have some variable that decide on both DB and bar/baz software config who is the master should be ok [08:25:59] what do you mean point of entry? [08:26:04] that is actually mostly setup (proxies) [08:26:35] which means pt-heartbeat should be outside of the node [08:26:41] in those cases [08:26:52] and update using the proxy [08:27:44] so, not run pt-heartbeat localy, but remotely against the proxy [08:27:54] could also just be DNS [08:27:59] no [08:28:11] we do not do load balancing using dns, ever [08:28:16] or failover [08:28:50] DNS actually points statically to the proxys right now [08:29:08] eg. ssh m1-master.eqiad.wmnet [08:29:50] my suggestion, change the todos to "move pt-heartbeat outside of just one host" [08:29:52] and commit [08:30:31] ok [08:31:13] there is something else [08:31:34] read_only would need changing too [08:31:44] add also a comment there [08:32:11] and let's bring the question to the ops session [08:32:33] how to handle failover for microservices like those [08:32:42] a variable per server? [08:32:47] a varaible per service? [08:33:02] (imagine we want to failover etherpad but not puppet) [08:34:16] it will be acceptable for low traffic services that for a short period of time the software runs on eqiad but the DB is in codfw? [08:34:36] I do not know [08:34:55] I asked to discuss that, but for obvious reasons, it is low priority [08:35:06] but I made sure in the last session to mention it [08:35:14] (i think you where there?) [08:35:27] what happens with blog, puppet, etc (misc) [08:35:39] (2 days ago not, I was unable to make it unfortunately) [08:35:45] it is ok [08:35:54] question is probably nobody has answers [08:36:02] yet [08:36:20] in some cases there is not even application servers for those on codfw [08:36:33] so let's leave it at least as clean as possible [08:36:52] ok [08:36:59] I wonder about read only [08:37:22] should we leave it as it is now on core, read only on? [08:37:32] and manually change it? [08:38:06] what is the least dangerous option: forgetting to setting it on, or leave it on by accident? [08:38:33] (this was not thought before because there use to not be 10 masters) [08:39:05] from one side given that puppet is not real-time should not handle who is the master, from the other side on disk config should reflect runtime one [08:39:19] yeah :-) [08:39:24] same issue as always [08:39:41] we also don't start mysql on crash/reboot, hence in case of a restart a manual intervention is needed anyway [08:40:01] having read_only=1 might cause some additional errors if left by mistake on a master [08:40:33] my consideration: you chose, but leave a comment on read only [08:40:51] and explicitly use it on mariadb::config [08:40:58] so at least has visibility [08:41:15] I think is more dangerous having it RW by mistake, means data corruption [08:41:18] ok [08:41:23] "e.g. start always in read only mode" [08:41:24] or [08:41:46] "read only mode off if it is the master of the primary datacenter" [08:42:09] both on misc and core [08:42:21] in mariadb::config is a variable, so I should add the comment where it's called [08:42:36] anywhere it is ok [08:43:02] but set it explicitly still on mariadb.pp [08:43:21] :-) [08:43:56] rebase also because "$replication_is_critical = ($::mw_primary == $::site)" disappeared [08:44:04] you may find conflicts [08:45:20] already rebased, so far no conflicts, I'll check it [08:47:16] 👍 [08:59:02] moritzm, did you ask alex, he rebooted some postgres instances recently, so they may not be needed again [09:01:26] ok, thanks. I'll contact him [09:20:12] so the $read_only is used only for misc and others, because for coredb in production.my.cnf.erb we have hardcoded read_only = on with a comment already [09:21:43] great! :-) [09:22:05] that is also funny: # Master selection will be handled by cluster control. [09:22:08] :D [09:22:11] yeah [09:22:17] aka a DBA manually [09:22:28] I wonder, did they have CC at some point? [09:22:48] or it is a in-house software I have never heard about? [09:23:25] or was it copied from somewhere else? [09:24:15] git log | grep -i "cluster control" doesn't return anything :) [09:25:45] in any case, if GTID is deployed, it will make all slave issues disappear [09:26:03] no more scripts for slave failover [09:26:13] just "change master" [09:27:33] problem is we need to enforce avoiding out-of-band changes, or those happen on a different domain [10:08:31] CR updated, I had conflicts only on site.pp for the decommissioned es200* [10:15:38] for https://gerrit.wikimedia.org/r/#/c/287394/ instead I have 2 options: add a generic $master to mariadb::config parameters or keep being specific adding something like $quick_thread_pool [10:26:13] anything is ok to me, I just didn't want too much parameters on site.pp [10:26:30] in fact, I want to get rid of p_s and some other soon [10:26:45] make it default only depending on the type of node [10:26:53] s/too much/too many/ [10:27:21] once they are black boxes, the implementation details do not worry me much [10:28:01] I think the SSL one , which was only temporary until all nodes were restarted [10:28:48] so feel free to ignore me, and just commit, I can be annoying at times [10:29:07] (it is your fault for asking!) :-P [10:29:11] lol [10:29:15] I always nitpick [10:29:19] but it is not productive [10:29:41] let's get s*** done, assuming it is better than what we had before [10:30:36] Hi, I searching the wikipedia MySQL statistics (Queries, Threads and users), this [10:30:46] is for calculate the scalability of the site :) [10:31:09] Nux__, sadly, we do not have a lot public (yet) [10:31:24] but I created a summary some months ago [10:32:20] Nux__, https://www.mediawiki.org/wiki/File:MySQL_at_Wikipedia.pdf [10:32:53] for more accurate stats about users, requests, you should relay on Analytics stats [10:34:45] I think there is a #wikimedia-analytics for that [10:35:40] There is a more or less live QPS per server on https://dbtree.wikimedia.org/ [10:42:03] jynus, thanks for your help and sorry for my late response I am not very good in english :), I have seen this PDF he helped me to see all database technologies used in wikipédia and to see graphical measures in ganglia interface. [10:44:51] we will have better publicly available stats soon [12:26:05] jynus, volans: FYI: I'm setting kernel.unprivileged_bpf_disabled=1 on the Linux 4.4 systems (and there's now a dozen jessie db servers like that with the recent reimaging) as a hardening measure (https://lwn.net/Articles/660331/ for background, triggered by the recent https://bugs.chromium.org/p/project-zero/issues/detail?id=808). this will not have any impact on mariadb (the feature is very new in the Linux kernel and I doublechecked that [12:26:06] the mariadb codebase doesn't use bpf() [12:42:10] I do not even know what bpf is [12:42:59] is it for network? [12:43:03] https://en.wikipedia.org/wiki/Berkeley_Packet_Filter [12:43:06] ;) [12:45:04] moritz, go ahead and enable it live (I suppose you have already puppetized it on boot) [12:45:32] ok, will do. I'll puppetise it afterwards, so that it gets activated on all boots [12:46:15] unrelated, I have some issue with the firewall, but I need to debug to report it [12:50:16] ok, ping me if I should have a look [12:50:51] not yet, low priority because it does not affect production mysql, I will get back to you at some point [12:51:00] ok [12:51:08] do you have a one liner to enable /disable login? [12:51:13] *logging [12:52:02] a four-liner: [12:52:04] I can check it, just I suppose you have iptables command line more fresh and trottling, etc [12:52:13] iptables -N LOGGING [12:52:14] iptables -A INPUT -j LOGGING [12:52:16] iptables -A LOGGING -m limit --limit 2/min -j LOG --log-prefix "iptables-dropped: " --log-level 4 [12:52:18] iptables -A LOGGING -j DROP [12:52:24] great! thanks [12:52:37] dropped packets are then logged to syslog with the prefix above [12:52:47] much appreciated [12:52:57] and also thanks for the heads up [12:54:14] np [12:54:22] hey jynus, got a minute to talk about db refresh? [12:55:05] dbstore refresh, specifically :) [12:55:37] actually, 1 minute ues, but I have an interview in 5 minutes [12:55:40] :-) [12:55:49] heh, ok, we can catch up in an hour [12:56:19] paravoid, to clarify, I like to talk a lot [12:56:25] just get the last line [12:56:30] ignore the rest [12:56:37] and probably we do not have to talk anything [12:56:41] am I right? [12:57:03] lol [12:57:16] volans can confirm I like to talk/write too much [12:57:24] I'd like to discuss things a little bit nevertheless [12:57:29] maybe we can do better than that [12:57:30] that is ok [12:57:41] than the 3 hosts you requested [12:57:55] btw, you may have misunderstood my 7-10T comment, I think [12:58:09] to be fair, I expected your answer, I literally said [12:58:16] "I need the coredbs" [12:58:23] "I want the dbstores" [12:58:24] I didn't challenge that number, I just quoted you [12:58:42] this quote is for 25.6T raw capacity per server [12:58:51] that is like 12 TB [12:58:54] yup [12:58:56] (sorry, on call) [12:59:16] go, we can catch up later [14:09:12] back [14:11:34] great work, volans [14:12:03] thank you for slowly fixing those technical debts [14:12:22] I hope we can get rid soon of the latest 5.5 nodes [14:12:35] and finally clean up puppet [14:12:40] hopefully :) [14:12:53] some cleanup coming soon, I was just doing it [14:12:54] I will have to ask if phab can run on 10 [14:13:07] as well as other services on misc [14:13:36] I predict that may take a bit because I may have to negotiate with a lot of end users [14:13:54] paravoid, I am available now, but ping me at any time [14:36:35] proposal https://gerrit.wikimedia.org/r/#/c/288195 - I'd like some nitpick ;) [see the comment below too] [14:40:38] actually, I like the idea [14:41:39] * volans sad, no nitpick :-) [14:42:47] because we may want slave plugin on masters [14:43:00] and other combinations [14:44:47] my worry is that, at some point, we will need to redo the configuration class [14:45:00] to be both clear AND flexible [14:45:08] and I do not know if that is possible [14:46:32] jynus: hey [14:46:37] not easy [14:47:00] paravoid, let's talk if you want [14:48:05] yup [14:48:11] ok, so [14:48:12] volans, can you have a look if we can reasonably drop p_s or any other parameter [14:48:13] first point [14:48:28] so it has a production default for core? [14:48:29] jynus: p_s already dropped, see last one [14:48:32] you said 7-10T needed reasonably [14:48:37] yes [14:48:39] that's 6T for current usage, plus growth, right? [14:49:17] it is difficult to say, because we do not have such a space, and we are (mis)using compression, using 4 TB [14:49:35] right, because we're using toku now [14:49:41] got it [14:49:42] 7-10 is my calculation going back to the working InnoDB/InnoDB compressed [14:49:50] so compression, but less [14:49:56] ok [14:50:02] growth for how long? [14:50:14] (hard to estimate, I realize) [14:50:23] but augmenting operational speed of copying data between servers [14:50:35] I had a look at current labs usage [14:50:40] in the case of labs [14:50:51] so that is labsdb1001 [14:51:49] and that is 5TB compressed [14:52:32] how's the rate of change? iow, how long would e.g. 10T lasts us for? [14:52:35] rough estimate [14:52:47] so that makes ~7 + 1 (for user dbs) [14:53:08] 6TB for production is assuming InnoDB compression works [14:53:15] now, the rate of growing [14:54:51] is 1TB/year [14:55:14] but, there are some schema changes pending that worries me [14:55:35] I have also some ideas to reduce the storage, but I cannot put a date to them in less than 3 years [14:56:01] so 4 years of life estimated with 12 productive TB [14:56:21] productive means after RAID + overhead [14:56:55] I agree with dbstore in general, and I can get by if I can reuse some old out-of warranty servers [14:57:03] how did we go from 7-10 to 12? :) [14:57:15] 7 + 1 = 8 [14:57:29] 8 + 4 = 12 in 4 years [14:57:39] the positive thing is that there are a lot of free slots where we could add drives later, but this requires to rebuild the RAID so quite time consuming [14:57:50] that is transparent [14:57:56] I would not worry about that [14:58:18] what I mean is, I want that capacity (I do not care about 10 or 12 TB) [14:58:25] but I do not need it now [14:58:31] that is why I want to wait [14:58:56] we haven't budgeted that for next FY either though, right? [14:59:05] the other option is to buy just the disks [14:59:12] wait, for which systems? [14:59:27] depends on the budget [14:59:47] dbstore1001/1002 are not ready to be replaced, these are fairly new systems [14:59:47] but you told me not to worry about budget, that it was your or mark's problem [14:59:58] (but we can talk about upgrading them with more space) [15:00:07] the Ciscos need to go [15:00:11] that is exactly what I want [15:00:18] I said read just the last line [15:00:24] but 3 new labs [15:00:32] and disks for dbstore, as much as we can [15:00:50] labs with how many space? 16x1.6T? [15:00:51] but I need to get rid of tokudb, and need the extra space [15:01:28] it depends on how much you want to future proof [15:02:02] with 8.5 TB we would have a 90% disk [15:02:16] and as I said, I predict 1TB growth per year [15:02:24] ok, that's useful info [15:02:43] that is just looking at graphana, it is not new [15:03:03] well, and counting each separate InnoDB host, which I did beforehand [15:03:26] what about analytics/dumps? [15:03:37] let me tell you about dumps [15:03:39] you say "will have to wait", but I'm not sure what you mean by that [15:04:05] dumps do not have issues (Ariel will probably want faster machines) [15:04:12] I'm also not sure if these are budgeted for next FY (I'd have to check), which may mean replacing them the earliest in 14 months from now [15:04:14] but the issue is that we are consolidating [15:04:23] so buying less machines [15:04:28] but more powerful [15:04:35] I hope you are not against that [15:04:53] right now, we have 7 machines alsmost enterily dedicated to dumps [15:04:55] why would I be? :) [15:05:13] well, I did not present you my plan, you just may not be aware [15:05:24] I did told mostly to rob and mark only [15:05:53] so buying 7 db-grade hosts (or 14, if you have into account codfw) [15:06:01] was a no go for me from the starty [15:06:20] so I want to concentrate dumps to only 2 machines, but obviouly more expensive [15:06:34] more expensive per server, I assume [15:06:38] yes [15:06:39] is it overall more expensive? [15:06:43] I'd assume no [15:06:49] I almost sure it isn't [15:06:55] but has different disk needs [15:06:59] alright [15:07:03] more disk, basically [15:07:10] so technically, I am "saving money" [15:07:20] just 7 -> 2 hosts [15:07:22] yeah [15:07:29] analyics [15:07:35] is like dying [15:07:51] db1047 has a broken RAID (battery) [15:08:08] basically, they never had dedicated machines, they were given spares [15:08:22] db1046 is not too much better [15:08:25] dbstore is ok [15:08:33] agree with you [15:08:40] db01046/1047 are slated for a replacement, yeah [15:08:44] so we need to fit them in this budget I think [15:09:20] question is as it is not a 1:1 replacement, it is not clear the path, as I am concetrating some services [15:09:38] which ones are the dumps mysqls? [15:09:41] the final goal is to have only 3 hosts* per shard [15:10:07] they do not have special naming, they are 7 of the on mediawiki, they are called "dump" role [15:10:49] https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php [15:10:54] yup, got it [15:10:59] db1053 [15:11:12] which I will not decom, just repurpose it for other role [15:11:19] db1021 [15:11:21] etc. [15:11:45] so db1050-db1073 will still be active [15:11:59] but we have to substitute ~50 servers [15:12:18] for aprox 7x3=21 + dumps + analytics [15:12:28] show around half the number of nodes [15:12:49] but less expensive overaly, AFAIK [15:13:17] actually, the first 22 have already been bought [15:13:42] it is dumps analytics labs and (what else?) that were missing [15:14:10] I think you missed the background (understandably) [15:15:00] maybe I will let you think and we can comment it on ticksts? [15:15:44] but there are hidden costs that I wanted to bring out [15:15:57] tokudb maintenance, which I want it to disappear [15:16:16] and less servers == less operational overhead [15:17:13] I'm not disputing or challenging any of that fwiw [15:17:27] I'm trying to fit as much as I can given our budget constraints [15:17:31] sure [15:17:37] the exercise is just shuffling $ around [15:17:40] I just wanted also to present the needs [15:17:43] given the constraints given to me last week :) [15:17:55] and of course I understand there are constraints [15:18:09] I was happy already we could bought the core servers [15:18:21] we had space issues before [15:18:30] now there is time to refect [15:18:43] so I am happy [15:19:20] I know I am also being in some cases pesimistic [15:19:53] but it wouldn't be the first time someone deploys half a terabyte of redundant data on preoduction in less than a week :-) [15:21:52] performance-wise, I do not care much, it is disk that only worries me, it is the last think I want to think about [15:26:03] so how much extra disk space do we need for dbstores? [15:26:15] dbstores, let me see [15:26:24] the base is 7.5, as labs [15:26:38] but let me see the growth there [15:27:03] ah, you mean like buying disks in addition, or new disks in total? [15:27:10] I suppose it doesn't matter [15:27:25] let me see a number [15:28:27] growth is 200 GB per shard [15:28:31] and per year [15:28:41] but give me a minute to confirm that [15:29:11] you said to buy new disks for them, how many is what I'm asking :) [15:29:31] 300 GB for s3 [15:29:40] so that makes, in average [15:30:25] mmm, 1.5 TB per year, more than I initially expected [15:30:54] 7.5 now + 1.5TB per year, how many years, 4? [15:31:32] that is 15, lets lower it back to the 12 with compression [15:32:21] for this depends how old are the servers, if you'll replace them in 2y not worth to calculate 4y growth probably [15:32:38] 2 years, lol, volans [15:32:52] well he's right, dbstores are slated to be replaced in Feb 2018 [15:33:21] that will be part of FY 17-18 budget [15:33:54] I didn't have into account the backups, but we could always use the decommed es2 hosts [15:34:09] for what? [15:34:22] oh, you mean all these figures do not include backups, ok [15:34:45] dbstores are also the hosts where backups are being held until bacula retrieves them [15:34:55] so they need headroom [15:35:06] ah [15:35:06] and that is why they are at times at 90% usage [15:35:35] but backups are just 750GB [15:35:39] "just" [15:35:58] I could move them IF i could reuse some decommed machines [15:36:08] so, not having those into account [15:36:26] it would be moving from 6TB -> 12 TB [15:36:31] I could work with that [15:36:52] that's usable, right? [15:36:55] yes [15:37:10] not having into account RAID, etc. [15:37:12] so 8x1.6T per dbstore? [15:37:58] yes, there is like a 1TB of overhead in RAID, llvm, filesystem root fs, etc. [15:38:06] mostly actually round errors [15:38:18] TB vs. TiB [15:38:55] however, I am unsure about compatibility and slots [15:39:04] that should be checked [15:40:06] please do [15:40:21] right now they have 1 TB drives [15:40:41] how are you going to expand the space? [15:40:45] dbstore1001 looks like has 12 SFF 1TB drives [15:40:48] 12 of those [15:40:49] just expanding the RAID? [15:41:10] well, I did not think yet about it, as it was not my first option [15:41:24] they are also SAS drives [15:41:57] and they are under warranty [15:41:57] Note: Spanned virtual disks such as RAID 10, 50, and 60 cannot be reconfigured. [15:42:00] so that is a complication [15:42:19] from the vendor's raid controller [15:42:22] website [15:42:53] paravoid, my wish is to prepare an alternative plan with the help of robh and chris [15:43:01] and I can go back to you [15:43:01]