[09:26:18] I have not upgraded mysql I don't want the master to be running a higher version than the slaves, just restarted mysql [09:28:33] 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10jcrespo) For the future, this information should be on the request summary on top- as it is vital, not on a hidden comment that will be easily lost. Clear request will be pr... [09:28:55] oh [09:29:48] that is a vim fail [09:38:50] there were connection errors on labswiki at 8:53 [09:40:57] network issues maybe? [09:41:51] https://grafana.wikimedia.org/dashboard/db/mysql?panelId=10&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1073&var-port=9104&from=1535100106220&to=1535103706220 [09:42:04] I don't know, there is also stable errors^ [09:42:26] like one every minute or so [09:44:32] https://grafana.wikimedia.org/dashboard/db/mysql?panelId=10&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1073&var-port=9104&from=now-7d&to=now [09:45:18] yes, but the base errors shouldn't happen either [09:45:29] yeah, I know [09:45:44] we should limit each account to 50 connections [09:45:44] Just saying it is not something that has happened now, more than it is something on-going [09:45:58] MAX_USER_CONNECTIONS [09:46:04] it will not fix the issue [09:46:16] but it will make wikitech not fail, etc. [09:46:29] yeah, it is always wikiuser as per the logs [09:46:43] (making it nova maintenier's problem, not us) [10:00:33] I want to focus now on the backups, I need to give them some love [10:01:04] give them love! [11:09:16] 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10aborrero) [11:10:08] 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10aborrero) [11:10:46] 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10aborrero) [12:01:00] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Bstorm) Done! [14:21:09] fyi, I just responded to T199501 -- besides the cost considerations, there are new issues identified with the H740P that you may want to be aware of [14:21:38] checking [14:22:12] TL;DR is that Linux 4.9 doesn't support this controller, so we need to either backport a newer version of the megasas driver to 4.9, or use a newer kernel [14:22:24] moritzm has been working on the former, see https://phabricator.wikimedia.org/T199125#4529755 [14:22:47] Oh wow [14:22:48] the quote for those servers include this controller, and I'd rather not order it until we've figured that out, but I could be convinced otherwise [14:22:52] I wasn't aware of that [14:23:08] are we talking raid controller or what controller? [14:23:08] yeah it's a new development, just found out couple of days ago [14:23:13] raid contoller, yes [14:23:19] the Dell H740P [14:23:21] ok, so worth waiting indeed [14:23:27] totally [14:23:49] we can risk it and order it and hope that we would have figured it out until it lands at the data center [14:24:21] but... yeah, I'd like to avoid that risk if possible [14:24:39] we can ask other providers the same questions [14:24:55] we asked quotes, but didn't go too deep into them [14:25:29] other hardware vendors you mean? [14:25:34] yes [14:26:06] HPE is the alternative right now, I'd like us to not introduce a third vendor until we have the time to be a more thoughtful about it [14:26:16] no, I meant HP [14:26:22] I guess he means HP [14:26:24] that [14:26:30] if you want to go for HPE, sure, I don't mind [14:26:41] I want to ask the same questions to them [14:26:49] which questions? [14:27:26] why higher pricing [14:27:48] let me give you context [14:28:11] paravoid: https://phabricator.wikimedia.org/T199501#4462956 [14:28:54] aha [14:29:25] part of the price bump is that the processor mapping isn't a 1:1 [14:29:27] so maybe not time to introduce a 3rd vendro (yet) but we should ask similar question you suggest [14:29:37] Xeon Golds are equivalent to E7, not E5 [14:29:51] so either change the quote [14:29:57] E5 is basically Xeon Silver, but Xeon Silvers don't come at those kind of frequencies [14:29:59] ask for a quiote with different disk, etc. [14:30:23] (or with different cpus) [14:30:28] So basically ask HP for the same changes we asked the other one, SSDs and a detailed list of what has changed [14:30:31] no? [14:31:00] to be fair, cpu is not that important for databases, even using compression [14:31:32] the one we have right now is top of the line [14:31:55] it makes sense to not be the worse to scale with memory [14:31:57] fastest quad-core in the market [14:32:05] *the worst [14:32:09] but if you see regular usage [14:32:19] Probably we can live with a middle range one, not top, not bottom [14:32:21] http://blog.exxactcorp.com/intel-broadwell-dp-skylake-sp-cpu-cheat-sheet/ is a good resource [14:32:44] * marostegui bookmarks that [14:33:14] https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&var-instance=All&from=1535110386504&to=1535121186506 [14:33:44] we asked in the past for fast processors with limited cores, as concurrency is limited [14:33:49] (limited by io) [14:33:55] nod [14:34:29] (some of those are peculiar because they are unused for misc reasons) [14:34:52] but db1083 should be our heaviest db right now [14:35:24] and cpu usage only spikes to 40% [14:35:38] https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?panelId=656&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&var-instance=All&from=1534516531824&to=1535121331824 [14:36:02] we could but the machines with just 1 processor and we would be ok [14:36:46] and ask for $1000-$2000 in savings, according to their interpretation [14:37:27] I am not saying we have to do that, I am saying we have a lot of options [14:37:39] Yeah, i think we can try to go for lower CPU specs [14:37:59] ok [14:38:21] it's a bit orthogonal to whether we're being ripped off or not, but anything we can do to reduce cost is always welcome :) [14:38:28] to either, if they were right, lower the cost, or, as it looks, catch their bluff [14:38:41] ^you see my line of thinking [14:38:48] that said, don't take unnecessary risks in reducing specs, it's not worth your time dealing with any potential fallout [14:39:08] yes, but if hw, on top of that, has issues [14:39:12] that is a no-brainer [14:39:30] Yeah, the only option would be to go for HP to avoid the RAID controller issue [14:39:38] I would like to stop talking to them and start talking to hp [14:39:41] that's how it looks right now :( [14:39:47] But I think we can definitely reduce specs within the CPU field [14:39:54] We don't need the top CPU [14:40:00] jynus: I don't mind that, but note that we can do both [14:40:06] of course [14:40:13] I've told Rob that this is top priority, whatever you need [14:40:17] again, I am just open options [14:40:49] would you like to respond to the task to ask Rob to send an email to HPE as well? [14:40:55] with a few pointed questions? [14:41:28] yes, but not today [14:41:48] this is the load of db1083: https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?panelId=606&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&var-instance=All&from=1534516895818&to=1535121695818 [14:41:59] 5 queued processes [14:42:20] I think it is worth trying to save money on the CPU, but either way, we are blocked on the controller, so we'd need to email HP as well [14:44:12] paravoid: I can take care of commenting on the task about what we discussed here [14:44:20] ok, thank you :) [14:44:50] please refine my comment https://phabricator.wikimedia.org/T199501#4530219 [14:44:52] manuel [14:44:53] Oh, Jaime did already! [14:44:56] will do [14:45:01] it may not be 100% clear [14:46:20] I will now try to focus on programming [14:47:35] thank you both! [15:05:41] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) [15:05:51] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) [15:06:01] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) [15:06:56] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) s8 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1002 [] db1124 [] db1109 [] db1104 [] db1101 [] db1099 [] db1092 []... [15:06:59] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) s8 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1002 [] db1124 [] db1109 [] db1104 [] db1101 []... [15:07:01] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) s8 eqiad progress [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1002 [] db1124 [] db1109 [] db1104 [] db1101 [] db1099 [] db1092 [] db1087 [] db1071 [15:08:06] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) [15:08:23] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) [15:08:35] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui)