[09:26:18] <marostegui>	 I have not upgraded mysql I don't want the master to be running a higher version than the slaves, just restarted mysql
[09:28:33] <wikibugs>	 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10jcrespo) For the future, this information should be on the request summary on top- as it is vital, not on a hidden comment that will be easily lost. Clear request will be pr...
[09:28:55] <jynus>	 oh
[09:29:48] <jynus>	 that is a vim fail
[09:38:50] <jynus>	 there were connection errors on labswiki at 8:53
[09:40:57] <marostegui>	 network issues maybe?
[09:41:51] <jynus>	 https://grafana.wikimedia.org/dashboard/db/mysql?panelId=10&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1073&var-port=9104&from=1535100106220&to=1535103706220
[09:42:04] <jynus>	 I don't know, there is also stable errors^
[09:42:26] <jynus>	 like one every minute or so
[09:44:32] <marostegui>	 https://grafana.wikimedia.org/dashboard/db/mysql?panelId=10&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1073&var-port=9104&from=now-7d&to=now
[09:45:18] <jynus>	 yes, but the base errors shouldn't happen either
[09:45:29] <marostegui>	 yeah, I know
[09:45:44] <jynus>	 we should limit each account to 50 connections
[09:45:44] <marostegui>	 Just saying it is not something that has happened now, more than it is something on-going
[09:45:58] <jynus>	 MAX_USER_CONNECTIONS
[09:46:04] <jynus>	 it will not fix the issue
[09:46:16] <jynus>	 but it will make wikitech not fail, etc.
[09:46:29] <marostegui>	 yeah, it is always wikiuser as per the logs
[09:46:43] <jynus>	 (making it nova maintenier's problem, not us)
[10:00:33] <jynus>	 I want to focus now on the backups, I need to give them some love
[10:01:04] <marostegui>	 give them love!
[11:09:16] <wikibugs>	 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10aborrero)
[11:10:08] <wikibugs>	 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10aborrero)
[11:10:46] <wikibugs>	 10DBA, 10Cloud-Services, 10Patch-For-Review: cloudvps: eqiad1: move nova db to m5-master - https://phabricator.wikimedia.org/T202549 (10aborrero)
[12:01:00] <wikibugs>	 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Bstorm) Done!
[14:21:09] <paravoid>	 fyi, I just responded to T199501 -- besides the cost considerations, there are new issues identified with the H740P that you may want to be aware of
[14:21:38] <marostegui>	 checking
[14:22:12] <paravoid>	 TL;DR is that Linux 4.9 doesn't support this controller, so we need to either backport a newer version of the megasas driver to 4.9, or use a newer kernel
[14:22:24] <paravoid>	 moritzm has been working on the former, see https://phabricator.wikimedia.org/T199125#4529755
[14:22:47] <marostegui>	 Oh wow
[14:22:48] <paravoid>	 the quote for those servers include this controller, and I'd rather not order it until we've figured that out, but I could be convinced otherwise
[14:22:52] <marostegui>	 I wasn't aware of that
[14:23:08] <jynus>	 are we talking raid controller or what controller?
[14:23:08] <paravoid>	 yeah it's a new development, just found out couple of days ago
[14:23:13] <paravoid>	 raid contoller, yes
[14:23:19] <paravoid>	 the Dell H740P
[14:23:21] <jynus>	 ok, so worth waiting indeed
[14:23:27] <marostegui>	 totally
[14:23:49] <paravoid>	 we can risk it and order it and hope that we would have figured it out until it lands at the data center
[14:24:21] <paravoid>	 but... yeah, I'd like to avoid that risk if possible
[14:24:39] <jynus>	 we can ask other providers the same questions
[14:24:55] <jynus>	 we asked quotes, but didn't go too deep into them
[14:25:29] <paravoid>	 other hardware vendors you mean?
[14:25:34] <jynus>	 yes
[14:26:06] <paravoid>	 HPE is the alternative right now, I'd like us to not introduce a third vendor until we have the time to be a more thoughtful about it
[14:26:16] <jynus>	 no, I meant HP
[14:26:22] <marostegui>	 I guess he means HP
[14:26:24] <marostegui>	 that
[14:26:30] <paravoid>	 if you want to go for HPE, sure, I don't mind
[14:26:41] <jynus>	 I want to ask the same questions to them
[14:26:49] <paravoid>	 which questions?
[14:27:26] <jynus>	 why higher pricing
[14:27:48] <jynus>	 let me give you context
[14:28:11] <jynus>	 paravoid: https://phabricator.wikimedia.org/T199501#4462956
[14:28:54] <paravoid>	 aha
[14:29:25] <paravoid>	 part of the price bump is that the processor mapping isn't a 1:1
[14:29:27] <jynus>	 so maybe not time to introduce a 3rd vendro (yet) but we should ask similar question you suggest
[14:29:37] <paravoid>	 Xeon Golds are equivalent to E7, not E5
[14:29:51] <jynus>	 so either change the quote
[14:29:57] <paravoid>	 E5 is basically Xeon Silver, but Xeon Silvers don't come at those kind of frequencies
[14:29:59] <jynus>	 ask for a quiote with different disk, etc.
[14:30:23] <jynus>	 (or with different cpus)
[14:30:28] <marostegui>	 So basically ask HP for the same changes we asked the other one, SSDs and a detailed list of what has changed
[14:30:31] <marostegui>	 no?
[14:31:00] <jynus>	 to be fair, cpu is not that important for databases, even using compression
[14:31:32] <paravoid>	 the one we have right now is top of the line
[14:31:55] <jynus>	 it makes sense to not be the worse to scale with memory
[14:31:57] <paravoid>	 fastest quad-core in the market
[14:32:05] <jynus>	 *the worst
[14:32:09] <jynus>	 but if you see regular usage
[14:32:19] <marostegui>	 Probably we can live with a middle range one, not top, not bottom 
[14:32:21] <paravoid>	 http://blog.exxactcorp.com/intel-broadwell-dp-skylake-sp-cpu-cheat-sheet/ is a good resource
[14:32:44] * marostegui bookmarks that
[14:33:14] <jynus>	 https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&var-instance=All&from=1535110386504&to=1535121186506
[14:33:44] <jynus>	 we asked in the past for fast processors with limited cores, as concurrency is limited
[14:33:49] <jynus>	 (limited by io)
[14:33:55] <paravoid>	 nod
[14:34:29] <jynus>	 (some of those are peculiar because they are unused for misc reasons)
[14:34:52] <jynus>	 but db1083 should be our heaviest db right now
[14:35:24] <jynus>	 and cpu usage only spikes to 40%
[14:35:38] <jynus>	 https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?panelId=656&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&var-instance=All&from=1534516531824&to=1535121331824
[14:36:02] <jynus>	 we could but the machines with just 1 processor and we would be ok
[14:36:46] <jynus>	 and ask for $1000-$2000 in savings, according to their interpretation
[14:37:27] <jynus>	 I am not saying we have to do that, I am saying we have a lot of options
[14:37:39] <marostegui>	 Yeah, i think we can try to go for lower CPU specs
[14:37:59] <paravoid>	 ok
[14:38:21] <paravoid>	 it's a bit orthogonal to whether we're being ripped off or not, but anything we can do to reduce cost is always welcome :)
[14:38:28] <jynus>	 to either, if they were right, lower the cost, or, as it looks, catch their bluff
[14:38:41] <jynus>	 ^you see my line of thinking
[14:38:48] <paravoid>	 that said, don't take unnecessary risks in reducing specs, it's not worth your time dealing with any potential fallout
[14:39:08] <jynus>	 yes, but if hw, on top of that, has issues
[14:39:12] <jynus>	 that is a no-brainer
[14:39:30] <marostegui>	 Yeah, the only option would be to go for HP to avoid the RAID controller issue
[14:39:38] <jynus>	 I would like to stop talking to them and start talking to hp
[14:39:41] <paravoid>	 that's how it looks right now :(
[14:39:47] <marostegui>	 But I think we can definitely reduce specs within the CPU field
[14:39:54] <marostegui>	 We don't need the top CPU 
[14:40:00] <paravoid>	 jynus: I don't mind that, but note that we can do both
[14:40:06] <jynus>	 of course
[14:40:13] <paravoid>	 I've told Rob that this is top priority, whatever you need
[14:40:17] <jynus>	 again, I am just open options
[14:40:49] <paravoid>	 would you like to respond to the task to ask Rob to send an email to HPE as well?
[14:40:55] <paravoid>	 with a few pointed questions?
[14:41:28] <jynus>	 yes, but not today
[14:41:48] <jynus>	 this is the load of db1083: https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?panelId=606&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&var-instance=All&from=1534516895818&to=1535121695818
[14:41:59] <jynus>	 5 queued processes
[14:42:20] <marostegui>	 I think it is worth trying to save money on the CPU, but either way, we are blocked on the controller, so we'd need to email HP as well
[14:44:12] <marostegui>	 paravoid: I can take care of commenting on the task about what we discussed here
[14:44:20] <paravoid>	 ok, thank you :)
[14:44:50] <jynus>	 please refine my comment https://phabricator.wikimedia.org/T199501#4530219
[14:44:52] <jynus>	 manuel
[14:44:53] <marostegui>	 Oh, Jaime did already!
[14:44:56] <marostegui>	 will do 
[14:45:01] <jynus>	 it may not be 100% clear
[14:46:20] <jynus>	 I will now try to focus on programming
[14:47:35] <paravoid>	 thank you both!
[15:05:41] <wikibugs>	 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui)
[15:05:51] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui)
[15:06:01] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui)
[15:06:56] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) s8 eqiad progress  [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1002 [] db1124 [] db1109 [] db1104 [] db1101 [] db1099 [] db1092 []...
[15:06:59] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) s8 eqiad progress  [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1002 [] db1124 [] db1109 [] db1104 [] db1101 []...
[15:07:01] <wikibugs>	 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) s8 eqiad progress  [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1002 [] db1124 [] db1109 [] db1104 [] db1101 [] db1099 [] db1092 [] db1087 [] db1071
[15:08:06] <wikibugs>	 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui)
[15:08:23] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui)
[15:08:35] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui)