[04:52:49] <wikibugs>	 10DBA, 10Operations, 10ops-codfw: pc2010 possibly broken memory - https://phabricator.wikimedia.org/T227552 (10Marostegui) 05Open→03Resolved No OS or idrac errors since the memory was replaced, so I am closing this as resolved. If it happens again, I will re-open  Thanks @Papaul!
[05:29:33] <wikibugs>	 10DBA, 10Goal, 10Patch-For-Review: Productionize db21[21-30} - https://phabricator.wikimedia.org/T228969 (10Marostegui)
[05:58:42] <wikibugs>	 10DBA: Update rack information on zarcillo.servers - https://phabricator.wikimedia.org/T229683 (10Marostegui) 05Open→03Resolved This is now fixed: ` root@db1115.eqiad.wmnet[zarcillo]> select fqdn,rack from servers where rack is NULL and  fqdn like 'db%'; Empty set (0.00 sec)  `
[08:22:03] <Amir1>	 marostegui: hey, I deployed a change on June 27th that reduced a crazy number of "INSERT IGNORE"s that are not needed (those rows were there already). Don't they show up in rows written metric? Because I couldn't find any big difference in graphs while it dropped tens of thousand writes like that per second :(
[08:22:46] <marostegui>	 Amir1: if the rows where there already, I don't think they will show in any metric
[08:23:01] <marostegui>	 your only hope could be the INSERT metric on the master specific dashboard
[08:23:20] <marostegui>	 like: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1067&var-port=9104&from=now-24h&to=now&refresh=5s&panelId=2&fullscreen
[08:23:22] <marostegui>	 that is s1 master
[08:23:23] <Amir1>	 it should reduce the deadlocks 
[08:25:07] <Amir1>	 marostegui: yeah it's actually visible. The deployment were around 20:00 https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1067&var-port=9104&from=1561623823497&to=1561710238504&panelId=2&fullscreen
[08:31:47] <marostegui>	 Amir1: yeah, only noticiable if you know about it, it would have not been noticiable to me
[08:32:21] <Amir1>	 true :(
[08:32:36] <marostegui>	 there are too many ups and down for me to notice :)
[08:34:08] <Amir1>	 Yeah, I just stick to reduction of deadlocks for now :P
[08:34:50] <marostegui>	 that's already awesome :)
[08:39:12] <wikibugs>	 10DBA, 10Operations, 10decommission: Decommission db2035 - https://phabricator.wikimedia.org/T229784 (10Marostegui)
[08:39:47] <wikibugs>	 10DBA, 10Operations, 10decommission: Decommission db2035 - https://phabricator.wikimedia.org/T229784 (10Marostegui) p:05Triage→03Normal
[08:40:09] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10decommission, and 2 others: Decommission db2035 - https://phabricator.wikimedia.org/T229784 (10Marostegui)
[08:44:08] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10decommission, and 2 others: Decommission db2035 - https://phabricator.wikimedia.org/T229784 (10Marostegui)
[08:45:50] <wikibugs>	 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db2042) - https://phabricator.wikimedia.org/T221533 (10Marostegui)
[09:13:05] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Upgrade db1100 firmware and BIOS - https://phabricator.wikimedia.org/T228732 (10Marostegui) @Cmjohnson are we still good for tomorrow at 14:00 UTC? I will have the host depooled and off for you before 14:00 UTC
[10:22:30] <wikibugs>	 10DBA, 10Operations: Remove sarin and neodymium GRANTs from all the databases - https://phabricator.wikimedia.org/T229796 (10Marostegui)
[10:22:48] <wikibugs>	 10DBA, 10Operations: Remove sarin and neodymium GRANTs from all the databases - https://phabricator.wikimedia.org/T229796 (10Marostegui) p:05Triage→03Normal
[10:24:03] <wikibugs>	 10DBA, 10Operations: Remove sarin and neodymium GRANTs from all the databases - https://phabricator.wikimedia.org/T229796 (10Marostegui)
[10:24:21] <wikibugs>	 10DBA, 10Operations: Remove sarin and neodymium GRANTs from all the databases - https://phabricator.wikimedia.org/T229796 (10Marostegui)
[10:24:36] <wikibugs>	 10DBA, 10Operations: Remove sarin and neodymium GRANTs from all the databases - https://phabricator.wikimedia.org/T229796 (10Marostegui)
[10:42:02] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Upgrade db1100 firmware and BIOS - https://phabricator.wikimedia.org/T228732 (10Cmjohnson) @marostegui yes, still good fit tomorrow at 1400UTC
[10:42:50] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Upgrade db1100 firmware and BIOS - https://phabricator.wikimedia.org/T228732 (10Marostegui) Excellent - thank you!
[11:21:08] <wikibugs>	 10DBA, 10Math: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 (10Marostegui) @Physikerwelt  there have been no errors or writes to the tables after T196055#5352527 and T196055#5384177 - I am thinking about starting to drop it everywhere this week.
[11:45:05] <Amir1>	 marostegui: Actually bytes received had a dip when it got deployed: https://grafana.wikimedia.org/d/000000278/mysql-aggregated?panelId=2&fullscreen&orgId=1&from=1561624886431&to=1561722086432
[11:45:14] <Amir1>	 (Around 20:00)
[11:45:24] <marostegui>	 Amir1: and why did it recover again?
[11:45:51] <Amir1>	 It didn't recover 
[11:46:26] <marostegui>	 sorry, my eyes went to the 3:00 sink
[11:46:27] <marostegui>	 haha
[11:46:31] <marostegui>	 I guess natural reaction :)
[11:46:32] <Amir1>	 it was around 90MB/s and after deployment it went to 80 MB/sec :D
[11:46:39] <Amir1>	 haha, it's fine
[11:46:58] <marostegui>	 indeed, it never went back to previous values
[11:46:59] <marostegui>	 nice
[11:56:31] <wikibugs>	 10DBA, 10Math: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 (10Physikerwelt) @Marostegui that's good news. Go ahead. BTW. I checked the code again and found some leftovers https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Math/+/524592/ if you have some spare time may...
[12:00:41] <Amir1>	 I measured and it seems it decreased 4% of traffic to the DB or 3MB/s
[12:02:19] <Amir1>	 marostegui: I was planning to investigate the spikes but they are gone now: https://grafana.wikimedia.org/d/000000548/wikibase-wb_terms?refresh=30s&orgId=1&from=now-7d&to=now
[12:02:41] <marostegui>	 Amir1: do you feel confident to enable again the change from last week?
[12:03:07] <Amir1>	 yeah, specially since it's Monday
[12:03:23] <marostegui>	 sure, the patterns are quite clear and won't take long to start as we saw
[12:03:39] <Amir1>	 This external requests should have been addressed by blocking them or throttling them 
[12:04:35] <Amir1>	 marostegui: so let's do it then?
[12:04:42] <marostegui>	 Amir1: sure
[12:04:58] <marostegui>	 Amir1: if we saw them again, let's revert or you want to give them some hours to try to investigate?
[12:05:27] <Amir1>	 A couple of hours would be great
[12:05:35] <marostegui>	 sure
[12:06:28] <Amir1>	 Revert "Revert "Revert "Revert "Switch property terms migration to WRITE_NEW on production wikidata"""" :D
[12:06:37] <marostegui>	 hahahaha
[12:31:46] <wikibugs>	 10DBA, 10Math: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 (10Marostegui) Thanks for double checking! Regarding that patch, I can take a look but I am not familiar with the code, so I am afraid I won't be too helpful there :(
[14:04:40] <Amir1>	 marostegui: so the graphs looks okay
[14:04:57] <Amir1>	 After I enable reading new for clients, it should get a little bit better
[14:08:05] <marostegui>	 Amir1: yeah, no spikes on db1104 :)
[14:08:38] <marostegui>	 we got the first spikes and so far it looks fine
[14:08:42] <marostegui>	 let's give it some more hours
[14:09:00] <Amir1>	 it got increased but I think that's sorta okay because 1- We are reading more rows 2- we are at middle of migration, when we stop writing and reading the old system, things should get better
[14:09:23] <marostegui>	 yeah, I was referring to the spikes we saw last week
[14:09:43] <marostegui>	 it did cause locks, but it recovered
[14:09:46] <Amir1>	 I will enable it for clients tomorrow and if there was no issue. I set it to read_new a week after
[14:10:31] <marostegui>	 Amir1: and that will stop reading the old system?
[14:10:47] <Amir1>	 marostegui: for properties only
[14:11:05] <marostegui>	 right
[14:11:15] <Amir1>	 very small amount of data, huge amount of reads
[14:11:25] <marostegui>	 so we should see a decrease?
[14:12:10] <Amir1>	 probably
[14:12:16] <marostegui>	 nice
[14:14:09] <Amir1>	 The most important part is migrating items. I started the process in beta cluster already. It seems okay as the code is virtually the same 
[14:15:24] <marostegui>	 How long you think that could take?
[14:16:37] <Amir1>	 I hope we will start in two weeks, but I guess running the migration script would take months 
[14:16:45] <Amir1>	 at least a month
[14:17:25] <marostegui>	 yeah, that's what I had in mind, that it will take quiiiite long
[14:17:44] <Amir1>	 but the good thing is that we basically can switch off things gradually, e.g. after 10% is done, we can basically stop writing to wb_terms for those 10% of items
[14:17:59] <Amir1>	 that would stop the table from growing 
[16:23:37] <wikibugs>	 10DBA, 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10Marostegui) As per the sync on the SRE meeting, @JHedden will be online from WMCS. I will handle the announcement for wikitech, could...
[16:26:16] <wikibugs>	 10DBA, 10Goal: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 (10Papaul)
[16:26:54] <wikibugs>	 10DBA, 10Operations, 10ops-codfw: (2019-08-31)rack/setup/install db2131.codfw.wmnet - https://phabricator.wikimedia.org/T229251 (10Papaul)
[16:28:49] <wikibugs>	 10DBA, 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10JHedden) >>! In T229657#5393428, @Marostegui wrote: > As per the sync on the SRE meeting, @JHedden will be online from WMCS. > I will...
[16:29:12] <wikibugs>	 10DBA, 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Switchover m5 primary master: db1073 to db1133 - https://phabricator.wikimedia.org/T229657 (10Marostegui) Thanks!