[04:56:42] 10DBA, 10Operations: Decommission db2046.codfw.wmnet - https://phabricator.wikimedia.org/T231767 (10Marostegui) [04:57:02] 10DBA, 10Operations: Decommission db2046.codfw.wmnet - https://phabricator.wikimedia.org/T231767 (10Marostegui) p:05Triage→03Normal [05:03:23] 10DBA, 10Operations, 10Patch-For-Review: Decommission db2046.codfw.wmnet - https://phabricator.wikimedia.org/T231767 (10Marostegui) [05:13:36] 10DBA: db1115 replicating from db2048 - https://phabricator.wikimedia.org/T231768 (10Marostegui) [08:04:55] jynus: I want to restart MySQl on db1115 to gather some data for T231769 [08:04:56] T231769: Investigate possible memory leak on db1115 - https://phabricator.wikimedia.org/T231769 [08:06:44] I will wait until we don't have any backups in ongoing status [08:11:14] there is a couple finishing [08:11:20] s3 and s6 I think [08:11:26] should be done soon [08:11:53] yep, I can wait with no rush [08:12:41] s3 has been shaky lately [08:12:50] yeah [08:12:53] at least 2 failures in the last 2 weeks [08:12:55] always codfw? or also eqiad? [08:12:58] eqiad only [08:13:20] tendril stability may have played some role [08:13:36] maybe yeah [08:13:41] as it is the one with most metadata [08:17:35] if you see db1095 stats, we are wasting resources: https://grafana.wikimedia.org/d/000000274/prometheus-machine-stats?orgId=1&var-server=db1095&var-datasource=eqiad%20prometheus%2Fops [08:17:57] backup source? [08:18:04] between backups the machine is idle (only replicating) [08:25:07] the reason for that is that processing takes more time than taking them, so they have to wait or we would have a large queue [08:26:00] processing as in packing and all that? [08:26:35] preparing it and compressing it mostly [08:26:50] which has to be done with them uncompressed [08:27:02] so we only do 2 at a time at most [11:50:56] FYI, upgrading apache on tendril/debmonitor hosts in a bit [11:56:04] completed