[04:35:01] so it is expected? as per the last answer :/ [04:44:46] 10DBA, 10Operations, 10ops-codfw: pc2010 possibly broken memory - https://phabricator.wikimedia.org/T227552 (10Marostegui) Thanks @Papaul I have started MySQL again, let's monitor the host for a few days [04:54:34] 10DBA, 10Operations: db2058: Broken storage - https://phabricator.wikimedia.org/T229449 (10Marostegui) [04:55:11] 10DBA, 10Operations: db2058: Broken storage - https://phabricator.wikimedia.org/T229449 (10Marostegui) 05Open→03Declined I am going to close this as this host will be decommissioned {T228258} [04:59:19] 10DBA, 10Operations, 10decommission: Decommission db2058.codfw.wmnet - https://phabricator.wikimedia.org/T229543 (10Marostegui) [04:59:49] 10DBA, 10Operations, 10decommission: Decommission db2058.codfw.wmnet - https://phabricator.wikimedia.org/T229543 (10Marostegui) p:05Triage→03Normal [04:59:59] 10DBA, 10Operations: Decommission db2043-db2069 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [05:01:38] 10DBA, 10AbuseFilter: Drop abuse_filter_log.afl_log_id in production - https://phabricator.wikimedia.org/T226851 (10Marostegui) [05:15:24] 10DBA, 10AbuseFilter, 10Patch-For-Review: Drop abuse_filter_log.afl_log_id in production - https://phabricator.wikimedia.org/T226851 (10Marostegui) [05:15:31] 10DBA, 10AbuseFilter, 10Patch-For-Review: Drop abuse_filter_log.afl_log_id in production - https://phabricator.wikimedia.org/T226851 (10Marostegui) 05Open→03Resolved All done! [05:25:31] 10DBA, 10Goal, 10Patch-For-Review: Productionize db21[21-30} - https://phabricator.wikimedia.org/T228969 (10Marostegui) [05:29:05] 10DBA, 10Goal, 10Patch-For-Review: Productionize db21[21-30} - https://phabricator.wikimedia.org/T228969 (10Marostegui) [09:21:14] 10DBA, 10Operations: db2058: Broken storage - https://phabricator.wikimedia.org/T229449 (10Marostegui) I rebooted the server and this is the boot message: ` Slot 0 HP Smart Array P420i Controller (1 GB, v6.00) 1 Logical Drive 1719-Slot 0 Drive Array - A controller failure event occurred prior to this... [09:25:22] marostegui: yeah but it's fixable at least. I know what they are doing [09:25:32] :) [09:27:06] marostegui: I'm going live with wb_terms new thing again. Please take a look and tell me if I should revert it or not, I leave the decision to our beloved DBA [09:27:27] yeah, let's give it some more minutes this time [10:24:57] Amir1: are we live? [10:27:51] not yet [10:27:59] oki [10:47:45] 10DBA, 10Operations, 10decommission, 10Patch-For-Review: Decommission db2058.codfw.wmnet - https://phabricator.wikimedia.org/T229543 (10Marostegui) [10:49:46] marostegui: Scheduled for around 40 minutes [10:49:59] Amir1: roger! [12:19:38] 10DBA, 10Math: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 (10Marostegui) Renamed on db1089 on enwiki: ` root@db1089.eqiad.wmnet[enwiki]> show tables like 'TO_DRO%'; +----------------------------+ | Tables_in_enwiki (TO_DRO%) | +----------------------------+ | TO_DROP_mat... [12:20:10] 10DBA, 10Math: Remove table `math` from the database - https://phabricator.wikimedia.org/T196055 (10Marostegui) [13:21:05] 10DBA: Decommission old coredb machines (<=db2042) - https://phabricator.wikimedia.org/T221533 (10Papaul) [13:35:54] 10DBA, 10Operations, 10ops-codfw: pc2010 possibly broken memory - https://phabricator.wikimedia.org/T227552 (10Papaul) I checked IDRAC logs this morning, all looks good so far [13:48:45] 10DBA, 10Operations, 10ops-codfw: pc2010 possibly broken memory - https://phabricator.wikimedia.org/T227552 (10Marostegui) Hehe, yeah, I checked too. Let's give it till Monday Cross your fingers! [14:03:58] 10DBA, 10Gerrit, 10Operations, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Marostegui) I have set up the proxy for m2 in codfw. I kn... [15:55:53] 10DBA, 10Operations, 10serviceops-radar, 10Performance-Team (Radar): phased rollout of dbctl, etcd-backed database configuration in Mediawiki - https://phabricator.wikimedia.org/T229070 (10Krinkle) >>! In T229070#5367389, @gerritbot wrote: > Change 525684 had a related patch set uploaded (by CDanis; owner:... [16:27:11] marostegui: it's happening again, it seems it's a TTL expiry, I want to know more on what queries are going up. Is there a way to get that? [16:28:42] Amir1: hard to get them as the spikes are not preditable I guess [16:28:53] and enabling the query logging for long is basically impossible :) [16:29:51] it's actually happens almost every two hours [16:29:57] that's fishy [19:25:47] Amir1: It happened again, I think we need to re evaluate this tomorrow [19:25:56] not sure I want to leave this happening during the weekend [19:26:22] marostegui: sure [19:27:35] whilst it doesn't seem to be increasing query latency, it is a well defined pattern that needs some investigation and from the graphs it does cause connection issues, so some users might be experiencing regressions [19:31:05] marostegui: yeah. I think we should prioritize the caching bit. Alaa is already on it [19:31:59] Amir1: good to hear! should we revert tomorrow to make sure the weekend goes without anything unexpected? [19:32:54] Sure. The only thing so far is that we are not sure we should go with memcached or apcu [19:33:18] Probably memcached. Properties are small but not that small [19:45:47] yeah, maybe memcached if they are not that small [22:18:29] marostegui: ahhh thank you for all you added to the dbctl page on wikitech, very helpful :)