[02:10:12] 10DBA: db1093 went down - depooled - https://phabricator.wikimedia.org/T222127 (10Dzahn) [02:12:08] 10DBA: db1093 went down - depooled - https://phabricator.wikimedia.org/T222127 (10Dzahn) also about a minute later: 22:10 <+icinga-wm> PROBLEM - HP RAID on db1093 is CRITICAL: CRITICAL: Slot 1: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2 - Controller: OK - Battery count: 0... [02:12:53] 10DBA: db1093 went down - depooled - https://phabricator.wikimedia.org/T222127 (10Dzahn) could confirm the changes appeared on https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php [02:16:40] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1093 - https://phabricator.wikimedia.org/T222128 (10Dzahn) [05:04:51] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1093 - https://phabricator.wikimedia.org/T222128 (10Marostegui) Per that output, looks like the BBU is gone, let's follow the investigation at {T222127} [05:05:06] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1093 - https://phabricator.wikimedia.org/T222128 (10Marostegui) [05:05:10] 10DBA, 10Patch-For-Review: db1093 went down - depooled - https://phabricator.wikimedia.org/T222127 (10Marostegui) [05:05:33] 10DBA, 10Patch-For-Review: db1093 went down - depooled - https://phabricator.wikimedia.org/T222127 (10Marostegui) [05:05:48] 10DBA, 10Patch-For-Review: db1093 went down - depooled - https://phabricator.wikimedia.org/T222127 (10Marostegui) [05:06:20] 10DBA, 10Patch-For-Review: db1093 went down - depooled - https://phabricator.wikimedia.org/T222127 (10Marostegui) Thanks a lot @Dzahn and @Tgr for taking care of this - we will take it from here Much appreciated [05:16:31] 10DBA, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10Marostegui) [05:17:02] 10DBA, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10Marostegui) a:03Cmjohnson The BBU looks broken: ` /system1/log1/record13 Targets Properties number=13 severity=Caution date=04/29/2019 time=23:19 description=... [05:17:57] 10DBA, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10Marostegui) p:05Triage→03High [05:18:11] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10Marostegui) [05:30:50] 10DBA, 10MediaWiki-API, 10Patch-For-Review: Slow query "ApiQueryLogEvents::execute" after actor rollout - https://phabricator.wikimedia.org/T220999 (10Xqt) [05:31:40] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10Marostegui) I have started MySQL which started correctly. As it started fine, I have started replication too, once it has caught up, I am going to do a da... [06:57:15] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10Marostegui) For what is worth, the LB looks like it worked fine. The time line is: 23:24: db1093 goes down 23:24-23:30: Spike of errors and then some res... [08:11:48] let's talk when you can about budget [08:12:00] let's do it [08:12:09] check line 168 [08:12:12] of our etherpad [08:12:15] that is the notes I added so far [08:12:29] I think I may had duplicated some things on 163 [08:12:43] let's merge [09:02:25] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1093 (s6 candidate master) went down - broken BBU - https://phabricator.wikimedia.org/T222127 (10Marostegui) The following tables have been checked against multiple hosts and reported no differences: ` archive logging page revision text user change_ta... [09:16:11] 10DBA, 10MediaWiki-API, 10Patch-For-Review: Slow query "ApiQueryLogEvents::execute" after actor rollout - https://phabricator.wikimedia.org/T220999 (10Dvorapa) [09:16:19] 10DBA, 10MediaWiki-API, 10Patch-For-Review: Slow query "ApiQueryLogEvents::execute" after actor rollout - https://phabricator.wikimedia.org/T220999 (10Dvorapa) [12:36:55] Why are you rebasing daily_snapshot so much? :) [12:50:49] https://mysqlserverteam.com/mysql-8-0-16-mysql_upgrade-is-going-away/ [13:00:00] so [13:00:03] i am confused again [13:00:09] "production db proxies" [13:00:24] that's in the procurement list to buy now [13:00:28] and we have quotes for those, I believe [13:00:39] is what we have in the budget request for NEXT fiscal the same or not? [13:00:49] and if not, how should we call them to avoid this confusion? [13:02:00] I am confused now too :) [13:02:14] So I think the one we "decelerated" were the ones for misc? [13:02:22] i have no idea anymore [13:03:20] we have space to buy them right now, as some esams purchases will be delayed [13:03:33] but i'd like to make sense of it all first [13:04:00] So there is a comment from jaime on line 78 that seems to indicate that those are the ones we delayed in favour of codfw but, I am not 100% sure [13:04:10] So let's wait for jynus to clarify if that comment is that [13:04:36] yeah [13:05:51] Because: https://phabricator.wikimedia.org/T213765#5146539 indicates 2 more, so might be that [13:06:02] But let's see what jaime meant exactly [14:57:58] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (101997kB) [14:59:48] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (10Marostegui) when do you want to do this? [15:02:33] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (101997kB) we can do right now? [15:03:39] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (10Marostegui) I would prefer if we can schedule it for next week as tomorrow is a public holiday and it is getting late on EU timezone [15:04:44] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (101997kB) sure no problem... [15:05:11] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (10Marostegui) Ping me on Monday if you want [15:06:55] 10DBA, 10Wikimedia-Site-requests: Global rename of Shadowxfox → Milenioscuro: supervision needed - https://phabricator.wikimedia.org/T222184 (101997kB) Thanks, I'll.. and have a Happy Holiday! [15:18:31] 10DBA, 10Patch-For-Review: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10jcrespo) [19:51:18] 10DBA, 10Operations, 10ops-codfw, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul) [19:51:59] 10DBA, 10Patch-For-Review: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10jcrespo) [19:52:01] 10DBA, 10Operations, 10ops-codfw, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul)