[09:43:06] 10DBA, 10Patch-For-Review: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10jcrespo) db2102 is setup, pending loading data, which being done now while testing at the same time the latest recover_dump.py version and generated backup. [11:25:14] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo) Either dns, remote ipmi or password may not be configured properly: ` Error: Unable to establish IPMI v2 / RMCP+ session 11:23:36 | Unab... [12:04:54] FYI [12:04:57] 14:02 [12:04:57] <+icinga-wm> PROBLEM - puppet last run on db1117 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:04:57] 14:03 [12:04:57] <+icinga-wm> PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:05:09] I'm fixing it, I reallocate some hiera keys [12:14:15] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts: ` ['db1139.eqiad.wmnet', 'db1140.eqiad.wmne... [12:17:00] same in db2037 [12:17:10] I'm merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/507770 [12:17:58] aaand, it's solved [12:22:56] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo) @Cmjohnson In case this is useful for you, I have documented how to enable ipmi on ilo5 from the web interface here: https://wikitech.wi... [12:23:22] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo) [12:28:24] thanks [12:28:44] note that traffic to those hosts may have been interrupted during the change [12:32:32] arturo: dbproxy1005 is failing too, and while that is not urgent because it is passive, that is wmcs' misc proxy [12:33:02] looking [12:33:31] again, not time sensitive, but likely related to the other m5 changes [12:33:52] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1140.eqiad.wmnet', 'db1139.eqiad.wmnet'] ` and were **ALL** successful. [12:34:00] dbproxy1005 works now [12:34:10] cool [12:34:17] thanks [12:34:25] thanks to you [12:37:53] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo) [12:38:44] 10DBA, 10Goal: Decommission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10jcrespo) [12:38:46] 10DBA, 10Goal: Purchase and setup remaining hosts for database backups - https://phabricator.wikimedia.org/T213406 (10jcrespo) [12:38:55] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo) 05Open→03Resolved a:05jcrespo→03Cmjohnson installed, implementation (provisioning) will be handled at T220572. [14:44:17] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 10Operations, and 5 others: Special:Log on commons -- entire web request took longer than 60 seconds and timed out - https://phabricator.wikimedia.org/T221458 (10WDoranWMF) 05Open→03Resolved a:03WDoranWMF [14:50:11] 10DBA, 10Operations, 10ops-eqiad, 10Goal, and 2 others: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Cmjohnson) [15:22:22] 10DBA, 10Operations, 10ops-eqiad, 10Goal, and 2 others: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Cmjohnson) [15:24:46] arturo: dbproxy1005 just failed over m5 [15:28:08] ^ Reedy (dbproxy is the proxy of wikitech too) [15:35:56] not sure what that means [15:47:57] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=dbproxy1005&service=haproxy+failover [15:48:23] that means either db1073 had the thing we suffered in the past (reaching max_connections) [15:48:29] or some other problem happened [15:48:42] db1073 is the db for striker, wikitech and others cloud services [15:49:00] I checked and didn't see an obvious problem [15:49:07] maybe network? [16:01:08] arturo: if you don't see any wmcs services with problems, I will restart the proxy [16:02:08] 10DBA, 10Operations, 10ops-eqiad, 10Goal, and 2 others: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Cmjohnson) [16:02:18] 10DBA, 10cloud-services-team: dbproxy1005 reports database failover - https://phabricator.wikimedia.org/T207901 (10jcrespo) This happened again, restarting proxy, as I don't see a clear connection with max_connections. Network instability? [16:06:04] I don't see any bold problem with our cloud right now [16:08:30] 10DBA, 10cloud-services-team: dbproxy1005 reports database failover - https://phabricator.wikimedia.org/T207901 (10jcrespo) ` -- Logs begin at Sat 2019-04-20 15:06:53 UTC, end at Thu 2019-05-02 16:07:12 UTC. -- May 02 14:53:39 dbproxy1005 haproxy[14940]: Backup Server mariadb/db1117:3325 is DOWN. 1 active and... [17:42:48] 10DBA, 10Operations, 10ops-eqiad, 10Goal, and 2 others: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Cmjohnson) [17:43:46] 10DBA, 10Operations, 10ops-eqiad, 10Goal, and 2 others: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Cmjohnson) a:05Cmjohnson→03RobH @robh all the servers are racked and on-site work has been completed. Some are off and some are in a state that just needs... [19:29:24] 10DBA, 10Patch-For-Review: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10jcrespo) [19:33:13] 10DBA, 10Patch-For-Review: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10jcrespo) eqiad is complete too, also pending only possible recompressions to save space, like most of the codfw servers here. I didn't realize how slow myloader... [21:39:54] 10DBA, 10Goal, 10Patch-For-Review: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 (10Papaul) [22:59:13] 10DBA, 10Operations, 10ops-codfw, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul)