[03:53:21] Morning, I don't know if it's useful but it's worth mentioning here: https://github.com/box/Anemometer [04:57:08] Amir1: Nice, I will add it to my "to take a look" (endless) list! [04:57:46] :D [05:10:22] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) [05:10:44] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) p:05Triage→03Normal [05:11:14] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) I've acked the alert for now. [05:28:30] In general I'd love to replace tendril with something used in outside world so you don't need to maintain yet another codebase [05:43:47] 10DBA, 10Patch-For-Review: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2045.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/2019... [05:51:20] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10elukey) Thanks a lot @Marostegui! We can shutdown the host without any problem, it only needs a ~10m heads up to properly stop eve... [06:44:26] 10DBA, 10Patch-For-Review: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2045.codfw.wmnet'] ` Of which those **FAILED**: ` ['db2045.codfw.wmnet'] ` [07:15:58] 10DBA, 10Patch-For-Review: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2045.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/2019... [07:41:57] 10DBA, 10Patch-For-Review: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2045.codfw.wmnet'] ` and were **ALL** successful. [07:45:46] 10DBA, 10Analytics, 10Analytics-EventLogging, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) a:03Cmjohnson Thanks - not sure how to proceed as the `dmesg` entries show that the issue is fixed but icinga is sti... [08:40:12] copying data away from dbstore2* is super-slow, they are so low on iops even sequental reads get affected [08:40:44] I also was able to recompress s2 around 300 GB down [08:43:18] how many instances dbstore2* have now? [08:44:30] I have not deleted or stopped any so far, just in case [08:44:38] just duplicated [08:44:49] oh, sorry, dbstore, I read dbprov hehe [09:31:51] db2099 has alerts disabled, but it will generate new checks when puppet runs on icinga FYI [09:32:01] thanks [09:32:12] will ack them when I see them (taking a break) [13:23:03] 10DBA: Decommission old coredb machines (<=db2042) - https://phabricator.wikimedia.org/T221533 (10Marostegui) [13:28:44] 10DBA: Decommission old coredb machines (<=db2042) - https://phabricator.wikimedia.org/T221533 (10Marostegui) [13:28:57] 10DBA: Decommission old coredb machines (<=db2042) - https://phabricator.wikimedia.org/T221533 (10Marostegui) [16:57:02] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10RobH) a:05RobH→03None IRC Update: This is ready for installation by the #DBA team, one of them will steal this task later this week. [16:57:30] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo) a:03jcrespo [17:04:39] 10DBA, 10Operations, 10ops-codfw, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul) [17:13:01] 10DBA, 10Operations, 10ops-codfw, 10Goal, 10Patch-For-Review: rack/setup/install db2[103-120].codfw.wmnet (18 hosts) - https://phabricator.wikimedia.org/T221532 (10Papaul) [17:32:54] 10DBA, 10Patch-For-Review: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10jcrespo) [17:33:05] 10DBA, 10Patch-For-Review: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10jcrespo) 98 and 99 done, althought they need recompression (specially s3).