[07:06:56] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2755745 (10Marostegui) revision finished its compression (it took 1 day and hours). It went from: 375G to 139G The whole dataset is now compressed and it is 485G. As db1073 is having some delay issues, I am g... [09:27:01] 10DBA: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2755877 (10Marostegui) [09:56:29] 10DBA: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2755916 (10Marostegui) From the logs: ``` /system1/log1/record14 Targets Properties number=14 severity=Critical date=10/31/2016 time=08:18 description=System Power Fault Detected (XR: 14 00 MID:... [10:15:41] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2755947 (10Marostegui) @Papaul is there anything from your side that you can check to see what that error means and if the server and the power supply are both fine? Thanks! [11:53:33] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2755877 (10jcrespo) Probably related: T137084 [12:03:58] 10DBA, 06Operations, 10ops-codfw: Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March - https://phabricator.wikimedia.org/T130702#2756149 (10jcrespo) [12:04:12] 10DBA, 06Operations, 10ops-codfw: Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March - https://phabricator.wikimedia.org/T130702#2213475 (10jcrespo) 05Resolved>03Open [12:39:30] 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T149509#2756223 (10Marostegui) [12:45:11] 10DBA, 06Operations, 10ops-codfw: Degraded RAID on db2052 - https://phabricator.wikimedia.org/T149377#2756230 (10Marostegui) [12:46:39] 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T149509#2754729 (10Marostegui) This is indeed correct. Disk in slot 3 is broken ``` Enclosure Device ID: 32 Slot Number: 3 Drive's position: DiskGroup: 0, Span: 1, Arm: 1 Enclosure position: N/A Device Id: 3 WWN: 5... [14:53:04] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2756785 (10chasemp) We need to schedule a downtime to do this move from labsdb1005 to labsdb1004. This should be a very short window of actual outage.... [15:17:47] 10DBA, 06Labs, 10Labs-Infrastructure: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2756903 (10jcrespo) @chasemp - as we talked on the last meeting we need to sort out some architecture decisions with t... [15:18:15] jynus: https://gerrit.wikimedia.org/r/#/c/318892/ I am going to deploy this in a bit [15:18:34] ok [15:18:47] should I create the users for labs? [15:19:24] sure, if you can [15:20:12] there is some deployment going on [15:20:23] careful with that [15:20:43] yes [15:24:24] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2756912 (10jcrespo) a:03jcrespo So, 'maintainviews' will be the user used to create the view (you will connect to mysql using that user). viewmaster wi... [15:35:09] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#1936600 (10yuvipanda) If we settle on a date and announce on labs-announce... [15:43:33] 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T149509#2754729 (10Cmjohnson) Disks swapped...waiting on rebuild. [15:57:18] 07Blocked-on-schema-change, 10DBA: Apply change_tag and tag_summary primary key schema change to Wikimedia wikis - https://phabricator.wikimedia.org/T147166#2757120 (10Marostegui) While attempting to do this in db2016 (codfw master) it broke replication in all the slaves ``` Error 'Duplicate entry '16890654'... [16:09:28] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2757157 (10chasemp) thank you @jcrespo! fyi this is maintained here atm (both user and pass are set in private) https://phabricator.wikimedia.org/diffus... [16:20:33] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2757223 (10jcrespo) I feel there is another missunderstanding, there is $::passwords::mysql::maintain_views and $::passwords::labsdb::maintainviews. I wil... [16:22:07] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2757233 (10chasemp) >>! In T123731#2757001, @yuvipanda wrote: > If we settle on a date and announce on labs-announce... @yuvipanda I think the asks h... [16:23:34] 10DBA, 06Labs, 10Labs-Infrastructure: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2757249 (10chasemp) @jynus great thanks :) fyi reminder on a pro/con task for discussion re: proxysql vs haproxy :) [16:24:13] 10DBA, 06Operations, 10ops-codfw: install new disks into dbstore2001 - https://phabricator.wikimedia.org/T149457#2757251 (10Marostegui) I am fine with that. What I want to do: - Move the snapshot from dbstore2001 to dbstore2002 and labsdb1008 (needs coordination with Chase). - Build dbstore2002 from there (... [16:37:37] 10DBA, 06Operations, 10ops-codfw: Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March - https://phabricator.wikimedia.org/T130702#2213575 (10Marostegui) ** Number of crashes es2019: 23rd March & 22nd April & 30th Oct ** Number of crashes es2017: 26th May 30th May, ** Number... [16:42:18] 10DBA, 06Operations, 10ops-codfw: Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March - https://phabricator.wikimedia.org/T130702#2225024 (10RobH) I'll review all the past and linked ticket histories. We'll need to generate a list of each system, and the overall errors and m... [16:48:28] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2757344 (10chasemp) ok thanks, `$::passwords::labsdb::maintainviews` works for me [16:54:26] 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T149509#2757360 (10Marostegui) It got rebuilt ``` ˜/icinga-wm 17:50> RECOVERY - MegaRAID on db1050 is OK: OK: optimal, 1 logical, 2 physical Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name... [16:54:46] 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T149509#2757361 (10Marostegui) 05Open>03Resolved [19:21:40] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2758017 (10jcrespo) a:05jcrespo>03None So this are the privileges created on all labsdbs (not yet on 9/10/11), but on 8 and the existing labs dbs: {P... [21:35:41] jynus, marostegui I've took a look at db1065 because it paged [21:36:36] many of it's checks have notifications disabled, is that intended? from mediawiki-config looks in production as usual with load of 50 and role API [21:36:39] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=db1065 [21:50:47] 10DBA, 06Operations: db1065 paged for NRPE timeout - https://phabricator.wikimedia.org/T149633#2758508 (10Volans) [21:51:31] more details in the task ^^^ all looks good for now [23:09:49] 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#2758720 (10jcrespo) [23:22:19] 10DBA, 06Operations: Review Icinga alarms with disabled notifications - https://phabricator.wikimedia.org/T149643#2758740 (10Volans) [23:22:34] 10DBA, 06Operations: Review Icinga alarms with disabled notifications - https://phabricator.wikimedia.org/T149643#2758752 (10Volans)