[05:19:03] 10DBA, 10Operations: db1082 crashed - https://phabricator.wikimedia.org/T258336 (10Marostegui) I have started to repool db1082 [05:21:18] 10DBA, 10Patch-For-Review: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) [05:23:21] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257253 (10Marostegui) Thank you both! [05:57:36] 10DBA, 10Operations: Refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [05:57:39] 10DBA, 10Operations: db1082 crashed - https://phabricator.wikimedia.org/T258336 (10Marostegui) [05:58:18] 10DBA, 10Operations: db1082 crashed - https://phabricator.wikimedia.org/T258336 (10Marostegui) 05Open→03Resolved a:03Marostegui db1082 is fully repooled. I am going to consider this resolved. There is not much else we can do really - this host will be replaced and refreshed in Q2 (T258336), it is quite o... [05:58:21] 10DBA, 10Operations: Refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [07:20:59] 10DBA, 10Gerrit, 10Patch-For-Review: Make sure both `reviewdb-test` (used forgerrit upgrade testing) and `reviewdb` (formerly production) databases get torn down - https://phabricator.wikimedia.org/T255715 (10Marostegui) 05Open→03Resolved Dropped `reviewdb` after double checking nothing wrote to it aga... [08:04:36] jynus: i'm looking at the icinga alerts defined for db2093: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=db2093 [08:04:52] there's a ton of them that are backup-related, but i can't find how they are defined in puppet [08:06:40] one second [08:14:24] kormat: include ::profile::mariadb::backup::check [08:18:52] do you have it located? [08:20:18] i'm looking at it, yeah [08:20:40] i'll send a CR [08:22:27] once i figure out what the correct thing to do is :) [08:30:54] <_joe_> so, the cumin aliases for db-section-s10 and db-section-s11 include no hosts [08:31:26] <_joe_> do we plan on using those? [08:31:27] s10 should be db1133 [08:31:35] i believe it is actually in use [08:31:54] <_joe_> ok, so something deeper is wrong [08:31:57] s11 labtestwiki and that lives on a cloud host [08:32:02] so it is not really used [08:32:07] s10 is wikitech [08:32:08] <_joe_> I'll open a task [08:32:37] kormat: weren't you working with cumin aliases recently? [08:32:43] I cannot recall [08:32:44] yes. that's my stuff. [08:33:03] but what is the right procedure when codfw is primary? [08:33:07] _joe_: db-section-s11 will never contain hosts, db-section-s10 maybe should. i can have a look. [08:33:27] jynus: yeah, that is the discussion we had when s10 was created and it was a: we'll see XD [08:33:41] as wikitech will be never active in codfw, cause it lives on m5 [08:33:54] speaking of which, I wanted to bring that subject up in today's meetnig [08:34:08] should we attempt to migrate wikitech to s3 or s5 or wherever during the next DC switchover? [08:34:10] <_joe_> kormat: let me open a task [08:34:24] _joe_: i wouldn't dream of stopping you :) [08:37:42] 10DBA, 10Operations: Database cumin aliases without a matching host - https://phabricator.wikimedia.org/T258376 (10Joe) [08:38:50] 10DBA, 10Operations: Database cumin aliases without a matching host - https://phabricator.wikimedia.org/T258376 (10Marostegui) p:05Triage→03Medium @jbond Is db-section-idp-test really needed? [08:40:20] 10DBA, 10Operations: Database cumin aliases without a matching host - https://phabricator.wikimedia.org/T258376 (10Kormat) The aliases are auto-generated from the list of valid sections defined in `modules/profile/types/mariadb/valid_section.pp`. We could maybe special-case ones which are always going to be em... [08:40:42] 10DBA, 10Operations, 10User-Kormat: Database cumin aliases without a matching host - https://phabricator.wikimedia.org/T258376 (10Kormat) a:03Kormat [08:41:20] _joe_: does the aliases check have the option to ignore a set of aliases? [08:42:59] mm, looks like no [08:43:09] <_joe_> kormat: nope [08:43:21] <_joe_> but it's ok if you just say it should stay that way on the task [08:44:13] marostegui: how do i see where the wikitech db is hosted? [08:46:24] kormat: https://noc.wikimedia.org/dbconfig/eqiad.json you can look for s10 there [08:46:44] And you can check what s10 has by checking: mediawiki-config/dblists/s10.dblist [08:47:28] hum [08:47:39] so db1133 is in section m5 _and_ s10? [08:47:50] db1133 hosts wikitech yep [08:48:06] https://phabricator.wikimedia.org/T167973 [08:48:29] s10.dblist says `labswiki` [08:48:41] which is not especially illuminating :) [08:50:58] hmm, ok. so 's10' is a 'virtual' section from the perspective of puppet. i.e. mediawiki knows about it, but puppet does not [08:53:34] kormat: yeah, labswiki is the database for wikitech [08:58:28] 10DBA, 10Operations, 10User-Kormat: Database cumin aliases without a matching host - https://phabricator.wikimedia.org/T258376 (10Kormat) >>! In T258376#6318266, @Marostegui wrote: > @jbond Is db-section-idp-test really needed? It's there because of `profile::mariadb::misc::idp_test`, but it looks like cur... [09:00:58] 10DBA, 10Operations, 10User-Kormat: Database cumin aliases without a matching host - https://phabricator.wikimedia.org/T258376 (10jcrespo) Maybe 608639 wasn't properly reverted? [09:26:41] 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat: Database cumin aliases without a matching host - https://phabricator.wikimedia.org/T258376 (10jbond) 05Open→03Resolved This has been removed now [09:28:02] 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat: Database cumin aliases without a matching host - https://phabricator.wikimedia.org/T258376 (10Kormat) 05Resolved→03Open Re-opening until the s10/s11 issue has been resolved. [10:00:34] kormat: we can adjust the check ofc [10:00:41] [re: cumin aliases] [10:33:08] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Marostegui) We have picked up this topic in our weekly and we're going to see how and if it is possible to do this with without having to depend on the DC switchover.... [10:49:54] 10DBA, 10Operations: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui) [10:50:11] 10DBA, 10Operations: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui) p:05Triage→03Medium [10:51:46] 10DBA, 10Operations: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui) [10:51:49] 10DBA, 10Operations: db1085 crashed - https://phabricator.wikimedia.org/T258360 (10Marostegui) [10:51:52] 10DBA, 10Operations: db1082 crashed - https://phabricator.wikimedia.org/T258336 (10Marostegui) [10:52:37] 10DBA, 10Operations: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui) [10:52:40] 10DBA, 10Operations: Refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [11:37:32] 10DBA, 10Operations: db1082 crashed - https://phabricator.wikimedia.org/T258336 (10Marostegui) Just for the record, looks like this host has had a history of HW crashes before: T178460 T158188 T145533 T145607 [11:37:50] 10DBA, 10Operations: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui) [12:18:15] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for arywiki - https://phabricator.wikimedia.org/T257725 (10Urbanecm) @Marostegui The database was just created. [12:36:45] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for lijwikisource - https://phabricator.wikimedia.org/T258389 (10Urbanecm) [12:40:45] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for avkwiki - https://phabricator.wikimedia.org/T258077 (10Urbanecm) Thanks @bstorm! [12:47:30] 10DBA, 10Cloud-Services, 10User-Kormat: Prepare and check storage layer for sysop_itwiki - https://phabricator.wikimedia.org/T257125 (10Urbanecm) @Kormat The database was just created. [13:04:39] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for arywiki - https://phabricator.wikimedia.org/T257725 (10Marostegui) a:03Marostegui Thanks - I am going to sanitize it [13:15:48] 10DBA, 10OTRS, 10Operations, 10serviceops: Create a parallel OTRS database with a frozen snapshot of the production one - https://phabricator.wikimedia.org/T257928 (10akosiaris) Many thanks! [13:22:32] 10DBA, 10Cloud-Services, 10User-Kormat: Prepare and check storage layer for sysop_itwiki - https://phabricator.wikimedia.org/T257125 (10Kormat) 05Open→03Resolved a:03Kormat I can confirm the database does not get replicated to labsdb. This also means no views need to be created, so resolving this now. [13:27:30] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for lijwikisource - https://phabricator.wikimedia.org/T258389 (10Marostegui) a:03Marostegui This needs sanitization [14:07:20] 10DBA, 10Cloud-Services, 10User-Kormat: Prepare and check storage layer for sysop_itwiki - https://phabricator.wikimedia.org/T257125 (10Urbanecm) Thanks! [15:00:54] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Andrew) Thank you for looking at this! The only issue I can think of with extended read-only time on wikitech is that it will break the SAL; any other edits can easil... [18:18:38] 10DBA, 10Parsoid, 10Parsoid-Tests: mysqldump of testreduce_vd database on scandium - https://phabricator.wikimedia.org/T258429 (10ssastry) [19:35:19] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Cmjohnson) {F31942312}. @Jclark-ctr TSR report is attached [22:15:13] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Jclark-ctr) Confirmed: Service Request 1030121866 was successfully submitted.