[05:18:07] 10DBA: db2019 has performance issues, replace disk or switchover s4 master elsewhere - https://phabricator.wikimedia.org/T170351#3429001 (10Marostegui) It might be easier just to replace the disk even if this host will go away at some point. @Papaul do you have spare disks? [05:20:08] 10DBA, 10Analytics, 10Analytics-EventLogging: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3429003 (10Marostegui) I can confirm that only x1 broke [05:57:47] 10DBA, 10Analytics, 10Analytics-EventLogging: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3429036 (10Marostegui) 05Open>03Resolved a:03Marostegui I have fixed x1 and replication has caught up again ``` Seconds_Behind_Master: 0 ``` [06:18:53] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3429080 (10Marostegui) [06:26:58] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3429081 (10Marostegui) [06:27:01] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3156297 (10Marostegui) Thanks @jcrespo and @Cmjohnson for advancing a lot on this task! The only pending host now to be able to resolve this task is db1106 which looks like it doesn't have... [06:29:25] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3429098 (10Marostegui) [06:29:29] dbstore2001:s1 is not replicating back to catchup [06:30:08] it was slowly, but now there is a heavy IO operation on that host [06:30:12] So it is going back again [06:31:49] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3429099 (10Marostegui) labsdb1011 is done for the table that exist there: ``` categorylinks PRIMARY KEY (`cl_from`,`cl_to`), templatelinks PR... [06:43:47] did install work for you? [06:44:00] didn't puppet fail? [06:44:17] it failed because of the same thign that failed for you yesterday, the ipmi tools and I did faidon's workaround [06:44:24] installing the package and then running puppet [07:05:10] yeah, I'll ask alex to fix today [07:05:41] I can fix too, but this is a bit delicate, we had to revert this once alrady [07:05:44] *already [07:28:22] 10DBA: Drop localisation and localisation_file_hash tables, l10nwiki databases too - https://phabricator.wikimedia.org/T119811#3429137 (10Marostegui) Dropped from s1 (enwiki) [07:30:49] 10DBA: Drop localisation and localisation_file_hash tables, l10nwiki databases too - https://phabricator.wikimedia.org/T119811#3429139 (10Marostegui) 05Open>03Resolved Dropped from s3 [07:30:52] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3429141 (10Marostegui) [07:49:16] what is the status of db1102? [07:49:56] It is replicating s6 (although stopped for now), replicating s2 and replicating (s7 - which I will add to tendril in a bit) [07:49:59] why? [07:50:36] I wanted to upgrade it to stretch and apply it a the new role [07:50:57] you can do that if you like :-) [07:51:08] maybe upgrade all of labs [07:51:16] new [07:51:31] labsdb1009 and 1010 are importing data now [07:51:48] I will wait then [07:51:57] what about dbstore2002? [07:52:16] I was about to import x1 now, but it doesn't need to be done now [07:52:20] So you can do that one if you like [07:52:46] why puppet disabled there? [07:53:18] because it is running multi-instance but set up in a manual way [07:53:37] ywa [07:53:50] but what is puppet preventing? [07:53:59] overwriting my.cnf? [07:54:00] removing all the stuff from my.cnf [07:54:00] yeah [07:54:06] so I think it is a good candidate [07:54:09] for your tests [09:14:27] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3429346 (10Marostegui) dbstore1002 crashed yesterday (T170308 ) while altering templatelinks table (119G). I have altered all the small pending t... [09:15:05] 10DBA, 10Patch-For-Review: Setup dbstore2002 with 2 new mysql instances from production and enable GTID - https://phabricator.wikimedia.org/T169510#3429348 (10Marostegui) x1 has been imported on dbstore2002 and it is up and replicating. [09:43:29] give a look at https://gerrit.wikimedia.org/r/364681 when youcn [09:44:31] looking [09:49:16] 10DBA, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3429464 (10elukey) a:03elukey [09:52:12] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3429482 (10Marostegui) db1066 is done: ``` root@neodymium:/home/marostegui# for i in `cat s1_tables`; do echo $i; mysql --skip-ssl -hdb1066 enwik... [09:52:43] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3429483 (10Marostegui) [11:42:47] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=dbstore2002 [11:46:50] 10DBA, 10Patch-For-Review: Refactor puppet mariadb class to support multi-instance hosts - https://phabricator.wikimedia.org/T169514#3430104 (10jcrespo) I brought down db1096 and applied it to dbstore2002: ``` ├─mariadb.service │ └─12082 /opt/wmf-mariadb101//bin/mysqld --datadir=/sr... [11:48:34] https://grafana-admin.wikimedia.org/dashboard/db/mysql?orgId=1&from=now-24h&to=now&var-dc=codfw%20prometheus%2Fops&var-server=dbstore2002 [11:48:54] right now s2 is being monitored, which is wrong- that and many other things pending to fix [11:49:56] there are a couple of tips I will have to tell you about 10.1 upgrade that I hit every time, in case you don't already know them [11:55:57] go ahead [11:56:02] :) [12:41:33] 10DBA, 10Operations, 10Wikimedia-Site-requests, 10Patch-For-Review: Create CoC committee private wiki - https://phabricator.wikimedia.org/T165977#3430212 (10Dereckson) Wiki creation is postponed. Next step is to merge Apache configuration — https://gerrit.wikimedia.org/r/354959 [12:51:26] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3430262 (10Marostegui) So dbstore1002 is "done", as I mentioned in: T166204#3429346 templatelinks and pagelinks tables will be skipped and will r... [12:52:56] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3430269 (10Marostegui) [13:53:22] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430531 (10Marostegui) ``` root@neodymium:/home/marostegui# mysql --skip-ssl -hdb1075 dinwiki -e "show tables;" | wc -l 78 ``` Is this all done in production and we should go ahe... [13:53:56] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430534 (10Marostegui) ``` root@neodymium:/home/marostegui# mysql --skip-ssl -hdb1075 maiwiki -e "show tables;" | wc -l 83 ``` Is this all done in production and we should g... [13:58:10] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430551 (10Urbanecm) Ping @Dereckson. As I watched -operations, it seems like we are waiting for Apache config being merged by ops (see the main task for details) and everyth... [13:58:35] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430558 (10Urbanecm) >>! In T169193#3430531, @Marostegui wrote: > ``` > root@neodymium:/home/marostegui# mysql --skip-ssl -hdb1075 dinwiki -e "show tables;" | wc -l > 78 > ``` >... [13:59:01] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430560 (10Marostegui) Would that be a blocker for the table sanitization? [13:59:14] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430561 (10Marostegui) ok - I will go ahead then [13:59:20] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430562 (10Marostegui) a:03Marostegui [13:59:26] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430563 (10Urbanecm) I don't think so - database is created. [14:06:41] just wanted to let you know that the eventlogging_cleaner.py is deployed on db1047 (+ mysql grants) and it connects to the db correctly via the unix socket [14:07:38] the only weirdness with configparser is [14:07:39] configparser.DuplicateOptionError: While reading from '/etc/my.cnf' [line 71]: option 'plugin-load' in section 'mysqld' already exists [14:07:56] because two plugins are loaded :D [14:09:22] jynus: Hey :) Sorry I'm late, I'm here [14:10:02] np [14:12:27] so just reboot with no failover? [14:12:42] jynus: I sent out a note to mailing list. [14:12:43] yes [14:13:19] elukey: configparser expects unique keys per section IIRC, I don't think you can get around that [14:21:18] marostegui- protip there is 83 tables https://phabricator.wikimedia.org/T168788#3430534 [14:21:23] *82 [14:21:37] -BN to get rid of headers [14:22:29] hehe yeah thanks [14:22:31] I always forget :) [14:24:21] it wan't important, just in case a script is used [14:34:49] volans: I managed to make it work with my.cnf with configparser.ConfigParser(strict=False, allow_no_value=True) [14:35:46] oh good! Are those options new? I remembered it had problems in the past for duplicate keys [14:38:45] volans: strict seems to be present from 3.2 (Changed in version 3.2: In previous versions of configparser behaviour matched strict=False.) [14:39:48] * volans curious to know what happens if you try to get plugin-load from the parsed config ;) [14:40:51] I am trying it :) [14:41:28] volans: gets the last one [14:41:37] that makes sense [14:41:42] 10DBA, 10Patch-For-Review: Setup dbstore2002 with 2 new mysql instances from production and enable GTID - https://phabricator.wikimedia.org/T169510#3430740 (10jcrespo) So right now the separate instances are puppetized, the main multi-source one isn't. To handle it: ``` systemctl set-environment MYSQLD_OPTS="... [14:41:46] ok, so "works" [14:44:16] we can maybe separate the client config on a separate file in the future [14:51:09] volans- 2 same keys is not only a valid configuration [14:51:37] in some cases, like replication filters, it is compulsory, and trying to define comma-separated filters will fail [14:51:53] I was referring to the python parsing of it, not the mysql side ;) [14:52:26] it "works" in the sense that is able to parse the file, failing to return all the items with multiple keys but only the last one [14:52:37] it will get more interesting [14:52:47] when peristent configuration starts playing its role [14:52:59] oh that for sure [14:53:07] so 3 possible values, on file, on memory, or on temporary persistance internal storage [14:53:58] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430762 (10Marostegui) a:03Marostegui [14:55:35] 10DBA, 10Patch-For-Review: Setup dbstore2002 with 2 new mysql instances from production and enable GTID - https://phabricator.wikimedia.org/T169510#3430770 (10Marostegui) >>! In T169510#3430740, @jcrespo wrote: > So right now the separate instances are puppetized, the main multi-source one isn't. To handle it:... [14:56:33] 10DBA, 10Patch-For-Review: Setup dbstore2002 with 2 new mysql instances from production and enable GTID - https://phabricator.wikimedia.org/T169510#3430795 (10Marostegui) ^ that is not for this ticket, sorry! [15:09:55] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3430862 (10Marostegui) I have sanitized sanitarium and sanitarium2 - but before creating the views I am running a check_private_data to make sure everything has been sanitized. [15:10:08] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3430867 (10Marostegui) I have sanitized sanitarium and sanitarium2 - but before creating the views I am running a check_private_data to make sure everything has been sanitized. [15:56:36] 10DBA, 10Operations, 10Scoring-platform-team, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3431085 (10madhuvishy) [16:06:40] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T170433#3431134 (10Volans) p:05Triage>03Normal [16:08:28] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T170433#3431142 (10Marostegui) 05Open>03declined There was already a ticket for that host: T169448 [16:09:01] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T169448#3431147 (10jcrespo) [16:09:04] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T170433#3431149 (10jcrespo) [16:11:35] just talked to chris, we need to order more disk spares, he'll talk to rob [16:11:49] I am going to logoff now :-) [16:11:52] See you tomorrow! [16:16:17] bye [18:17:39] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3431922 (10Dereckson) Thanks (indeed it wasn't a blocker, as the db is in ready state). [18:18:16] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3390099 (10Dereckson) This one is public. [19:20:17] 10DBA, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3432244 (10Nuria) 05Open>03Resolved [20:03:11] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3432484 (10Cmjohnson) @jcrespo please remind how you would like the raid setup..Raid10? [20:08:33] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3432538 (10Marostegui) Yep, RAID10 please: https://wikitech.wikimedia.org/wiki/Raid_and_MegaCli#Raid_setup_at_Wikimedia Thanks! [20:08:56] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T169448#3432540 (10Cmjohnson) New disks need to be ordered . A task has been created and escalated to @faidon T170446 [20:09:48] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T169448#3432547 (10Marostegui) Thanks a lot @Cmjohnson [20:25:50] 10DBA, 10Discovery, 10GeoData, 10Interactive-Sprint: Removal of {{#coordinates:}} leaves DB entries behind - https://phabricator.wikimedia.org/T143366#3432665 (10mpopov) Tagging #dba here because the geo_tag table grows whenever someone adds coordinates but does not shrink when coordinates are removed on-w...