[05:07:30] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) I am going to do one more compatibility test on s6, just to be completely sure [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbst... [05:40:41] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [06:58:36] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) [06:58:47] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [07:18:42] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [07:23:38] 10DBA, 10Epic, 10User-Elukey: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423 (10Marostegui) @elukey the only way to use some sort of password-less auth for MySQL would be to indeed use unix socket authentication - however, users would need to be... [07:26:13] 10DBA, 10Epic, 10User-Elukey: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423 (10elukey) This is a good question, I don't know a precise number, but I'd say between 10 and 20? (very quick guess) It shouldn't be a huge issue to ask users to have... [07:28:22] 10DBA, 10Epic, 10User-Elukey: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423 (10Marostegui) You might want to talk to @bd808 @chasemp or @Bstorm to get some more details about that script they use to handle those users. [07:34:45] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) >>! In T203709#4605397, @Marostegui wrote: >>>! In T203709#4600563, @Marostegui wro... [08:11:50] 10DBA, 10Epic, 10User-Elukey: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423 (10jcrespo) If you mean "Can analytics prepare and setup a different authentication backend/system for the analytics MySQL Server? Will you help us do that?" the answer... [09:45:31] 10DBA: BBU problems dbstore2002 & db2042 - https://phabricator.wikimedia.org/T205257 (10Banyek) [09:46:04] 10DBA: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10Marostegui) [09:48:38] 10DBA: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10Marostegui) If the BBU are compatibles maybe we can: - Use db2064's BBU for dbstore2002 - The new x1 host that has been ordered (T199501#4603837) will replace db2033/or db2069 in x1. - Move either db2033/db2069 to become the new m3 m... [10:01:22] jynus, banyek meeting? [11:01:12] 10DBA, 10Epic, 10User-Elukey: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423 (10elukey) >>! In T159423#4609785, @jcrespo wrote: > If you mean "Can analytics prepare and setup a different authentication backend/system for the analytics MySQL Serv... [11:18:47] I did s1 and s2 master [11:18:59] will take care of es hosts later [11:30:33] 10DBA, 10Patch-For-Review: dbstore2002 s2 crashed - https://phabricator.wikimedia.org/T204593 (10Banyek) The compression stopped and the replication was resumed. The compression will continue after the tomorrow's backup is done. [11:30:54] 10DBA, 10Patch-For-Review, 10User-Banyek: dbstore2002 s2 crashed - https://phabricator.wikimedia.org/T204593 (10Banyek) [12:14:12] dbstore2002 is catching up really slow - imho it can be because of the slow, cache-less raid [12:44:56] could be although the other threads are not delayed, so looks like it is not having a massive impact on replicaiton [12:51:06] 10DBA, 10Patch-For-Review, 10User-Banyek: dbstore2002 s2 crashed - https://phabricator.wikimedia.org/T204593 (10Banyek) The disk performs pretty slow, and the reason might be T205257 Until replication catch up I enabled the write cache with ``` hpssacli ctrl slot=0 modify dwc=enable ``` [13:23:01] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) Everything looks fine with db2088, db2062 and db2070 query patterns. I am going to... [13:23:35] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) s5 progress: [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore2001 [] dbstore100... [13:24:18] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) [13:44:07] going soon for es2 restart [13:44:37] +1 [13:44:42] es2? [13:44:45] or s2? [13:44:45] yes [13:44:57] external store, cluster #2 [13:45:01] roger [13:45:02] thanks! [13:45:07] I already did s1 and s2 [13:45:14] ah great! :-) [13:45:19] so all s* and x* hosts are restarted [13:45:27] <3 [13:45:28] not sure ic pc* too [13:45:31] *if [13:45:35] I did pc1xxx [13:45:38] All of them [13:45:39] ok [13:45:50] I may not do es1 ones [13:45:54] as there is not rush for those [13:46:18] only es2 and es3 [13:46:36] yeah, sounds like a plan! [14:13:47] there are some light but constant errors on es1 hosts on codfw [14:13:59] will try to debug that after es3 restart [14:15:03] marostegui: there seems to be high QPS on codfw s8 multiinstance [14:15:48] since 13:22 [14:25:06] 10DBA, 10User-Banyek: Maintenance M4 cluster - https://phabricator.wikimedia.org/T205288 (10Banyek) [14:28:14] I haven't touched s8 today [14:31:20] will have a look, could be user-created [14:31:50] I am taking a look [14:33:44] https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?orgId=1&var-dc=codfw%20prometheus%2Fops&var-group=core&var-shard=s8&var-role=All&from=1537778012832&to=1537799612832 [14:35:00] flapping up and down [14:36:56] I am not seeing that reflected on the rc slaves [14:37:19] the high qps or the lowering now of load [14:37:24] ? [14:37:50] 10DBA, 10User-Banyek: Maintenance M4 cluster - https://phabricator.wikimedia.org/T205288 (10Banyek) **db1107**: The hosts acts as master in m4 cluster. The maintenance itself can be done as: Agree in a usable maintenance window Stop eventlog1002 to push data Disable eventlogging_cleaner on both hosts Make sur... [14:37:52] the high qps [14:38:29] Icinga, which checks punctual qps (queries in a single second period) show 20018.52 QPS [14:39:05] on db2083 [14:39:28] followed bt 15K [14:39:34] it is flapping up and down [14:39:45] Ah, I was checking rc slaves (as you said multiinstance) [14:40:03] 10DBA, 10User-Banyek: Maintenance M4 cluster - https://phabricator.wikimedia.org/T205288 (10Banyek) **db1108** - Agree in a usable maintenance window - Disable `eventogging_sync.sh` on **db1108** - Disable `eventlogging_cleaner` - Do the maintenance itself as https://phabricator.wikimedia.org/P7496 - Enable `... [14:40:34] you can see here it is not a normal state: https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?orgId=1&var-dc=codfw%20prometheus%2Fops&var-group=core&var-shard=s8&var-role=All&from=1537660800000&to=1538265599999 [14:40:40] maybe it is not rc? [14:40:46] I am not sure [14:40:56] That is what I was seeing, that on the rc slaves I was not able to correlate that graph [14:41:03] So I was like: where is it! [14:41:28] now down to 10K [14:41:41] it doubles QPS and triples traffic when happening [14:42:04] 10DBA, 10User-Banyek: Maintenance M4 cluster - https://phabricator.wikimedia.org/T205288 (10Banyek) Notes about the performable operations can be found here: https://docs.google.com/document/d/1JVwK2TvaLJ7cs5-qnx_536LYM_3G-aAum66FtoFjhFE/ [14:50:56] is it worth for me to keep an eye on it, I just finished my reboots [14:51:00] ? [14:56:29] 10DBA, 10Patch-For-Review, 10User-Banyek: dbstore2002 s2 crashed - https://phabricator.wikimedia.org/T204593 (10jcrespo) p:05Normal>03Low This certainly now have a lower priority as most of the compression needed to not run out of space was solved and the instance was rebuilt. You can also close it as re... [15:01:41] Sorry jynus I was in a meeting [15:01:48] I leave for today, see you on SRE meeting [15:02:10] 10DBA, 10Operations, 10Research, 10Services (designing): Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10bmansurov) @jcrespo anything else blocking us from importing data to the database? Any documentation on connecting to the database from the services? @Pchelolo whe... [15:05:59] 10DBA, 10Patch-For-Review, 10User-Banyek: dbstore2002 s2 crashed - https://phabricator.wikimedia.org/T204593 (10Marostegui) >>! In T204593#4610981, @jcrespo wrote: > This certainly now have a lower priority as most of the compression needed to not run out of space was solved and the instance was rebuilt. You... [15:09:53] things seem mostly normal now, so not taking any action [15:13:22] 10DBA, 10Operations, 10Research, 10Services (designing): Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10jcrespo) > anything else blocking us from importing data to the database? There is no formal request yet, you need to create a ticket to #DBAs to ask to create a... [15:31:40] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10bmansurov) [15:32:35] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10bmansurov) [15:37:46] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) Are you sure you want to call your database and user with '-' signs? It is completely allowed, but you may have to annoyingly escape it in certain contexts: ``` mysql> SELEC... [15:50:13] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [15:52:27] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10bmansurov) [15:52:55] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10bmansurov) @jcrespo, good call. I've updated the task description. [15:53:44] ^ jynus for that I will just backup it every week as we do now, it is tiny, and I don't think we have the ability now to say: this DB only once a month [15:54:47] well, I mostly asked because it is apparently imported [15:54:54] so maybe backups are not needed at all [15:55:03] or are backed up in a different way [15:59:14] sure, but as they said 1 a month… [15:59:18] we could as well do it once a week [16:01:38] ok, I didn't see the answer yet [16:09:05] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10bmansurov) [16:23:17] 10DBA, 10Operations, 10Research, 10Services (designing): Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10Pchelolo) > @Pchelolo where would database settings live? Would it be the service codebase itself or do we have a separate repository for that? Usually the source... [16:36:44] 10DBA, 10Patch-For-Review, 10User-Banyek: dbstore2002 s2 crashed - https://phabricator.wikimedia.org/T204593 (10Banyek) sync_binlog was already disabled, and I don't found innodb_flatc [16:37:53] banyek|away: I meant innodb_flush_log_at_trx_commit, sorry [16:38:13] that makes sense :) [16:40:27] but that one is also disabled as well [16:41:24] banyek|away jynus any reason why notifications are disabled on dbstore2002 for s1,s3,s4 and x1? [16:41:31] s2 i get why, but the others? [16:43:46] lag is disabled because during backups running it is annoying [16:43:55] but only lag should be disabled [16:43:58] yep [16:44:01] not other services [16:44:01] only that is [16:44:04] good! thanks :) [16:44:11] it is annoying, we can add a comment [16:44:17] or puppetize it [16:44:35] I will add a comment for now [16:45:09] or can we add it to the backup script itself? (so set & mute when start, and enable when finishes) [16:48:51] banyek|away: patches welcome! :-D [16:49:04] yay [16:51:30] 10DBA, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, 10Quibble, 10Release-Engineering-Team (Kanban): Enable MariaDB/MySQL strict mode on CI db hosts - https://phabricator.wikimedia.org/T119371 (10hashar) a:05hashar>03None For now CI has `sql_mode = 'TRADITIONAL'` unfortunately I lack... [16:52:19] 10DBA, 10Patch-For-Review, 10User-Banyek: dbstore2002 s2 crashed - https://phabricator.wikimedia.org/T204593 (10Marostegui) >>! In T204593#4610981, @jcrespo wrote: > You can also close it as resolved and create a lower priority ticket to review missing compression on dbstore200X tables. > +1 to close this... [21:49:58] 10DBA, 10Wikimedia-Extension-setup, 10foundation.wikimedia.org: Enable Translate extension on Governance wiki - https://phabricator.wikimedia.org/T205349 (10MarcoAurelio) This is pending #dba review. Per the #datacenter-switchover-2018 guidelines we were given via wikitech-l, we're adviced //not// to deploy... [21:57:50] marostegui: ^^ -- no me dispares [21:58:00] aunque temo más a Jaime