[07:48:07] backup check finally went to warning (only 3 hosts without backups) [07:48:18] and hopefully next to green [07:51:07] dbprovs are part of (but not the only) the bottleneck, everybody will benefit from being on a separate pool [09:43:27] 10DBA: dbprov2002 slower to generate snapshots - https://phabricator.wikimedia.org/T236924 (10jcrespo) 05Open→03Resolved dbprov2* will allways be slower due to the daily backups happening there. The extreme slowdown, however, seems fixed. Backups are up to date. [09:43:29] 10DBA, 10Epic: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10jcrespo) [09:49:42] 10DBA, 10Operations, 10Puppet, 10User-jbond: Document all uses of the puppetCA certificate - https://phabricator.wikimedia.org/T237259 (10Joe) As far as etcd is concerned, a rolling restart should be enough to ensure the new CA is picked up. I will take care of that. [09:59:39] 10DBA, 10Operations, 10Puppet, 10User-jbond: Document all uses of the puppetCA certificate - https://phabricator.wikimedia.org/T237259 (10jbond) >>! In T237259#5634722, @Joe wrote: > As far as etcd is concerned, a rolling restart should be enough to ensure the new CA is picked up. I will take care of that... [10:00:04] 10DBA, 10Operations, 10Puppet, 10User-jbond: Document all uses of the puppetCA certificate - https://phabricator.wikimedia.org/T237259 (10jbond) [10:39:20] 10DBA, 10Operations, 10serviceops, 10Goal, 10Patch-For-Review: Switchover backup director service from helium to backup1001 - https://phabricator.wikimedia.org/T236406 (10jcrespo) [10:39:49] 10DBA, 10Operations, 10serviceops, 10Patch-For-Review: Backups on buster hosts fail to run - https://phabricator.wikimedia.org/T235838 (10jcrespo) 05Open→03Resolved [10:39:54] 10DBA, 10Operations, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) [11:21:35] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for szywiki - https://phabricator.wikimedia.org/T237373 (10jhsoby) [14:39:15] is it interesting if I comment on this channel what "reducing consistency temporarilly on db1114" is? [14:39:34] I just did "set global sync_binlog = 0;" [14:39:45] and "set global innodb_flush_log_at_trx_commit=0;" [14:40:52] that meant iops took a dive: [14:40:55] https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1114&var-port=9104&from=1572954039954&to=1572964839954&panelId=34&fullscreen [14:41:10] but if db1114 crashes, I will have to start from 0 [14:41:37] a nice trick in a lag emergency [14:43:00] I could increase throughput more by paralelizing parallel replication, but I do not trust that in a mixed mariadb-mysql environement (or it wouldn't work) [15:16:48] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10Charlotte) p:05Tr... [15:21:29] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10jcrespo) Unless I... [15:22:29] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10Mholloway) [15:23:01] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10Mholloway) Thanks... [15:24:02] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10jcrespo) @Mhollowa... [15:25:50] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10jcrespo) a:03jcre... [15:47:38] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10jcrespo) ` root@cu... [15:53:22] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for szywiki - https://phabricator.wikimedia.org/T237373 (10jcrespo) Thanks for the heads up, as a public wiki it has no blockers on us, we will just require as ping as soon as the database is deployed into production for... [15:54:04] 10DBA, 10CPT Initiatives (Core REST API in PHP), 10Core Platform Team Workboards (Green): Compose query for minor edit count - https://phabricator.wikimedia.org/T235572 (10eprodromou) 05Open→03Resolved @jcrespo so, that sounds about right. Since we've implemented the "don't run for long edit histories" c... [16:00:14] 10DBA, 10CPT Initiatives (Core REST API in PHP), 10Core Platform Team Workboards (Green): Compose query for minor edit count - https://phabricator.wikimedia.org/T235572 (10eprodromou) I started T237430 to do the follow-on work per Tim. We also have T237043 for keeping the counts in a separate table, updated... [17:28:50] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10Mholloway) LGTM!... [17:29:03] 10Blocked-on-schema-change, 10DBA, 10Product-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog (Android-app-release-v2.7.29x-N-Nanaimo-Bar): Schema change for T234955 - add column wetc_revert_count to wikimedia_editor_tasks_counts - https://phabricator.wikimedia.org/T237264 (10Mholloway) 05Open... [21:00:51] 10DBA, 10Goal: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10Papaul) [22:02:37] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Papaul) [22:03:38] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Papaul) [22:06:06] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Papaul) [22:07:11] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Papaul) [22:09:09] 10DBA, 10Operations: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Papaul)