[03:34:13] 10DBA, 10DiscussionTools, 10Performance-Team, 10Editing-team (FY2020-21 Kanban Board), and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Krinkle) [03:35:10] 10DBA, 10DiscussionTools, 10Performance-Team, 10Editing-team (FY2020-21 Kanban Board), and 2 others: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Krinkle) > * Perform the reclaim sequence on the parser cache database servers. The DBAs have be... [05:11:40] 10DBA, 10Data-Services, 10decommission-hardware, 10cloud-services-team (Kanban): decommission labsdb1011.eqiad.wmnet - https://phabricator.wikimedia.org/T282524 (10Marostegui) 05Stalled→03Open [05:23:08] 10DBA, 10Data-Services, 10decommission-hardware, 10Patch-For-Review, 10cloud-services-team (Kanban): decommission labsdb1011.eqiad.wmnet - https://phabricator.wikimedia.org/T282524 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `labsdb1011.eqiad.wmnet`... [05:23:50] 10DBA, 10decommission-hardware, 10ops-eqiad: decommission labsdb1011.eqiad.wmnet - https://phabricator.wikimedia.org/T282524 (10Marostegui) This is ready for #dc-ops [05:23:57] 10DBA, 10decommission-hardware, 10ops-eqiad: decommission labsdb1011.eqiad.wmnet - https://phabricator.wikimedia.org/T282524 (10Marostegui) a:05Marostegui→03wiki_willy [08:43:12] 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Performance-Team: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) Optimize run started against pc1010. [09:52:32] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [09:56:25] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [10:00:12] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [10:01:25] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [10:02:53] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [10:13:12] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [10:13:20] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [10:13:30] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [10:14:56] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [10:16:21] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Marostegui) [10:19:16] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [10:26:11] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [10:26:28] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) s3 is done, only the master pending, to be done once the switchover is completed (T283131) [10:26:42] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) s3 is done, only the master pending, to be done once the switchover is completed (T283131) [10:26:44] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) s3 is done, only the master pending, to be done once the switchover is completed (T283131) [11:11:44] 10DBA: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) [11:12:39] 10DBA: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) p:05Triage→03Medium a:03Marostegui This should be stalled for now and only to be done once we are happy with s6's performance/stability in around 3-4 weeks or so. [11:12:47] 10DBA: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) 05Open→03Stalled [11:18:36] 10DBA: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10jcrespo) [11:19:01] 10DBA: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) [11:23:17] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10jcrespo) [12:32:26] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [12:37:00] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [12:39:47] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [12:44:07] 10DBA: db-replication-tree doesn't support circular replication - https://phabricator.wikimedia.org/T283239 (10Kormat) [12:44:16] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) p:05Triage→03Medium [12:44:27] 10DBA: db-replication-tree doesn't support circular replication - https://phabricator.wikimedia.org/T283239 (10Kormat) p:05Triage→03Medium [12:49:26] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [12:58:32] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) pt-heartbeat-wikimedia fails to start on db2093 with: ` DBD::mysql::st execute failed: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine lim... [13:00:04] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [13:01:24] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [13:02:07] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [13:26:34] 10Data-Persistence-Backup, 10SRE, 10SRE-swift-storage, 10Epic, 10Goal: WMF media storage must be adequately backed up - https://phabricator.wikimedia.org/T262668 (10LSobanski) [13:35:35] 10DBA: Investigate intermittent replica lag alarms - https://phabricator.wikimedia.org/T274513 (10LSobanski) 05Open→03Declined Having seen this in a while and it's not super clear what the original problem was. Resolving. [13:38:45] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [13:38:50] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [13:38:57] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [13:48:53] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) Heartbeat restarted on all primaries. [13:51:05] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Marostegui) Nice work!! [13:51:43] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) s1 eqiad [x] dbstore1003 [] db1184 [] db1169 [] db1164 [] db1163 [] db1154 [x] db1140 [x] db1139 [] db1135 [] db1134 [] db1133 [] db1119 [] db1118... [13:51:50] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) s1 eqiad [x] dbstore1003 [] db1184 [] db1169 [] db1164 [] db1163 [] db1154 [x] db1140 [x] db1139 [] db1135 [] db1134 [] db1133... [13:51:52] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) s1 eqiad [x] dbstore1003 [] db1184 [] db1169 [] db1164 [] db1163 [] db1154 [x] db1140 [x] db1139 [] db1135 [] db1134 [] db1133 [] db1119 []... [14:09:46] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) >>! In T283228#7100571, @Kormat wrote: > pt-heartbeat-wikimedia fails to start on db2093 with: https://gerrit.wikimedia.org/r/c/operations/puppet/+/693162 merged to fix this for both dbinventory hosts. [14:31:54] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) [14:32:24] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) 05Open→03Resolved Deployment complete: ` kormat@cumin1001:~(0:0)$ sudo debdeploy deploy -u 2021-05-20-wmfmariadbpy.yaml -Q C:wmfmariadbpy Rolling out wmfmariadbpy: Non-daemon update, no service restart needed The update sp... [15:08:55] 10DBA, 10SRE: wmf-auto-reinstall fails on hosts that run pt-heartbeat - https://phabricator.wikimedia.org/T252528 (10Kormat) 05Open→03Resolved a:03Kormat This is now fixed. Puppet will no longer start/stop heartbeat. That is managed by `db-switchover` when changing masters. This does mean that `pt-heartb... [15:24:32] 10Data-Persistence-Backup, 10SRE, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10faidon) p:05Triage→03High Given a) this was linked during budgeting in the context of of our cross-DC... [15:33:28] 10Data-Persistence-Backup, 10SRE, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10jcrespo) One of the things I raised to my manager is that this limitation means that, in the event of a c... [17:04:08] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Cmjohnson) [17:04:10] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [17:04:32] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Cmjohnson) [17:04:34] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [17:06:39] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Cmjohnson) [17:06:44] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [17:07:48] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10Cmjohnson) [17:07:53] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Cmjohnson) [17:08:06] 10Data-Persistence-Backup, 10Analytics-Clusters: Define priorities for HDFS data to be backed up - https://phabricator.wikimedia.org/T283261 (10JAllemandou) [17:12:12] 10Data-Persistence-Backup, 10Analytics-Clusters: Define priorities for HDFS data to be backed up - https://phabricator.wikimedia.org/T283261 (10LSobanski) p:05Triage→03Medium My understanding based on recent conversations was that a decision is yet to be made about what approach to take with HDFS backup /... [17:14:20] 10Data-Persistence-Backup, 10Analytics-Clusters: Define priorities for HDFS data to be backed up - https://phabricator.wikimedia.org/T283261 (10JAllemandou) @LSobanski You're absolutely right, this task is about documenting on our end the priorities and sizes of datasets to be backed up so that we can be bette... [17:18:59] 10Data-Persistence-Backup, 10Analytics-Clusters: Define priorities for HDFS data to be backed up - https://phabricator.wikimedia.org/T283261 (10elukey) @JAllemandou I'd remove Data Persistence from this task, with the team's choices we'll not get any storage for the next fiscal year (so this task may be confus... [17:20:03] 10Data-Persistence-Backup, 10Analytics-Clusters: Define priorities for HDFS data to be backed up - https://phabricator.wikimedia.org/T283261 (10JAllemandou) ACk - doing so - thanks @elukey [17:21:48] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10odimitrijevic) [17:25:50] 10Data-Persistence-Backup, 10Analytics-Clusters: Evaluate possible solutions to backup Analytics Hadoop's HDFS data - https://phabricator.wikimedia.org/T277015 (10LSobanski) p:05Triage→03Low Blocked at least until we get a clear picture from T283261. [18:45:18] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by razzi on cumin1001.eqiad.wmnet for hosts: ` dbstore1006.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage... [19:39:00] leaving this channel, off to libera [19:45:30] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbstore1006.eqiad.wmnet'] ` Of which those **FAILED**: ` ['dbstore1006.eqiad.wmnet'] ` [19:45:53] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by razzi on cumin1001.eqiad.wmnet for hosts: ` dbstore1006.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage... [20:51:44] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbstore1006.eqiad.wmnet'] ` Of which those **FAILED**: ` ['dbstore1006.eqiad.wmnet'] ` [22:19:11] i guess this channel is no longer in use, cya on libera!