[00:00:21] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2145.codfw.wmnet'] ` Of which those **FAILED**: ` ['db2145.codfw.wmnet'] ` [00:05:33] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Papaul) Why db2145 failed below ` Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while ev... [01:52:22] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` db2145.codfw.wmnet ` The log can be found in `/var... [02:16:19] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2145.codfw.wmnet'] ` and were **ALL** successful. [02:17:18] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Papaul) [02:18:16] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` db2146.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/... [02:22:15] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` ['db2147.codfw.wmnet', 'db2148.codfw.wmnet'] ` The log can be found in `... [02:41:36] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2146.codfw.wmnet'] ` and were **ALL** successful. [02:43:07] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Papaul) [02:58:53] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` db2149.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/... [03:11:35] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2147.codfw.wmnet', 'db2148.codfw.wmnet'] ` and were **ALL** successful. [03:16:18] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Papaul) [06:00:55] 10DBA, 10SRE, 10ops-codfw: Degraded RAID on db2146 - https://phabricator.wikimedia.org/T275590 (10Marostegui) 05Open→03Invalid This can be ignored, as this is a new host, the raid was still being initializing. It is all good. ` root@db2146:~# megacli -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Infor... [06:01:37] 10DBA, 10SRE, 10ops-eqiad: Degraded RAID on db1103 - https://phabricator.wikimedia.org/T275266 (10Marostegui) Thank you for the fast response. Confirming this is all good now: ` root@db1103:~# megacli -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name... [06:14:35] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Marostegui) >>! In T273568#6855111, @Papaul wrote: > Why db2145 failed below > ` > Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evalu... [06:24:32] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1090.eqiad.wmnet - https://phabricator.wikimedia.org/T274333 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1090.eqiad.wmnet` - db1090.eqiad.wmnet (**PASS**) - Downtimed host on Icinga... [06:28:05] 10DBA, 10DC-Ops, 10decommission-hardware, 10ops-eqiad, 10Patch-For-Review: decommission db1090.eqiad.wmnet - https://phabricator.wikimedia.org/T274333 (10Marostegui) a:05Marostegui→03wiki_willy This is ready for #dc-ops [06:31:10] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [11:37:58] 10DBA, 10mariadb-optimizer-bug: Investigate possible optimizer regression on 10.4.17 with DELETE statements - https://phabricator.wikimedia.org/T268457 (10Marostegui) For the record 10.4.18 has been released [13:28:18] 10DBA, 10Data-Services: Prepare and check storage layer for mniwiktionary - https://phabricator.wikimedia.org/T273459 (10Kormat) a:03Kormat Sanitization is in place, complete private data check running now. [13:36:15] 10DBA, 10Data-Services: Prepare and check storage layer for mniwiki - https://phabricator.wikimedia.org/T273465 (10Kormat) a:03Kormat Sanitization is in place, complete private data check running now. [13:36:20] 10DBA, 10Data-Services: Prepare and check storage layer for altwiki - https://phabricator.wikimedia.org/T271982 (10Kormat) a:03Kormat Sanitization is in place, complete private data check running now. [13:44:11] 10DBA: Reimage db1134 to Buster and repool it - https://phabricator.wikimedia.org/T275343 (10Marostegui) [13:49:34] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` db2150.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/... [13:58:30] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` db2151.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/... [14:13:56] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2150.codfw.wmnet'] ` and were **ALL** successful. [14:14:17] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` db2152.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/... [14:15:15] 10DBA: mariadb: Replication lag monitoring does not support circular replication - https://phabricator.wikimedia.org/T275497 (10Kormat) A related issue is that when we switch over to codfw as primary DC, we do _not_ switch the misc sections, so puppet code which depends on mw_primary == section primary is then w... [14:22:06] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2151.codfw.wmnet'] ` and were **ALL** successful. [14:23:22] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Papaul) [14:36:30] 10DBA: mariadb: Replication lag monitoring does not support circular replication - https://phabricator.wikimedia.org/T275497 (10Kormat) **Proposal** For every section, define: - writeable DC: mwprimary/eqiad/codfw/both - replication type: none/unidirectional/circular This will allow correct monitoring for inte... [14:39:04] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2152.codfw.wmnet'] ` and were **ALL** successful. [14:41:37] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Papaul) [14:42:10] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Papaul) 05Open→03Resolved @Marostegui all yours. Have fun [14:49:21] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Marostegui) Thank you Papaul - we'll take it from here They look good: ` [14:48:47] marostegui@cumin1001:~$ sudo cumin 'db21[45-52].codfw.wmnet' 'free -g ; echo ; df -hT /srv; e... [14:49:58] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for mniwiktionary - https://phabricator.wikimedia.org/T273459 (10Kormat) a:05Kormat→03None Private data check was clean. View db was created, and is ready for #cloud-services-team. [14:50:31] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for mniwiki - https://phabricator.wikimedia.org/T273465 (10Kormat) a:05Kormat→03None Private data check was clean. View db was created, and is ready for #cloud-services-team. [14:51:04] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for altwiki - https://phabricator.wikimedia.org/T271982 (10Kormat) a:05Kormat→03None Private data check was clean. View db was created, and is ready for #cloud-services-team. [14:51:42] 10DBA: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [14:51:56] 10DBA: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) p:05Triage→03Medium [14:52:11] 10DBA: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [14:52:14] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install db11[76-84] - https://phabricator.wikimedia.org/T273566 (10Marostegui) [15:08:18] 10DBA: mariadb: Replication lag monitoring does not support circular replication - https://phabricator.wikimedia.org/T275497 (10Marostegui) As we spoke on IRC, we'd need to switch those flags as pre-steps on the DC switchover, as for XX days before and after the switchover we do enable circular replication on sX... [16:05:47] 10DBA, 10Analytics-Clusters, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) [20:08:45] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for altwiki - https://phabricator.wikimedia.org/T271982 (10Andrew) 05Open→03Resolved a:03Andrew Should be all set. [20:36:53] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for mniwiktionary - https://phabricator.wikimedia.org/T273459 (10Andrew) 05Open→03Resolved a:03Andrew [21:35:37] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for mniwiki - https://phabricator.wikimedia.org/T273465 (10Andrew) 05Open→03Resolved a:03Andrew [23:38:05] 10DBA, 10GlobalUsage, 10Platform Engineering Roadmap Decision Making, 10StructuredDataOnCommons: Normalize globalimagelinks table - https://phabricator.wikimedia.org/T241053 (10CCicalese_WMF) [23:48:16] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install db11[76-84] - https://phabricator.wikimedia.org/T273566 (10Jclark-ctr) [23:48:43] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install db11[76-84] - https://phabricator.wikimedia.org/T273566 (10Jclark-ctr) Racked and cabled 1 host Db1176 A1 u6 p14 id1751