[00:18:27] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10razzi) Hmm, the machine has been renamed and is almost operational, but doesn't have ssh keys so I can't log in. I'm not sure what to do at this point, I trie... [04:32:55] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) @razzi I have fixed the issue and the host is now accessible. The certificated wasn't signed by puppet, so I have done so manually - did you get a... [04:51:09] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1087.eqiad.wmnet - https://phabricator.wikimedia.org/T282093 (10Marostegui) [04:59:36] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1087.eqiad.wmnet - https://phabricator.wikimedia.org/T282093 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1087.eqiad.wmnet` - db1087.eqiad.wmnet (**PASS**) - Downtimed host on Icinga... [04:59:50] 10DBA, 10decommission-hardware: decommission db1087.eqiad.wmnet - https://phabricator.wikimedia.org/T282093 (10Marostegui) a:05Marostegui→03wiki_willy [04:59:58] 10DBA, 10decommission-hardware: decommission db1087.eqiad.wmnet - https://phabricator.wikimedia.org/T282093 (10Marostegui) This is ready for #dc-ops [05:00:36] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [05:05:20] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbstore1006.eqiad.wmnet'] ` The log can be foun... [05:25:16] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbstore1006.eqiad.wmnet'] ` and were **ALL** successful. [05:29:23] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) @razzi all went fine this time, do you remember if you used the `--new` thing? Maybe that was the issue. Anyways, I have executed: ` root@dbstore1... [06:17:42] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) a:03Marostegui [06:17:44] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) a:03Marostegui [06:17:46] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) a:03Marostegui [06:19:10] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) [06:19:21] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) [06:19:31] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) [06:28:00] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) Altered db2089:3316 to leave it for a few days to make sure all is good s6 progress [] dbstore1005 [] db2141 [] db2129 [] db2124 [] db2117 [] db2114 [] db2095 [x] d... [06:28:02] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) Altered db2089:3316 to leave it for a few days to make sure all is good s6 progress [] dbstore1005 [] db2141 [] db2129 [] db2124 [] db2117 [] db2114 [] db2095 [x]... [06:28:04] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) Altered db2089:3316 to leave it for a few days to make sure all is good s6 progress [] dbstore1005 [] db2141 [] db2129 [] db2124 [] db2117 [] db2114 [] db2095 [x] d... [06:29:01] 10DBA, 10Platform Engineering Roadmap Decision Making, 10SRE, 10Performance-Team (Radar), 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Marostegui) I want to start working on this next week on s6. [06:50:15] 10DBA: Fix db-switchover update zarcillo part - https://phabricator.wikimedia.org/T272954 (10Marostegui) @Kormat with yesterday's relase, this is good to be closed? [08:26:43] 10DBA, 10Orchestrator: Investigate moving replicas around with Orchestrator doesn't result on skipped transactions - https://phabricator.wikimedia.org/T267133 (10Marostegui) 05Open→03Resolved So using `orchestrator -c relocate ` is the safest option (or the dashboard on `Smart mode` - which is the one we u... [08:48:15] 10DBA: Fix db-switchover update zarcillo part - https://phabricator.wikimedia.org/T272954 (10Kormat) 05Open→03Resolved Yes indeed 🎉 [08:48:17] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Kormat) [08:53:21] 10DBA, 10MediaWiki-Parser, 10Parsoid, 10Performance-Team: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) Optimize of pc1010 finished. Disk space usage went from 3.72TB to 2.44TB. {F34462617} [10:55:05] 10DBA, 10SRE: wmf-auto-reinstall fails on hosts that run pt-heartbeat - https://phabricator.wikimedia.org/T252528 (10LSobanski) >>! In T252528#7100826, @Kormat wrote: > This does mean that `pt-heartbeat-wikimedia` needs to be started manually after a boot, however. @Kormat is this captured somewhere in docume... [11:04:10] 10DBA, 10SRE-tools: Improve database master switchover script - https://phabricator.wikimedia.org/T200306 (10jcrespo) Stevie: I believe you have implemented bullet point #5 at https://gerrit.wikimedia.org/r/665324 (thank you!). Check the other things that were pending (some years ago) just FYI and feel free t... [11:42:10] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10jcrespo) [11:43:22] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10jcrespo) 05Open→03Resolved a:05jcrespo→03Marostegui db1176 and db2151 puppetized and added to tendril and zarzcillo as misc:mediabackupstemp hosts [11:43:24] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10jcrespo) [11:43:27] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install db11[76-84] - https://phabricator.wikimedia.org/T273566 (10jcrespo) [11:56:04] 10DBA, 10SRE: wmf-auto-reinstall fails on hosts that run pt-heartbeat - https://phabricator.wikimedia.org/T252528 (10Kormat) >>! In T252528#7103046, @LSobanski wrote: >>>! In T252528#7100826, @Kormat wrote: >> This does mean that `pt-heartbeat-wikimedia` needs to be started manually after a boot, however. > >... [12:00:20] 10DBA: Deploy wmfmariadbpy 0.7 - https://phabricator.wikimedia.org/T283228 (10Kormat) For posterity, here's the script i used for the heartbeat changes: {P16127} [12:00:42] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) Thanks! [12:51:04] 10Blocked-on-schema-change, 10DBA: Schema change for making cuc_id in cu_changes unsigned - https://phabricator.wikimedia.org/T283093 (10LSobanski) p:05Triage→03Medium a:03Kormat Assigning to Stevie to confirm if this can go into Ready. [12:52:38] 10DBA: db-replication-tree doesn't support circular replication - https://phabricator.wikimedia.org/T283239 (10LSobanski) @Kormat How do you see the urgency of this (and the probability we'll work on it anytime soon)? [14:12:34] 10Data-Persistence-Backup, 10SRE, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10cmooney) I've been looking into this issue a litte, and propose to do some tests Monday/Tuesday AM (Europ... [15:01:50] 10Data-Persistence-Backup, 10SRE, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10jcrespo) Thank you very much for your comments. > Due to this I would like to test the performance, usin... [15:06:20] 10Data-Persistence-Backup, 10SRE, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10cmooney) Ok @jcrespo sounds like a plan. And thanks for the extra info, indeed it does seem to rule out... [15:12:18] 10Data-Persistence-Backup, 10SRE, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10jcrespo) >>! In T274234#7103617, @cmooney wrote: > In terms of the WAN links I wouldn't rule them out 100... [20:58:56] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10razzi) @Marostegui glad you were able to figure that out and that it worked on a new reimage. My last attempt timed out, and I was troubleshooting some networ...