[05:42:49] <wikibugs>	 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) Pre restart steps done
[06:06:11] <wikibugs>	 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) This was done. RO time for phabricator was around:  06:01:53 ON 06:03:12 OFF  Thanks @mmodell for the help!  ` root@db1132.eqiad.wmnet[(none)]> select @@report_host; +---------------...
[06:06:13] <wikibugs>	 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui)
[06:06:15] <wikibugs>	 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui)
[06:06:35] <wikibugs>	 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) 05Open→03Resolved a:03Marostegui
[06:06:57] <wikibugs>	 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui)
[06:10:35] <wikibugs>	 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) m3 is now in orchestrator
[06:10:42] <wikibugs>	 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui)
[06:36:07] <wikibugs>	 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) m5 cleaned
[06:38:32] <wikibugs>	 10DBA, 10wikitech.wikimedia.org, 10User-notice, 10cloud-services-team (Kanban): Restart m5 master (db1128) - https://phabricator.wikimedia.org/T272388 (10Marostegui) Procedure:    Pre restart [] Silence m5 hosts [] buffer pool dump + disablement in advance to make the restart faster  Restart [] `!log m5 ma...
[06:48:46] <wikibugs>	 10DBA, 10Platform Engineering Roadmap Decision Making, 10SRE, 10Performance-Team (Radar), 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Marostegui) Thanks @Krinkle - I will probably start first with s6 codfw (frwiki,jawiki,ruwiki), and using wikimediadebug to...
[07:16:41] <wikibugs>	 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui)
[07:16:52] <wikibugs>	 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) clouddb1019:3316 moved under db1155:3316
[07:34:03] <wikibugs>	 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[08:12:09] <wikibugs>	 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui)
[08:13:50] <wikibugs>	 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) clouddb1019:3314 moved under db1155:3314  All the new clouddb hosts are moved under the new 10.4 sanitariums. This task is now stalled - waiting on...
[10:17:55] <wikibugs>	 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[10:23:42] <wikibugs>	 10DBA, 10decommission-hardware: decommission db1081.eqiad.wmnet - https://phabricator.wikimedia.org/T273040 (10Marostegui)
[10:24:09] <wikibugs>	 10DBA, 10decommission-hardware: decommission db1081.eqiad.wmnet - https://phabricator.wikimedia.org/T273040 (10Marostegui) Not ready until monday
[10:24:23] <wikibugs>	 10DBA, 10decommission-hardware: decommission db1081.eqiad.wmnet - https://phabricator.wikimedia.org/T273040 (10Marostegui)
[10:24:26] <wikibugs>	 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[10:24:42] <wikibugs>	 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui)
[10:39:33] <jynus>	 hey, kormat any strong thoughts about https://gerrit.wikimedia.org/r/c/operations/puppet/+/657820 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/657801 to vote keep or remove?
[10:40:16] <kormat>	 re: 657820, nuke away
[10:42:00] <kormat>	 re: 657801, seems useful
[10:43:59] <jynus>	 so I don't feel also strongly about it either but 2 reasoning happened: if we go back to use lvm backups, we should implement them as part of wmfbackups, and we cannot really implement it witout modifying our partitioning
[10:44:27] <jynus>	 so more of a model than a refactoring issue
[10:44:47] <jynus>	 I will wait anyway, comment on it with any suggestion on patch
[10:45:35] <kormat>	 i'm by default in favour of removing stuff from puppet that we're not using. it's always available in the git history if we need to dig it up again
[10:45:49] <jynus>	 yeah, that was manuel's thought too
[10:46:31] <jynus>	 I think my, (not very strong) compass here was "how likely we are to use it again"
[10:51:50] <kormat>	 (ah - i think you linked to the wrong CR above. https://gerrit.wikimedia.org/r/c/operations/puppet/+/657821 is the one for removing mylvmbackup. i've +1'd it)
[10:52:01] <jynus>	 oh, sorry
[10:52:06] <jynus>	 what did I link
[10:52:13] <kormat>	 the backup grants
[10:52:22] <jynus>	 oh, sorry
[10:52:26] <jynus>	 indeed I meant the other
[11:00:55] <jynus>	 I found another obsolete thing on the mariadb package, sending patch soon
[11:07:00] <jynus>	 I think today is going to be cleanup day, first this, now the bacula one
[11:12:11] <wikibugs>	 10Data-Persistence-Backup, 10DC-Ops, 10SRE, 10Patch-For-Review: decom helium and heze - https://phabricator.wikimedia.org/T260717 (10jcrespo) a:03jcrespo
[11:58:46] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission helium.eqiad.wmnet - https://phabricator.wikimedia.org/T273049 (10jcrespo)
[11:59:16] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission helium.eqiad.wmnet - https://phabricator.wikimedia.org/T273049 (10jcrespo)
[11:59:18] <wikibugs>	 10Data-Persistence-Backup, 10DC-Ops, 10SRE, 10Patch-For-Review: decom helium and heze - https://phabricator.wikimedia.org/T260717 (10jcrespo)
[12:00:14] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission helium.eqiad.wmnet and helium-array - https://phabricator.wikimedia.org/T273049 (10jcrespo)
[12:01:16] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission helium.eqiad.wmnet and helium-array - https://phabricator.wikimedia.org/T273049 (10jcrespo) @robh This is not yet ready for dc-ops processing, but do we need a separate checklist for the system and the attached array, or one is enough?
[12:01:32] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission helium.eqiad.wmnet and helium-array - https://phabricator.wikimedia.org/T273049 (10jcrespo)
[12:05:38] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission heze and its attached array - https://phabricator.wikimedia.org/T273051 (10jcrespo)
[12:06:09] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission heze and its attached array - https://phabricator.wikimedia.org/T273051 (10jcrespo)
[12:06:12] <wikibugs>	 10Data-Persistence-Backup, 10DC-Ops, 10SRE, 10Patch-For-Review: decom helium and heze - https://phabricator.wikimedia.org/T260717 (10jcrespo)
[12:09:42] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission helium.eqiad.wmnet and helium-array - https://phabricator.wikimedia.org/T273049 (10jcrespo) a:03jcrespo
[12:10:20] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware: decommission heze and heze-array1 - https://phabricator.wikimedia.org/T273051 (10jcrespo)
[12:46:11] <wikibugs>	 10DBA: Investigate using PMM (Percona Monitoring and Management) for slow-query analysis - https://phabricator.wikimedia.org/T273054 (10Kormat)
[12:46:33] <kormat>	 marostegui: so we don't forget about it ^
[12:48:37] <wikibugs>	 10DBA: Investigate using PMM (Percona Monitoring and Management) for slow-query analysis - https://phabricator.wikimedia.org/T273054 (10Marostegui) p:05Triage→03Medium
[13:00:39] <wikibugs>	 10DBA: Investigate using PMM (Percona Monitoring and Management) for slow-query analysis - https://phabricator.wikimedia.org/T273054 (10jcrespo) {icon thumbs-up color=green} For history, we enabled a similar solution to this (through grafana + prometheus_mysqld_exporter- not sure if PMM uses that for queries or...
[13:02:17] <wikibugs>	 10DBA, 10observability, 10Epic: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492 (10jcrespo)
[13:05:59] <wikibugs>	 10DBA, 10SRE, 10User-Kormat: Add monitoring to ensure consistency between tendril and zarcillo - https://phabricator.wikimedia.org/T257822 (10jcrespo) This looks very related to T242571, but not merging because it is a topic very likely to evolve.
[13:08:04] <jynus>	 kormat, is T256845 and T257822 the same or are they just similar?
[13:08:05] <stashbot>	 T256845: Add monitoring to ensure that puppet/tendril/zarcillo all agree on the set of sections that exist - https://phabricator.wikimedia.org/T256845
[13:08:05] <stashbot>	 T257822: Add monitoring to ensure consistency between tendril and zarcillo - https://phabricator.wikimedia.org/T257822
[13:09:02] <jynus>	 and T257814 and T242571 seem also very similar
[13:09:05] <stashbot>	 T242571: Automatically populate tendril/zarcillo with the list of databases in the infrastructure - https://phabricator.wikimedia.org/T242571
[13:09:06] <stashbot>	 T257814: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814
[13:11:53] <jynus>	 not sure if there is a tendril-related epic, but we could use one to track all issues
[13:13:42] <wikibugs>	 10DBA: Investigate using PMM (Percona Monitoring and Management) for slow-query analysis - https://phabricator.wikimedia.org/T273054 (10jcrespo)
[13:13:46] <wikibugs>	 10DBA, 10SRE, 10observability, 10Patch-For-Review: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896 (10jcrespo)
[13:15:13] <wikibugs>	 10DBA: Investigate using PMM (Percona Monitoring and Management) for slow-query analysis - https://phabricator.wikimedia.org/T273054 (10jcrespo) Adding T143896 epic, even if one can argue that "query monitoring" is metrics or not, but to link it to an epic where this need was mentioned.
[13:18:56] <wikibugs>	 10DBA, 10SRE, 10observability, 10Patch-For-Review: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896 (10jcrespo)
[13:20:20] <wikibugs>	 10DBA, 10SRE, 10User-Kormat: Add monitoring to ensure consistency between tendril and zarcillo - https://phabricator.wikimedia.org/T257822 (10Kormat) 05Open→03Resolved a:03Kormat Resolving this as tendril is going away.
[13:20:26] <wikibugs>	 10DBA, 10SRE, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Kormat)
[13:23:52] <kormat>	 feh. phab's approach for marking a task as depending on another leads to really confusing hierarchies.
[13:24:01] <jynus>	 +100000
[13:24:10] <jynus>	 it is both a dependency AND a subtask
[13:24:20] <jynus>	 not clear at all
[13:24:43] <wikibugs>	 10DBA, 10SRE, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Kormat)
[13:25:01] <jynus>	 I am trying to generate a few epics for organization
[13:25:17] <jynus>	 T143896 for metrics monitoring related tasks
[13:25:17] <kormat>	 jynus: can you not, please? :)
[13:25:19] <stashbot>	 T143896: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896
[13:25:21] <jynus>	 ok
[13:25:34] <jynus>	 not new ones
[13:25:36] <jynus>	 existing ones
[13:25:50] <kormat>	 i mean, for your own area, by all means, go with whatever suits you
[13:26:03] <kormat>	 but i prefer to keep epics fairly constrained for my stuff
[13:26:10] <jynus>	 ok
[13:26:16] <kormat>	 e.g. https://phabricator.wikimedia.org/T257814 is an epic, with a small number of subtasks
[13:27:40] <jynus>	 I just pointed that T257821 and T257822 looked very similar
[13:27:40] <stashbot>	 T257821: Add monitoring to ensure consistency between puppet and zarcillo - https://phabricator.wikimedia.org/T257821
[13:27:40] <stashbot>	 T257822: Add monitoring to ensure consistency between tendril and zarcillo - https://phabricator.wikimedia.org/T257822
[13:28:23] <kormat>	 those particular 2 were intended to be like that
[13:28:28] <jynus>	 ok
[13:28:35] <kormat>	 i filed them at the same time
[15:59:37] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware, 10Patch-For-Review: decommission helium.eqiad.wmnet and helium-array - https://phabricator.wikimedia.org/T273049 (10RobH) >>! In T273049#6780019, @jcrespo wrote: > @robh This is not yet ready for dc-ops processing, but do we need a separate checklist for th...
[16:00:30] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware, 10Patch-For-Review: decommission helium.eqiad.wmnet and helium-array - https://phabricator.wikimedia.org/T273049 (10RobH)
[16:00:45] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware, 10ops-eqiad, 10Patch-For-Review: decommission helium.eqiad.wmnet and helium-array - https://phabricator.wikimedia.org/T273049 (10RobH)
[16:01:49] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware, 10Patch-For-Review: decommission heze and heze-array1 - https://phabricator.wikimedia.org/T273051 (10jcrespo)
[16:02:04] <wikibugs>	 10Data-Persistence-Backup, 10decommission-hardware, 10ops-codfw, 10Patch-For-Review: decommission heze and heze-array1 - https://phabricator.wikimedia.org/T273051 (10jcrespo)
[18:18:55] <wikibugs>	 10Data-Persistence-Backup, 10SRE: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10jcrespo) Apparently, there is the following code on backup::set:  ` $motd_content = "#!/bin/sh\necho \"Backed up on this host: ${name}\""         @motd::scrip...
[21:58:11] <wikibugs>	 10DBA, 10MediaWiki-Cache: insert ignore into objectcache ignores stuff bigger than mediumblob - https://phabricator.wikimedia.org/T273117 (10Physikerwelt)