[06:07:47] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T256939 (10Majavah)
[07:34:50] <wikibugs>	 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10jcrespo) > I think this ticket is done now.  Where database backups requested/needed?
[07:47:12] <jynus>	 big fail: https://gerrit.wikimedia.org/r/c/operations/puppet/+/609101
[07:50:02] <kormat>	 jynus: ahh
[07:50:08] <jynus>	 kormat: if you can check what's the issue with db1077?
[07:50:34] <jynus>	 T256939
[07:50:35] <stashbot>	 T256939: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T256939
[07:50:41] <kormat>	 sure thing
[07:50:55] <jynus>	 it is a test host, so not big deal, but we should understand what is going on
[08:16:10] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T256939 (10Kormat) I'm at a loss here.  There's nothing recent in the ilo log. This is the last entry: ` /system1/log1/record36   Targets   Properties     number=36     severity=Caution     date=06/18/2020     time=...
[08:17:33] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T256939 (10jcrespo) It says: Battery count: 0. Did the battery die or did it come back after it failed (or detected failed)?
[08:20:28] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T256939 (10Kormat) The battery has been dead for a long time {T225391}, it's a [[ https://github.com/wikimedia/puppet/blob/4d037cb8debde241870cee57a7ecb39e2b718f25/hieradata/hosts/db1077.yaml#L2 | known issue ]].
[08:21:10] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T256939 (10jcrespo) Apparently this is known issue: T226519 Maybe then auto-icinga-ack should be disabled for this host?
[08:24:16] <kormat>	 jynus: i wonder. icinga hasn't opened a task for db1077's raid since last june. maybe that service was downtimed for ~a year?
[08:25:49] <jynus>	 maybe, manel would know
[08:26:03] <jynus>	 so maybe downtime for another year?
[08:38:39] <kormat>	 hmm, no, i don't think that's it. we've reimaged db1077 a few times, and that nukes any old downtimes
[08:39:06] <kormat>	 unless m.arostegui has been manually re-downtiming db1077:RAID every time
[08:39:18] <jynus>	 could be
[08:39:44] <jynus>	 feel free to solve in any way you think is reasonable
[08:39:51] <jynus>	 then report next week to manuel
[08:40:10] <kormat>	 +1
[08:42:43] <jynus>	 I think there is no cache in use for that host
[08:42:57] <jynus>	 if that was mw production, we would have lots of errors due to performance
[08:43:06] <jynus>	 but being a test host, we will live with it until replaced
[08:43:27] <kormat>	 as the alert is acked, i think i can leave it as-is, and poke mar.ostegui about it next week
[08:43:28] <jynus>	 manuel will know when it is scheduled, or maybe you can find in the budget scheduled, kormat
[08:43:29] <kormat>	 yeah
[08:43:42] <jynus>	 *scheduled to be decommission
[08:44:18] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad, 10User-Kormat: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T256939 (10Kormat) a:03Marostegui
[09:05:15] <wikibugs>	 10DBA, 10Operations, 10CAS-SSO, 10Patch-For-Review, 10User-jbond: Request new database for idp-test.wikimedia.org - https://phabricator.wikimedia.org/T256120 (10jbond) 05Open→03Resolved Thanks @jcrespo Thanks for helping this is all set up and ready for the ticket to close however this database will...
[09:09:51] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat: Add mysql_role and section profiles to remaining mariadb roles - https://phabricator.wikimedia.org/T256866 (10Kormat)
[09:10:53] <kormat>	 jynus: you asked on ^ if dbstore and general misc roles have been covered. if you can give me an example of a machine for each, i can triple-check
[09:11:32] <kormat>	 just to make sure we're talking about the same things
[09:14:52] <jynus>	 https://github.com/wikimedia/puppet/blob/production/manifests/site.pp#L679
[09:15:58] <jynus>	 https://github.com/wikimedia/puppet/blob/production/manifests/site.pp#L574
[09:16:44] <jynus>	 https://github.com/wikimedia/puppet/blob/production/manifests/site.pp#L652
[09:17:08] <kormat>	 ok, cool. yes, i can confirm they are already covered by the previous CR(s). `sudo cumin "A:db-section-s2"` and `sudo cumin "A:db-section-m1"` contain those hosts as appropriate
[09:17:20] <jynus>	 ok
[09:17:29] <jynus>	 it wasn't part of the patch
[09:17:34] <jynus>	 and it wasn't core
[09:17:40] <jynus>	 so I asked about them
[09:17:56] <kormat>	 yep, no problem, thanks for checking :)
[09:18:16] <jynus>	 one thing that may be confusing
[09:18:41] <jynus>	 you will see mentions of shard and slave on previous code
[09:18:59] <jynus>	 with very few exceptions those terms are not used anymore
[09:19:14] <jynus>	 we use section (which may be shards or not) for replica groups
[09:19:16] <kormat>	 shard => section. what's the replacement for 'slave'?
[09:19:18] <jynus>	 and replica
[09:19:21] <jynus>	 for slave
[09:19:24] <kormat>	 ahh, ok.
[09:19:25] <jynus>	 but because dependencies
[09:19:33] <jynus>	 it is difficult to change it all at the same time
[09:19:38] <jynus>	 I mention in case of new code
[09:19:39] <kormat>	 yeah understood
[09:19:47] <jynus>	 to use the latest terminology
[09:19:50] <kormat>	 i'll file a task for the renaming of slave to replica
[09:19:53] <kormat>	 to keep track of it
[09:19:56] <jynus>	 I think mediawiki uses master
[09:20:01] <jynus>	 but mysql now uses source
[09:20:07] <jynus>	 we can discuss which one we prefer
[09:23:10] <jynus>	 so pc1 and es1 are real shards, but s1 is not
[09:25:24] <kormat>	 oh? how so?
[09:34:35] <jynus>	 sX are not horizontal partitions https://en.wikipedia.org/wiki/Shard_(database_architecture) they are just tenant's of a multi-tenant setup
[09:34:59] <jynus>	 pcs and es* are real partitions 
[09:35:09] <kormat>	 ahh. right, yes
[09:35:11] <jynus>	 by hash or id
[09:35:31] <jynus>	 they all are replica sets "groups of servers that hold the same data"
[09:35:52] <jynus>	 but I think sections, which mw uses, are the simplest way to clasify them
[09:36:36] <jynus>	 sections for either sharding or multi-tenancy
[09:37:32] * kormat nods
[09:37:52] <jynus>	 so I don't care too much about the actual name used
[09:38:06] <jynus>	 but I would like to slowly unify towards a single name for better communication
[09:38:22] <jynus>	 so it is not called one thing on code and another thing on another code
[09:57:38] <wikibugs>	 10DBA: Choosing a wrong host with transfer.py produces an "ERROR: The specified source path X doesn't exist on Y" - https://phabricator.wikimedia.org/T256951 (10jcrespo)
[09:58:13] <wikibugs>	 10DBA: Choosing a wrong host with transfer.py produces an "ERROR: The specified source path X doesn't exist on Y" - https://phabricator.wikimedia.org/T256951 (10jcrespo)
[09:58:17] <wikibugs>	 10DBA, 10Google-Summer-of-Code (2020), 10Patch-For-Review: GSoC 2020 Proposal: Improve the framework to transfer files over the LAN - https://phabricator.wikimedia.org/T248256 (10jcrespo)
[09:59:57] <wikibugs>	 10DBA: transferpy package does not depend on python3-yaml - https://phabricator.wikimedia.org/T256604 (10jcrespo) 05Open→03Invalid transfer.py doesn't depend on python3-yaml, the wmf database backup system does. My fault.
[10:00:56] <jynus>	 can I give you a suggestion regarding task creation, kormat?
[10:03:07] <kormat>	 please :)
[10:04:24] <jynus>	 by experince, thing like T256879 will age badly
[10:04:25] <stashbot>	 T256879: Remove unused parameters from profile::mariadb::monitor::prometheus - https://phabricator.wikimedia.org/T256879
[10:04:51] <jynus>	 please comment properly what is the planned work so future kormat knows what you meant months ago :-D
[10:05:07] <jynus>	 reference existing code with urls, etc.
[10:05:26] <jynus>	 this is important for you, but also for manager that will want to know that you are doing
[10:05:31] <jynus>	 *managers
[10:06:31] <kormat>	 ack. the reason i didn't in that case was i need to do some investigation into how it is used, and what _are_ the unused parameters :) but yeah, i should do that
[10:06:33] <jynus>	 "There use to be this (link to code) after this (link to patch) this is no longer needed because (link to code) is being used"
[10:06:45] <jynus>	 not worried about the particular ticket
[10:07:15] <jynus>	 but I have been working on this with the GSoC student and wanted to mention it to you too
[10:07:49] <jynus>	 let me tell you that I have done the same thing in the past
[10:08:10] <jynus>	 but after having 500 ticketes and not remembering what I meant, I regretted not doing it :-D
[10:08:56] <kormat>	 understood
[10:09:36] <jynus>	 I literally said to myself, wtf did I meant here with "fix"?
[10:09:44] <jynus>	 :-D
[10:10:36] <wikibugs>	 10DBA: Choosing a wrong host with transfer.py produces an "ERROR: The specified source path X doesn't exist on Y" - https://phabricator.wikimedia.org/T256951 (10Privacybatm) Parsing cumin output seems to be a better idea, let me check the output of cumin in this kind of cases.
[10:11:06] <jynus>	 did phab mysql just went down?
[11:45:27] <jynus>	 kormat: not sure if you understood my comment on https://gerrit.wikimedia.org/r/c/operations/puppet/+/608874
[11:45:32] <jynus>	 the patch itself is ok
[11:45:54] <jynus>	 but there is no general explanation on commit msg
[11:45:59] <jynus>	 and no text on T256866
[11:45:59] <stashbot>	 T256866: Add mysql_role and section profiles to remaining mariadb roles - https://phabricator.wikimedia.org/T256866
[11:46:13] <jynus>	 I am asking to commit a context (a couple of lines max)
[11:56:23] <jynus>	 s8 on dbstore1005 got stopped, but I don't know why
[11:57:10] <jynus>	 we need to know if maintenance is happening there
[11:59:19] <jynus>	 uh, no, it crashed
[11:59:26] <jynus>	 creating a ticket
[12:03:49] <wikibugs>	 10DBA, 10Analytics: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10jcrespo)
[12:04:53] <wikibugs>	 10DBA, 10Analytics: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10jcrespo) On replication start, instance crashed again- probably there is data/fs corruption.
[12:07:39] <wikibugs>	 10DBA, 10Analytics: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10jcrespo) Same issues as T249188 ?
[12:08:50] <kormat>	 jynus: ahh, sorry. will do :)
[12:09:14] <jynus>	 more worried about T256966 now :-(
[12:09:15] <stashbot>	 T256966: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966
[12:10:08] <kormat>	 oh ouch
[12:10:57] <jynus>	 I am going to luch, poke if you have time, and start remembering how to setup an instance
[12:11:02] <jynus>	 *lunch
[12:27:34] <wikibugs>	 10DBA, 10Analytics: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10Kormat) This host was reimaged to buster recently (2020-06-22) as part of T254870, and the symptoms do sound very like https://jira.mariadb.org/browse/MDEV-22373, with the significant difference that this...
[12:37:37] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat: Add mysql_role and section profiles to remaining mariadb roles - https://phabricator.wikimedia.org/T256866 (10Kormat)
[13:01:38] <wikibugs>	 10DBA, 10Operations, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat)
[13:03:31] <wikibugs>	 10DBA, 10Operations, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat) p:05Triage→03Medium
[13:06:27] <wikibugs>	 10DBA, 10Operations, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat)
[13:08:16] <wikibugs>	 10DBA, 10Operations, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat)
[13:09:09] <wikibugs>	 10DBA, 10Operations, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat)
[13:27:21] <wikibugs>	 10DBA, 10Operations, 10User-Kormat: Remove unused parameters from profile::mariadb::monitor::prometheus - https://phabricator.wikimedia.org/T256879 (10Kormat) p:05Triage→03Medium
[13:27:51] <wikibugs>	 10DBA, 10Operations, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat)
[13:27:56] <wikibugs>	 10DBA, 10Operations, 10User-Kormat: Remove unused parameters from profile::mariadb::monitor::prometheus - https://phabricator.wikimedia.org/T256879 (10Kormat)
[13:41:28] <jynus>	 s8 snapshots just finished
[13:41:42] <jynus>	 should we try to recover it to dbstore1005?
[13:42:41] <kormat>	 it's worth a shot
[13:43:41] <kormat>	 jynus: i'm happy to take care of it
[13:43:47] <kormat>	 as a learning experience
[13:48:37] <jynus>	 sure, ping luca or otto
[13:48:44] <jynus>	 transfer while mysql is running
[13:48:46] <kormat>	 jynus: https://phabricator.wikimedia.org/P11726 is the proposed procedure
[13:49:04] <jynus>	 and make sure you have their ok before putting the old instance down
[13:49:44] <jynus>	 no, transfer first so there is no extensive downtime
[13:50:01] <jynus>	 or you have analytics ok
[13:50:05] <jynus>	 one of the 2
[13:50:33] <kormat>	 huh, ok. as replication is stopped, i didn't think it was being used. will do!
[13:50:55] <jynus>	 they normally not sensitive to replication down for a few minutes
[13:50:57] <jynus>	 *Are
[13:51:01] <jynus>	 but better ask
[13:52:17] <kormat>	 +1
[13:52:45] <jynus>	 if you get their ok/awareness you can do it an any way you want :-D
[13:53:28] <kormat>	 got a go-ahead from elukey
[13:53:32] <kormat>	 does the rest of the procedure look ok?
[13:53:35] <jynus>	 yep
[13:53:44] <kormat>	 great, thanks :)
[13:53:49] <jynus>	 dump if you can the grants
[13:53:56] <jynus>	 before putting it down
[13:53:58] <jynus>	 to have a copy
[13:54:05] <kormat>	 they're not in the backup?
[13:54:15] <kormat>	 ohh. it's analytics,
[13:54:16] <jynus>	 the backup has core-like grants
[13:54:19] <kormat>	 they could have different grants
[13:54:21] <kormat>	 ok
[13:54:22] <jynus>	 exactly
[13:54:27] <jynus>	 we can load them from other instance
[13:54:28] <kormat>	 how does one dump grants? :)
[13:54:30] <jynus>	 but this way you can dump
[13:54:37] <jynus>	 and just execute blindly
[13:55:01] <jynus>	 run pt-show-grants with the right socket
[13:55:05] <jynus>	 let me see
[13:55:47] <jynus>	 pt-show-grants S=/run/mysqld/mysqld.s8.sock
[13:55:47] <kormat>	 the output of `pt-show-grants -S /run/mysqld/mysqld.s8.sock` looks legit
[13:55:59] <jynus>	 dump that to a file
[13:56:17] <jynus>	 the load with mysql -S /run/mysqld/mysqld.s8.sock < grants.file
[13:56:29] <jynus>	 that I think is the easiest/most portable
[13:56:37] <jynus>	 note we only have 10.1 backups yet
[13:56:45] <jynus>	 so you will eat the upgrade
[13:56:54] <kormat>	 updated procedure
[13:57:00] <jynus>	 that another reason to keep the original grants
[13:58:00] <jynus>	 I think there is only 1 important user there, the one for researchers
[13:58:09] <jynus>	 but doesn't hurt to backup all :-D
[13:58:39] <jynus>	 do you see why we need a centalized grant management :-D
[13:59:12] <kormat>	 i do, indeed :)
[14:00:06] <jynus>	 so downtime is not important, othewise it would be in high availability configuration
[14:00:23] <jynus>	 but if the owner is aware they can answer user that get an error
[14:00:51] <jynus>	 so we should try to communicate if there is going to be a downtime
[14:01:24] <jynus>	 *users
[14:02:41] <jynus>	 log when you put it down, too
[14:03:19] <kormat>	 +1
[14:04:34] <kormat>	 hah. i just ran into the transfer.py bug where it reports a dir doesn't exist when the hostname doesn't resolve
[14:04:53] <jynus>	 he
[14:04:57] <jynus>	 there is a ticket for that
[14:05:19] <jynus>	 T256951
[14:05:19] <stashbot>	 T256951: Choosing a wrong host with transfer.py produces an "ERROR: The specified source path X doesn't exist on Y" - https://phabricator.wikimedia.org/T256951
[14:05:41] <jynus>	 I will speed up review if the patch looks good
[14:07:49] <jynus>	 <kormat> ^ that's me - Ha ha, not following your own guide? XD
[14:08:01] <kormat>	 i thought i had already done it. i was wrong. :)
[14:09:12] <jynus>	 I don't want to bother but I hope you see with example some of the decision regarding things like "not automatically autostart replication after a restart"
[14:09:35] <jynus>	 those decisions can be challenged as we get better with monitoring/automation
[14:09:44] <jynus>	 they are not set on stone
[14:09:50] * kormat nods
[14:10:34] <wikibugs>	 10DBA, 10Analytics: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10Kormat) p:05Triage→03High a:03Kormat
[14:10:42] <wikibugs>	 10DBA, 10Analytics, 10User-Kormat: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10Kormat)
[14:30:38] <wikibugs>	 10DBA, 10Patch-For-Review: Use logging package instead of print statements in transferpy package - https://phabricator.wikimedia.org/T255999 (10jcrespo) Most of this is done, but let's keep this open, even with lower priority, to see if we can add some extra logging at a later time from new features.
[14:30:50] <wikibugs>	 10DBA, 10Patch-For-Review: Use logging package instead of print statements in transferpy package - https://phabricator.wikimedia.org/T255999 (10jcrespo) p:05Triage→03Low
[14:52:23] <wikibugs>	 10DBA, 10Operations, 10Performance-Team (Radar), 10Services (watching), 10Sustainability (MediaWiki-MultiDC): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672 (10Krinkle)
[14:52:48] <elukey>	 I am super ignorant about setting up master replica, so I started reading https://mariadb.com/kb/en/setting-up-replication/
[14:53:09] <elukey>	 and added the status of meta/matomo dbs (the masters to replicate) in https://phabricator.wikimedia.org/T234826#6274823
[14:55:22] <elukey>	 the first n00b questions are - do I need server_id set in both masters? (default seems 1, so possibly only on replicas?) and also, is binlog ROW ok or should I use something different?
[15:39:04] <jynus>	 sorry, was at a meeting
[15:39:35] <jynus>	 we can take care of the details, but please be patient as with manuel away we may be busier than normal
[15:39:46] <jynus>	 kormat: did the transfer work?
[15:40:43] <wikibugs>	 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10Dzahn) @dpifke ^ Do you want  backups?
[15:48:25] <jynus>	 ah, it is transferring still
[15:48:42] <jynus>	 not much pending
[15:58:29] <kormat>	 transfer has finished
[16:02:07] <jynus>	 let me know how it goes, if you are going to do the recovery now
[16:02:30] <kormat>	 ERROR 1959 (OP000) at line 43: Invalid role specification `research_role`
[16:02:34] <kormat>	 when i tried to load the grants
[16:03:00] <jynus>	 ah, that was the role issue with labs remember?
[16:03:16] <jynus>	 that some roles dissapeared on upgrade or something?
[16:03:29] <kormat>	 not a clue
[16:03:51] <jynus>	 do the rest of the upgrade process so as to start replicaiton
[16:04:07] <jynus>	 and we can check the grants when we confirm things work ok
[16:04:12] <kormat>	 ok
[16:04:41] <jynus>	 can you tell me where you had kept the grant file, so I can check?
[16:04:56] <kormat>	 /root/s8.grant on dbstore1005
[16:06:27] <jynus>	 not the difference between sudo -s and sudo -i (efective uid)
[16:06:46] <jynus>	 (it was on /home/kormat)
[16:07:30] <jynus>	 nothing that cannot be fixed later
[16:07:39] <jynus>	 let's hope the recovery works first
[16:07:40] <kormat>	 ah, oops, right :)
[16:07:45] <kormat>	 replication started
[16:08:08] <jynus>	 Got fatal error 1236 from master when reading data from binary log
[16:08:54] <jynus>	 are you sure you did the right change master?
[16:08:54] <kormat>	 oh, crud. i used the wrong master
[16:09:08] <jynus>	 db1109
[16:09:40] <jynus>	 reset slave all; to be sure after stop
[16:09:48] <kormat>	 ah
[16:09:52] <kormat>	 i did stop, change master, start
[16:09:59] <jynus>	 if it worked ,ok
[16:10:06] <kormat>	 yeah it looks ok
[16:10:06] <jynus>	 I think it didn't replciate anything
[16:10:10] <kormat>	 phew
[16:10:11] <jynus>	 so it should work
[16:10:23] <jynus>	 if it had, we should have deleted the binlogs and relay logs with taht command
[16:10:35] <jynus>	 it's ok, don't worry
[16:10:44] <kormat>	 gotcha
[16:10:46] <jynus>	 we configured the gtids preciselly to avoid those issue
[16:11:09] <jynus>	 replication looks good
[16:11:28] <jynus>	 let's see that grant issue
[16:11:45] <jynus>	 did you try to apply the grant again and see if it works?
[16:12:29] <kormat>	 i can try again
[16:12:40] <kormat>	 same error
[16:13:02] <jynus>	 interesting, I see the same grants for the user
[16:15:18] <jynus>	 it could be a limitation of pt-show grants and mariadb roles
[16:15:53] <jynus>	 I am going to use create role on s8
[16:16:06] <jynus>	 do not touch it now, ok?
[16:16:25] <kormat>	 +1
[16:16:45] <jynus>	 yeah, I think it fails to handle roles
[16:18:34] <jynus>	 I think it worked now if I do CREATE ROLE 
[16:18:39] <jynus>	 run the file again?
[16:18:46] <jynus>	 although it may stop at the first error
[16:19:04] <jynus>	 so we may want to run everything after line 43
[16:19:04] <kormat>	 no error this time
[16:19:32] <jynus>	 now the ultimate test diff < file < (pt-show-grant ...) :-D
[16:19:42] <jynus>	 so this is new info to me
[16:19:48] <jynus>	 pt-show grant doesn't work for roles
[16:20:05] <jynus>	 and we really don't have an alternative
[16:20:15] <jynus>	 specially in an independent format
[16:20:23] <kormat>	 there's a bunch of differences
[16:20:36] <jynus>	 send me the oneliner
[16:20:39] <jynus>	 so I can see them
[16:20:57] <jynus>	 if it is extra stuff from the backups it is not a big deal
[16:21:03] <jynus>	 of may be it is an ordering issue
[16:21:05] <jynus>	 or both
[16:21:06] <kormat>	 `diff -u ~kormat/s8.grants ~kormat/s8.grants.post-recover`
[16:21:47] <jynus>	 yeah, it is all extra stuff from the backup
[16:22:01] <jynus>	 we can put a note to clean up for tomorrow
[16:22:11] <jynus>	 but everyhing that should be there is there
[16:22:27] <jynus>	 so I think the immediate issue is solver, thanks!
[16:22:32] <jynus>	 unless it breaks tomorrow again...
[16:22:46] <jynus>	 let's keep the ticket open and monitor tomorrow the state
[16:22:57] <jynus>	 good work, kormat
[16:23:35] <kormat>	 thanks for the help :)
[16:23:45] <jynus>	 is this your first large outage solved on your own?
[16:24:05] <jynus>	 I mean at wmf, or course
[16:24:13] <kormat>	 yep
[16:24:21] <jynus>	 good work, kormat
[16:24:26] <kormat>	 thanks :)
[16:25:43] <jynus>	 update ticket, and take a well-desrved rest/beer
[16:26:16] <jynus>	 more will come :-D
[16:31:06] <wikibugs>	 10DBA, 10Analytics, 10User-Kormat: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10Kormat) Data restored from backup, machine is now catching up on s8 replication. There are some extra grants from the backup that should be cleaned up, but otherwise things are in a good...
[21:16:35] <wikibugs>	 10DBA, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Patch-For-Review, 10User-Marostegui: DBA review for push notifications tables - https://phabricator.wikimedia.org/T246716 (10Mholloway) This is ready for DBA review.  To elaborate on what we're doing:  We plan to create two new...
[22:06:22] <wikibugs>	 10DBA, 10Gerrit, 10Patch-For-Review: Make sure both `reviewdb-test` (used forgerrit upgrade testing) and `reviewdb` (formerly production) databases get torn down - https://phabricator.wikimedia.org/T255715 (10Dzahn) db_pass was removed from private hieradata, from private passwords module, from labs/private....
[22:14:36] <wikibugs>	 10DBA, 10Gerrit, 10Patch-For-Review: Make sure both `reviewdb-test` (used forgerrit upgrade testing) and `reviewdb` (formerly production) databases get torn down - https://phabricator.wikimedia.org/T255715 (10QChris) I went over the possible scenarios with @dzahn.  How long do we keep DB backups?  If we can...
[22:40:03] <wikibugs>	 10DBA, 10Gerrit, 10Patch-For-Review: Make sure both `reviewdb-test` (used forgerrit upgrade testing) and `reviewdb` (formerly production) databases get torn down - https://phabricator.wikimedia.org/T255715 (10Dzahn) >>! In T255715#6275884, @QChris wrote: > Does removing them need sign-off from releng?  cc: @...
[23:32:36] <wikibugs>	 10DBA, 10Core Platform Team, 10MediaWiki-Page-derived-data, 10TechCom-RFC, and 2 others: RFC: Normalize MediaWiki link tables - https://phabricator.wikimedia.org/T222224 (10Krinkle) p:05Triage→03Medium
[23:33:18] <wikibugs>	 10DBA, 10Core Platform Team, 10MediaWiki-Page-derived-data, 10TechCom-RFC, and 2 others: RFC: Normalize MediaWiki link tables - https://phabricator.wikimedia.org/T222224 (10Krinkle)