[01:52:09] <wikibugs>	 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10Johan) Since next Tech News won't go out until Monday anyway (and I don't think we need to be too concerned about a few seconds of Etherpad read-only) – do re-instate if the update for some reason go...
[04:44:20] <wikibugs>	 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10Marostegui) Thanks @Dzahn  There is still a question from T254795#6204661 that needs answsering: - Which grants do we need this user to have?  Also, to confirm, connections will come from: mwdebug1001, mwdebug...
[04:47:24] <wikibugs>	 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10Marostegui)
[04:57:59] <wikibugs>	 10DBA, 10Operations: db1088 crashed - https://phabricator.wikimedia.org/T255927 (10Marostegui) Anything else left here after the 100% repool or we can close this? Thank you!
[05:02:42] <wikibugs>	 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10Marostegui)
[05:05:21] <wikibugs>	 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10Marostegui)
[06:24:09] <wikibugs>	 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10Dzahn) @dpifke Doesn't it have to connect from xhgui1001/xhgui2001 (but would that be in addition to mwdebug and webperf* ?)
[07:39:53] <wikibugs>	 10DBA, 10Operations: db1088 crashed - https://phabricator.wikimedia.org/T255927 (10Kormat) 05Open→03Resolved Nope, all done!
[07:51:00] <jynus>	 Jun 25 07:50:29 backup1001 systemd[1]: Stopped Bacula Director Daemon service.
[07:51:12] <marostegui>	 \o/
[07:51:37] <jynus>	 remind me the ticket, sorry, if you have it handy?
[07:51:49] <marostegui>	 https://phabricator.wikimedia.org/T254556
[08:09:24] <wikibugs>	 10DBA, 10Operations, 10ops-codfw: Degraded RAID on pc2007 - https://phabricator.wikimedia.org/T255904 (10Kormat) 05Open→03Resolved Array rebuild has completed, and is back in "optimal" state.
[08:40:14] <wikibugs>	 10DBA, 10Operations, 10CAS-SSO, 10Patch-For-Review, 10User-jbond: Request new database for idp-test.wikimedia.org - https://phabricator.wikimedia.org/T256120 (10jbond) >>! In T256120#6252975, @Marostegui wrote: > Should be fixed now.  Thanks although I'm now getting  "Error message: CREATE command denied...
[08:40:23] <marostegui>	 only db2133 was in core
[08:41:57] <wikibugs>	 10DBA, 10Operations, 10CAS-SSO, 10Patch-For-Review, 10User-jbond: Request new database for idp-test.wikimedia.org - https://phabricator.wikimedia.org/T256120 (10Marostegui) Fixed
[08:42:21] <jynus>	 if we get rid of multi-source hosts we could have a simpler schema
[08:42:38] <kormat>	 did someone say nuke labsdb*? ;)
[08:42:58] <jynus>	 but because multisource, the group is on the instance, and the section is on the replication table (section_instances)
[08:44:35] <wikibugs>	 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10Marostegui) This is done.  I am going to leave db1135 replicating for 24h (so we can also see if basic 10.4 -> 10.1 replication works) and then I will move db1135 somewhere else.
[08:44:42] <wikibugs>	 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10Marostegui)
[08:44:44] <wikibugs>	 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10Marostegui) 05Open→03Resolved
[08:46:42] <jynus>	 I've run a couple of backups succesfully already
[08:46:47] <marostegui>	 sweet
[08:47:48] <jynus>	 also if etherpad worked on 10.4, it will work with anything :-D
[08:47:59] <jynus>	 *10.4 will
[08:48:46] <marostegui>	 hahahaha
[08:48:47] <marostegui>	 yeah
[08:48:52] <marostegui>	 I thought the same
[09:10:21] <jynus>	 while checking prometheus I saw metrics gathering is failing for db1077
[09:10:41] <jynus>	 server seems to be up, should I take a look or is someone working on it/setting it up?
[09:11:16] <jynus>	 it may just need a prometheus restart
[09:12:54] <marostegui>	 you can ignore it, it is the testing one
[09:13:04] <marostegui>	 probably grants missing or something
[09:13:05] <jynus>	 ah, ok
[09:13:36] <jynus>	 going back to my transfers and backups :-D
[10:49:15] <wikibugs>	 10DBA, 10Patch-For-Review: Switchover es5 master from es1023 to es1024 - https://phabricator.wikimedia.org/T255755 (10Marostegui) Moved this to Tuesday 7th July at 05:00 AM UTC as I will be off the 1st of July, and I want to keep an eye after the switchover and the following days.
[11:34:01] <wikibugs>	 10DBA, 10Patch-For-Review: Switchover es5 master from es1023 to es1024 - https://phabricator.wikimedia.org/T255755 (10Marostegui)
[11:34:23] <wikibugs>	 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui)
[11:59:42] <wikibugs>	 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10Marostegui)
[12:43:35] <wikibugs>	 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) s2 eqiad progress  [] labsdb101...
[12:43:59] <wikibugs>	 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui)
[13:36:47] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10Sustainability (Incident Prevention): PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10Marostegui) Can this task be closed? By default hosts reimage now but they do kee...
[14:03:30] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10Sustainability (Incident Prevention): PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10jcrespo) a:03Kormat
[14:03:55] * kormat shakes his fist at jynus 
[14:04:46] <jynus>	 it is a decision-making asignment, not a doing assignment, eh!
[14:05:18] <jynus>	 decide and then unasign if kept open
[14:06:08] <jynus>	 I could also asign it to faidon, which I think was the person that requested it so it is sent to foundations
[14:06:12] <jynus>	 up to you, really
[14:06:25] <paravoid>	 what?
[14:06:43] <jynus>	 should T251416 be open?
[14:06:47] <stashbot>	 T251416: PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416
[14:06:55] <jynus>	 not sure if a dba decision or a foundations decision
[14:07:01] <paravoid>	 I did not request that :)
[14:07:16] <jynus>	 you requested me to open a ticket to understand the issue, but maybe I am wrong
[14:07:21] <paravoid>	 ah maybe!
[14:07:32] <paravoid>	 it rings a bell now
[14:07:38] <jynus>	 I just want to make sure nobody is waiting on me
[14:07:54] <paravoid>	 I was envisioning it as something to be discussed across multiple SREs, not a task for me specifically to decide
[14:07:58] <jynus>	 but as the dba issue has been solved, we can send it to your ream for long term
[14:08:09] <paravoid>	 although I'm happy to make that decision if noone else has any opinions on that :)
[14:08:10] <jynus>	 s/for you/for your team/
[14:08:33] <jynus>	 please talk to kormat, I am not really up to date with lastest advances there, ok?
[14:08:41] <paravoid>	 ok
[14:08:47] <kormat>	 hii
[14:08:50] <paravoid>	 hii :)
[14:09:05] <jynus>	 so I assign it to him to mean "please don't want on me" if that makes sense?
[14:09:12] <jynus>	 *wait
[14:10:07] <jynus>	 paravoid: it would help if there was a foundations tag 0:-D
[14:11:12] <paravoid>	 there is SRE-tools, but that may or may not be the best fit here -- I'll defer to volans
[14:11:14] <jynus>	 so I don't have ln -s foundations faidon :-D
[14:11:24] <jynus>	 ;-)
[14:13:36] * volans reading backlog
[14:14:39] <paravoid>	 volans: tl;dr for the part I pinged you about: how can a partman task be tagged; is SRE-tools appropriate for that, and if not, do we have an alternative to offer (besides #operations)
[14:15:34] <volans>	 the long term solution is clearly SRE-tools for the PXE menu and all that work, although we know will not happen right now
[14:15:46] <jynus>	 yeah, that is known
[14:15:54] <jynus>	 but maybe closing it was not the right action
[14:15:54] <kormat>	 volans: is there a task for that?
[14:15:59] <volans>	 we can add SRE-tools and leave it in the backlog for that
[14:16:23] <volans>	 kormat: various
[14:16:26] <jynus>	 so the change is, please kormat correct me, is that we have a way forward for dbs/backup hosts ?
[14:16:33] <volans>	 tracking is T116063
[14:16:34] <stashbot>	 T116063: Hardware Automation Workflow - Overall Tracking - https://phabricator.wikimedia.org/T116063
[14:16:38] <volans>	 and then a pletora od subtasks
[14:16:44] <jynus>	 but technically the issue is still ongoing in general?
[14:16:50] <kormat>	 jynus: correct
[14:16:58] <volans>	 note the date, it's 5y ago, before I started
[14:17:04] <jynus>	 I don't know the details despite me writing the initial task
[14:17:12] <kormat>	 ok, i'm going to update the ticket, removing dba, adding sre-tools
[14:17:17] <kormat>	 (and unassigning myself :P)
[14:17:24] <jynus>	 yep, all cool to me
[14:17:52] <jynus>	 maybe let's add the one line summary as the last comment
[14:18:00] <jynus>	 if the helps 
[14:18:03] <jynus>	 *that
[14:18:16] <volans>	 yes please
[14:18:20] <kormat>	 yes. i've already written that bit
[14:18:46] <jynus>	 let me find the operations task to add SRE-tools
[14:18:51] <jynus>	 *project
[14:19:18] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10Sustainability (Incident Prevention): PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10Kormat) From the perspective of #dba, this issue is mostly resolved. Most DB mach...
[14:19:36] <jynus>	 oh, it is there
[14:19:43] <jynus>	 but let me add observability team
[14:20:45] <jynus>	 let me know if it looks correct: https://phabricator.wikimedia.org/project/manage/1025/
[14:21:18] <jynus>	 should I create a #backups project?
[14:21:29] <marostegui>	 there was one no?
[14:21:38] <kormat>	 marostegui: maybe it didn't get backed up ;)
[14:21:49] <marostegui>	 I thought we had a backup tag or something
[14:21:51] <akosiaris>	 lol
[14:21:52] <jynus>	 nope
[14:21:57] <marostegui>	 At least we discussed it some time ago
[14:22:02] <marostegui>	 But don't remember what was the conclusion 
[14:22:07] <jynus>	 yeah, the issue is it coudl be missleading
[14:22:15] <jynus>	 because "production sre backups"
[14:22:24] <jynus>	 vs "I am backuping my tool poject"
[14:22:36] <jynus>	 plus having a separate project board?
[14:22:40] <akosiaris>	 https://www.youtube.com/watch?v=MgxgYL5P4z4
[14:22:41] <jynus>	 not something to discuss here
[14:23:05] <jynus>	 marostegui: let's discuss on next meeting, ok?
[14:23:16] <akosiaris>	 that ^ should be on the first page of the backup tag project :-)
[14:23:21] <jynus>	 could do make DBA a "data persistance" team tak
[14:23:27] <jynus>	 and have yellow backup tag
[14:23:55] <jynus>	 don't know
[14:54:15] <wikibugs>	 10DBA, 10Operations, 10SRE-tools, 10Patch-For-Review: Audit all cumin queries in switchdc scripts - https://phabricator.wikimedia.org/T243935 (10Kormat)
[14:56:04] <wikibugs>	 10DBA: Create reuse recipes for tendril/zarcillo/dbprov/backup hosts - https://phabricator.wikimedia.org/T255768 (10Kormat)
[15:19:20] <Amir1>	 marostegui: 1100 drifts. Mostly the MCR stuff
[15:19:39] <Amir1>	 I try to make it foldable so we ignore those for now
[17:24:12] <Amir1>	 marostegui: These are the drifts excluding MCR ones: https://phabricator.wikimedia.org/P11667
[17:24:25] <Amir1>	 (in total, around 100-ish)
[17:24:56] <Amir1>	 This is all of them: https://phabricator.wikimedia.org/P11668
[17:27:36] <Amir1>	 btw MCR schema changes caused around 10% in size reduction in s6: https://grafana.wikimedia.org/d/000000377/host-overview?panelId=28&fullscreen&orgId=1&var-server=db1131&var-datasource=thanos&var-cluster=mysql&from=1592944447567&to=1593103141444
[17:27:48] <Amir1>	 is s1 and s8 it'll be massive
[17:53:08] <wikibugs>	 10DBA, 10CheckUser, 10Trust-and-Safety, 10WMF-Legal, and 2 others: Configure WMF wikis to log login attempts in CheckUser - https://phabricator.wikimedia.org/T253802 (10Huji) @DannyS712 when you get the chance, can I ask you to please review https://gerrit.wikimedia.org/r/605301/ ?  I am going to follow up...