[02:26:24] PROBLEM - MariaDB sustained replica lag on db1089 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1089&var-port=9104 [02:28:00] RECOVERY - MariaDB sustained replica lag on db1089 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1089&var-port=9104 [07:07:01] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10Marostegui) p:05High→03Medium I have deleted ipv6 for the affected x2 hosts, so this is triaged with Volans workaround. ` db2142.codfw.wmnet has address 10.192.0.14 db2143.codfw.wmnet has address 1... [07:53:08] 10DBA, 10decommission-hardware: decommission es1019.eqiad.wmnet - https://phabricator.wikimedia.org/T270159 (10Marostegui) [07:54:44] 10DBA: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) a:03Marostegui [07:56:52] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Monitor the growth of watchlist table at wikidata and wikicommons - https://phabricator.wikimedia.org/T268096 (10Marostegui) [08:02:50] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) [08:03:32] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) Going to add another week for checking enwiki, as it grew a bit too much over past week [08:04:48] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) [08:04:57] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) [08:10:42] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) [08:54:35] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Monitor the growth of watchlist table at wikidata and wikicommons - https://phabricator.wikimedia.org/T268096 (10Marostegui) [08:55:07] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) [08:55:35] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Monitor the growth of watchlist table at wikidata and wikicommons - https://phabricator.wikimedia.org/T268096 (10Marostegui) 05Open→03Resolved This looks good enough, all the growth seems under control. Closing this as fixed! [09:14:01] 10DBA: Test upgrading sanitarium hosts to Buster + 10.4 - https://phabricator.wikimedia.org/T268742 (10Marostegui) Removed ipv6 dns from these hosts (see T270101) [09:21:21] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp fields of jobs table - https://phabricator.wikimedia.org/T268391 (10Marostegui) [09:22:34] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp fields of jobs table - https://phabricator.wikimedia.org/T268391 (10Marostegui) [09:26:20] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp fields of jobs table - https://phabricator.wikimedia.org/T268391 (10Marostegui) [09:28:40] 10DBA, 10decommission-hardware: decommission es1019.eqiad.wmnet - https://phabricator.wikimedia.org/T270159 (10Marostegui) [09:29:20] 10DBA, 10decommission-hardware: decommission es1019.eqiad.wmnet - https://phabricator.wikimedia.org/T270159 (10Marostegui) [10:05:19] jynus: quick one, for the media backup design are we looking for solutions for a "live" backup from where we can restore directly or other forms of more offline/long-term backups can be considered? Do we have any SLO on the time for a full restore? (that will ofc affect the previous question) [10:06:12] volans: both [10:06:45] volans: please read the analysis doc for more background, that ones is complete [10:07:34] I've just read it [10:07:49] the design or the analysis? [10:07:55] design [10:08:06] "Media Backups Design" [10:08:07] so there is an analysis doc with the requirements [10:08:12] it is a different one [10:08:18] ah ok [10:08:23] looking for it [10:08:42] the design is not complete yet, please hold of your pitchforks! :-) [10:08:48] *off [10:09:25] we can do an rfc when it is written, plus it will likely evolve [10:10:13] but the summary is the focus is not right now a full restore because that needs different considerations (e.g. thumbnails) [10:10:56] ok, thx [10:11:17] so first data coverage, later focus on "service backup" [10:11:56] there is also other issues beyond backup recovery, which is mediawiki dynamic nature [10:12:21] it is the same than with mysql backup. "recovering the last backup available" is not always a possible startegy [10:13:15] yep, I know [10:13:57] so in the analysis doc we mention that first milestone: full coverage aka "being able to recover any (but just one at a time) image" [10:14:25] but we are thinking of course further at the same time [10:15:40] that's fair [10:15:44] it's a huge project [10:15:47] yeah [10:15:53] and not only backups [10:16:14] I mentioned on the doc both dump and analytics could benefit from the metadata [10:16:46] there is also more considerations- a very fast backup process may be not ideal, if there is a high rate of deleted images just after upload [10:18:07] so I want to have a complete concrete proposal first, because othewise we are just postponing having something forever, even imperfect [10:18:29] I can loop you in for feedback explicitly if you want [10:19:21] note I didn't work on this on my own- my manager, other SREs (ariel, filippo, mark) are also involved, I can add you to the group too [10:22:23] no worries, I'm mostly curious and will follow along the progress, I'll read the analysis and will comment if I have any suggestion [10:23:10] you are being helpful, just wanted to tame expectations :-), specially on first iteration [10:23:29] here design == what we did for proof of concept [10:24:13] my biggest issue is lack of experience working with cloud vendors [10:24:18] I would love more help on that [10:26:49] eh, never used them a lot, I have been on the other side though :-P [10:38:49] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp fields of jobs table - https://phabricator.wikimedia.org/T268391 (10Marostegui) [10:46:19] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp fields of jobs table - https://phabricator.wikimedia.org/T268391 (10Marostegui) [11:11:36] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui) [11:12:27] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui) All this is pretty much done. The last thing I am testing is that all the hosts would properly send an email if there's private data detected... [13:44:20] 10DBA, 10Data-Services: Prepare and check storage layer for madwiki - https://phabricator.wikimedia.org/T269440 (10Urbanecm) >>! In T269440#6672500, @LSobanski wrote: > Thanks, let us know when the database is created, so we can sanitize it. I've created the database. [13:44:49] 10DBA, 10Data-Services: Prepare and check storage layer for madwiki - https://phabricator.wikimedia.org/T269440 (10Marostegui) a:03Marostegui I will sanitize it [14:05:06] 10DBA, 10Data-Services: Prepare and check storage layer for madwiki - https://phabricator.wikimedia.org/T269440 (10Marostegui) This has been sanitized. I have tested the triggers creating my user. I am now running a check data on labsdb1009, 1010, 1011, 1012 as well as clouddb1020:3315 and clouddb1016:3315 Af... [14:07:25] marostegui: fyi, I'm currently in a process of creating like four more databases [14:07:43] (just fyi, in case something can be done for multiple wikis at once) [14:07:47] wawikisource just created [14:07:59] Urbanecm: ah ok, then I'll wait for all of them created indeed [14:08:04] thanks for the heads up [14:08:15] okay, so I'll ping you once all of them are created :) [14:08:19] thank you [14:45:34] 10DBA: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) [15:00:21] 10DBA, 10Data-Services: Prepare and check storage layer for wawikisource - https://phabricator.wikimedia.org/T269432 (10Urbanecm) >>! In T269432#6672496, @LSobanski wrote: > Thanks, let us know when the database is created, so we can sanitize it. DB just got created. [15:00:26] 10DBA, 10Data-Services: Prepare and check storage layer for eowikivoyage - https://phabricator.wikimedia.org/T269427 (10Urbanecm) >>! In T269427#6672493, @LSobanski wrote: > Thanks, let us know when the database is created, so we can sanitize it. DB just got created. [15:01:15] 10DBA, 10Data-Services: Prepare and check storage layer for skrwiki - https://phabricator.wikimedia.org/T268412 (10Urbanecm) >>! In T268412#6640176, @LSobanski wrote: > Thanks, let us know when the database is created, so we can sanitize it. DB got created. [15:01:19] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Ladsgroup) [15:02:00] marostegui: please don't kill me, I have a couple more being added once we migrate the rest of tables :D [15:03:29] Amir1: hahaha [15:03:40] Amir1: You know I wouldn't be able to kill you! [15:03:45] Just because we live far away from each other! [15:04:10] * Amir1 prays Manuel never visits Poland or Germany [15:04:49] hahahaha [15:08:27] 10DBA, 10Data-Services: Prepare and check storage layer for skrwiki - https://phabricator.wikimedia.org/T268412 (10Marostegui) a:03Marostegui I will sanitize it [15:21:28] marostegui: I'm done with creating databases for now :) [15:21:52] 10DBA, 10Data-Services: Prepare and check storage layer for skrwiktionary - https://phabricator.wikimedia.org/T268458 (10Urbanecm) >>! In T268458#6644801, @LSobanski wrote: > Thanks, let us know when the database is created, so we can sanitize it. And this one got created as well. [15:21:52] cool, I will take care of them tomorrow [15:21:53] thanks [15:22:22] okay, sounds good :) [17:04:49] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10wiki_willy) Ack @Marostegui >>! In T270101#6691088, @Marostegui wrote: > > @wiki_willy could you talk to your team to make sure the rest of hosts at {T267043} do not get installed with ipv6 once the... [17:12:30] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10wiki_willy) @Cmjohnson and @RobH - per our conversation on IRC, just a heads up to avoid installing the remaining db hosts with IPV6. (reference T270101 for the r...