[04:36:28] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team, 10Patch-For-Review: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Marostegui) [04:51:42] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [04:53:08] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1178 is clean [04:58:27] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team, 10Patch-For-Review: Reduce parser cache retention temporarily for DiscussionTools - https://phabricator.wikimedia.org/T280605 (10Marostegui) [05:11:20] 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Marostegui) @jcrespo can coordinate better the dbprov downtimes, I am swapping names there :) [05:11:34] 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Marostegui) [05:15:53] 10DBA, 10Cognate, 10ContentTranslation, 10Growth-Team, and 9 others: Restart x1 database master (db1103) - https://phabricator.wikimedia.org/T281212 (10Marostegui) All hosts silenced. Master binary's upgraded, waiting now to perform the restart at 06:00 AM UTC [05:51:54] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team (Radar): Post-deployment: (partly) ramp parser cache retention back up - https://phabricator.wikimedia.org/T280604 (10Marostegui) [05:52:20] 10DBA, 10DiscussionTools, 10Editing-team, 10Performance-Team (Radar): Post-deployment: (partly) ramp parser cache retention back up - https://phabricator.wikimedia.org/T280604 (10Marostegui) a:05Marostegui→03None Not assigning it to me specifically, as anyone could pick this up after the mitigation [06:02:18] 10DBA: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281 (10Marostegui) [06:02:38] 10DBA: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281 (10Marostegui) [06:03:08] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: db2135 crashed - https://phabricator.wikimedia.org/T278408 (10Marostegui) [06:03:14] 10DBA: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281 (10Marostegui) 05Open→03Resolved All hosts have been upgraded [06:03:51] 10DBA, 10Cognate, 10ContentTranslation, 10Growth-Team, and 9 others: Restart x1 database master (db1103) - https://phabricator.wikimedia.org/T281212 (10Marostegui) 05Open→03Resolved This was done. RO starts: 06:00:15 RO stops: 06:00:46 Total RO time: 31 seconds [06:18:05] 10DBA, 10AbuseFilter, 10mariadb-optimizer-bug: Check whether `FORCE INDEX page_timestamp` is still needed in LazyVariableComputer.php - https://phabricator.wikimedia.org/T281579 (10Marostegui) 05Open→03Stalled This query is still filesorting on 10.1 and takes around 30 seconds to complete. ` root@PRODUCT... [06:38:08] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1178 is slowly being pooled into s8 [06:38:17] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [06:49:24] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) s2 sanitarium master db1074 has been replaced by db1156 [06:50:24] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [06:53:49] 10DBA, 10decommission-hardware: decommission db1074.eqiad.wmnet - https://phabricator.wikimedia.org/T281959 (10Marostegui) [06:54:07] 10DBA, 10decommission-hardware: decommission db1074.eqiad.wmnet - https://phabricator.wikimedia.org/T281959 (10Marostegui) Wait a few days to make sure its replacement (db1156) works fine. [06:54:43] 10DBA, 10decommission-hardware: decommission db1074.eqiad.wmnet - https://phabricator.wikimedia.org/T281959 (10Marostegui) [06:54:46] 10DBA, 10Data-Persistence-Backup, 10Patch-For-Review: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 (10Marostegui) [06:54:48] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [06:55:18] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [07:13:32] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) [07:13:42] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1082.eqiad.wmnet - https://phabricator.wikimedia.org/T281794 (10Marostegui) I have depooled this host [08:05:30] I'm installing new cumin host with buster and the role includes profile::mariadb::packages_client, it selects mariadb variants based on the distro [08:05:53] what about bullseye, should I rebuild wmf-mariadb104-client for it? [08:06:41] moritzm: yeah, let's do that [08:06:52] moritzm: I can try to do it too, but definitely not this week [08:07:08] not sure if jaime has done it already (i believe he was trying bullseye before) [08:10:39] I'll check if anything is up on deneb, otherwise I'll build/import it [08:11:24] moritzm: sure, I will try to get my bullseye environment ready and try it next week [08:11:38] I will ping jaime also to see if he maybe got a package once he gets online [08:25:28] jynus: I was chatting with moritzm earlier, and I was wondering if you ever built the wmf104 client package for bullseye, or was it 10.5? [08:27:58] I don't think I ended up builting any package, just downloading and testing it locally [08:28:07] ah ok, good [08:28:09] thanks :) [08:28:22] do you want me to? [08:29:01] no no, no worries [08:37:27] 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10jcrespo) @Papaul dbprov2002 should be shut down carefully to make sure data is kept intact (I'd prefer to do so). Otherwise, it can be down for e.g. 1 day.Will it need IP changes done beforehand?... [08:38:27] 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10jcrespo) [08:46:44] was there any es restarts recently? [08:47:29] from what I see from tendril, no [08:48:13] nope [08:49:04] all es backups failed yesterday [08:49:13] both eqiad and codfw [08:49:40] wow [08:49:46] MySQL server has gone away [08:49:47] definitely no restarts on both dcs [08:50:05] I wonder if we have found why transfer fails? [08:50:13] unstable network [08:50:42] mysql uptime looks also up, so no mysql restarts either (ie: crash) [08:51:09] yeah, it would be very weird it happens on all 4 servers [08:51:56] let me tell you the timestamps to discard maintenance (which would be a good thing) [08:52:13] if it fails for a good reason, I am not worried [08:52:23] There was no maintenance on es servers that I know of [08:52:45] codfw es4: 2021-05-04 07:58:21 [08:53:00] codfw es5: 2021-05-04 07:58:21 [08:53:10] the fact it is the same time there would point to network [08:53:40] yeah [08:53:52] jynus: do backups stop trying after one failure? [08:53:53] can you check if there was any weirdness on es host there at the time? I am thinking it is possible network at the backup side, but to discard at the mysql side [08:54:51] kormat, yes and no [08:55:09] yes because we don't have the space to store multiple temporary backups [08:55:21] temporary as "in generation" [08:55:40] no because the next schedule will attempt backups again [08:56:29] I am going to check networking on generating hosts [08:59:39] jynus: any specific host you want me to check? [09:00:10] es2022 and es2025 [09:00:35] ok, going to take a look [09:01:20] at 2021-05-04 07:58:21 [09:02:02] 10DBA, 10SRE, 10ops-codfw: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10elukey) [09:04:13] I don't see anything weird, no tcp errors, no network at 0 bytes, it just stopped backing up [09:04:24] other resources were not saturated [09:04:37] jynus: they both look clean [09:04:41] mysql-wise [09:05:28] no kills? [09:05:37] nope [09:05:44] last log entry is at 13 april [09:05:46] for kills [09:07:24] there was a spike of lag at 8:04? [09:08:11] and a spike of aborted clients- which means "the client stopped responding" [09:08:15] I don't see that on es2022 or es2025 [09:08:59] please recheck: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=es2022&var-port=9104&from=1620114490073&to=1620115805893 [09:09:07] maybe I am looking wrongly [09:09:56] the thing is "aborted clients" didn't tell us more than the backups log told us [09:10:17] "Lost connection to MySQL server during query", which we already knew [09:10:22] it doesn't tell us why [09:10:31] are we sure that lag is real? [09:10:42] even the graph itself says 0 second [09:10:45] so maybe a graphing issue? [09:11:02] real is likely, but it is a bit far from the timestamp we are interested [09:12:16] what do you mean? It says "2 seconds" for probably a single collection [09:13:09] it just happens >5 minutes after the issue, so not probably related [09:13:19] 10DBA, 10SRE, 10ops-codfw, 10serviceops: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10jijiki) [09:13:44] the fact that it happened on 2 servers at the same time, for me it indicates network or app at client, not server [09:14:12] es2025 has no errors on its network iface [09:14:19] so yeah, the servers themselves look ok [09:14:50] which would be ok, but it happened on both datacenters- too frequently [09:15:34] maybe we can check switches graphs to see if maybe the port got saturated or something? [09:16:52] let me check eqiad, to see if there is something else there [09:18:33] backup1002 failed at 2021-05-04 00:01:49 [09:18:39] again both dumps at the same time [09:20:21] I will continue investigating on my own, will report here if I find something [09:20:34] ok! [09:20:52] thanks for checking mysql, I am now sure it is not mysql [09:21:30] yeah, both hosts at the same time it is very unlikely, it must be something from the client [09:21:44] I see rsyslog failing at the same time [09:22:35] "omkafka: action will suspended due to kafka error -195: Local: Broker transport failure" [09:22:53] are both hosts connected to the same switch? [09:23:11] this is on the same host [09:26:08] I found nothing, but I realized a theory for random transfer.py failures [09:26:54] there is a service that cleans up temporary files on buster- maybe that is interacting badly with temporary files (locks and md5sum) created by transfer [09:27:01] something to check at a later time [09:27:46] (but only on long running transfers) [09:29:06] doe sit use /tmp? [09:29:09] I mean transfer [09:29:27] it uses some path, let me check [09:30:38] yeah, it uses /tmp [09:30:58] and maybe the behaviour changed from "delete on every reboot" [09:31:07] to delete with the service every X hours [09:31:10] maybe move it to /var/run ? [09:31:15] to test, I mean [09:32:44] yeah, that would be super easy [09:33:07] I will add a note to the ticket as a potential trigger of the issue [09:33:29] as I just happened to run with an unrelated log entry saying "running clean up of /tmp, etc." [09:34:55] regarding this issue, I don't see ethernet, kernel or other relevant system logs [09:35:07] other than rsyslog failures [09:36:51] and that happens very frequently, so not really a clue [09:47:19] yeah, all the logs look clean [09:47:21] on both codfw hosts [10:18:58] 10DBA, 10Cognate, 10ContentTranslation, 10Growth-Team, and 9 others: Restart x1 database master (db1103) - https://phabricator.wikimedia.org/T281212 (10Trizek-WMF) [10:19:08] 10DBA: transfer.py fails when copying data between es hosts - https://phabricator.wikimedia.org/T262388 (10jcrespo) Another potential reason for errors is the service that cleans up temporary files (systemd-tmpfiles-clean.timer). transfer.py uses /tmp for a couple or reasons (locking, checksumming, xtrabackup te... [10:51:01] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s2 - https://phabricator.wikimedia.org/T281826 (10Marostegui) 05Open→03Resolved a:03Marostegui This is all clean. Of course, once we switch the master we'll need to remove the old server_id for db1122 (171978786) before adding s2 to orchestrator [10:51:03] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) [10:51:16] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) [11:08:23] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Marostegui) I am working on the migration document, making it a lot more detailed and with actual commands. Once that looks good, I will try the procedure on our testi... [11:09:27] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Marostegui) a:03Marostegui [11:18:41] I have a bullseye transferpy package ready for bullseye, but it will depend on python3-wmfmariadbpy-remote [11:24:39] same for wmfbackups ones - I have uploaded for now to apt1001:/home/jynus/bullseye CC moritzm [12:13:44] jynus: hey. i'm looking at reimaging the candidate master for s6 in codfw to buster. i see that there is both a stretch and buster backup source there. what coordination is required before i proceed? does https://gerrit.wikimedia.org/r/c/operations/puppet/+/681621 get merged before, or after? [12:14:12] mm. well. it can't be before, i think. my guess is that it gets merged when the s6 _master_ (not candidate) master gets reimaged [12:14:12] at any time, really [12:14:45] ah, ok, nevermind me then :) [12:14:46] it will mean that backups are started to be taken from the buster hosts [12:14:54] there is no hard dependency [12:15:02] more like, whenever you think that's adequate [12:15:16] that's the good thing about having the choice :-) [12:15:56] but probably "around the same time of the switchover" [12:16:21] from s6 master to s6 candidate master? [12:16:25] or do you mean the dc switchover? [12:16:44] sorry I wasn't clear, the master upgrade, I meant [12:16:50] gotcha [12:16:59] but again, it is not a hard dependency [12:17:02] ok, in that case i'll go ahead with the candidate master upgrade today [12:17:10] the idea is that, from the moment it is deployed [12:17:20] we will generate primarilly buster backups [12:17:44] (the logical ones will still work for any os/version) [12:20:58] 10Data-Persistence-Backup, 10Goal: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) [12:22:44] there will be one further step on my side (in case you are documenting the list) which is getting rid of the -then- unused stretch instance [12:23:11] and that can be at the end of all other steps- when we are 100% sure we will not revert anything (cleanup) [12:24:06] I will add it to T280751 [12:24:08] T280751: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 [12:28:48] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10jcrespo) [12:29:07] ^ feel free to improve on that, that is my best take for now [12:29:16] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Kormat) [12:29:36] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Kormat) Turns out the candidate master for s6/codfw (db2114) is already running buster/10.4. [12:29:44] lol [12:30:11] you are welcome! [12:30:23] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10jcrespo) [13:15:41] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by kormat on cumin2001.codfw.wmnet for hosts: ` ['db2129.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20210505131... [13:57:09] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) s6 is done, pending the master. It will be finished once we've completed the migration to 10.4 on T280751 [13:57:19] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) s6 is done, pending the master. It will be finished once we've completed the migration to 10.4 on T280751 [13:57:23] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) s6 is done, pending the master. It will be finished once we've completed the migration to 10.4 on T280751 [13:57:31] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [13:57:38] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [13:57:42] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [14:05:44] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [14:05:47] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [14:05:50] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [14:06:35] 10Blocked-on-schema-change, 10DBA: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 (10Marostegui) [14:06:37] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [14:06:41] 10Blocked-on-schema-change, 10DBA: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 (10Marostegui) [14:19:01] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) s8 is fully done apart from the master (db1104) [14:35:58] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s5 - https://phabricator.wikimedia.org/T281828 (10Marostegui) 05Open→03Resolved a:03Marostegui This is all clean. Of course, once we switch the master we'll need to remove the old server_id for db1100 (171974853) before adding s5 to orchestrator [14:36:00] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) [14:36:07] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) [14:49:31] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2129.codfw.wmnet'] ` and were **ALL** successful. [14:57:26] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10Kormat) [14:58:05] jynus: i think https://gerrit.wikimedia.org/r/c/operations/puppet/+/681621 can be merged now. the s6 master in codfw is now buster. [14:58:22] cool, then, thanks for your work! [14:58:30] doing now [14:59:36] marostegui: db2095:s6 is broken [14:59:43] yep [14:59:47] it is me cleaning up heartbeat [14:59:55] ah ok, grand :) [15:03:02] it is now fixed [15:03:55] it is such a pain to clean up the table [15:04:00] sooo easy to screw it [15:04:26] and it's hard to blame me when you do [15:04:29] inconvenient, i'm sure [15:08:16] so by reviewing s5 on codfw, you can see that technicaly there is still a 10.1 instance (db2097) but that just is pending for me to remove (it is passive) [15:08:19] sorry [15:08:21] I meant s6 [15:09:29] One thing I could probably improve at some point is not having to manually indicate were to take backups from, but make the software discover/decide smartly [15:10:56] some advance machine learning algorithm such as "while (stretch master) {backup from stretch} else {backup from buster}" [15:12:56] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10jcrespo) [15:19:38] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10jcrespo) [15:20:04] 10DBA, 10Patch-For-Review: Upgrade s6 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T280751 (10jcrespo) ^I've prepared the backup failover for eqiad :-) [15:37:44] I tried to build a wmf-mariadb104-client package for Bullseye [15:38:56] it seems to work, but the base path is a little different [15:39:42] it's /opt/mariadb-10.4.18-linux-systemd-x86_64 for me, while the postinst seems to expect the alterantive without the systemd part [15:40:04] or should I have downloaded the sysvinit variant? [15:40:12] mmm [15:41:08] it's simple to fix, just wondering if I'm on the wrong path [15:41:10] I don't remember being 2 versions before, just 1 with 2 compilation options [15:42:20] current https://mariadb.org/download/ let's me pick the init system and select between systemd and sysvinit [15:43:03] that's new [15:43:20] or it is the mariadb.com stuff, not the foundation one, not sure [15:43:37] oh, I see [15:43:49] I think you are bundling a compiled version? [15:44:03] we use the source one [15:44:47] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) [15:44:49] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on s6 - https://phabricator.wikimedia.org/T281829 (10Marostegui) 05Open→03Resolved a:03Marostegui This is all clean. Of course, once we switch the master we'll need to remove the old server_id for db1131 (171974662) before adding s6 to orchestrator [15:45:06] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) [15:45:17] we don't really patch much, but we disable a lot of stuff (extra engines) [15:45:46] for the client I don't think it matters much [15:47:11] btw, ruwikinews watchlist is now 1% of its original size, needs shrinking if you want to, it'll free up a couple of gigabytes in every host (the table was among the largest watchlist tables across fleet https://phabricator.wikimedia.org/P15523) [15:59:32] on the not so bright news, with abstracting user table. Now we have +4K drifts reported. With revision it'll go higher [16:10:02] I hope we can clean them up during the dc switchover [16:35:29] moritzm, I've left some completely untested 10.5 packages on apt1001:/home/jynus/bullseye [16:46:02] jynus: <3 [16:46:56] will send patches, but those are not intended for deploy or replace your work, just some ongoing testing, that didn't take me much from the backup testing I was already doing [16:48:48] there is some changes I don't know about on the server- column store, etc. [16:55:24] ah I see 10.5 for bullseye, not 10.4 as moritzm was testing [16:56:04] yeah, I was working with 10.5 as you said it was ok [16:56:26] sure sure :) [16:56:38] but as I said you can install what you want on cumin [16:57:08] we should probably go for 10.4 in bullseye for now I think, as we haven't even finished migrating to it :( [16:57:13] sure [16:57:49] although now without multisource migrations should be less painful [16:57:59] as multisource was a source of unknowns [17:14:53] also the reason was that I thought ongoing work on bullseye cumin was mostly for testing reasons [19:14:59] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 4.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [19:17:25] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104