[02:03:34] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Papaul) [04:21:30] 10DBA, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) Positions and binlogs for labsdb1011 to replicate from once ready: {P11161} [04:47:59] 10DBA, 10Operations: Upgrade and restart s3 and s7 primary DB master: Thu 7th May - https://phabricator.wikimedia.org/T251158 (10Ovedc) Hello! can you please allow the register users dismiss the notice, once they read it? Thanks! [04:49:28] 10DBA, 10Operations: Upgrade and restart s3 and s7 primary DB master: Thu 7th May - https://phabricator.wikimedia.org/T251158 (10Marostegui) >>! In T251158#6115189, @Ovedc wrote: > Hello! can you please allow the register users dismiss the notice, once they read it? Thanks! Thanks for the message. However we... [05:14:47] 10DBA, 10Operations: Upgrade and restart s3 and s7 primary DB master: Thu 7th May - https://phabricator.wikimedia.org/T251158 (10Marostegui) 05Open→03Resolved This is done. RO started: 05:00:47 RO finished: 05:04:19 [05:14:50] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [05:15:57] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [05:17:25] 10DBA, 10Operations: Upgrade and restart s3 and s7 primary DB master: Thu 7th May - https://phabricator.wikimedia.org/T251158 (10Marostegui) [05:24:50] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2078.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20200507... [05:25:19] haproxy failover on codfw [05:25:23] expected [05:25:28] ok [05:26:06] ruwiki ongoing on labsdb1011 [05:26:17] yes [05:53:28] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2078.codfw.wmnet'] ` and were **ALL** successful. [06:02:18] 10DBA: inverse_timestamp column exists in text table, it shouldn't - https://phabricator.wikimedia.org/T250063 (10Marostegui) 05Open→03Stalled Stalling per the above comment [06:02:26] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Sustainability (Incident Prevention), 10WorkType-NewFunctionality: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10Marostegui) [06:27:35] I would like to move forward with https://gerrit.wikimedia.org/r/c/operations/puppet/+/593527 [06:28:06] although not sure with what exact combination of parameters [06:29:27] maybe I should deploy it first non-paging [06:29:46] but what about codfw? read only or read write? [06:31:03] did you see my comment there? [06:31:05] my vote, I think, would be to keep codfw rw, but not replicate from it [06:31:14] yeah, but that didn't involve codfw [06:31:23] Yeah, we should leave codfw rw (as it is now) [06:31:34] oh, it is? [06:31:38] yes [06:31:42] it's always been [06:31:43] I thought krinkle said it wasn't [06:31:49] that is why I am confused [06:31:52] No, I mean, it is not being written [06:31:55] But read_only is OFF [06:31:59] ok [06:32:13] yeah, no ignoring the comment [06:32:22] but that I wanted to discuss later :-D [06:32:50] I think we could enable paging for masters on a separate patch [06:32:59] for the primary dc [06:33:04] yeah, that is ok too [06:33:05] but separate patch [06:33:11] sounds sane yeah [06:33:15] ok, this is my proposal [06:33:27] I enable rw checking on ALL parsercaches, with no paging [06:33:40] We've had that ticket pending so maybe it is a good moment to do it now (in the next few days on a separate patch) to get it over with [06:33:43] and if it works well, I enable paging on all masters and pcs on primary [06:33:49] that sounds good [06:33:59] thanks, wanted you sanity check and ok [06:34:58] I think that check has to be enabled on some misc hosts, too, but that's another story [06:35:08] then I guess we need to set is_critical to false on your patch for now? [06:35:16] yep, will amend [06:35:21] coolio [06:39:30] last version (for now): https://gerrit.wikimedia.org/r/c/operations/puppet/+/593527/2/modules/profile/manifests/mariadb/parsercache.pp [06:40:39] PCC looks good? [06:42:33] ongoing... [06:45:21] Looking good: https://puppet-compiler.wmflabs.org/compiler1003/22376/ [07:03:46] 10DBA, 10observability, 10Epic: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492 (10jcrespo) [07:03:50] 10DBA, 10observability, 10Patch-For-Review, 10Sustainability (Incident Prevention): Monitor read_only on all databases, make it page on masters - https://phabricator.wikimedia.org/T172489 (10jcrespo) 05Stalled→03Open a:03jcrespo [07:05:46] this is the answer to your patch comment :-D https://gerrit.wikimedia.org/r/c/operations/puppet/+/594885 [07:05:51] will wait a bit more to enable it [07:06:20] it is actually wrong [07:06:30] it should be is_critical && is_master [07:11:59] haha that would have been a nice storm of sms [07:13:44] actually not [07:14:29] it only controls the paging or not-and rarely it alerts [07:15:16] I made a bit of refactoring https://gerrit.wikimedia.org/r/c/operations/puppet/+/594885/3/modules/role/manifests/mariadb/core.pp [07:15:29] I think it is clearer with meaningful variables [07:15:38] but ofc will require more testing [07:15:53] I always find some amusing the contact groups dba, admins XD [07:16:02] It always takes a few seconds to realise what that actually means [07:17:26] yeah, I am not sure that work anymore [07:17:45] however, I think we could send pages only to us? [07:17:59] not sure with the new workflow [07:18:07] I think it is critical enough to page everyone [07:18:21] I would not thouch that in this patch, would ask later [07:23:35] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10Marostegui) @jcrespo db2078 has been upgraded. I believe this will be the first 10.4 host we'll be taking data dumps from, if you notice something weird, let me know! [07:25:53] ^did you do the revoke from the right user? [07:26:02] yeo [07:26:04] yep [07:26:05] thanks [07:26:22] it will be tested next week :-D [07:28:01] https://puppet-compiler.wmflabs.org/compiler1001/22377/ [07:28:27] https://puppet-compiler.wmflabs.org/compiler1001/22377/db1089.eqiad.wmnet/index.html that means we'll page for slaves too? [07:34:05] nope, that just changes the group, as per your request [07:34:14] it didn't change the is_critical part [07:34:25] as per your request, which I agree :-D [07:34:52] yeah, it make it consistent with the monitor_replication check [07:35:11] so maybe we can stop puppet on the masters and merge? and test on one master? [07:36:04] wait a bit [07:36:10] we are still deploying the pc change [07:36:18] oh, I thought it was done! [07:36:38] pc1008 MariaDB read only pc2 CRITICAL 2020-05-07 07:34:25 0d 0h 1m 14s 1/3 Could not connect to localhost:3306 [07:36:50] mm [07:37:02] only that one? [07:37:24] well, everything is very asynchronous, so not sure [07:37:32] check_mariadb works fwiw [07:37:37] that is a 10.4 host [07:37:52] so maybe grants issue? [07:38:03] I think that was the one that was totally reimaged as we re-created the raid [07:38:04] maybe a client library [07:38:06] let me check grants [07:38:10] ? [07:38:31] but strange becuse it worked in other 10.4 upgrades [07:38:41] grants look the same as the ones on pc1007 [07:39:28] if I run check_mariadb.py, it works as expected [07:39:36] yep [07:39:40] root@pc1008:/etc# /usr/bin/check_mariadb.py --port=3306 --icinga --check_read_only=false --process [07:39:40] Version 10.4.12-MariaDB-log, Uptime 2337280s, read_only: False, 3834.62 QPS, connection latency: 0.002539s, query latency: 0.000605s [07:40:06] yeah, that's weird [07:40:22] ideas? [07:40:55] let me check other hosts [07:41:31] maybe the icinga user? [07:42:10] yeah, it is grants [07:42:12] root@pc1008:/etc# sudo -u nagios /usr/bin/check_mariadb.py --port=3306 --icinga --check_read_only=false --process [07:42:12] Could not connect to localhost:3306 [07:42:15] those hosts [07:42:28] were never migrated to unix_socket authentication [07:42:36] on pc? [07:42:48] yeah, all have to be updated to passwordless access [07:43:18] ok, let me fix that on pc1008 [07:44:15] sudo -u nagios /usr/bin/check_mariadb.py --port=3306 --icinga --check_read_only=false --process [07:44:16] Version 10.4.12-MariaDB-log, Uptime 2337558s, read_only: False, 4200.44 QPS, connection latency: 0.003152s, query latency: 0.000869s [07:44:18] done [07:44:33] set sql_log_bin=0; grant usage on *.* to 'nagios'@'localhost' IDENTIFIED VIA unix_socket; [07:44:47] uh? [07:44:48] I am guessing you did that^ [07:45:03] yes [07:45:08] I pasted that it works now? [07:45:55] you do it everywhere, or I do? [07:46:28] everywhere == all pc hosts [07:48:54] based on recoveries, I think you are doing it now [07:51:50] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=MariaDB+read+only+pc [07:54:09] so krinkle was right [07:54:25] pc2007 and pc2009 were in read only [07:54:51] pc2008 and pc2010 were in read_only=off [07:55:40] I will set them in read-write [07:55:58] after making sure they are not replicating to anything [08:00:59] nice irc split [08:01:05] jynus: I said that I fixed grants on all the hosts [08:01:15] No idea why pc2007 is alerting, it shoulkd have read_only=0 [08:01:16] but it has 1 [08:01:30] maybe we manually forced it to 1? [08:01:59] oh, you are back [08:02:07] marostegui: it just happens [08:02:10] without the check [08:02:12] a reboot [08:02:14] or something [08:02:15] yeah, I wrote a bunch of things from 09:46 XDDD [08:02:17] but never arrived XD [08:02:27] see my only thing [08:03:01] bot was on my side: http://bots.wmflabs.org/logs/%23wikimedia-databases/20200507.txt [08:03:08] lucky! [08:03:10] reading [08:04:10] ok, read it [08:04:27] interestingly, when timo suggested it, I actually checked pc2008 which is read_only off [08:04:32] heh [08:04:41] will set it now [08:04:45] excellent [08:04:48] after I checked there was no replication [08:04:55] could you check puppet was ok? [08:05:02] yeah, and my.cnf says read_only=0 [08:05:04] puppet/my.cnf [08:05:05] ok [08:05:11] so as I said, this happens [08:05:16] it happened a lot on misc hosts [08:05:20] when we first enabled [08:05:30] that is why this was very valuable [08:05:48] and wanted to move it forward [08:06:11] https://phabricator.wikimedia.org/P11168 [08:06:54] logged it [08:07:06] cool [08:07:09] ta [08:08:13] I need to get ready to go downstairs [08:08:19] Will be back hopefully in around 1h [08:08:24] and virus-free [08:08:27] yeah, np [08:08:35] I am not going to do any paging deployment [08:08:40] just more testing [08:08:49] this was good [08:08:57] we detected and issue and corrected it :-D [08:09:05] that is something to celebrate [08:09:20] enjoy your time in civilization! [08:53:44] hmm. it looks like there's a `partman/early_command`. i wonder if we can hook something into that [10:24:07] jynus: marostegui: moritzm: updated T252027 a bit [10:24:07] T252027: debian-installer: partman doesn't allow lvm LVs to be reused when reimaging - https://phabricator.wikimedia.org/T252027 [10:24:17] and i now know far too much about the internals of partman [10:24:32] @meeting will check when finished [10:24:55] I'll have a look later the day, currently in the middle of things [10:25:25] np :) [11:24:46] I have updated zarcillo.masters as it was still showing db2065 as m3 codfw master, which was outdated [11:29:26] thanks [12:01:33] 10DBA, 10Core Platform Team: text table still has old_* fields and indexes on some hosts - https://phabricator.wikimedia.org/T250066 (10Marostegui) @Ladsgroup did you get any response? [12:32:01] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['pc2010.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20200507... [13:00:38] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['pc2010.codfw.wmnet'] ` and were **ALL** successful. [13:01:10] do I disable puppet on all db hosts? [13:01:23] or maybe the masters only? [13:01:34] but yeah, on all hosts also work [13:01:36] depends on your trust of the patch :-D [13:01:52] it doesn't hurt to disable it everywhere [13:02:12] in theory is safe [13:02:27] but this way we do a more controlled deployment [13:02:55] yeah, we need to test 1 master in eqiad, 1 master in codfw, 1 slave in eqiad and codfw [13:04:16] I'm on it [13:04:21] :* [13:05:51] done it on all es,db and pc hosts [13:05:55] not touching the others [13:05:58] sweeeeet [13:06:03] deploy now? [13:06:10] yeah [13:06:55] Notice: Skipping run of Puppet configuration client; administratively disabled (Reason: 'make masters page T172489') [13:06:55] T172489: Monitor read_only on all databases, make it page on masters - https://phabricator.wikimedia.org/T172489 [13:07:00] was checking it worked [13:08:26] I think we should test the following with RW: codfw master, codfw slave, eqiad slave [13:08:35] give me suggestions :-D [13:08:48] db2112 - codfw master [13:08:51] db1089 eqiad slave [13:09:04] db2071 codfw slave [13:09:08] should test multi-instance too? [13:09:41] running on db2112 [13:09:43] I am going to be very sad once we refresh db1089, it is my preferred slave for somereason :( [13:09:46] those were not touched [13:09:58] but no harm on testing [13:10:17] ok: db2088 codfw multininstance [13:10:24] db1099 eqiad multi instance [13:11:42] running puppet on icinga after the first one [13:11:50] cool [13:14:12] I think I saw no change for db2112 [13:14:49] which is expected in codfw, no? [13:14:54] yeah [13:15:00] also on the next one, which is db1089 [13:15:21] nice [13:15:35] not checked yet [13:15:40] I mean that we expect no change [13:15:42] it is running now [13:19:18] no change [13:19:29] coool [13:19:36] going for master eqiad now [13:19:47] master eqiad: db1083 for instance is enwiki [13:25:07] + service_description MariaDB read only s1 #page [13:25:15] nice! [13:25:21] thanks for the page jynus :-P [13:25:31] eh? [13:26:03] -operations should be monitored, not this one [13:26:23] 🚨 📟 🚨 🔥 🚨 [13:26:26] :D [13:27:22] marostegui: I am going to leave it like that for some minutes, then reeenable but not force a run of puppet [13:27:30] sounds good [13:28:16] I am a bit worried about that creating positive effects during dc switchover [13:28:16] I am happy to see this deployed \o/ [13:28:27] positive effects? [13:28:38] false positives, sorry [13:28:59] but I think that is a very very edge case "we are moving a datacenter" [13:29:01] we because of a race condition? [13:29:10] and forgot to downtime the check [13:29:14] we can also downtime the masters [13:29:22] yes, as we do with replication [13:29:33] this is all theoretical [13:29:46] I think we never had one of those in the last 3 years [13:29:59] but we had servers going down and coming up in read only [13:30:05] at least once or twice [13:30:11] so I think it is a net positive [13:30:12] yep, s5 I think it was like 2-3 years ago [13:30:38] the check_mariadb.py was supposed to also be added other checks like uptime and connections [13:30:51] but we'll see about that, too much paging is not good [13:31:45] MariaDB read only s1 (hashtag)page View Extra Service Notes OK 2020-05-07 13:30:09 0d 0h 3m 0s 1/3 Version 10.1.39-MariaDB, Uptime 15217877s, read_only: False, 1549.61 QPS, connection latency: 0.003741s, query latency: 0.001035s [13:31:55] also remember to ack this one too for reboots :-D [13:32:11] for reboots? [13:32:15] master reboots [13:32:20] I think they are some left? [13:32:26] ah, I downtime them entirely [13:32:29] yes [13:32:40] ah, all services? [13:32:42] I see [13:32:43] yeah [13:32:46] for 1h only [13:33:02] but the checks I do include if all the slaves are connected and if the master is RW [13:33:05] normally if I only touch mysql and not the server, I only do replication, processes and read only [13:33:06] before enabling RW on MW [13:33:11] maybe systemd [13:33:27] yeah, it is a lot of stuff [13:34:16] everything seems normal [13:34:29] should we page on pcs being read only ? [13:34:41] Probably yes [13:34:54] I would treat them as a normal master [13:34:57] only core (es and db mw masters) and pcs, right? [13:35:06] yeah I think so [13:35:09] ok [13:35:15] I didn't include pcs here [13:35:17] let's consider core all the same [13:35:18] I can do it now [13:35:24] +1 [13:35:37] and that way we can close the ticket [13:35:47] * marostegui cries [13:36:03] we can do the other details (misc non paging, replication on misc at a later time) [13:36:16] yeah, that also has separate tasks [13:36:33] oh, is there a task for that, can you find it? [13:36:37] while I do the patch [13:36:41] yeah, let me check [13:36:54] one thing that maybe kormat could help us with [13:36:59] is the cleaning up of puppet [13:37:00] T237927 [13:37:01] T237927: Add replication lag (and other checks) to misc all hosts - https://phabricator.wikimedia.org/T237927 [13:37:16] cool thanks, I forgot about that [13:37:24] jynus: i've got an `rm -rf` sitting right here [13:37:29] ha [13:37:40] lots of things have been delayed [13:37:49] but now we may have the workforce to work a bit on cleaning up [13:37:57] not necesarilly perfection :D [13:38:08] I have also thought to give kormat to upgrade pc1 and pc3 entirely to buster and 10.4, that involves MW deployments and extra care with the active masters :) [13:38:12] just some cleaning, remove redundanty code [13:38:33] sure, not saying that should happen now or that he should do it [13:38:38] marostegui: welll, it probably still hurts less than partman :) [13:38:41] but that with 3, we could do it [13:38:44] yeah, we have plenty of stuff for kormat :p [13:38:48] step by step [13:39:03] specially abstraction-wise, this are in a really poor state [13:39:17] and we are half-migrated to the code style guide [13:39:31] jynus: ping when you have the pc patch ready for review [13:39:36] on it [13:39:46] I was checking the other task [13:39:54] sure, no rush, just ping me here [13:40:38] TIL partman does rpc in bash [13:40:51] 10DBA: Add replication lag (and other checks) to misc all hosts - https://phabricator.wikimedia.org/T237927 (10jcrespo) [13:41:10] oh wow [13:41:33] Isn't bash great? [13:41:51] it runs a `parted` daemon, and uses 2 fifos to interact with it [13:41:55] it's... amazing. [13:43:56] oh, I am stupid [13:44:07] the patch I was about to create is already merged [13:44:16] haha [13:44:33] and I was confused why editing a file with the same text as before didn't give me a diff [13:44:47] either I am stupid or I had somthing not appropiatelly rebased [13:44:53] probably the 1st [13:45:47] you merged this https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/593527/ [13:45:53] and it is already on produciton: pc1009: MariaDB read only pc3 (hashtag) page [13:46:08] but that says is_critical = false [13:46:11] so it won't page [13:46:15] no? [13:46:26] yes, but then I merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/594885/7 [13:46:36] which included on the same patch pc too [13:46:39] aaaaah right [13:46:44] I thought it was a separate one, or yet to do [13:46:55] everything is as it should, I just cannot remember what I did [13:46:59] XDDDDD [13:47:18] note you also voted +1 to that [13:47:21] Sergei Golubchik closed MDEV-21794. [13:47:21] ----------------------------------- [13:47:21] Fix Version/s: 10.4.13 [13:47:22] 10.5.3 [13:47:22] (was: 10.4) [13:47:22] (was: 10.5) [13:47:22] Resolution: Fixed [13:47:23] yaaaay [13:48:21] cool [13:48:33] I am going to reenable puppet everywhere [13:48:43] ok [13:48:48] and then close the ticket after it applies [13:48:53] will continue working on the other one [13:49:23] will move https://gerrit.wikimedia.org/r/c/operations/puppet/+/594905 to T237927 [13:49:24] T237927: Add replication lag (and other checks) to misc all hosts - https://phabricator.wikimedia.org/T237927 [13:49:31] 10DBA, 10Upstream: Possibly disable optimizer flag: rowid_filter on 10.4 - https://phabricator.wikimedia.org/T245489 (10Marostegui) ` Sergei Golubchik closed MDEV-21794. ----------------------------------- Fix Version/s: 10.4.13 10.5.3 (was: 10.4)... [13:50:18] we need a bot that changes MDEV links and #[:digit:]* to mariadb and mysql bugs links [13:51:12] They have included it here https://jira.mariadb.org/projects/MDEV/versions/24223 so that's good [13:53:02] this is interesting https://jira.mariadb.org/browse/MDEV-20257 [13:53:35] yeah I saw that one when checking the release bugs [13:53:44] But this is very unlikely to happen I think [13:53:49] run CREATE USER foo@localhost; [13:53:50] SIGKILL the server; [13:54:10] sure I know, but... [13:54:24] wonder which server has thousands of grant lines :-D [13:54:27] ? [13:54:33] hehe [13:55:07] as in, that could be a key difference between production and cloud dbs in terms of upgrade [13:55:18] although yeah, not that one specifically [13:55:31] yeah, very specific case [13:55:34] I am going to go offline [13:55:37] Thanks for working on the pages [13:55:42] Glad to see that already done! [13:55:49] is that ticket from 2018 or so? [13:56:26] Aug 4 2017, 10:30 [13:56:27] haha [13:57:16] Going offline o/ [13:57:43] \o/ [14:04:48] 10DBA, 10observability, 10Epic: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492 (10jcrespo) [14:04:48] 10DBA, 10observability, 10Patch-For-Review, 10Sustainability (Incident Prevention): Monitor read_only on all databases, make it page on masters - https://phabricator.wikimedia.org/T172489 (10jcrespo) 05Open→03Resolved This has been now fullfilled within the scope of the title. If a master server now cr... [14:09:14] 10DBA, 10Privacy Engineering, 10Security-Team, 10Patch-For-Review: Drop (and archive?) aft_feedback - https://phabricator.wikimedia.org/T250715 (10ArielGlenn) No internal links @Reedy, sorry ;-) The above can go as soon as someone gives the final thumbs up. [14:48:33] 10DBA, 10Operations: Upgrade and restart s3 and s7 primary DB master: Thu 7th May - https://phabricator.wikimedia.org/T251158 (10Agusbou2015) [15:32:10] 10DBA, 10Core Platform Team: text table still has old_* fields and indexes on some hosts - https://phabricator.wikimedia.org/T250066 (10Ladsgroup) >>! In T250066#6115758, @Marostegui wrote: > @Ladsgroup did you get any response? I didn't get anything. Maybe pinging again? [15:35:24] 10DBA, 10Core Platform Team: text table still has old_* fields and indexes on some hosts - https://phabricator.wikimedia.org/T250066 (10daniel) >>! In T250066#6116341, @Ladsgroup wrote: >>>! In T250066#6115758, @Marostegui wrote: >> @Ladsgroup did you get any response? > > I didn't get anything. Maybe pinging... [15:36:02] marostegui: is this on your radar? [15:36:05] "templatelinks tl_from index-mismatch-prod-extra": { [15:36:05] "s6": [ [15:36:05] "db1098.eqiad.wmnet", [15:36:05] "db1096.eqiad.wmnet" [15:36:05] ] [15:36:06] }, [15:38:04] Amir1: if there's not a task probably not :) [15:38:08] Amir1: mind creating one? [15:38:15] sure [15:38:21] thank you <3 [15:38:43] Thank you for cleaning up messes [15:40:46] Leftover from T174509 [15:40:46] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [15:41:24] thank you! [15:43:25] 10DBA: tl_from index on templatelinks is lingering in production - https://phabricator.wikimedia.org/T252126 (10Ladsgroup) [15:43:35] Done ^ [15:43:41] <3 [15:57:21] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Papaul) [16:01:44] 10DBA: Add replication lag (and other checks) to misc all hosts - https://phabricator.wikimedia.org/T237927 (10jcrespo) [16:01:46] 10DBA, 10observability, 10Epic: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492 (10jcrespo) [18:12:39] 10DBA, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), 10Schema-change: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10daniel) [18:13:29] 10DBA, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), 10Schema-change: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10daniel) Pinging @Bstorm for the labs replication question. [18:15:36] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), 10Schema-change: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10daniel) [18:47:41] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Papaul) [18:48:46] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Papaul) @Marostegui thanks for updating the operations/puppet update portion. [19:57:18] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), 10Schema-change: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10bd808) >>! In T238966#6116998, @daniel... [21:20:28] 10DBA: Automate the detection of netcat listen port in transfer.py - https://phabricator.wikimedia.org/T252171 (10Privacybatm) [21:27:21] 10DBA: Refactor transfer.py - https://phabricator.wikimedia.org/T252172 (10Privacybatm) [22:17:57] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), 10Schema-change: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10daniel) > I see a bunch of revision_co... [22:42:28] 10DBA: transfer.py fails to run 2 commands - https://phabricator.wikimedia.org/T252175 (10Privacybatm) [22:47:00] 10DBA: transfer.py fails to run 2 commands - https://phabricator.wikimedia.org/T252175 (10Privacybatm) [23:14:28] 10DBA, 10Core Platform Team: text table still has old_* fields and indexes on some hosts - https://phabricator.wikimedia.org/T250066 (10tstarling) I don't know what those fields are doing there. ` MariaDB [frwiki]> show create table text\G *************************** 1. row *************************** CREATE... [23:34:20] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), 10Schema-change: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Bstorm) From my reading, we have no su... [23:44:18] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), 10Schema-change: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Bstorm) So! Unless I'm missing somethi...