[05:28:13] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned - https://phabricator.wikimedia.org/T276150 (10Marostegui) [05:29:02] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [05:31:47] 10Blocked-on-schema-change: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [05:32:00] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 (10Marostegui) [05:39:24] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [05:39:36] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db2152 is now "pooled" as vslow in s8. [05:41:11] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) Once eqiad dump, vslow hosts are done, I will come up with a plan for the rest of the hosts. [06:07:08] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [06:07:33] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) Once db1162 is back (T275309) I will reimage it and repopulate it. [08:21:58] marostegui: for rc table we have this schema change as well but not yet merged: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/667358 [08:23:18] ugh we can't start it unless the change gets to production since puppet runs the maintenance script using the index. https://gerrit.wikimedia.org/g/operations/puppet/+/eabe27c5df7c47a20f53264804c4d5a595d9f74f/modules/snapshot/files/cron/dumpcategoriesrdf-daily.sh [09:32:13] how do you want to go about T274809? [09:32:13] T274809: Drop unused database "bacula" from m1 - https://phabricator.wikimedia.org/T274809 [10:07:30] let's do it a bit later, I have an interview in a bit :( [10:09:49] ok [10:42:41] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10LSobanski) @Papaul Thanks. I'm guessing https://phabricator.wikimedia.org/maniphest/task/edit/form/66/ would be the best place for this. Unfortunately I cannot edit the form. Wo... [12:53:23] 10Data-Persistence-Backup: Discovery of external vendor data requiring backup - https://phabricator.wikimedia.org/T276219 (10LSobanski) [12:53:39] 10Data-Persistence-Backup: Discovery of external vendor data requiring backup - https://phabricator.wikimedia.org/T276219 (10LSobanski) p:05Triage→03Low [12:56:44] 10Data-Persistence-Backup: Internal APT repository backup - https://phabricator.wikimedia.org/T276220 (10LSobanski) [12:56:58] 10Data-Persistence-Backup: Internal APT repository backup - https://phabricator.wikimedia.org/T276220 (10LSobanski) p:05Triage→03Low [12:57:34] PROBLEM - MariaDB sustained replica lag on db1160 is CRITICAL: 17.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1160&var-port=9104 [12:58:44] RECOVERY - MariaDB sustained replica lag on db1160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1160&var-port=9104 [13:08:16] 10Blocked-on-schema-change, 10DBA: Drop default of oldimage.oi_timestamp - https://phabricator.wikimedia.org/T272511 (10Marostegui) [13:29:11] 10Blocked-on-schema-change, 10DBA: Drop default of oldimage.oi_timestamp - https://phabricator.wikimedia.org/T272511 (10Marostegui) [13:30:38] 10Blocked-on-schema-change, 10DBA: Drop default of oldimage.oi_timestamp - https://phabricator.wikimedia.org/T272511 (10Marostegui) s5 progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1003 [] db1154 [x] db1150 [x] db1145 [] db1144 [] db1130 [] db1124 [x] db1113 [] db1110 [] db1100 [... [13:31:24] 10Blocked-on-schema-change, 10DBA: Drop default of oldimage.oi_timestamp - https://phabricator.wikimedia.org/T272511 (10Marostegui) [14:03:58] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) db1164 is now replicating in s1 (running 10.4.18) Will start pooling after 24h [14:04:52] jynus: I am ready for the drop! [14:04:59] ok [14:05:13] you want to do it, or I do? [14:05:18] you can go! [14:05:37] I will drop it on the master, ok? [14:05:41] yep! [14:05:54] 💧 [14:08:51] now we need to cleanup grants [14:09:39] the application ones and the backup ones [14:09:47] yep [14:09:54] and from puppet tracking file [14:10:14] do you know where the grants on puppet are? [14:10:42] I would assume ./modules/role/templates/mariadb/grants/production-m1.sql.erb [14:10:45] let me check if they are there [14:11:00] yeah, we have the bacula and bacula9 ones there [14:16:16] https://gerrit.wikimedia.org/r/c/operations/puppet/+/667870 [14:16:20] checking [14:19:15] also hhttps://gerrit.wikimedia.org/r/c/operations/puppet/+/657801/7..8 [14:19:23] https://gerrit.wikimedia.org/r/c/operations/puppet/+/657801/7..8 [14:19:29] with just one h [14:20:18] ah nice [14:20:29] check that last edit [14:20:40] Hyper Hypertext Transfer Protocol - sounds like a nerdcore cover of a Scooter song [14:20:40] if you are ok with both I will remove it from production [14:20:45] yeah, just did [14:20:52] go for it I 1+ed them [14:20:52] and the last thing I will do [14:21:05] is update documentation, after deploy [14:21:08] +1 [14:21:11] it should take me not a long time [14:21:12] thanks [14:21:24] so ir you are still around you could maybe take a look on the wiki after I do it [14:21:30] *if [14:21:35] sure, send me the link once you've got it [14:22:23] last question, should I commit 657801, even if unused? [14:22:43] to have at least a way to track the grants until a better solutions is thought? [14:22:44] yeah, I think it is useful to have it in git [14:22:46] yeah [14:23:03] as in "better something bad that nothing" 0:-) [14:23:07] *than [14:24:33] I mean, worst case scenario, we just commit its deletion [14:27:34] I cannot remeber now if drop user if exists is available in mariadb and in this version [14:27:49] "The IF EXISTS clause was added in MariaDB 10.1.3" [14:27:55] so it should be ok, I think [14:28:00] it is available [14:28:32] I will use it because of misc active-passive, as it will be a very small risk [14:30:52] I will also try to use orchestrator instead of tendril for reference [14:31:13] +1000 [14:42:04] doc update: https://wikitech.wikimedia.org/w/index.php?title=MariaDB%2Fmisc&type=revision&diff=1901142&oldid=1890317 [14:42:32] tell me if I forget something, otherwise, I will resolve the ticket [14:43:12] I am going to do a grep on puppet meanwhile [14:47:07] looks good! [14:47:08] thanks [14:47:36] ok, closing, thanks for the assistance! [14:47:44] thank you! [14:49:56] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo) [14:50:09] 10DBA, 10Data-Persistence-Backup, 10Data-Persistence, 10Patch-For-Review: Drop unused database "bacula" from m1 - https://phabricator.wikimedia.org/T274809 (10jcrespo) 05Open→03Resolved a:03jcrespo Db dropped, (and a backup was previously generated), grants removed, documentation [[ https://wikitech.... [14:52:49] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Papaul) @LSobanski Please reach out to Rob. Thanks [15:19:19] 10Data-Persistence-Backup, 10Epic, 10Patch-For-Review: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10jcrespo) [15:19:40] 10DBA, 10SRE: Puppetize grants for mysql hosts that are the source of recovery (dbstore, passive misc) - https://phabricator.wikimedia.org/T111929 (10jcrespo) 05Open→03Resolved a:03jcrespo This is technically done with the merging of the previous patch. This is not a great solution, but it is "a" solutio... [15:21:04] ^ what?, first resolving a 10 yo bug and now a 5+ yo one? what's next attending bugs in less than a year??? [15:22:22] Nice jynus [15:23:09] jynus: cumin2001 is due for HW renewal, if you have any requirement, want to check the specs are ok for the usage in backups please see T275580 [15:24:01] volans, it is a mere orchestrator- it works on any hw- only thinkg I tought is if it would make sense to separate it (have 2 cumin instances per dc, one dedicated to dbs/backups) [15:24:41] what have in common dbs and backups? [15:24:47] sorry [15:24:54] not used for backups in general [15:25:03] just for database backups orchestration [15:25:23] let me check the role [15:25:50] if you want that feel free to ask for the hardware via normal channels, this is just for the renewal of older hardware [15:26:35] well, I am asking if you would want that to happen, or see no issue with current setup, as it would influence that decision [15:26:50] I don't need it [15:27:51] the 2 relevant profiles would be "profile::mariadb::wmf_root_client" and "include profile::dbbackups::transfer" [15:29:04] if you see no issue with current setup (including that) the answer would be "anything will work" [15:29:31] including the quote you mention [15:29:45] I currently don't see any problem, not sure if moritzm has a different idea [15:30:13] also all the plans for non-root cumin will be based on a different host, so cumin* will for the foreseable future remain a global-root only host [15:30:31] mm, that is new info [15:30:34] I didn't know [15:30:48] I think backups should work as non root [15:30:56] T244840 [15:30:57] T244840: Evaluate options for non-root operations with cumin and spicerack cookbooks - https://phabricator.wikimedia.org/T244840 [15:31:07] yeah, cumin* is explicitly root-only for many things, so not sure if we gain much by splitting to an additonal host [15:31:49] ok, so lets keep the arch as it is now, unless an issue appears [15:32:01] and if we setup a non-root cumin, we may work to migrate it there [15:32:05] at a later time [15:34:22] in fact, the first thing the db backups script does it sudo as the backup generating user [15:35:27] depends if the backup user has secrets that should not be read by non-root [15:36:19] we would have to change some things indeed (owner of paswords, etc.) [15:37:52] but to stress your original question, we take very little resources right now on cumin [15:38:03] (at least in theory, unless there is a bug) [15:38:28] ok then, no special requirement, basic specs it is :) [15:38:48] it should just run a dozen of cumin commands every day, most resources taken on source backups host and dbprov ones [15:38:58] volans, correct [15:39:25] unless marostegui or kormat has some additional input for mariadb management [15:46:32] volans: I will get back to you tomorrow, sorry [15:47:35] np [16:07:58] 10DBA, 10SRE, 10ops-eqiad: eqiad: move db1111 to rack A8 - https://phabricator.wikimedia.org/T273982 (10elukey) [19:21:14] PROBLEM - MariaDB sustained replica lag on db1121 is CRITICAL: 11 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1121&var-port=9104 [19:22:24] RECOVERY - MariaDB sustained replica lag on db1121 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1121&var-port=9104 [20:37:11] PROBLEM - MariaDB sustained replica lag on db1121 is CRITICAL: 22.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1121&var-port=9104 [20:39:01] RECOVERY - MariaDB sustained replica lag on db1121 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1121&var-port=9104 [21:12:37] 10DBA, 10Performance-Team, 10SRE, 10Sustainability (MediaWiki-MultiDC): Apache <=> mariadb SSL/TLS for cross-datacenter writes - https://phabricator.wikimedia.org/T134809 (10Krinkle) [21:13:36] 10DBA, 10SRE, 10Performance-Team (Radar), 10Sustainability (MediaWiki-MultiDC): Apache <=> mariadb SSL/TLS for cross-datacenter writes - https://phabricator.wikimedia.org/T134809 (10Krinkle) [23:50:16] 10Blocked-on-schema-change: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Ladsgroup)