[04:46:28] 10DBA: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) [05:38:04] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts: ` ['db1118.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20200622... [05:46:19] 10DBA, 10Operations, 10decommission-hardware, 10ops-eqiad: decommission dbproxy1008.eqiad.wmnet - https://phabricator.wikimedia.org/T255406 (10Marostegui) [05:55:34] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1118.eqiad.wmnet'] ` and were **ALL** successful. [06:11:57] 10DBA, 10Analytics: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts: ` ['dbstore1005.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto... [06:30:34] 10DBA, 10Analytics: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbstore1005.eqiad.wmnet'] ` and were **ALL** successful. [07:01:00] 10DBA, 10Analytics, 10Patch-For-Review: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10Marostegui) [07:01:40] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10Marostegui) [07:01:40] 10DBA, 10Analytics, 10Patch-For-Review: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10Marostegui) 05Open→03Resolved a:03Marostegui All done! [07:19:23] 10DBA: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts: ` ['db1117.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202006220719_marostegui_6426.log`. [07:36:55] 10DBA: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1117.eqiad.wmnet'] ` and were **ALL** successful. [07:43:11] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on pc2007 - https://phabricator.wikimedia.org/T255904 (10Kormat) a:03Papaul [07:43:39] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on pc2007 - https://phabricator.wikimedia.org/T255904 (10Kormat) Hi @Papaul, can we get this disk replaced please? It should still be under warranty with dell. [07:43:53] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on pc2007 - https://phabricator.wikimedia.org/T255904 (10Kormat) idrac logs are here: https://phabricator.wikimedia.org/P11619 [07:47:56] 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10Marostegui) [07:52:41] 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10Marostegui) @jcrespo @akosiaris @ayounsi I would like to switchover the master to the new master that runs Buster and MariaDB 10.4. m1 holds: bacula librenms (which in previous switchovers has sho... [07:55:01] 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10akosiaris) >>! In T254556#6243442, @Marostegui wrote: > @jcrespo @akosiaris - I would like to do this maybe Thursday 25th at 08:00 AM UTC. Is that ok? Let me know if you prefer any other day/time S... [07:56:47] 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10Marostegui) Thank you, I have sent a calendar invite to you and to @jcrespo [08:19:15] 10DBA, 10Patch-For-Review: Upgrade m1 to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254556 (10jcrespo) ok. [08:30:57] 10DBA, 10Operations, 10ops-eqiad: db1088 crashed - https://phabricator.wikimedia.org/T255927 (10Kormat) [08:32:50] 10DBA, 10Operations, 10ops-eqiad: db1088 crashed - https://phabricator.wikimedia.org/T255927 (10Kormat) DCOps: The BBU on this machine has failed. Do you have a spare BBU in the DC, or if not, can we please order a replacement? Cheers. ` /system1/log1/record7 Targets Properties number=7 severity... [08:36:45] 10DBA, 10Operations, 10ops-eqiad: db1088 crashed - https://phabricator.wikimedia.org/T255927 (10Kormat) Mysql is started and catching up on replication. Once that's completed we'll perform a data consistency check. [09:08:48] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on pc2007 - https://phabricator.wikimedia.org/T255904 (10Marostegui) p:05Triage→03Medium [09:43:28] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts: ` ['db1094.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20200622... [10:04:48] 10DBA, 10Epic: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1094.eqiad.wmnet'] ` and were **ALL** successful. [10:19:56] PROBLEM - 5-minute average replication lag is over 2s on db1149 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1149&var-port=9104&var-dc=eqiad+prometheus/ops [10:19:56] PROBLEM - 5-minute average replication lag is over 2s on db1106 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1106&var-port=9104&var-dc=eqiad+prometheus/ops [10:19:56] PROBLEM - 5-minute average replication lag is over 2s on db1123 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1123&var-port=9104&var-dc=eqiad+prometheus/ops [10:19:56] PROBLEM - 5-minute average replication lag is over 2s on db2076 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2076&var-port=9104&var-dc=codfw+prometheus/ops [10:20:42] ^we are testing this [10:23:48] PROBLEM - 5-minute average replication lag is over 2s on db1137 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1137&var-port=9104&var-dc=eqiad+prometheus/ops [10:23:50] PROBLEM - 5-minute average replication lag is over 2s on db2111 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2111&var-port=9104&var-dc=codfw+prometheus/ops [10:23:52] PROBLEM - 5-minute average replication lag is over 2s on es1014 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=es1014&var-port=9104&var-dc=eqiad+prometheus/ops [10:23:54] PROBLEM - 5-minute average replication lag is over 2s on es2025 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=es2025&var-port=9104&var-dc=codfw+prometheus/ops [10:24:08] PROBLEM - 5-minute average replication lag is over 2s on es2019 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=es2019&var-port=9104&var-dc=codfw+prometheus/ops [10:26:13] PROBLEM - 5-minute average replication lag is over 2s on db1075 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1075&var-port=9104&var-dc=eqiad+prometheus/ops [10:26:13] PROBLEM - 5-minute average replication lag is over 2s on db1082 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1082&var-port=9104&var-dc=eqiad+prometheus/ops [10:26:13] PROBLEM - 5-minute average replication lag is over 2s on db1096 is CRITICAL: bad_data: parse error at char 92: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1096&var-port=13316&var-dc=eqiad+prometheus/ops [10:26:13] PROBLEM - 5-minute average replication lag is over 2s on db1101 is CRITICAL: bad_data: parse error at char 92: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1101&var-port=13318&var-dc=eqiad+prometheus/ops [10:26:13] PROBLEM - 5-minute average replication lag is over 2s on db1124 is CRITICAL: bad_data: parse error at char 92: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1124&var-port=13318&var-dc=eqiad+prometheus/ops [10:26:23] PROBLEM - 5-minute average replication lag is over 2s on db2107 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2107&var-port=9104&var-dc=codfw+prometheus/ops [10:26:30] PROBLEM - 5-minute average replication lag is over 2s on db1093 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1093&var-port=9104&var-dc=eqiad+prometheus/ops [10:26:30] PROBLEM - 5-minute average replication lag is over 2s on db1143 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1143&var-port=9104&var-dc=eqiad+prometheus/ops [10:26:30] PROBLEM - 5-minute average replication lag is over 2s on db2120 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2120&var-port=9104&var-dc=codfw+prometheus/ops [10:26:32] PROBLEM - 5-minute average replication lag is over 2s on es1016 is CRITICAL: bad_data: parse error at char 91: unclosed left parenthesis https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=es1016&var-port=9104&var-dc=eqiad+prometheus/ops [11:03:08] and this concludes our channel bandwidth test for the week. [11:04:29] lol [11:05:24] :D [12:09:48] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) I deployed this schema change o... [13:29:21] 10DBA: Use logging package instead of print statements in transferpy package - https://phabricator.wikimedia.org/T255999 (10Privacybatm) [14:05:52] remember when I said that core_multiinstance were used on backup sources? [14:05:58] it is not true [14:06:08] we use dbstore_multiinstance for those already [14:06:52] kormat: were you able to spot the parsing error in the end? [14:07:08] yeah. a simple missing `)` [14:07:11] cool [14:07:23] so dbstore and core I think no longer have a meaning [14:07:32] they are only there for historical naming [14:08:17] I won't change them, but something like mariadb::mw and mariadb::mw::backup_source may be better? [14:08:28] after we refactor? [14:08:53] ah hah [14:08:59] yeah makes sense [14:09:13] not sure, feel free to provide better suggestions [14:09:48] should also reconsider not having core and core_multiinstance? [14:10:05] everything is multiinstance, but 1 instance is allowed? [14:10:38] the port handling gets a bit weird i think [14:10:45] yeah [14:11:03] but if instances were not handled on puppet... [14:11:16] but we were closer to a mysql as a service model... [14:11:34] one can dream [14:17:49] one question, did x1 got upgraded fully to 10.4? [14:18:16] no [14:18:17] 10DBA, 10Patch-For-Review: Use logging package instead of print statements in transferpy package - https://phabricator.wikimedia.org/T255999 (10Privacybatm) I don't think we will be able to incorporate data transfer progress information. The RemoteExecution works in such a way that it will send the full data a... [14:18:22] waiting on what to do with the backupsources [14:18:30] I was about to ask about taht [14:18:49] https://phabricator.wikimedia.org/T254871#6209872 [14:18:55] should we move things around? upgrade already in bulk? [14:19:14] as usual, we have questions but not answer :-DDD [14:19:41] let me see because I think we have some duplication [14:19:45] that was about to answer that [14:19:50] my suggestion was to upgrade many hosts in sections where db1095 lives [14:19:53] so it is worth upgrading it [14:19:57] (and the codfw one) [14:21:16] here is the status [14:21:38] we have db1139 with 10.1 and s1 and s6 [14:22:01] and db1140 with 10.4 and s1 and s6 [14:22:07] both are working fine [14:22:18] I also requested one more host per dc [14:22:41] we can play with that [14:22:57] the issue is the --prepare, which has to happen on a version higer or equal [14:23:15] jynus: keep in mind that you'd still have this: https://phabricator.wikimedia.org/T252512#6210863 [14:23:16] although we have backup[12]002 [14:23:27] so that's another host we have to play with for a few days [14:23:34] (as db1102 will be relocated somewhere else) [14:23:39] so that is some extra room [14:23:49] and I still owe you a host for https://phabricator.wikimedia.org/T253217 [14:23:57] I didn't see that comment, I may have been away [14:24:40] should we wait until the extra dbprov and source hosts are bought or should we work with what we have? [14:24:40] no worries at all [14:24:44] can we wait? [14:25:17] so db1102 needs to be replaced by that host, that host will be a backup source anyways [14:25:17] should we wait until more upgrades happen overal? [14:25:21] so you can provision it anytime [14:25:24] yes [14:25:39] but that would only give us 4 extra sections duplicated [14:25:45] not 9 [14:25:51] You lost me [14:26:02] we have s1 and s6 on both 10.4 and 10.1 [14:26:07] we can have 2 extra ones [14:26:16] e.g. x1 + other [14:26:27] what does db1095 holds at the moment? [14:26:30] x1 and? [14:26:42] s2 and s3 [14:26:57] right, so x1 is fully upgraded (apart from the masters) [14:27:04] oh [14:27:07] including source? [14:27:09] should I upgrade more hosts in s2 and s3? [14:27:13] no no, backup sources not [14:27:15] ah [14:27:29] so I can move s2 and s3 to the new host [14:27:37] and got for full x1 on 10.4 [14:27:37] the question I asked earlier was: is it worth upgrading db1095 already given that lots of s2, s3 and x1 hosts are running 10.4 already? [14:27:52] or should I keep upgrading till it is worth upgrading dbprov hosts too? [14:28:06] and do a master failover (x1)? [14:28:10] yep [14:28:17] or do you plan to do another first? [14:28:27] I am going to do m1, es5 and I wanted to do x1 too [14:28:37] > "should I keep upgrading till it is worth upgrading dbprov hosts too?"---> don't know [14:28:46] we can play with backup1002 for some time [14:28:54] but not for a long time [14:28:58] right now the following roles cannot be upgraded: primary masters, sanitarium masters, sanitarium and labs [14:29:03] as we are blocked on labs running 10.1 [14:29:10] ah, that is another blocker [14:29:18] and of course candidate masters cannot be upgraded [14:29:44] I think labs should be "solved" first [14:29:57] I think it makes no sense to discuss sources and others until that is fixed [14:30:12] well, x1 [14:30:15] but not upgrading the sources makes impossible to upgrade x1 [14:30:16] and misc [14:30:26] but that is easy to solve [14:30:40] I move s2 and s3 elsewhere [14:30:44] m1 is almost fully upgraded (only pending the master) [14:30:49] and backup to backup1002 [14:30:54] then we do a full upgrade [14:31:11] we can do that now [14:31:17] db1117 is also upgraded [14:32:01] so let's do that [14:32:04] upgrade m1 [14:32:08] then x1 [14:32:14] then blocked on labsdb [14:32:19] for the most part [14:32:25] no, upgade m1, then es5 and then x1 [14:32:29] ok [14:32:35] so you have more days [14:32:39] anything except x* [14:32:41] to play around with moving the instances and all that [14:32:42] s* [14:33:00] we haven't tested swichover on 10.4 on production [14:33:17] I tested it with the testing hosts [14:33:24] and no issue? [14:33:35] nope [14:33:49] ok, give me that extra host and I will prepare x1 [14:34:00] sure, that host is waiting for you [14:34:02] then we will rethink the others when we have more hw [14:34:11] I had totally missed that [14:34:21] keep in mind that db1145 does need to go at least to s4 [14:34:26] as it was purchased for SDC [14:34:38] so ideally it needs to replace whatever db1102 has [14:34:50] now I got lost [14:35:01] I though the new one were for s4 [14:35:05] right, so db1145 is a new host bought for SDC [14:35:07] not the old ones [14:35:09] oh [14:35:17] what about the ones being replaced? [14:35:31] db1102 is the one being replaced by db1145 [14:35:31] can I get one instead? [14:35:42] yes, db1102 is the one we have to give you back too [14:35:45] so you can keep it [14:35:57] ok [14:36:08] let me update this task: https://phabricator.wikimedia.org/T253217 [14:36:09] so that is the same, we just move the asignment [14:36:14] of the sections [14:36:16] the usage is the same [14:36:17] yes, but db1102 has smaller disks [14:36:25] that is why I was saying that db1145 must go to s4 [14:36:30] (and wherever else) [14:36:34] ok, gotcha [14:36:36] we can do that [14:36:47] 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10Marostegui) [14:36:49] db1102 s4 -> db1145 [14:37:21] so the easier thing would be to mimic db1145 with whatever db1102 has and then use db1102 for whatever you want [14:37:37] yes [14:38:27] then I upgrade db1102 [14:38:34] and put x1 there [14:38:53] and backup snapshots of it to backup1002 [14:39:00] what about codfw? [14:39:17] we don't touch it yet? [14:39:20] same problem :) [14:39:40] x1 codfw is all done except the backup source [14:39:43] but we will have a host? [14:40:07] an extra one? no, we don't [14:40:20] we didn't have to buy as many there as those were kinda new and had bigger disks [14:40:50] if you need one, I can probably give you one from https://phabricator.wikimedia.org/T253217 [14:41:19] but I rather not and maybe use db2102 for that [14:41:23] (core test) [14:41:57] or that was the backup testing host? [14:42:39] db2102 is the codfw backup testing host [14:43:19] db2101 MariaDB Replica Lag: x1 [14:43:24] but x1 is already on its own [14:43:29] the backup source [14:43:30] ah true [14:43:31] yes [14:43:36] we we can just upgrade it [14:43:39] *so [14:43:48] excellent [14:43:56] you take care of it? [14:43:58] the only palace we are blocked [14:44:01] or you want me to? [14:44:04] I can do [14:44:08] <3 [14:44:10] is space for snapshots [14:44:24] we have backup1002 and backup2002 [14:44:35] but we really would need those extra dbprovs [14:44:43] it is not ok long term [14:45:07] for which Q of next FY have you scheduled them [14:45:08] ? [14:45:14] Q1 [14:45:24] sweet [14:45:35] they would be on 10.4 from the start [14:45:47] and we would have 1 extra server room we would need [14:45:57] good good [14:46:13] could we land by then the labs db unblocking, you think? [14:46:18] I know doesn't depend on you... [14:46:33] in q1? [14:46:36] yes [14:46:41] I don't think so [14:46:58] I mean, from our side, setting up the multi-instance hosts might take a Q [14:47:03] so we would be blocked by that, but at least we would not use backup1002 [14:47:08] but from wmcs side it might take a little while [14:47:31] but that would put us beyond Q2 for a full upgrade [14:47:39] well, more than that [14:47:46] Yep [14:47:49] to start working on it on Q2 [14:48:04] I would be surprised if in Q1 we have deprecated the old labsdb hosts [14:48:26] Again, from our side I think it is "fast" [14:48:36] should we test 10.4 -> 10.1 replication? [14:48:44] outside of labsdb? [14:49:01] because I predict that to be a huge pain [14:49:06] we could, but I would still feel a bit scared [14:49:18] I mean [14:49:24] especially cause we have replication filters and triggers involved, and if they fail that's a big thing [14:49:27] once the alernative is working [14:49:33] ah [14:49:40] once the "new" labsdb are up? [14:49:55] because otherwise we would be blocked on that forever [14:50:25] yes, this blocks all the final upgrades [14:50:28] e.g. do you see the current servers going away in less than a year? [14:50:30] and it has been communicated [14:50:36] jynus: Yes [14:50:40] not replacing them [14:50:44] I mean removing the current ones [14:50:46] We are not replacing them [14:50:54] yeah, I hope so [14:50:58] buf [14:51:03] should we bet on it :-D [14:51:07] ? [14:51:27] anyway [14:51:28] In 1 year maybe they won't be even able to replicate :) [14:51:41] I would test the replication just in case on a test host [14:51:48] not now [14:51:52] but at some point [14:51:57] sure [14:52:03] ok [14:52:04] I doubt it will work [14:52:12] so let's go over the plan [14:52:17] especially with RBR, triggers and all that :) [14:52:25] please review so we are all ok and makes sense [14:53:14] 1) I upgrade db2101 (x1) to 10.4 [14:53:24] and send snapshots to backup2002 [14:53:46] (I will keep dumps on the same dbprov) [14:54:52] 2) I move db1102 (s4) to db1145 [14:55:17] do I move s5, which is also on db1102 to it or somewhere else? [14:55:19] would you also move the other section db1102 holds? [14:55:23] he he [14:55:33] I would move it to db1145 just to make things easier [14:55:37] ok [14:55:39] and then use db1102 wherever you like [14:55:48] 2) I move db1102 (s4, s5) to db1145 [14:56:24] db1145 will be on stretch [14:56:31] sounds good [14:56:44] 3) I put x1 on db1102 (buster) [14:57:13] (I then can remove it from db1095) [14:57:33] 4) I backup x1 from db1102 to backup1002 [14:57:46] something else? [14:57:57] I think that's all yeah [14:58:02] on Q1 [14:58:10] we will get dbprov[12]003 [14:58:25] and will be on buster directly and return snapshots from buster to it [14:58:42] plus we will get an extra backup source also on buster for extra flexibility [14:58:57] and at that poing backups will not be a blocker [14:59:15] *pint [14:59:23] I will copy that as an answer to the ticket [14:59:31] and will get to it [14:59:36] cool! [14:59:38] thank you [15:02:20] 10DBA: Upgrade x1 databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254871 (10jcrespo) This is the plan after a conversation on IRC: ` 1) I upgrade db2101 (x1) to 10.4 and send snapshots to backup2002 2) I move db1102 (s4, s5) to db1145 (stretch) (I will ke... [15:03:04] 10DBA: Productionize db114[1-9] - https://phabricator.wikimedia.org/T252512 (10jcrespo) Sorry, I hadn't seen this question, I will take care of this following discussion at T254871#6244800 [15:03:12] sorry I didn't see that [15:03:18] no worries [15:03:30] if you ping me while i am offline and see that I don't responde [15:03:50] it is likely it got buried below a pile of other mails/ticket updates [15:04:00] haha don't worry [15:04:04] *I don't responf after I come back [15:04:19] I know the feeling! :) [15:04:31] I wasn't in a rush, so I didn't ping you after it [15:05:31] one last thing [15:05:56] oh, I see the confusion [15:06:06] db1102 is not backup testing it is backour source [15:06:32] what about backup testing, is that back (I don't need it now)? [15:07:22] let me update it [15:07:23] 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10jcrespo) [15:07:28] I did that one [15:07:32] ah [15:07:32] 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10Marostegui) [15:07:32] hehe [15:07:41] 10DBA, 10Data-Services, 10User-Ladsgroup, 10cloud-services-team (Kanban): Prepare and check storage layer for shnwiktionary - https://phabricator.wikimedia.org/T256010 (10Nintendofan885) [15:07:44] but I am asking about one host that got pooled into s8 [15:07:51] it was on section test-s1 [15:08:02] jynus: yep, let's take one of those from the list [15:08:07] not yet sure which one [15:08:16] ok, so it will be one of those, right? [15:08:20] kormat: want to handle that new wiki creation? ^ [15:08:22] jynus: yep [15:08:31] ok, not need for now, but just good to know [15:09:02] marostegui: yeah sure. it would beat dealing with puppet/pcc. [15:09:43] 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10jcrespo) [15:09:50] I've put db1081 (back to backup testing test-s1?) [15:09:57] but won't touch anyone for now [15:10:02] but I may need one at some point [15:10:30] to improve/test the recovery [15:11:08] sounds good [15:11:27] I will be able to give you db1084 before [15:11:44] or actually now that I think about it [15:11:49] On thursday I will give you db1135 [15:11:50] :) [15:11:52] up to you [15:11:57] as that one will go away from m1 [15:11:57] db1135? [15:11:59] I will ping you [15:12:08] is that "mine?" [15:12:13] it will be :) [15:12:19] or just mean it will be free [15:12:19] don't know [15:12:26] because I just need one :-D [15:12:33] Yeah, either db1084 or db1135 [15:12:41] ok, will put in on db1084 [15:12:44] but yeah, there will be one [15:12:44] sure [15:12:46] I chose the first one thinking [15:12:54] it will be the worst one [15:12:59] as I don't need performance [15:13:03] let me change it [15:13:55] I just checked, you can take db1084 now if you want [15:14:01] 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10jcrespo) [15:14:21] I've marked it, will ping you when I repuppetize it [15:14:41] 10DBA: Relocate "old" s4 hosts - https://phabricator.wikimedia.org/T253217 (10Marostegui) [15:14:43] cool [15:14:53] you'd need to depool it [15:15:07] as it is serving traffic at the moment [15:24:12] 10DBA: Upgrade x1 databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254871 (10jcrespo) a:03jcrespo [15:27:06] 10DBA, 10Operations, 10ops-eqiad: db1088 crashed - https://phabricator.wikimedia.org/T255927 (10wiki_willy) a:03Jclark-ctr @Jclark-ctr - I think there are some bbu's leftover from the last time you requested some spares to be ordered, but let me know if not. Thanks, Willy [15:29:42] kormat: thanks! so far it doesn't require any action as we are waiting for them to create the DB, but if you can let them know to ping us once it is created, move it to the blocked column etc that'd be nice - thank you! [15:30:09] kormat: we also need to know if it is a public one or not [15:30:17] the name indicates so, but let's ask [15:30:57] 10DBA: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) [15:33:28] 10DBA, 10Data-Services, 10User-Ladsgroup, 10cloud-services-team (Kanban): Prepare and check storage layer for shnwiktionary - https://phabricator.wikimedia.org/T256010 (10Kormat) Hey. Let me know when the DB has been created. Also, can you confirm this will be a public wiki? [15:33:41] marostegui: done [15:33:46] thank you! :) [15:33:52] you're almost welcome [15:34:08] I think I should logoff! it's been enough for todya! [15:34:10] byeee [15:34:31] hehe o7 [15:36:33] 10DBA, 10Data-Services, 10User-Ladsgroup, 10cloud-services-team (Kanban): Prepare and check storage layer for shnwiktionary - https://phabricator.wikimedia.org/T256010 (10RhinosF1) >>! In T256010#6244973, @Kormat wrote: > Hey. Let me know when the DB has been created. The wiki was created today >Also, can... [15:53:44] 10DBA, 10Data-Services, 10User-Ladsgroup, 10cloud-services-team (Kanban): Prepare and check storage layer for shnwiktionary - https://phabricator.wikimedia.org/T256010 (10Kormat) Sanitization is in place, complete private data check running now. [18:03:47] if you agree I'd say to start Wed. directly with the new system so that when DCOps wakes up they can start using it already and I can help if there is any issue and Cas will takeover helping for the second part of the day [18:04:00] ops, wrong window, sorry