[00:31:47] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [00:36:29] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [01:22:47] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [01:25:09] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [04:46:33] Urbanecm: I was off, thanks for the ping. With the comments on that task for that specific table I didn't get the impression that table would be much of a problem with the expected writes and reads you mentioned, of course, if you could give more precise numbers I can evaluate again :) [05:06:00] 10DBA: fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production - https://phabricator.wikimedia.org/T277116 (10Marostegui) Come up with the alter table statement and identify which hosts really need it in all the sections apart from s1 (as far as I remember your script only ch... [05:06:20] 10DBA: iw_url in interwiki is varbinary(127) in production but blob in code - https://phabricator.wikimedia.org/T277118 (10Marostegui) Come up with the alter table statement and identify which hosts really need it in all the sections apart from s1 (as far as I remember your script only checked certain hosts but... [05:08:31] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) >>! In T278614#6985029, @Ladsgroup wrote: > With the wikitech-l imported my last offer is now: 34GB. This is a pretty decent size and 4GB per year is also ok with the... [05:24:28] 10DBA, 10Platform Engineering, 10SRE, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10Marostegui) Thanks @Krinkle! Yes, we are well aware of the trends parsercache is having lately a... [05:32:34] 10DBA, 10Platform Engineering, 10SRE, 10Performance-Team (Radar), 10Sustainability (Incident Followup): Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10Krinkle) 05Open→03Resolved 👍 [05:47:40] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup) Thanks. We will likely bother you in two or three weeks. Most of the work is done. [05:53:42] 10DBA: fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production - https://phabricator.wikimedia.org/T277116 (10Ladsgroup) >>! In T277116#6992885, @Marostegui wrote: > Come up with the alter table statement The alter table is similar to {T268392}: `lang=sql ALTER TABLE /*_*/fi... [05:56:22] 10DBA: iw_url in interwiki is varbinary(127) in production but blob in code - https://phabricator.wikimedia.org/T277118 (10Ladsgroup) The alter table would be: ` ALTER TABLE /*_*/interwiki MODIFY iw_url BLOB NOT NULL; ` the sections would be s1,s2,s3,s4,s6,s7 [07:03:13] 10DBA, 10Data-Services: labsdb1009:s2, replication broken - https://phabricator.wikimedia.org/T279848 (10Marostegui) a:03Marostegui It's been a bit of a pain to fix these drifts. It was a huge transaction involving `recentchanges`and `ores_classification`. 5 rows were missing on `recentchanges` and 27 on `or... [07:04:26] 10DBA: iw_url in interwiki is varbinary(127) in production but blob in code - https://phabricator.wikimedia.org/T277118 (10Marostegui) [07:04:51] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) Excellent! [07:08:28] 10DBA: fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production - https://phabricator.wikimedia.org/T277116 (10Marostegui) [07:43:45] marostegui: hi! I'm going to be working on figuring out a date for the DC switchover soon and wanted to check in with you if there were any requirements/preferences on scheduling from the DBA side [07:44:30] marostegui: thanks for the response. For arabic wikipedia, the number of writes would be ~13k writes per month. For number of reads, that's more tought to estimate. It's going to be cached in memcached for a day, and needed on every pageview by every newcomer. A really ballpark estimate: ~26k reads per month for arwiki? [07:51:43] legoktm: we'll talk about it today in our weekly and will come back to you! [07:52:03] great, thanks :) [07:52:09] Urbanecm: 26k rows or statements? :) [07:52:58] marostegui: both, the read queries will mostly select rows by primary key. [07:56:54] (note that both reads and writes will probably double in foreseeable future [~3 months]) [07:57:23] Urbanecm: so around 26k rows per month with perhaps 50k? [07:57:34] yup [07:57:44] (for arwiki) [07:58:07] Urbanecm: how many wikis will it have in the end? [07:59:09] legoktm: to confirm, this is MW switchover right not misc services (ie: gerrit, phab...) [07:59:27] marostegui: right now, we have the features deployed on 36 wikis, but we're deploying the features to more and more wikis [07:59:44] (with the ultimate goal being the features available everywhere, but that's a long run) [08:00:12] marostegui: ummm, I think it's switching over as many services as possible? At least what we did in the last switchover. [08:00:13] Urbanecm: Got it, so that 26k is only for arwiki, so the rest of the wikis will have a similar number? [08:00:26] legoktm: last switch didn't involve misc stuff, only MW related [08:01:20] Ack, then I assume that's the plan but I will also ask in our meeting to verify :) [08:01:44] legoktm: I would assume the same yeah, but let's double check. Thank you! [08:02:48] marostegui: when estimating the reads, i doubled the number of new accounts registered on arwiki (one of the biggest wikis we target right now), see https://stats.wikimedia.org/#/ar.wikipedia.org/contributing/new-registered-users/normal|bar|2-year|~total|monthly. [08:03:41] Urbanecm: I am not too worried about the reads if they are selecting by PK...how many rows will they return? [08:03:45] (more or less) [08:03:54] one or two [08:04:14] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [08:04:16] Urbanecm: ah, then that's fine [08:04:29] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) 05Open→03Resolved This is all done [08:05:00] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [08:05:39] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10Marostegui) [08:05:41] 10DBA: Switchover s7 from db1086 to db1136 - https://phabricator.wikimedia.org/T274336 (10Marostegui) [08:05:45] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) 05Open→03Stalled This is all done and only waiting for the masters to be done, either master switchover or DC switchover. [08:06:08] marostegui: okay, good. There will be a small number of reads (100-200 per month perhaps?) that will select rows using the other index (gemm_mentor_mentee), in case that changes something. [08:06:20] (*there will also be) [08:06:24] Urbanecm: that's no problem yeah [08:06:45] okay, cool. So I'll create it on x1 when the code arrives to production. Thanks a lot! [08:07:09] Urbanecm: I have a code review pending related to a private table, is this related? (I haven't looked at it yet) [08:08:25] marostegui: no, https://gerrit.wikimedia.org/r/c/operations/puppet/+/675185 (the table we were discussing now) is already merged. There's https://gerrit.wikimedia.org/r/c/operations/puppet/+/677653 pending, but that's for another table [08:09:05] ah cool, so the first one is all done, including restarting sanitarium hosts [08:09:09] I will look at the second one today [08:09:44] thanks :) [08:15:55] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), and 2 others: Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Marostegui) >>! In T279587#6984196, @Urbanecm_WMF wrote: > Sure, here you go... [08:17:58] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10kostajh) 05Stalled→03Open >>! In T279053#6971659, @kostajh wrote: > @Marostegui and now that I am thinking about it some more, let me regenerate t... [08:25:30] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), and 2 others: Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Marostegui) All sanitarium hosts have been restarted [08:26:07] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint): Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Marostegui) a:03Marostegui [08:28:19] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), and 2 others: Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Urbanecm_WMF) Thanks! [08:29:53] marostegui: as a heads up, we are currently getting login failure for the adminlinkrecommendation user. Is it possible the user is blocked? If not I can look at our application code + deployment chart to see if we messed something up [08:30:47] kostajh: I've not made any changes to the user yet [08:30:59] kostajh: Which error do you get? [08:31:19] MySQLdb._exceptions.OperationalError: (1044, "Access denied for user [08:31:41] kostajh: when was the last time it worked fine? [08:31:48] We've not made changes to that user as far as I know [08:31:57] I didn't know if the repeated failed attempts at using ALTER might have invoked some mechanism to suspend login for the user [08:32:08] kostajh: nope, it doesn't [08:32:18] Probably about a week ago. It's most likely a change we made to the MySQL connection code, will have a look [08:32:27] Can't reproduce it locally ofc :) [08:32:30] kostajh: same host it normally runs too? [08:33:54] marostegui: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-syslog-2021.04.13?id=6XM_yngBfVMx58vqDLyJ [08:35:21] GRANT USAGE ON *.* TO `adminlinkrecommendation`@`10.64.0.135` IDENTIFIED BY PASSWORD 'xxxx' [08:35:24] so that host is indeed allowed [08:35:41] GRANT SELECT, INSERT, DELETE, CREATE, DROP ON `mwaddlink`.* TO `adminlinkrecommendation`@`10.64.0.135` [08:36:18] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10jcrespo) >>! In T276448#6982943, @Marostegui wrote: >>>! In T276448#6982841, @Marostegui wrote: >> @jcrespo I would like to do this Wednesday 14th April - is this a good day or will it mess up with th... [08:36:48] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10Marostegui) >>! In T276448#6993282, @jcrespo wrote: >>>! In T276448#6982943, @Marostegui wrote: >>>>! In T276448#6982841, @Marostegui wrote: >>> @jcrespo I would like to do this Wednesday 14th April -... [08:40:02] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint), 10Patch-For-Review: Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Marostegui) We are debugging some connection issues, I won't merge the above patch until that is figured out, to avoid adding mo... [08:40:21] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10jcrespo) As long as it is not too early in the morning, 14 will be ok. We may want to do it late in the morning so etherpad and other owners are around? So it should be ok as long we we merge the patc... [08:41:36] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10Marostegui) What about 10UTC? Would that work for backups? I will ping other owners if this works for you [08:55:53] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10jcrespo) >>! In T276448#6993299, @Marostegui wrote: > What about 10UTC? Would that work for backups? I will ping other owners if this works for you Sure. [08:59:02] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10Marostegui) Thank you Jaime. @akosiaris would be available tomorrow at around 10 AM UTC in case we need to restart etherpad? @jbond @MoritzMuehlenhoff ok to restart mysql from `cas` and `pki` point o... [09:00:27] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10ayounsi) Affirm. [09:00:34] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10akosiaris) >>! In T276448#6993361, @Marostegui wrote: > Thank you Jaime. > > @akosiaris would be available tomorrow at around 10 AM UTC tomorrow 14th April in case we need to restart etherpad? Yes [09:00:38] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10MoritzMuehlenhoff) >>! In T276448#6993361, @Marostegui wrote: > @jbond @MoritzMuehlenhoff ok to restart mysql from `cas` and `pki` point of view tomorrow 14th April? Sounds good [09:00:59] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10jbond) No problem for Cas and ok I [09:01:15] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 - https://phabricator.wikimedia.org/T276448 (10Marostegui) Thank you all! @ayounsi XDDDDD [09:01:37] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC - https://phabricator.wikimedia.org/T276448 (10Marostegui) [09:28:48] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC - https://phabricator.wikimedia.org/T276448 (10Kormat) [09:30:09] 10DBA, 10Patch-For-Review: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC - https://phabricator.wikimedia.org/T276448 (10Marostegui) [09:40:09] jayme: marostegui was wondering if you could run this command in the cronjob container https://phabricator.wikimedia.org/P15279 [09:54:37] kostajh: it's unfortunately not completely trivial to run commands in a cronjob container those are not long-living. I'll basically need to create something that looks like one of your cronjob containers to do so. [09:55:38] means: will take some minutes [09:55:43] jayme: basically, we don't know whether it is a mysql issue or the script itself. The user/password works fine from cumin1001 when connecting to m2-master-eqiad.wmnet, so we cannot see why or how it can be given access denied from that specific host (if it uses m2-master.eqiad.wmnet, which it should be using) [09:56:06] the user hasn't been touched since it was created [09:56:16] same for the database behind m2-master.eqiad.wmnet [09:57:19] Okay. I'll take a closer look in a minute. Do you have host restrictions on that user or is that purely firewall based (so hitting that should not lead to access denied)? [09:57:36] if it is giving access denied, it means it is reaching port 3306 fine [09:58:07] the restriction is based on hosts, so that's why it needs to connect to m2-master.eqiad.wmnet which is a dbproxy that has access to the database behind it [09:58:25] connecting to the database itself (ie, connecting to db1107 and not via dbproxy) will result in access denied [10:01:46] ack, thanks [10:18:34] marostegui: kostajh: I'm connected with "mysql --ssl-verify-server-cert=false -u${DB_USER} -p -h${DB_HOST}" from within the pod [10:19:13] jayme: meeting (but it works?) [10:20:21] marostegui: yep.. I'm also able to "use mwaddlink;" [10:21:31] kostajh: so maybe we can narrow it to the script itself? [10:21:51] if the CLI connection works from what jayme is sayinhg [10:26:27] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [10:26:37] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1180 is automatically being pooled in s6 [10:33:15] marostegui: kostajh: could it be that the user is just not allowed to alter the database? [10:33:30] first thing it does is "ALTER DATABASE mwaddlink CHARACTER SET utf8mb4 COLLATE utf8mb4_bin" [10:34:00] jayme: As far as I understood they're not being able to connect, no? I haven't added the ALTER grant yet as I thought the problem wasn't there but simply to connect (for now) [10:34:11] I have left the ALTER grant aside for now [10:34:23] marostegui: nono, they get an access denied [10:34:26] MySQLdb._exceptions.OperationalError: (1044, "Access denied for user 'adminlinkrecommendation'@'10.64.0.135' to database 'mwaddlink'") [10:34:48] jayme: yes, but is that cause they are trying to alter or simply connecting? [10:35:16] https://phabricator.wikimedia.org/T279053#6993297 [10:35:23] jayme: * [10:35:24] ^ [10:35:25] marostegui: that's from thr ALTER DATABASE [10:35:33] according to the python traceback [10:35:38] jayme: then yes, the grant isn't there :) [10:35:57] jayme: I thought the issue was the connection, and hence I didn't add the grant yet to avoid more variables [10:35:59] Let me add the grant [10:37:19] jayme: I didn't think about that; I don't have the output unfortunately but we were getting a different error message earlier due to the lack of ALTER permissions [10:37:37] here is the error for the record https://phabricator.wikimedia.org/P15279#80932 [10:38:00] ALTER added [10:38:16] kostajh: where did you get a different error? [10:38:56] in kubectl logs for the load-datasets container, it was showing something specifically mentioning ALTER. I don't remember, unfortunately, and it looks like I didn't save it to the task. I'll have a look in logstash [10:39:09] We can also wait 20 minutes to see if this works on the next cron run now that ALTER permissions are there [10:41:24] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint), 10Patch-For-Review: Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10Marostegui) ALTER grant added: ` root@db1107.eqiad.wmnet[mysql]> show grants for 'adminlinkrecommendation'@'10.192.16.9'; | GRAN... [10:42:05] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint), 10Patch-For-Review: Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10kostajh) >>! In T279053#6993650, @Marostegui wrote: > ALTER grant added: > ` > root@db1107.eqiad.wmnet[mysql]> show grants for '... [10:42:25] marostegui: I think you mentioned you wanted to make adjustments to monitoring while the initial ALTER statements are executed? That will be in ~18 minutes from now, assuming the connection works [10:42:48] yeah, I am going to silence the slaves [10:42:51] thanks for the reminder! [10:44:06] kostajh: the failed job pods will not immediately be removed. So all the logs should still be there [10:44:32] I do only see the above error in equad and "MySQLdb._exceptions.OperationalError: (1290, 'The MariaDB server is running with the --read-only option so it cannot execute this statement')" in codfw though [10:44:52] why would that try to connect to codfw? [10:45:15] marostegui: it's the job running in codfw [10:45:17] I think I asked the same question on ticket [10:45:23] jayme: ah ok :) [10:45:31] because if it does cross-dc connections, it should use TLS [10:45:52] I assume it tries to connect to codfw and if it fails it doesn't fallback to eqiad [10:46:31] the (cron)job actually runs in both datacenters as we keep the deployments to k8s identical [10:46:49] jayme: but codfw doesn't try to write to eqiad, right? [10:46:49] if it only tries to connect to the local dc, that is ok [10:47:03] so confusing to have jayme and jaime XD [10:47:25] I am not jayme, I am jynus! [10:47:32] jynus: you should rename your nick to j4n1s to make it even MORE confusing [10:48:11] the codfw job tries to write to m2-master.codfw.wmnet actually...but from the perspective of linkrecommendation it's fine if it fails AFAIK [10:48:21] marostegui: even more confusing: you rename yourself to that [10:48:21] jayime: good then, thanks [10:48:28] hrhr [10:48:29] haha [10:48:48] i should rename my nick to jaime [10:48:58] I tried, it is in use [10:49:02] :_( [10:49:16] I swear I don't block it :) [10:55:49] 10DBA, 10Add-Link, 10Growth-Team (Current Sprint), 10Patch-For-Review: Grant ALTER privileges to adminlinkrecommendation user on m2 - https://phabricator.wikimedia.org/T279053 (10kostajh) 05Open→03Resolved [10:56:42] the thing I offered Re: recommendation project is to help them design a better model, one that fits better their needs than a misc host [10:56:56] e.g. maybe they need multi-dc [10:57:14] and high availability so it doesn't stop workin on every import, etc. [10:57:38] sit with them and see if they need to buy extra hw or whatever [11:00:52] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [11:01:14] marostegui jayme: it's working ...! [11:01:24] nice [11:01:40] cool [11:01:42] now to see if T278719 is actually resolved [11:01:42] T278719: load-datasets.py: Lock wait timeout exceeded; try restarting transaction - https://phabricator.wikimedia.org/T278719 [11:01:45] kostajh: ok to close https://phabricator.wikimedia.org/T279053? [11:02:14] marostegui: yes I resolved that one a few minutes ago [11:02:18] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [11:02:29] ah thanks! [11:05:59] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Urbanecm_WMF) [11:06:21] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Urbanecm_WMF) [11:07:22] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1076.eqiad.wmnet - https://phabricator.wikimedia.org/T274752 (10Marostegui) [11:09:32] 10Data-Persistence-Backup, 10Upstream: DB backup restore skip empty databases - https://phabricator.wikimedia.org/T200035 (10jcrespo) [11:11:09] 10Data-Persistence-Backup, 10Upstream: DB backup restore skip empty databases - https://phabricator.wikimedia.org/T200035 (10jcrespo) 05Open→03Stalled Marking as stalled, as this is blocked on getting a patch from upstream. [11:20:06] 10Data-Persistence-Backup: Internal APT repository backup - https://phabricator.wikimedia.org/T276220 (10jcrespo) Hey, @Jbond @MoritzMuehlenhoff sorry to ping you, but this is something that you may know how to do properly- as you are involved with operational security (please correct me if wrong, and add the r... [11:50:01] 10DBA, 10Data-Services, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review, and 2 others: Create Wikipedia Kari Seediq - https://phabricator.wikimedia.org/T276246 (10Ladsgroup) [12:49:34] 10DBA, 10SRE: Rename be_x_oldwiki database to be_taraskwiki - https://phabricator.wikimedia.org/T127570 (10LSobanski) 05Stalled→03Resolved a:03LSobanski Since T83609 was declined, I don't think there is much value in keeping this task open. Please reopen and / or message me if you think otherwise. [12:55:18] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) Pooled db1184 into s1 with minimal weight, if all goes fine, I will start to slowly pool it automatically [13:27:37] 10DBA, 10Platform Engineering Roadmap Decision Making, 10SRE, 10Performance-Team (Radar), 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Marostegui) a:05nnikkhoui→03Marostegui [13:56:55] 10DBA, 10Data-Services: labsdb1009:s2, replication broken - https://phabricator.wikimedia.org/T279848 (10Marostegui) 05Open→03Resolved labsdb1009:s2 caught up: ` # mysql.py -hlabsdb1009 -e "show slave 's2' status\G" | grep Seconds Seconds_Behind_Master: 3 ` Closing this for now [14:29:09] 10DBA, 10Data-Services: labsdb1009:s2, replication broken - https://phabricator.wikimedia.org/T279848 (10nskaggs) @Marostegui Thanks for jumping on this quickly. Hopefully this means this collection of hosts will be in a healthy state for the migration. To answer your earlier question, I would say yes. If ther... [14:50:29] 10DBA, 10Tracking-Neverending, 10Wikimedia-database-error: Duplicate key errors (tracking) - https://phabricator.wikimedia.org/T106854 (10LSobanski) 05Open→03Resolved a:03LSobanski As I'm told, "Tracking-Neverending" tasks are a thing of the past so I'm resolving this one. Please reach out to me if you... [15:29:09] 10DBA, 10Epic: [META ticket] Automation for our DBs tracking task - https://phabricator.wikimedia.org/T156461 (10LSobanski) 05Open→03Resolved a:03LSobanski Closing as this task is no longer used for its original purpose. [15:38:11] 10DBA, 10Patch-For-Review: Audit MySQL configurations - https://phabricator.wikimedia.org/T133333 (10LSobanski) 05Open→03Resolved a:03LSobanski Closing as this is likely to be out of sync with the current configuration (last update was almost 5 years ago). [15:42:17] 10Data-Persistence-Backup: Internal APT repository backup - https://phabricator.wikimedia.org/T276220 (10jbond) As far as i can tell all the necessary data is in `/srv/wikimedia` which is already being backed up via [[ https://github.com/wikimedia/puppet/blob/6f813489ba71d58287c1487587177ed5f6b7ff0c/modules/prof... [15:46:53] 10Data-Persistence-Backup: Internal APT repository backup - https://phabricator.wikimedia.org/T276220 (10jcrespo) a:03LSobanski Thanks for the information, maybe I understood wrongly the task to do here. Assigning to @LSobanski. [15:52:26] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10gmodena) [16:29:24] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10jcrespo) [16:52:24] 10DBA, 10Data-Services, 10Toolforge, 10Tracking-Neverending: Certain tools users create multiple long running queries that take all memory and/or CPU from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601 (10LSobanski) 05Open→03Resolved a:03L... [16:53:03] 10DBA, 10observability: Move paging from individual databases to database service "groups" - https://phabricator.wikimedia.org/T252679 (10LSobanski) [17:39:03] marostegui: so the switchover will include some misc services, but I still need to come up with a full list after dicussing with the relevant service owners [17:42:19] 10DBA, 10SRE: Rename be_x_oldwiki database to be_taraskwiki - https://phabricator.wikimedia.org/T127570 (10Krenair) 05Resolved→03Declined [17:53:56] 10DBA: New database request: image_matching - https://phabricator.wikimedia.org/T280042 (10LSobanski) p:05Triage→03Medium a:03Marostegui [18:14:12] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10brennen) > E.g. if the backups will require a large amount of space we may need to provision more hardware. Long term, the amount of data we currently... [18:32:28] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10jcrespo) > Long term, the amount of data we currently have in Gerrit is our baseline. Not sure what is the scope of this particular task, but if the a... [19:11:33] 10DBA, 10Tracking-Neverending, 10Wikimedia-database-error: Duplicate key errors (tracking) - https://phabricator.wikimedia.org/T106854 (10Krinkle) [22:10:48] 10DBA, 10GrowthExperiments-MentorDashboard, 10GrowthExperiments-Mentorship, 10Growth-Team (Current Sprint), and 2 others: Create growthexperiments_mentor_mentee database table on extension1 for wikis in growthexperiments.dblist - https://phabricator.wikimedia.org/T278573 (10Urbanecm_WMF) 05Open→03Resolv... [22:38:38] 10DBA, 10GrowthExperiments-MentorDashboard, 10GrowthExperiments-Mentorship, 10Growth-Team (Current Sprint), and 2 others: Create growthexperiments_mentor_mentee database table on extension1 for wikis in growthexperiments.dblist - https://phabricator.wikimedia.org/T278573 (10Urbanecm_WMF) >>! In T278573#699...