[00:57:23] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Urbanecm) >>! In T267275#6660053, @Marostegui wrote: > I want to monitor enwiki size two more weeks, as there was a big increase from one week to another. Let's see... [03:06:41] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10dmaza) >>! In T261005#6687650, @Jdforrester-WMF wrote: > Is there a task for enabling this in MediaWiki by default (and possibly even dropping the feature flag)?... [06:17:59] 10DBA, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 (10Marostegui) As sort of expected, fixing this optimizer bug isn't an easy thing: https://jira.mariadb.org/browse/MDEV-24... [06:32:45] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) >>! In T261005#6682514, @ifried wrote: > @Marostegui As the feature has been enabled on all wikis for a week now, shall we close this ticket? Thanks... [06:34:38] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) >>! In T267275#6687758, @Urbanecm wrote: >>>! In T267275#6660053, @Marostegui wrote: >> I want to monitor enwiki size two more weeks, as there was a big... [06:37:36] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) [06:44:00] 10DBA, 10CheckUser: Monitor the growth of CheckUser tables at enwiki and few other very large wikis - https://phabricator.wikimedia.org/T267275 (10Marostegui) [06:49:05] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Monitor the growth of watchlist table at wikidata and wikicommons - https://phabricator.wikimedia.org/T268096 (10Marostegui) [06:53:56] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on es1023 - https://phabricator.wikimedia.org/T268796 (10Marostegui) RAID back to optimal ` root@es1023:~# megacli -LDPDInfo -aAll Adapter #0 Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Sec... [08:04:21] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui) [08:05:49] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui) [08:42:22] 10DBA, 10MediaWiki-General, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, and 2 others: Normalise MW Core database language fields length - https://phabricator.wikimedia.org/T253276 (10Marostegui) a:03Marostegui Thanks @Ladsgroup - good catch. It was just 2 wikis, (2 of the ones that w... [08:42:43] 10DBA, 10MediaWiki-General, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, and 2 others: Normalise MW Core database language fields length - https://phabricator.wikimedia.org/T253276 (10Marostegui) To make it more interesting, it only happens on some hosts [08:52:46] 10DBA, 10MediaWiki-General, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, and 2 others: Normalise MW Core database language fields length - https://phabricator.wikimedia.org/T253276 (10Marostegui) Hosts: [] All codfw [] dbstore1003:3315 [] clouddb1016.eqiad.wmnet:3315 [] clouddb1020.eqi... [09:17:47] 10DBA, 10MediaWiki-General, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, and 2 others: Normalise MW Core database language fields length - https://phabricator.wikimedia.org/T253276 (10Marostegui) 05Open→03Resolved All fixed, thanks for reporting! [09:46:40] 10DBA, 10Patch-For-Review: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) [10:09:06] marostegui: you around? [10:09:11] arturo: yes [10:09:18] quick question [10:10:32] in this diagram: https://phabricator.wikimedia.org/F33918352 how heavy is the mysql replication traffic between the replica DB and sanitarium boxes? [10:10:53] arturo: heavy in terms of? [10:10:53] heavy in terms of network usage [10:11:21] I guess is replicating the whole DB data (without private info) [10:11:50] I'm trying to figure out if that concrete traffic link should be considered impactful for users [10:12:17] if the throughput in that link is something that might impact customers somehow [10:12:53] arturo: This is a sanitarium host: https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=8&orgId=1&refresh=5m&from=now-24h&to=now&var-server=db1125&var-datasource=thanos&var-cluster=mysql [10:13:42] arturo: And this is a labsdb host with just replication traffic https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=5&orgId=1&from=now-6h&to=now&var-server=labsdb1012&var-port=9104 [10:13:46] the max in 90d is 120MB/s [10:14:03] arturo: yes, replication traffic itself isn't massive [10:14:25] ok [10:15:20] user traffic is a bit higher for those hosts that send data back https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=8&orgId=1&refresh=5m&from=now-24h&to=now&var-server=labsdb1011&var-datasource=thanos&var-cluster=mysql [10:16:08] data back? back to the sanitarium hosts? [10:16:13] no, to the users [10:16:24] oh I get it [10:16:37] like someone doing a select count(*) on the revision table :) [10:16:45] so lasbdb1012 you mentioned before *just* do replication to the sanitarium? [10:16:58] yeah, normally that host only receive user traffic the first week of the month [10:17:07] which is when analytics run their reports [10:17:13] labsdb1009, 1010 and 1011 do receive queries 24x7 [10:17:36] but all of them sync with the sanitarium hosts, no? [10:17:44] correct [10:18:03] arturo: I mentioned labsdb1012 cause you mentioned "replication" traffic [10:18:20] I remember jaime commenting something regarding which hosts starts the TCP connection when replication happens [10:18:43] So it is easier to see just replication traffic on labsdb1012, as it normally has no user traffic, apart from the first week of the month [10:18:55] ok [10:19:08] while the other labsdb hosts have replication traffic + the user queries sending results somewhere [10:20:54] is the statement "sanitarum sync traffic is not realtime-customer-facing" accurate? [10:21:15] arturo: not sure what you want to mean with that [10:21:23] ok, not accurate :-) [10:21:44] sanitarium hosts receive no user traffic, it just replicates from production to the wiki replicas [10:21:55] I want to express in a statement that the sanitarium sync traffic is not directly related to what users are querying in the replicas [10:22:39] (sanitarium sync <-> replicas dbs) not (sanitarium sync <-> prod dbs) [10:22:40] arturo: Yes, sanitarium hosts are full transparent to users, they are there just to sanitize the data that arrives to wikireplicas [10:23:10] arturo: That is not entirely true, sanitarium hosts have a master, which is a production database [10:23:17] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp field of uploadstash - https://phabricator.wikimedia.org/T270055 (10LSobanski) p:05Triage→03Medium [10:23:19] Maybe this will help: https://tendril.wikimedia.org/tree [10:23:27] 10Blocked-on-schema-change, 10DBA: Increase size of slot_roles.role_id - https://phabricator.wikimedia.org/T270054 (10LSobanski) p:05Triage→03Medium [10:23:34] arturo: locate db1124 [10:23:39] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10LSobanski) p:05Triage→03Medium [10:23:46] yes [10:23:48] As you can see on s1, db1124 has a master (db1106) [10:23:53] db1106 is a production database [10:24:07] yes [10:24:16] db1124 is sanitarum? [10:24:19] yep [10:24:41] so the master for the wikireplicas is always a sanitarium host, as those remove the PII that comes from production (db1106) [10:25:08] arturo: if it is easier to explain over a call, let me know and I am happy to jump in [10:25:26] I think I'm getting closer to what I need, no need for a call, thanks! [10:25:49] basically I'm trying to see if there are other ways to arrange realms in here: https://phabricator.wikimedia.org/F33918352 [10:26:14] to see if there is a way to put everything inside the cloud realm up until sanitarium stuff [10:27:02] so the traffic that crosses the cloud network edge is just replication instead of customer queries [10:27:20] yes, user queries only happen on wikireplicas [10:27:24] does that make sense? [10:27:52] replication needs to flow from production to wikireplicas, and it is done via sanitarium as it is the one in charge of removing PII [10:28:18] yes [10:30:29] how do each replica DB knows to which port to connect in sanitarium hosts? is that hardcoded in puppet or something? [10:30:55] it is configured when replication is configured [10:31:05] so within mysql you specify the ip and port where you want it to connect to [10:31:41] but in general we follow this rule: s1: 3311, s2: 3312, s3: 3313.... where s1, s2, s3 etc are MW sections [10:31:44] we have from s1 to s8 [10:31:52] ok [10:34:36] thanks for your time!! [10:34:52] yw [10:35:39] arturo: if it helps, there is a list of port to service on both puppet and the filesystem via wmfmariadbpy [10:36:40] ack [10:37:19] https://phabricator.wikimedia.org/source/operations-puppet/browse/production/hieradata/common/profile/mariadb.yaml [10:37:43] if you need to locate a service or list the available sections, that would be the way [11:01:23] I am google authenticating... [12:02:15] jynus: ^ this was the last message before google exploded, are you sure you didn't cause all this? [12:03:37] maybe [12:23:35] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp fields of jobs table - https://phabricator.wikimedia.org/T268391 (10Marostegui) [12:27:19] lol [12:28:29] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp fields of jobs table - https://phabricator.wikimedia.org/T268391 (10Marostegui) [13:20:11] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10Marostegui) [13:21:48] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10Marostegui) [13:21:50] 10DBA: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) [13:22:21] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10Marostegui) p:05Triage→03High [13:23:15] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10Marostegui) Setting this to high as even if it is not blocking the setup of x2 in general, we might want to reach an agreement on how to proceed before putting these hosts (and the other 22 hosts we'll... [13:34:14] 10DBA: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) [13:51:18] 10DBA: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) [13:59:50] 10DBA: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) @CDanis we need to add (not yet) x2 as a new section on `dbctl`. Can you help us deploying this new section eventually? (hosts are being setup, and they are not yet in `dbctl` anyways). Thank you [14:00:55] 10DBA: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10CDanis) Sure, always happy to help :) [14:36:20] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10Volans) AFAIK databases are still in the list of clusters that do not support IPv6 as listed in T253173. As such the Netbox script to [[ https://netbox.wikimedia.org/extras/scripts/interface_automation... [14:44:04] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10Marostegui) Thanks @Volans for the detailed info! I will remove them from those hosts, but I am worried about the other 22 we still need to install. Who does normally run those provisioning scripts? Is... [14:54:15] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10Volans) Yes, usually is run by DCOps that should ask the service owner if they need it or not. [16:26:48] I would like to restart dbstore1004 again to avoid memory issues during the rest of December, let's find a date, CC elukey? [16:27:11] should take no more than 5-10 minutes [16:27:12] jynus: anytime is fine! [16:27:29] (thanks a lot) [16:27:30] let me send you an invite [16:29:57] I will create a ticket for long-term root cause analysis [16:35:09] 10DBA, 10Analytics: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking - https://phabricator.wikimedia.org/T270112 (10jcrespo) [16:37:54] 10DBA, 10Analytics: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking - https://phabricator.wikimedia.org/T270112 (10jcrespo) p:05Triage→03Low This is not a huge concern since we have memory monitoring T172490, but adding it here for tracking, so we can research at a later tim... [16:38:13] 10DBA, 10Analytics: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking - https://phabricator.wikimedia.org/T270112 (10jcrespo) [18:54:40] 10DBA, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 (10MMiller_WMF) [18:55:49] 10DBA, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 (10MMiller_WMF) A part of this work, which was a collaboration with the Search team, has been merged. We will wait until... [21:06:09] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) 05Open→03Resolved a:03ifried Fantastic, @Marostegui ! Thank you for your help & collaboration throughout the course of the release process. It's re... [23:03:11] 10DBA: Research changes on prometheus-mysqld-exporter after buster/mariadb upgrade - https://phabricator.wikimedia.org/T252761 (10jcrespo) A workaround was created because after prometheus upgrade for buster, prometheus-exporter demanded a password for its mysql user. This was reported to debian at: https://bugs...