[08:41:17] a couple of programmers and bot authors of plwiki started to collaboratively write some SQL queries that run on plwiki_p replica. I think it would be really cool if we could create our own views, that could be, in turn, shared among different tools we have. How do you think this could be done, suppose the tools have access to ToolsDB and the replicas already. [08:48:55] are they really specific to pl? or could be useful elsewhere? (re @wmtelegram_bot: a couple of programmers and bot authors of plwiki started to collaboratively write some SQL queries that run on plwiki_p...) [08:50:29] jeremy_b: not really specific that is sure (activity tracking, streak analysis, etc.). we start small, with minimal requirements and building up, there is no "grand plan" or architecture. [08:53:15] saper: if you're thinking something like SQL replication from replicas to toolsdb, that's not possible. but you could create a toolforge job that syncs data from replicas to a public toolsdb table that can be read by multiple tools [08:55:47] dhinus: I am open to everything, right now my mind was thinking about something like DB link. The obvious disadv to the sync job/replication is that it might be quite a lot of data, but maybe that could be managed. [08:58:13] the tools we have might access to join with other replica tables so we could run into having inconsistent data (not probably business critical, but some reports we make could be hard to explain) [08:58:41] yes depends on how big the data is... but might be worth exploringt [08:58:53] but we could live with it probably or work it around (like running the reports "daily" and not on demand) [08:59:17] dhinus: yeah one day we could replace quarry with CREATE VIEW :) [09:00:36] DB link between replicas and toolsdb, is not something I would recommend, a better solution would be to have something like a dedicated db in Trove, but in any case opening a SQL replication link from replicas to Trove is something that would require more discussion and planning [09:01:03] that does not seem feasible at all to me [09:01:30] https://gitlab.wikimedia.org/toolforge-repos/nowi/-/blob/html/sql/102749?ref_type=heads is our starting point, that's really simple, but I already have a couple of CTEs on top of this. [09:01:42] taavi: you mean because of private data? or other concerns? [09:01:55] dhinus: no idea what Trove is, will check [09:02:12] dhinus: both [09:05:47] I think the idea of having a db with a custom set of views dedicated to one or more tools is intriguing, but right now I don't see any feasible way apart from a job that copies data out of replicas into toolsdb/trove [09:06:04] private data is indeed a hard blocker for using sql replication [09:06:47] opening that door though would easily risk to require massive amounts of data to be transferred, I'm not sure how feasible that will be [09:28:17] dhinus: do I get it right that give we work on some limited set of revisions data (say X months back), a simple "create table myreplica from select ... " could be a starting point to explore? [09:31:28] saper: you can't do "create table" on wikireplica dbs, they are read-only [09:32:08] you would have to extract from wikireplicas in python or other languages, and then separately store in toolsdb/trove [09:35:49] yes yes yes 'select from ... replica_db ' | psql on our infra :) [09:40:34] saper: that would work :) [09:41:05] cool, thanks, let's see how it goes [12:43:28] !log admin upgrade spicerack on cloudcumin [12:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [12:52:17] volans: I am fully aware that small incremental improvement ("saper wants to have custom views") might in the end require a giant leap ("new architecture for the replica system") in the worst case [13:01:30] saper: there's a phab for that :P T215858 [13:01:32] T215858: Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema - https://phabricator.wikimedia.org/T215858 [13:04:20] <3 This is even bigger that I would have imaged and will never probably done. Create a subtask linked to this? yay/nay I don't need a new schema upfront, but custom views can be some kind of journey towards it [13:04:36] "there is a phab for that" ™ is almost always true :D [13:07:33] I think having a subtask documenting your use case will not hurt, it could be a good hub for future discussion, IRC chats have a tendency to get lost :) [13:09:37] "here's one I made earlier" [13:09:56] * dhinus wonders if AI video generation is good enough to produce a copy of Apple's "there's an app for that", but with Phab :) [14:04:39] T424524 is out there, thank you very much. I have linked this chat there as well. [14:04:40] T424524: Ability to create new views for tools and analytic purposes - https://phabricator.wikimedia.org/T424524 [14:07:39] saper: thank you for the detailed write-up <3 [14:27:33] detailed write ups make good AI prompts or so I heard [14:35:17] LOL [17:57:33] !lag [19:28:10] saper: https://replag.toolforge.org/ may be what you hoped !lag would point you towards. [19:36:47] bd808: thanks, figured it out. Frustrating day today, ran into at least 3 various bugs today at various places... [19:38:40] saper: computers were a mistake. nobody should expect electric rocks to do anything of value. ;)