[06:42:16] cdanis: so I made some progress with the raw data, and I can get the spans that I want, however struggling to figure out how to link that back to the request (that I would see without jaeger) [12:23:56] addshore: tell me more? from any span you should be able to get the traceid of the root [12:33:40] so, i have a traceID of fffdc497f1a2eab1457ec7f917aca754 for example for span d87cff8645119f25 but i dont unserstand how to actually get the trace from the id in opensearch query workbench? [12:33:56] (Basically I want to be able to click the download link there and get the data) [12:39:41] cdanis: essentially going from https://phabricator.wikimedia.org/P80971 on https://logstash.wikimedia.org/app/opensearch-query-workbench to also having the trace data (starting url) [12:42:09] addshore: ah, interestingly that trace is missing a bunch of its parents [12:42:24] https://trace.wikimedia.org/trace/fffdc497f1a2eab1457ec7f917aca754 [12:43:15] wow, that is some deep WANObjectCache::getWithSetCallback calling [12:44:03] it was an mw-api-ext call against cebwiki, that much is for sure [12:44:50] addshore: those WANObjectCache spans all report the key(s) in question in their tags [12:45:22] so you could maybe figure out what flavor of request it was from those heh [12:45:39] oooo [12:45:40] not sure why the parents got dropped, we don't have any envoy spans nor any of the mediawiki entrypoint spans [12:46:40] addshore: the Database spans have a lot of details in their tags, including the calling function (since that was already part of their logging instrumentation) https://trace.wikimedia.org/trace/fffdc497f1a2eab1457ec7f917aca754?uiFind=6dc38765b0779581 [12:46:54] code.function: Wikimedia\Rdbms\Database::beginIfImplied (MediaWiki\Api\ApiPageSet::initFromTitles) [12:47:47] yes, I was going to keep that too [12:48:38] I think the Query Workbench UI sint the best :D [12:49:14] Looking at https://phabricator.wikimedia.org/P80973 for example, all 100 have no parent span id? [12:49:54] Adding `AND parentSpanID IS NOT NULL` returns 0 results [12:50:29] I also noticed there are tables like `jaeger-service-2025.07.22` and I see the URLs there, but dont see how to connect / look them up :D [12:51:01] addshore: yeah that field might be in the schema but not actually used -- looking at https://trace.wikimedia.org/trace/fff85fdd86dba374f4c097c343753363 from one of the results on the first page, that looks like a complete trace [12:52:59] 👍 so `SELECT * FROM `jaeger-span-2025.08.06` WHERE spanID = "8a095816d1843aba" LIMIT 100` is from that trace and indeed doesnt have parent, but it does have a traceID, just not sure what to do with that to get the URL [12:53:19] https://trace.wm.o/trace/{traceId} [12:53:49] sorry, not trace url, rather lookup in opensearch what the URL is that started the trace [12:53:54] ohhh [12:54:37] so, mw-web: GET https://www.wikidata.org/w/index.php?....... [12:56:49] addshore: something like this https://phabricator.wikimedia.org/P80974 [12:59:31] it's way easier to see the actual format of the objects in the discover view, for whatever reason [12:59:35] https://logstash.wikimedia.org/app/discover#/doc/b4bc7080-db27-11ee-b301-7f687f301a4e/jaeger-span-2025.08.07?id=JqKZhJgBMirbfnq1JciP [13:05:06] cdanis: right, however If I start at the span level, and have https://logstash.wikimedia.org/app/discover#/doc/b4bc7080-db27-11ee-b301-7f687f301a4e/jaeger-span-2025.08.06?id=W5euf5gBMirbfnq1jYWQ I still then dont know how to get to that URL? [13:05:44] but I now realize joining to the service table, via _id might be the one? using the traceID, assuming thats what it means [13:08:05] noop [13:11:19] addshore: I'm trying nested queries but they aren't working and the error message isn't helpful [13:11:29] despite it being supported maybe https://docs.opensearch.org/latest/search-plugins/sql/sql/complex/#example-1-table-subquery [13:11:56] yeah, the errors are less than useless :D [13:12:46] im confused by doing `select * from jaeger-service-2025.07.22 limit 10` and seeing no IDS there at all, i assuem there are IDs on the document, but not displayed? [13:13:07] the SQL view here is ... weird [13:13:20] oh, that's the service table [13:13:29] hmm, and I can't explore jaeger-service-* ? [13:13:29] I don't know what the service table is for 😅 [13:13:36] I was joining the spans back on themselves, basically [13:13:38] it seems to have the info I want xD [13:13:49] aaah right, and trying to go up the tree to the one the the url? [13:13:49] SELECT DISTINCT(`tag.http@url`) [13:13:51] FROM `jaeger-span-2025.08.06` [13:13:53] WHERE traceID IN (select traceID from `jaeger-span-2025.08.06` where spanId = "3da6f92cb98539f8") [13:13:55] is what I was trying [13:14:16] if it worked, that inner select could be hypothetically anything that matched spans you were interested in [13:16:30] once c.white gets online he might be a good resource for this I think [13:16:37] yeah, im going to keep poking it, as you said the docs say it should work :D [13:22:19] i think maybe our version of opensearch, or somehting about the setup, doesnt allow the jobs [13:22:21] joins [13:22:25] or sub queries [13:23:12] you could do it as two queries yourself I suppose :) [13:23:20] yeah ;_; [13:23:44] I've had it in the back of my mind for a while to write a little tool (probably with a web UI) to do this kind of thing -- find me all the traces with spans like X and Y [13:24:03] jaeger's built-in search is very limited by its original storage backend being cassandra [13:24:05] yeah, essentially i just want a data dump so that I can do some local munging of stuff [13:24:53] can I ask, what's the bigger picture here? are you interested in seeing what kinds of calls involve the most wikidata db lookups, or something? [13:26:12] yes, so im trying to look at selects of the wikidata revision table, durations of that query, and then the code causing the query, and project / site that triggered it (/ full url) [13:26:51] however, I'm only really trying to do this with the trace data as I feel like it is basically all there, otherwise i likely would have ignored half of that data and just looked at the db performance stuff [13:26:58] right [13:27:17] is there a common code path for the wikidata revision table lookups in mediawiki itself? [13:28:45] so, I did some analysis on a 15 min section that I managed to download from the jaeger UI yesterday, and found there to be 20 code points that make the queries during that time period [13:29:08] and that worked really well to be honest, I was just a bit had about having to use the UI to make a query, only being able to get 1000 traces to download at once etc [13:30:33] So yeah, the dream was just be able to download more of the same data, run the same code over it, and get a much clearer picture, rather than relying on downlaods of 1000 traces at once from the jaeger UI. [13:30:46] Also, as I only get about the 1 DB query happening in the whole trace, perhaps I could only get that too [13:32:44] hmm, you can get everything you want except the project/site/full URL from just the one span [13:33:39] yeah indeed [13:33:51] so maybe (as the sub query doesnt work) i give up on caring about the URL. [13:33:57] / fill it in later or sometihng [13:34:26] one option: we could write a mediawiki patch that includes the project name as part of those database spans [13:34:27] so, if the open search was accessible for me to query directly somehow, i also figured i could then very easily script it [13:34:47] cdanis: if thats easy that would be cool for cross wiki queries, [13:35:09] i think it would make looking at this db related data very useful [13:35:09] I don't know if it's easy or not tbh, I'm not actually a mediawiki dev 😅 [13:35:23] :D I am, but have not looked at any of the jaeger tracing stuff at all [13:35:33] the tracing stuff is easy, it's all the dependency injection that's hard [13:36:20] let me go look :) [13:36:29] addshore: https://gerrit.wikimedia.org/g/mediawiki/core/+/4e61091d9c3142094ee83dc2c35e7dca6eb3322d/includes/libs/rdbms/database/Database.php#798 [13:52:20] amazing, thanks! [14:01:11] btw addshore some pointers and advice at https://wikitech.wikimedia.org/wiki/Distributed_tracing/Tutorial/Instrumenting_your_own_application [14:03:32] ty, I think i'll put a pin in this for now and revisit it another day, (may also write a phab task) [18:06:41] I got c.danis' subquery example to run, but it hit the memory circuit breaker (not really surprising). I wouldn't rely on subqueries, though. Since OpenSearch isn't an RDBMS, two queries is the way.