[17:19:47] bblack: do you know offhand how difficult it is to plumb some bits from the traffic layer all the way through to the things that Analytics cares about? it could be interesting to have visible in the turnilo/Hive webrequest data to have some notion of "the filters/data we have on the Traffic layer believe this request originated from a public cloud IP" [17:19:57] maybe this is a better question for Analytics but I'm starting with you [18:01:11] cdanis: [18:02:15] TL;DR is it's not fundamentally hard. modules/varnish/templates/analytics.inc.vcl.erb -> "sub analytics_deliver" is where we're declaring some arbitrary fields to them based on the edge layer's interpretation of things. [18:02:56] I think I don't know the present answer to whether and how to negotiate the two ends of that pipeline - I think we have to talk to them first about declaring a new field there, so they can prepare to ingest it, before we commit new stuff on our end [18:03:49] in general, some kinds of things like GeoIP data, we haven't sent over via this kind of mechanism, because they're doing that level of analysis in a different/better way on their end anyways. [18:04:58] but I think this seems like the right kind of case for us to tag it - we don't want to have to sync analytics+traffic views of a list of cloud nets or ASNs, etc and have them diverge in the time domain as well. What we really want to know is "did the cache categorize this request as cloud_nets at the time it processed it?" [18:25:34] cool, thanks for the pointers! [18:25:47] and yeah, exactly what I was thinking wrt the last point [18:27:01] I'll file a task, seems useful [18:41:39] hmm, cdanis if you just want it in webrequest, i think you can add it in the X-Analytics header [18:41:47] is that magical? :) [18:41:50] that gets turned into a map [18:41:52] yeah [18:42:01] will that work just in Hive or also in Turnilo? [18:42:03] it would be better if we made it a map type in the first place...but this was before we had that stuff [18:42:17] in turnilo i think it wwoud lhave to be included in the druid load [18:42:20] not totally sure how that owrks [18:42:29] but the original data is the same [18:42:34] okay! [18:42:36] so probably just a config [18:43:03] in druid though, the dimensions have to have low cardinailty, so a field with millions of possible values isn't quite right [18:44:37] https://wikitech.wikimedia.org/wiki/X-Analytics [18:44:43] yeah [18:44:45] this would just be a boolean [20:04:12] filed https://phabricator.wikimedia.org/T279380 :) [20:04:28] happy to do the VCL side of things myself, should be pretty straightforward [21:37:06] 10Traffic, 10SRE, 10Wikipedia-iOS-App-Backlog, 10iOS-app-Bugs: Wikipedia iOS apps sending harmful bursts of traffic synchronized to the top of the hour, especially at 22:00 UTC - https://phabricator.wikimedia.org/T264881 (10JMinor) [21:39:53] 10Traffic, 10SRE, 10GitLab (Initialization), 10Patch-For-Review, and 2 others: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10Dzahn)