[00:01:01] Analytics-Kanban: Restart Pentaho - https://phabricator.wikimedia.org/T105107#1436463 (kevinator) NEW [00:12:20] :) [01:34:48] Analytics-Kanban, Research-and-Data: Validate Uniques using Last Access cookie {bear} - https://phabricator.wikimedia.org/T101465#1436543 (leila) @madhuvishy, Thanks for this. How hard it is in terms of engineering resources and otherwise to count uniques for a short period of time, say 24 hours, and comp... [06:07:33] Analytics-Backlog, Wikimania-Hackathon-2015: Dockerize Hadoop Cluster, Druid, and Samza + Load Test - https://phabricator.wikimedia.org/T102980#1436726 (Qgil) a:Milimetric OK, thank you. We are just aiming to have all the confirmed sessions assigned to someone, in order to make it easier for anybody to... [07:28:52] Thanks jgage for the kafka cluster repair :) [10:46:33] Analytics, Engineering-Community, Research-and-Data, ECT-July-2015: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1437191 (Qgil) @anomie @bd808 @tgr, is it ok to add this task to the #Reading-infrastructure-team backlog? Not only because this small and b... [11:14:46] Analytics, Analytics-Backlog, Performance-Team, Patch-For-Review: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1437265 (Krinkle) Open>Resolved >>! In T104277#1432643, @ori wrote: > What issues? Mainly two things observed: * A (high) presence of H... [12:06:50] (CR) Joal: [C: 2 V: 2] "Looks good to me :)" [analytics/aggregator] - https://gerrit.wikimedia.org/r/223031 (https://phabricator.wikimedia.org/T95339) (owner: Mforns) [12:41:09] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [12:43:10] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [14:21:03] hey milimetric [14:21:30] hey joal [14:21:39] About puppet on wikimetrics, any news ? [14:21:58] joal: no, haven't touched it, I just made a task [14:22:10] k cool [14:22:11] I was going to ping marcel to see if he can take a look [14:22:22] it's in progress now so we can't forget about it anymore [14:22:28] but I've been fighting this insane date logic [14:22:41] date logic ? [14:22:45] I'm almost done this time, with the 3rd round of unexpected complexity [14:22:56] the wikimetrics local / zoned / global / local default dance [14:23:21] one of the more complicated pieces of code I ever wrote, with almost 0 motivation to write it because it's such such low value [14:23:28] like, it saves 10 people 2 clicks here and there... [14:23:37] :( [14:23:44] yeah... not good use of time... [14:23:50] but we promised we'd do it and we're a team of our word [14:23:53] But, when it's done ... :) [14:25:25] Thanks for the info [14:54:54] Analytics, Engineering-Community, Research-and-Data, ECT-July-2015: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1437769 (bd808) >>! In T102079#1437191, @Qgil wrote: > @anomie @bd808 @tgr, is it ok to add this task to the #Reading-infrastructure-team ba... [15:02:06] joal, hi! you around? I can't find anywhere how to deploy the aggregator, or if it is already deployed? [15:09:39] Analytics-Kanban: Troubleshoot EventLogging validation alerts - https://phabricator.wikimedia.org/T105167#1437818 (mforns) NEW a:mforns [15:16:03] hey mforns :) [15:16:18] joal, hey! [15:16:28] Normally aggregator gets deployed automagiccaly by puppet [15:16:32] joal, ok [15:16:34] cool [15:16:51] joal, where does it live? [15:16:54] I wanted to double check that now, and maybe ahve a manual run to generate all.csv dataset for histoprical data [15:17:10] We can do it together if you want ? [15:17:16] joal, we still need to add the --all-projects flag [15:18:16] joal, can it be after standup? I'd like to do something before standup [15:18:29] mforns_brb: sure :) [15:18:32] ok [15:22:33] (PS11) Milimetric: Add global default report fields [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/217857 (https://phabricator.wikimedia.org/T74117) [15:43:51] joal, are you done for today? [15:44:07] nope :) [15:44:28] let's double check deploy if you want [15:44:57] joal, sure! but only if you have time [15:45:05] Definitely ok :) [15:45:08] batcave ? [15:45:10] batcave? [15:45:12] :] [15:45:14] :) [15:52:11] jgage: Hi ! [15:56:12] goin to lunch yall, bbl [15:56:42] enjoy milimetric [15:57:45] oh madhuvishy do you wanna go to scrum of scrums today? [15:57:55] or anyone else for that matter [15:58:07] it's in 1.5 hours and I think I'm going to miss it [15:59:42] just in case, if someone goes, I put "nothing to report" in http://etherpad.wikimedia.org/p/Scrum-of-Scrums because I couldn't think of anything to report that's relevant for other teams right now [16:00:26] joal: hi :) [16:00:42] not sure if we have a meeting today or if we're skipping, i'm fine with either [16:00:57] andrew being off, do we cancel ops-analytics meeting ? [16:01:18] jgage: --^ [16:01:20] i don't really have anything to discuss. kafka 18 & 21 were out yesterday, i triggered a reelection and that was that [16:02:10] yeah, thx for that [16:03:04] sure, let's cancel it. i'm not sure if we'll have one next week either, i think otto will be at wikimania [16:05:01] yup, he will, and so do it [16:05:05] will you jgage ? [16:05:42] unfortunately no [16:05:49] i want to go to the one in italy next year :D [16:05:57] I understand that :) [16:06:40] ok, i will see you in our meeting on july 22 then, unless something explodes [16:06:42] jgage: we are gonna make a change in puppet, do you mind reviewing ? [16:06:49] sure [16:06:52] jgage: right [16:07:01] I'll let you know when ready [16:07:11] k, just add me as reviewer [16:07:14] :) [16:07:17] awesome, thx [16:08:29] joal: vital signs is updated again [16:08:40] milimetric: you ROCK :) [16:08:42] puppet should run fine but I'll leave the bug in "code review" just so we can check on it again tomorrow [16:08:48] I did nothing, just the basics here: https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster [16:09:56] Analytics-Backlog, Wikimania-Hackathon-2015: Dockerize Hadoop Cluster, Druid, and Samza + Load Test - https://phabricator.wikimedia.org/T102980#1438045 (Milimetric) Oh I'm happy to be the point of contact, but if anyone is reading this, grab anyone on the list above if you can't find me. [16:10:45] Analytics-Kanban: Bug: puppet not running on wikimetrics1 instance, Vital Signs stale {musk} [5 pts] - https://phabricator.wikimedia.org/T105047#1438048 (Milimetric) followed directions here [1] and puppet ran cleanly again. I will leave this in code review so we can monitor and make sure it runs periodicall... [16:11:20] milimetric: you had to re-setup everything ? [16:12:05] joal: nono, just the update part. I basically git pull --rebase in both the /var/log/git/operations/puppet repo and the private one [16:12:14] and then re-ran puppet to make sure it's all good [16:12:23] mforns: before you review the "test data for pageview API" task, I am gonna provide a small analysis on how much k=100 impacts [16:12:23] now it looks enabled so it should run every 30 min. but you never know :) [16:12:51] milimetric: that's great :) [16:12:58] joal, ok, let me know when I can start :] [16:13:09] I'll double check tomorrow [16:13:34] * joal thanks milimitric for the beautiful updated chart :) [16:14:05] joal: do you happen to know why kafka and jmxtrans are listening on a few random high ports? [16:14:08] gage@analytics1012:~$ sudo lsof -i -n -P | grep LISTEN | grep java | awk '{print $9, $3}' | sort -n -k 1.3 [16:14:11] *:2101 jmxtrans [16:14:14] *:9092 kafka [16:14:16] *:9999 kafka [16:14:19] *:47400 kafka [16:14:21] *:51771 jmxtrans [16:14:24] *:55286 jmxtrans [16:14:26] *:60924 kafka [16:15:25] my question is related to this changeset to create firewall rules for those hosts: https://gerrit.wikimedia.org/r/#/c/223534/ [16:23:00] ok, really going for lunch now. I'll be gone for a while 'cause i have an errand to run [16:48:39] (PS1) Mforns: Add clarifying comment on --all-projects behavior [analytics/aggregator] - https://gerrit.wikimedia.org/r/223577 (https://phabricator.wikimedia.org/T95339) [16:48:46] (CR) jenkins-bot: [V: -1] Add clarifying comment on --all-projects behavior [analytics/aggregator] - https://gerrit.wikimedia.org/r/223577 (https://phabricator.wikimedia.org/T95339) (owner: Mforns) [16:53:09] (PS2) Mforns: Add clarifying comment on --all-projects behavior [analytics/aggregator] - https://gerrit.wikimedia.org/r/223577 (https://phabricator.wikimedia.org/T95339) [16:53:25] jgage: sorry for missing the pings [16:53:37] jgage: I am not good at jmxtrans conf :( [16:53:41] can't help really [17:18:13] Analytics, ContentTranslation-Analytics, MediaWiki-extensions-ContentTranslation, ContentTranslation-Release6: find how much are the articles that are created using ContentTranslation read - https://phabricator.wikimedia.org/T105194#1438288 (Amire80) NEW [17:26:13] Analytics, Engineering-Community, Research-and-Data, ECT-July-2015: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1438360 (Tgr) IMO we should not reinvent any wheels - API requests are normal web requests, and there is already extensive machinery in plac... [17:32:19] (CR) Joal: Add clarifying comment on --all-projects behavior (1 comment) [analytics/aggregator] - https://gerrit.wikimedia.org/r/223577 (https://phabricator.wikimedia.org/T95339) (owner: Mforns) [17:33:23] milimetric: when you're back we can do the code review for the wikimetrics task. let me know :) [17:34:27] (PS3) Mforns: Add clarifying comment on --all-projects behavior [analytics/aggregator] - https://gerrit.wikimedia.org/r/223577 (https://phabricator.wikimedia.org/T95339) [17:34:42] (CR) Mforns: Add clarifying comment on --all-projects behavior (1 comment) [analytics/aggregator] - https://gerrit.wikimedia.org/r/223577 (https://phabricator.wikimedia.org/T95339) (owner: Mforns) [17:35:10] mforns: I have data for analysis of the K=100 thing, but didn't write the comment in phabricator yet [17:35:18] joal, ok [17:35:19] mforns: I'll do that tomorrow morning :) [17:35:27] joal, cool! [17:35:48] joal, I have still plenty of things to do :] [17:35:52] Thanks for the patches, and sorry for forgetting telling you about comment [17:36:02] mforns: I have no doubt ;) [17:36:07] Analytics, Engineering-Community, Research-and-Data, ECT-July-2015: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1438403 (Halfak) @tgr if you can find a way to extract an OAuth ConsumerId from our "extensive machinery" around web requests, I'd love to h... [17:36:26] joal, np [17:40:10] (CR) Joal: [C: 2 V: 2] "Thanks Marcel :)" [analytics/aggregator] - https://gerrit.wikimedia.org/r/223577 (https://phabricator.wikimedia.org/T95339) (owner: Mforns) [17:40:41] joal, thank *you [17:40:52] No mforns, Thank you ! [17:40:59] :) [17:44:53] Guys, except if any of you need help from me, I'll leave :) [17:45:17] see you tomorrow! [18:09:28] Analytics, Engineering-Community, Research-and-Data, ECT-July-2015: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1438547 (Tgr) >>! In T102079#1438403, @Halfak wrote: > @tgr if you can find a way to extract an OAuth ConsumerId from our "extensive machine... [18:15:14] analytics1021: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 3.8467799701e-12 [18:15:23] since 8h 13m [18:15:48] is the bot outputting it here ? [18:15:51] it should [18:18:18] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 26.67% of data above the critical threshold [30.0] [18:19:24] Analytics-Kanban, Research-and-Data: Validate Uniques using Last Access cookie {bear} - https://phabricator.wikimedia.org/T101465#1438589 (ggellerman) @leila Thanks for exploring this possibility [18:21:19] well, i see the bot works, so i guess i'm duplicating things [18:21:42] assuming people read the bot messages [18:22:00] because it happens like every day [18:22:31] jgage is fixing, thanks! [18:22:46] i've never gotten an answer about those eventlogging alerts, and they go to my phone. i vote for turning them off. [18:23:13] oh you pasted the analytics1021 alert, that's different and actually valuable [18:24:27] we've just received new hadoop nodes; once those are added to the cluster we'll convert one of the oldest-gen workers into a kafka broker and ditch problematic analytics1021 [18:37:47] Analytics, Engineering-Community, Research-and-Data, ECT-July-2015: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1438689 (Halfak) > Parse the Authorization header Solid idea. I wonder if we can also get the user_id/global_id using this type of strateg... [18:42:48] Analytics, Editing-Department: Investigate drop-off in global edit save rate starting 27 June 2015 - https://phabricator.wikimedia.org/T105215#1438716 (Neil_P._Quinn_WMF) [18:48:20] ACKNOWLEDGEMENT - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 100.00% of data above the critical threshold [30.0] daniel_zahn https://phabricator.wikimedia.org/T105216 [19:30:05] madhuvishy: I'm free anytime you're free [19:39:28] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [19:50:59] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 33.33% of data above the critical threshold [30.0] [20:09:03] milimetric: just got back from lunch. 2 minutes [20:15:20] Analytics, Engineering-Community, Research-and-Data, ECT-July-2015: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1439000 (Tgr) >>! In T102079#1438689, @Halfak wrote: > Solid idea. I wonder if we can also get the user_id/global_id using this type of str... [20:16:38] milimetric: batcave? [20:16:52] omw [20:21:39] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 26.67% of data above the critical threshold [30.0] [20:49:42] jgage: were you and mutante talking about EL above? I couldn't decipher the opsy talk :) [20:50:03] I'm confused about this alert and the fact that dzahn claimed it and put it under some weird Cassandra umbrella [21:04:41] milimetric, are you looking al EL too? [21:05:13] mforns: I just looked at the graphs and realized it's on the varnish side most likely [21:05:23] nothing looks wrong with any of the EL workers [21:05:25] aha [21:05:33] so I figure just not enough events are getting throug [21:05:50] but that seems like an ops thing, and I don't really know who to talk to without otto around [21:06:45] mmm [21:13:29] Analytics-Kanban: Restart Pentaho - https://phabricator.wikimedia.org/T105107#1439195 (Milimetric) pentaho is inaccessible via ssh due to stale puppet probably. I didn't want to waste Yuvi's time with it so I didn't ask. I rebooted the instance but that didn't solve anything, the service didn't come up. We... [21:16:52] leila: I added to sheets to the spreadsheet on mobile apps uniques. One for Android, and one for iOS [21:16:58] its pretty interesting [21:17:08] two sheets* [21:17:19] thanks, madhuvishy. I'll look into it this afternoon. (in a meeting now) [21:17:28] leila: sure :) [21:21:49] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [21:24:02] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1439244 (Milimetric) FYI this is still blocked on @Ironholds responding to my comment above (https://phabricator.wikimedia.org/T103186#1412355). Apologies for not pinging him correct... [21:25:35] milimetric: In this task - https://phabricator.wikimedia.org/T101465#1436543.. Do you have some idea on leila's last comment? (It's about running a short duration unique token based test for validation) [21:27:09] madhuvishy: I think that's a question for brandon black. From our end it's easy to write some varnish code, but it's up to him if he'd be ok with deploying it for a single day and then removing it [21:27:39] milimetric: alright, I will ask him then [21:28:13] so for us it would just be - set a one time cookie with a 1 day expiration date and a unique id. hm... well, that might be tricky in varnish too i'm not sure [21:28:21] milimetric: I don't think we need to do global uniques - cos last access doesn't give us global counts - so may be even doing one domain would do [21:28:25] but the code complexity isn't high [21:28:41] ok, cool, you should mention that 'cause it'll keep the exposure low [21:30:10] Analytics-Kanban, Analytics-Visualization: Update Vital Signs UX for aggregations {musk} [13 pts] - https://phabricator.wikimedia.org/T95340#1439266 (Milimetric) a:Milimetric [21:41:23] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1439313 (Ironholds) It was probably more to do with the fact that, as my OOO email made clear, I've been afk for a week ad a half. I'll try to address this tomorrow (just clearing my... [21:43:16] EventLogging graph is going back to normal... [21:48:34] milimetric: thanks, sent email [21:51:05] mforns: yeah, i just saw that, did you do anything? [21:51:10] milimetric, no... [21:51:26] :) great [21:51:33] self-healing [21:51:44] I was just checking the db, the inserts were normal during all this time, it seems a graphite problem to me [21:51:50] hehe, yea [21:51:58] maybe the reporter then... [21:52:43] milimetric, I checked the graphite-consumer in hafnium and it was fine [21:54:06] did you check the eventlogging logs there? [21:54:09] i'll take a look [21:56:33] milimetric, yes, the graphite consumer log just contains the init log [21:57:14] the /var/log/upstart/eventlogging_* logs haven't been updated in forever [21:57:20] but it seems strange to me that in eventlog1001, the reporter logs stop like 2 days ago [21:57:37] I can see the wrapped up logs, but not the current log [21:57:52] yeah, but the zipped logs are from weeks ago [21:58:03] you mean in hafnium? [21:58:06] yea [21:58:09] aha [21:58:29] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [21:58:29] just weird - i think we have no visibility in this. Obviously something bad happened in opsy world [21:58:32] and in eventlog1001 the same happens with the reporter logs [21:58:36] yea [21:58:55] I asked on the ticket that this got added to, we'll see if they say anything [21:59:30] milimetric, I assumed that in hafnium, the graphite consumer log, does not log anything but the initialization [21:59:54] oh, that would make sense [22:00:02] i figured it'd restart more often but maybe not [22:00:05] so if the process does not restart for a long time, no logs are written, and the only thing that remains is old gzipped logs [22:00:11] this would make sense to me [22:00:30] but the reporter logs do print stuff other than the init [22:16:20] Analytics-Backlog, Compact-Personal-Bar-(Beta): Delete all data from EventLogging:PersonalBar schema - https://phabricator.wikimedia.org/T105065#1439406 (gpaumier) [22:17:41] milimetric, somehow I can not match any of the alerts with any anomaly in graphite graphs... [22:32:11] mforns: they seemed to happen at the same time as the big drop in the overall.raw.rate [22:32:14] did you look at that one? [22:32:47] milimetric, yes, I could not find any matching drop [22:33:07] lemme look at the alerts again [22:33:09] there are lots of drops, but could not match them with the alert timestamp [22:34:01] milimetric, if you want to look at Jul 5th 20:54:49 - 20:58:30 UTC [22:34:14] I found some missing events in the db for this period [22:34:38] the graphite charts are intact however... [22:37:32] mforns: http://graphite.wikimedia.org/render?from=16%3A00_20150708&until=22%3A00_20150708&width=400&height=250&target=eventlogging.overall.raw.rate&_uniq=0.9958936583716422&title=eventlogging.overall.raw.rate [22:37:49] that drop corresponds with the icinga emails [22:38:23] a little bigger: http://graphite.wikimedia.org/render?from=16%3A00_20150708&until=22%3A00_20150708&width=1200&height=650&target=eventlogging.overall.raw.rate&_uniq=0.9958936583716422&title=eventlogging.overall.raw.rate [22:38:51] milimetric, wait but this drop is the one that just happened a while ago [22:39:15] a while ago? It's today [22:39:15] I think this is of a different nature of the others, that lasted for 2 minutes [22:39:19] oh yea [22:39:34] you're looking for the 2 minute ones? [22:39:35] yes, I meant just now [22:39:38] yes [22:39:45] oh there's a new 2 minute one? [22:40:09] I'm not seeing one, yea [22:40:22] xD, no no. With "a while ago" I meant today like 40 mins ago [22:40:35] I'm looking at Jul 5th 20:54:49 - 20:58:30 UTC [22:40:47] this one is the biggest one, 4 minutes [22:41:12] and I found a hole in the database (missing events) in those minutes [22:41:19] yeah, the alarm seems wrong, this part looks ok to me: http://graphite.wikimedia.org/render?from=22%3A00_20150708&until=23%3A00_20150708&width=1200&height=650&target=eventlogging.overall.*.rate&_uniq=0.9958936583716422&title=eventlogging.overall.raw.rate [22:41:23] but graphite chart is intact [22:42:20] mforns: wait i'm not seeing a problem alert 40 minutes ago [22:42:25] just the recovery message from icinga [22:42:46] the last problem alert I have is 21:21 [22:42:59] sorry - july 5th [22:43:43] oh i get the confusion, i thought you said you meant "just now" for the 2 minute alerts [22:43:58] oh, no no [22:44:00] bah, heh :) looking back in time now [22:45:16] mforns: this is the one I see close to that time: http://graphite.wikimedia.org/render?from=19%3A00_20150705&until=20%3A00_20150705&width=1200&height=650&target=eventlogging.overall.*.rate&_uniq=0.9958936583716422&title=eventlogging.overall.raw.rate [22:45:38] looks like 19:31 to 19:33 [22:46:01] looks like some sort of hiccup [22:46:03] milimetric, ok, but that's like 1.5 hous before the alert [22:46:09] yea [22:46:16] so you're right, no match [22:46:22] icinga was confused somehow [22:46:49] milimetric, do you know from where exactly icinga gets the metrics? [22:46:57] graphite [22:47:00] mmm [22:47:28] so theoretically if you plug in the numbers you get from that graph into whatever math icinga does, it should say yes throw an alert [22:47:35] but the data might have changed [22:47:41] remember how graphite sometimes updates late and weird [22:47:45] aha [22:47:48] this makes sense [22:47:51] that could've happened here plugging up the hole [23:02:19] milimetric: this balanced consumer stuff is working great! [23:02:27] no load tests yet but i love it [23:03:11] that's cool. Yeah, I think adding something to Event Logging should feel like hitting the perfect sweet spot in tennis [23:03:13] milimetric: http://i.imgur.com/o7MFD4e.png [23:03:13] simple and powerful [23:03:26] :) pretty - i love kafka so smart [23:03:43] https://www.irccloud.com/pastebin/FRiq2DOE/ [23:03:50] I pushed 200 events [23:03:59] through the forwarder [23:04:07] milimetric: :) yeah [23:04:58] that's surprisingly good balancing for such a low number of events [23:05:20] i'd expect it to be good on average as the number of events gets higher, but this is great [23:10:21] milimetric: yup :) [23:10:50] Analytics, Analytics-Backlog, Performance-Team, Patch-For-Review: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1439579 (Catrope) As requested by @ori and @Krinkle, here's my wishlist of statistics to collect: 1. Collect # of requests/responses for each... [23:23:38] Analytics, Analytics-Backlog, Performance-Team, Patch-For-Review: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1439658 (Krinkle) For @Catrope's #1 wishlist item I'd recommend we fragment the existing metrics by an additional layer. I don't think we need... [23:53:41] Analytics, Analytics-Backlog, Performance-Team, Patch-For-Review: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1439760 (ori) Ok. Can you formulate a plan for how you'd act on this data? i.e., if the data showed X, we'd do Y; if it instead showed P, we'd... [23:55:08] Analytics, Analytics-Backlog, Performance-Team, Patch-For-Review: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1439765 (ori) (And also: can you say what the goals are, in terms of cache hit rates, or some other metric? Identify what ought to be possible...