[02:55:37] anyone up for reviewing https://gerrit.wikimedia.org/r/#/c/243834/ ? [03:04:18] ori: I left a few comments [03:06:21] http://docs.hhvm.com/manual/en/hack.attributes.memoize.php looks interesting, but hack only :/ [03:09:51] legoktm: thanks! [14:55:31] https://github.com/hmlb/phpunit-vw [15:43:17] ori: wonderful :) [15:57:51] csteipp: "we should not make correlating the webrequest/pageviews dataset with mediawiki logs easier, since mediawiki logs often identify the user." -- have you read T102079? That's exactly what I'm being asked to implement. [15:58:56] bd808: I hadn't read that yet. Let me do taht. [15:59:43] Multiple managers very very much want unique user tracking right now :/ [16:02:09] api.log already does correlate ip, username and action api request [16:02:28] Yeah, but not reading history [16:02:58] well... unless you are reading via the native apps [16:03:49] Those go via api? /me deletes app [16:04:03] yeah they are all api calls [16:04:44] "Do you want to unistall this app?" Yes [16:05:18] bd808: in multiple engineering management meetings, chris & others warned that given the lack of bandwidth / availability of security and privacy engineers, managers should be careful not to push on doing things that may compromise either [16:05:29] toby should know better [16:07:20] Our Q1 review deck will be published soon and you can see the things that are asked [16:07:40] that said, having an api-driven reading experience seems sensible for unrelated reasons (primarily the separation of form and content), so if reading via the API is problematic, it's the API logging that should be fixed, not the apps [16:07:46] I can't speak to exactly who is making the asks but I think it is way beyond Toby [16:08:46] I'm being asked to make the api logging more robust so that Quim, Partnerships and Reading can all have tracking dashboards [16:10:05] Originally they wanted it all in EL but when we told them the volume Analytics said it would melt everything [16:10:54] bd808: Btw, thanks for pointing this out. I'll definitely start a conversation with Toby. I'm really not trying to shoot the messenger :) [16:11:09] I have tough skin :) [16:11:20] and I think you know my feelings on tracking [16:11:39] * Reedy tracks bd808s current location [16:12:14] Oh, in that case... WHAT ARE YOU THINKING!?!?!!!! bd808 is TEH EVIL TRACKER!!!! [16:12:44] $DAYJOB-1 -- http://www.kount.com/ [16:13:00] "Study: For optimal heart health, Americans should double or even quadruple the amount of exercising they’re doing. The findings challenge the notion of a 30-minutes-a-day magic number for exercise." [16:13:00] So, just so I'm getting this [16:13:06] where am i going to find four minutes a day?!?! [16:14:01] You guys want to sent off api traffic to webrequests, so that you can run one query to get all of what was in api.log + restbase logs in one place? [16:14:34] I.e., if this was just about the api... (what are we calling it now, action api?) [16:15:03] we could send that to its own place in hadoop for reporting, and not coorelate with webrequests [16:15:15] After adding in User Agent [16:15:21] The first desire is T108618 (plus they want OAuth id and CA uid if available as well) [16:16:07] the correlation id for log events across multiple apps (varnish, mw, rb, parsoid, ...) is sort of othganl [16:16:45] bd808: What's the ask behind the hadoop request though? No one should actually care about the tech. [16:17:00] T102079? [16:23:00] T113817 came to my attention when we were talking about if and how we could correlate Varnish hits for api content with what is actually cached [16:23:25] That was one of the things discussed earlier in T108618 [16:28:19] bd808: Right. So understanding that yes, research people would love to correlate across all of the hadoop datasets. And I can see why you want Restbase + MW api correlated. But is there a reason you need to correlate hits to the varnish cache (usually web page hits) and api hits to achieve T102079? [16:29:07] some currently unknown percentage of the total api traffic is cached data from Varnish [16:29:25] We started caching api hits? [16:29:50] API modules can mark their responses as publicly cachable [16:29:59] they can be is they are GETs and include parameters that ask for cahcing [16:29:59] I missed that.. cool [16:30:18] there isn't any caching done automatically at this point [16:31:12] the request has to include maxage/smaxage parameters to enable caching [16:33:10] So is varnish already reporting hits to api.php into webrequests? [16:33:23] I believe it is, yes [16:33:52] but since a large number of api.php requests are POST that data doesn't tell us much about what is actually being requested [16:33:58] So you've already got the requests there, you just want it structured now? [16:34:07] and even the GETs are hard to analyize [16:34:15] *nod* [16:34:32] and enrich with data normally found in api.log and not in Varnish [16:39:45] So putting words into your mouth: If structured api data is being written into its own hadoop dataset, you want a unique request id in both the webrequest dataset and the api dataset, so you can dedup the datasets and figure out how many cached hits aren't in your api dataset? [16:49:08] csteipp: yeah I think that is the current point of talking about an X-Request-Id header [17:28:23] anomie: hit "join" in your hangout window ;) [17:28:33] bd808: You know me too well [17:30:10] csteipp: The webrequests dataset tracks sessions? [17:30:48] anomie: Essentially-- IP + UA [17:31:32] csteipp: Even without a request-id, we (want to) have IP + UA in the API log itself. [17:31:50] (we already have IP, and UA is what's requested) [17:35:09] anomie: Yes. And I'm also trying to get us to stop storing IP + UA and only store a session id. [17:37:31] Well, IP is useful if ops needs to ban someone being stupid at the router, and UA answers questions that random people want answered. [21:07:57] How do we decide on names for repos? [21:08:04] Trying to work out what to call this ORM library [21:10:06] Reedy, TheThingThatShouldNotBe? [21:10:18] lol [21:10:21] wikimedia/bad/evil/not-really-orm/orm [21:10:40] JeroenORM [21:10:44] => JORM [21:11:01] we're moving it outta the core? [21:11:04] Yeah [21:11:15] NotReallyORM [21:11:47] If it was easier to remove from Wikibase, I'd just move it straight to EducationProgram [21:12:24] Reedy: I started ripping it out [21:12:31] but it's tough [21:12:45] Reedy: I don't think it can be in a separate library. I did a quick look and it creates its own DBErrorExceptions and stuff [21:12:45] Guess I could get it done tomorrow... but there's that review thing [21:13:08] also ORMTable depends upon ApiBase?? wtf [21:13:14] This week if people are ok with me throwing away all the crappy tests that depend on it in Wikibase [21:13:18] hoo, just tell me and I'll +2 it w/o looking [21:13:55] https://gerrit.wikimedia.org/r/243933 [21:13:59] that's the start of it [21:14:06] it just adds regression tests [21:14:15] legoktm: eh? [21:14:30] It's got an api module base thing [21:15:52] Hmm.. It throws MWExceptions [21:16:39] Can we quickly remove all usages, slap @deprecated onto it and kill it with 1.28? [21:16:59] hoo: Not really [21:17:10] Unless you want to fix EducationProgram [21:17:21] Move it there [21:17:32] Not sure I'm serious... [21:17:38] When Wikibase doesn't use it, we can do that [21:19:20] That sounds like the least bad thing [21:20:35] Let's plan to do that then [21:21:30] https://www.mediawiki.org/wiki/Manual:ORMTable :P [21:21:59] It's not used in any of teh WMF hosted extensions [21:22:02] Hmm, it says UploadWizard uses it [21:22:06] This seems out of date [23:06:05] legoktm: are you up for +2ing https://gerrit.wikimedia.org/r/#/c/243834/ ? you reviewed it once and I took your suggestions. it comes with a pretty extensive suite of tests. [23:09:45] (or anyone else for that matter, i ain't choosy) [23:35:25] ori: {{done}} [23:35:31] bd808: thanks!!