[20:04:17] * Ironholds waves at the halfak [20:04:55] Hey Ironholds. How's your Sunday? [20:05:24] I discovered more profound wtf in our pageviews process :/ [20:05:32] and now I have to document it and come up with something better. [20:05:37] Pop quiz: how do we identify spiders? [20:06:15] Case sensitive pattern match "Spider" in their user agent? [20:06:19] nope [20:06:25] we aggregate requests by IP [20:06:26] * halfak thinks of a solid, dumb strategy [20:06:30] and any IP with >N requests, we discard [20:06:32] * Ironholds throws hands up [20:06:34] What's N? [20:06:47] I don't know, but when Erik was listing possible false positives one of his examples was cyber cafes [20:06:54] ...so I'm guessing it's not monumentally high. [20:07:36] there's also the fact that all the XFFed traffic will be coming from a small number of proxies, which...would presumably display that way and risk being identified as spiders. [20:08:19] which includes...a lot of mobile traffic. 48,912 requests in one day of sampled logs, from 16 IPs, because they've got HTTPs enabled and so those are our IPs. [20:08:50] ..wait, oh thank god. Okay, he's amended his email: apparently that applies to edits. For views we're fine. [20:09:37] heh [20:09:39] we use UA Parsing. We're safe. [20:09:44] * Ironholds almost had a stroke [20:20:02] I've actually gotten to the point where I've put an assert False in my code in order to debug it. [20:20:32] * halfak 's life in weird, intermittent error land. [20:20:39] heh [20:20:47] I wrote that namespace checking code, btw [20:20:50] just finishing up documenting it [20:21:00] namespace checking? [20:21:26] give it a list of namespace IDs and a dbname, it'll return the localised namespace names for each ID [20:21:33] give it a list of names and a dbname, it'll return the IDs [20:21:41] Gotcha