[16:20:55] morning [16:25:07] o/ Ironholds [16:29:33] Ping Ironholds [16:29:41] Joining call? [16:43:48] oh no, my self-deprecating comment was interpreted as an expression of affront to all wikipedia researchers :) [16:44:17] "Please, community of Wikipedia researchers, accept my apologizes." [16:44:26] i assume valerio is not here? [16:56:38] perhaps a representative of the cabal can let him know there's no problem? [17:00:21] mako, I accept your apology (which is totally not necessary) :P [17:00:51] Thanks for engaging in that conversation. :) [17:03:04] halfak: well, i apologize too [17:03:08] halfak: but that was a quote from Valerio [17:03:47] Oh. Herp derp. [17:03:49] :) [17:04:39] halfak: i think there's a real benefit to building a page somewhere (maybe one exists i don't know about) that is very clear about what is and is not in the page view data [17:05:05] in terms of redirects, hits to pages that do not exist, etc. [17:05:19] +1 [17:05:39] Related: We're waist deep in an improved definition of page views https://meta.wikimedia.org/wiki/Research:Page_view [17:05:42] * Ironholds heard about pageviews, rises from the stygian depths [17:05:51] page moves [17:05:58] I bet Erik Z has some docs somewhere about web stats collector. [17:06:08] he has some, but not a tremendous amount [17:06:19] better documentation is part of the improved definition planning [17:06:38] like, ried priedhorsky is doing a ton of stuff on views now and he pointed me to this https://bugzilla.wikimedia.org/show_bug.cgi?id=35045 [17:07:02] turns out, it doesn't have an impact on the way they are recorded but if you're relying on what's in your browser URL bar, it sure looks like it could [17:07:27] although i think it's a useful fix, it's going to complicate perception of page views in relationship to redirects [17:08:02] * halfak runs to the next meeting [17:08:46] halfak: that's useful [17:10:07] so, valerio is specifically interested in hits to /w/index.php, which AFAICT is somehow just outside of the definition altogether, no? [17:10:26] mako, it's outside the current definition yes, which requires /wiki/ [17:10:54] but it's clearly still still useful and essentialy for answering certain questions (his questions seem to be how often people access old versions) [17:10:56] this was actually the bit of the current definition I have the biggest problem with because it's culturally biased [17:11:00] the new one allows for index.php?title= and index.php?curid= [17:11:11] it also allows for all the dialect-specific serbian and chinese projects [17:11:39] how about the fun oldid= that doesn't match the title= :) [17:11:45] however I don't know whether it will be included in the "pageX got Y pageviews" data as well as the "WM sites got Y pageviews" data [17:11:56] i actually do that all the time [17:12:11] well, see above. I don't know if that'll be a problem ;p [17:14:10] Ironholds: i hadn't thought about that in regards to chinese & co. that's a big problem [17:15:31] also, i have this vague impression that the pagecount data does not (or maybe did not?) include logged in users? [17:15:51] Ironholds: is that true? [17:17:05] mako, I..can't see how we'd even distinguish [17:17:15] I mean, there's no indication in the data as to whether someone is logged-in or not [17:17:24] so it should include logged-in users happily [17:17:58] and yeah, that's a big problem: I'm fine with us being off somewhat but when that somewhat is culturally biased we can has problems. [17:18:09] well, sometimes viewership data are cache hits [17:18:12] Short of some project having a massive botnet problem we should adhere for accuracy across all projects [17:18:32] totally, and a request from a logged-in user pulls a non-cached version [17:18:34] and logged in users see slightly different pages [17:18:45] but it should still hit the varnish caches (I think?) which are then told tor etrieve a fresh version [17:18:54] I'll check with the devs, mind [17:19:27] so, i have a vague recollection that it might have been counted. but even if that's a true recollection, it would have been since before the caches were even varnish :) [17:19:35] like, when they were all squid [17:19:56] so i have a hazy memory of out-of-data information :) [17:20:00] heh [17:20:11] well, I've poked the devs and will find out [17:20:15] Ironholds: i would really like to know, this is the kind of thing i'd like to put on a page somewhere [17:20:21] * DarTar waves at mako [17:20:30] yes, it's going to be on a page somewhere [17:20:31] DarTar: ciao! [17:20:49] at the moment my speed at developing the pageviews definition is somewhat slowed, however, and the documentation accompanies that [17:21:01] DarTar: cool, then we if also put some stuff about redirects and page movies and the /w/index.php, i think that would be a real service to anyone using those data [17:21:08] wait, that was for Ironholds [17:21:14] :) [17:21:24] i mean, public channel, you are all allowed to read it [17:21:40] yeah, I know [17:21:46] it will have aaaall of the caveats [17:21:59] Ironholds: where should this go? [17:22:10] it'll be up on meta with the new definition once that's done [17:22:15] Ironholds: i'm happy to do my own little braindump [17:22:31] feel free, but my energy for integrating it with anything is currently nil [17:22:41] mostly because my energy is nill [17:22:49] *nil. and NUL, in the ASCII sense. [17:23:22] Ironholds: that's fine, i'll make a mess now and you can clean it up later ;) [17:42:57] Helder, so I'm not seeing a tremendous amount of traffic on that day, unless they're using really extensive URL encoding [17:45:14] I'm going to download the raw dumps and see if I can parse through those [17:46:46] weird... [17:48:46] Helder, well, I suspect there's some URL encoding business going on [17:50:33] Woo active channel today. :) [18:03:41] halfak: very preliminary outline is up in IdeaLab [18:12:43] Thanks Pine! Can you link me to it? [18:13:06] halfak: just a minute, I'm about to send out a security bulletin [18:43:26] Does anyone know how to set up cyberduck or filezilla to connect to stat100X? [18:44:56] ewulczyn, I used to know how for filezilla [18:45:24] http://www.digitalinternals.com/network/ssh-port-forwarding-for-ftp/208/ maybe [18:57:12] Ironholds: can you forward the security notice that I just sent to Wikimedia-l to the staff list? [18:57:57] I'm not on Wikimedia-l because I like being sane [18:58:01] * Ironholds goes to read [18:58:06] we know. Notice already went out internally. [18:58:10] it was already sent to the staff list a while back. [18:58:47] ok [18:59:03] It would have been nice if they'd sent this to the public lists [18:59:12] * Pine looks for someone to trout [19:09:46] Let's see, where was I [19:14:49] halfak: https://meta.wikimedia.org/wiki/Grants:IdeaLab/Editor_Interaction_Visualizer [19:56:52] Helder, and you're sure it was 20-09-14? [23:36:00] Ironholds: ewulczyn re: cyberduck, you should set it up to access over sftp. Should be much easier than ssh tunneling with ftp [23:36:39] the hell [23:36:45] Skeleton does not like italic text. [23:38:17] it wants em or strong? idiots!