[00:00:05] Ironholds: There's only so much idiocy I can do with a YAML file. ;-) [00:00:11] Ironholds: But no worries. [00:00:57] Hmm. That said, http://edit-reportcard.wmflabs.org/ doesn't look like it's updated itself for the day. Did the cron fail (or not run)? [00:01:09] * James_F files a Phabricator ticket, given it's a Friday afternoon. [00:02:08] I'd use more parens, but that's just me [00:02:28] Ironholds: :-) [00:03:01] the SQL looks sane but I don't know what the base DBs look like [00:03:08] and having spent all day writing C++, my brain is toast. [00:03:26] well, more accurately; having spent all day writing C++, my brain is fine. Having spent part of the day writing MAKEFILES means my brain is toast. [00:03:31] VisualEditor, Analytics-EventLogging, Analytics: http://edit-reportcard.wmflabs.org/ doesn't seem to be getting updates (cron job not running yet?) - https://phabricator.wikimedia.org/T76921#822918 (Jdforrester-WMF) [00:03:38] Ironholds: My sympathies. [00:03:52] James_F, you kidding? It's awesome. [00:03:59] Ironholds: Makefiles are not awesome. [00:04:00] you give me 7m events from 1.2m users [00:04:10] guess how long it takes my code to tokenise them into sessions? [00:04:30] I don't have any feel for what "good" would look like. [00:04:36] two seconds [00:04:38] but fair [00:04:43] You could tell me an hour and I'd be impressed or shocked depending on your steer. [00:04:50] That sounds fast, true. [00:05:04] I can also now geolocate 300k IPs in 800ms. [00:05:13] I love you, C++. You make coding fun again. [00:05:42] In comparison, the slowest of my queries you just looked at takes 5.13s right now and is filtering a lot less than that much. [00:06:03] still, SQL is not the fastest [00:06:06] Filtering ~400k rows. [00:06:17] With triple-nested SELECTs, whee. [00:06:31] But, I guess, It Works™ so… [00:06:37] * James_F nods. [00:11:16] I'd like to start fixing some small bugs on wikimetrics. I'm looking at https://www.mediawiki.org/wiki/Analytics/Wikimetrics/Help, but am not clear on what I need to get started. I already have a Labs account, though, so https://www.mediawiki.org/wiki/Developer_access doesn't seem to be applicable? Where do I start? [00:13:50] FINCH, Analytics-Engineering: Define user segments in a way that Product and Analytics can actually use in database queries - https://phabricator.wikimedia.org/T76908#822937 (Jaredzimmerman-WMF) [00:16:15] fhocutt, good question! milimetric? [00:16:27] (also, re-welcome :D) [00:16:33] thanks, Ironholds. [00:17:15] is the mediawiki-vagrant set-up at all useful for wikimetrics work, or is it its own thing? [00:17:32] FINCH, Analytics-Engineering: Define user segments in a way that Product and Analytics can actually use in database queries - https://phabricator.wikimedia.org/T76908#822944 (Jaredzimmerman-WMF) [00:17:49] * fhocutt has been corralling vagrant yaks today, and is sort of hoping that it's not... [00:20:19] fhocutt: we have a lot of volunteers trying to help out with wikimetrics right now :) [00:20:24] it's a bit challenging to coordinate [00:20:30] what's your area of expertise? [00:21:10] re: vagrant - yes, mediawiki vagrant has a wikimetrics role. If you enable it and run vagrant provision, you should be able to browse to localhost:5000 and see wikimetrics running [00:21:28] milimetric, python and back-end-type stuff [00:22:27] ok, cool. I'll make sure our product owner knows that even more people are interested in contributing. Did you take a look at the tasks here: https://phabricator.wikimedia.org/tag/analytics-wikimetrics/ ? [00:23:27] some of them, yes [00:24:50] so what's interesting to you to work on? [00:25:15] most of the tasks are small cleanup things [00:25:58] oh, just looked at those issues, none of them are very interesting at all [00:26:50] yeah, am looking through, but I'm happy to start on some small cleanup bits to learn my way around [00:27:10] fhocutt: well, get it set up in vagrant and take a look at the code [00:27:24] most of it is fairly friendly, but one thing we could use help with in some places is documentation [00:27:59] sure [00:28:13] that's always a problem, it seems [00:28:34] well - more documentation is always good :) [00:28:36] ping me if you need any help, i'll leave IRC up [00:29:06] cool, thanks. Vagrant probably won't be set up till tomorrow, if that--my internet's pretty patchy and I've been running into snags [00:32:06] bahaha [00:32:21] James_F, now that I've sessionised, guess how long it takes me to work out session length for each one? [00:32:46] 2.6m sessions. 204 nanoseconds. [00:34:12] and that's on a single core. Goodbye parallelisation! [00:39:22] VisualEditor, Analytics-EventLogging, Analytics: http://edit-reportcard.wmflabs.org/ doesn't seem to be getting updates (cron job not running yet?) - https://phabricator.wikimedia.org/T76921#822988 (Liuxinyu970226) [01:02:40] fhocutt: if you need help with seting up wikimetrics in vagrant let me know, i will be around for a little while [01:29:11] Analytics-Refinery: Kraken data flow monitoring not working properly - https://phabricator.wikimedia.org/T52195#823028 (Milimetric) Open>declined a:Milimetric kraken is phased out, this is no longer relevant [01:40:08] VisualEditor, Analytics-EventLogging, Analytics: http://edit-reportcard.wmflabs.org/ doesn't seem to be getting updates (cron job not running yet?) - https://phabricator.wikimedia.org/T76921#823036 (Jdforrester-WMF) [01:40:22] Ironholds: Nice. [01:43:12] The future is compiled. [01:43:14] The future is orange. [01:43:21] (The future should probably spend less time in tanning salons) [02:24:28] hi nuria__, still around? [02:24:46] I haven't been able to get vagrant up yet [02:25:27] it's giving me port forwarding errors [03:11:57] (CR) Unicodesnowman: "Thank you! I'm happy to, could you give some more info on how it can be improved?" [analytics/dashiki] - https://gerrit.wikimedia.org/r/177487 (owner: Unicodesnowman) [03:43:43] advice to all of you: [03:43:47] never build a puppet manifest parser. [03:45:09] wait [03:45:13] ori, you glorious person you! [03:45:23] what? [03:45:32] I just discovered your Puppet-manifest-to-JSON ruby parser. [03:45:47] oh, lord. [03:45:50] we have a need, in analytics, for something to turn puppet manifests into something useful. [03:46:01] http://thedoomthatcametopuppet.tumblr.com/ [03:46:03] this will..drastically simplify my life [03:46:09] "Posts generated by a Markov chain trained on the Puppet documentation and the assorted works of H. P. Lovecraft" [03:46:10] haha [03:46:20] there's a reason why they go together so well [03:46:25] the puppet source code is insane [03:46:31] what are you trying to do? [03:48:04] extract the IPs of the SSL terminators [03:48:17] ...yes, this is silly, but also necessary. [03:48:20] * ori takes a look [03:48:23] doesn't sound silly [03:48:43] See aaron's notes at http://etherpad.wikimedia.org/p/ssl_terminators for the steps (or at least, the steps Ops worked out for him) [03:49:13] I'm going to do it in C++ over the weekend if my frantic googling does not work, but I'm hoping that ruby script will :D [03:49:56] role::cache::* != SSL terminators, though [03:50:48] i forget, is SSL termination done on the varnishes now? I think that's only the case for ULSFO [03:51:06] ...oh damn. [03:51:14] Okay, I'm sending myself a note, to drag you into a thread about this :D [03:52:08] what are you trying to do? :) [03:52:44] this is classic http://mywiki.wooledge.org/XyProblem [03:59:39] heh [04:00:15] Identify SSL terminator IP addresses in the request logs, so we have a way of (a) not factoring terminators into our heuristics for botnet detection and (b) knowing where we should use x_forwarded_fors and not rely on IP addresses for fingerprinting/geolocation [04:10:45] Ironholds: so, really, you need a good way to tell whether an IP is associated with a Wikimedia server, which is a much easier task, because you can just check if it's within our address blocks [04:12:13] and that's super easy: 91.198.174.0/24 208.80.152.0/22 2620:0:860::/46 198.35.26.0/23 185.15.56.0/22 2a02:ec80::/32 10.0.0.0/8 [07:46:29] ori, yep, except not always. In the sense that some of those requests are going to be automated, internally-generated traffic (which we want to exclude), and some are going to be SSL, and we can't distinguish by using the ranges :(