[12:29:10] Hey, wondering if anyone knows anything about this; let's say I'm building something a bit like a search engine, I've got a bunch of things I have to rank and a load of different factors I want to rank them by, what patterns can I use to manage all my factors, i.e. make it all modular and normalised? Presumably big search engines have hundreds of factors and [12:29:10] a distributed team working on each part, how do they go about assigning relative importance between factors etc? [12:31:43] I guess they would have some optimisation loop, but gets hard when you have a very high number of factors because the space gets large [12:51:25] Anyone here worked on Cirrus? [12:55:21] https://en.wikipedia.org/wiki/User:NEverett_(WMF) apparently [13:34:47] mornin' Ironholds. [16:26:59] hey leila :) [16:27:00] yo DarTar [16:28:08] hey both [16:28:26] so, the referer-tracking has handled 90 days of logs so far [16:28:29] do we have a quorum? [16:28:38] ha, wow [16:28:43] errored out on 3, but I wrote in awesome exception handling (aw yiss) so it's still going [16:28:58] DarTar, and that's 90 days after I started it running at 3am this morning [16:29:13] alright, cool [16:29:32] well, we’re not in a hurry for this data, but thanks for the heads up [16:29:49] I'm worried it'll run out of memory at some point, so I'll see when it dies and do some work dumping it straight to file [16:30:01] we're not? moth- you told me this was the most important task! [16:30:06] I've done nothing but work on this since Monday! [16:30:13] not urgent != not important [16:30:18] ;) [16:30:21] ...darnit. Should ask for more clarity in the future [16:30:29] I wrote parallelised code! Until 7am! /sulks [16:30:35] okay, will work on apps alongside [16:30:39] brb [16:31:13] (flex) [16:33:05] Ironholds: if this turns out to be a major blocker, let’s discuss – I don’t want you to be fighting with referrers for a long time [16:33:18] DarTar, naw, it's going to be good. I've got it working. [16:33:21] there are two outcomes [16:33:27] and on that note, did you see the work Majo and Aaron presented on redirects? [16:33:27] 1, it does not run out of memory. Cool! big dataset [16:33:34] DarTar, let's chat? [16:33:36] 2, it does run out of memory and so I have to do it in chunks, appending to a TSV [16:33:40] cool! big dataset ;p [16:33:41] I'm in the RG hangout [16:33:51] the second one looks like 'putting a for loop around something'. That's it. [16:33:58] hey leila, sure – joining. Ironholds: RG? [16:34:07] I'll see if my internet can handle it. [16:34:40] thanks! [16:35:05] mornin' tnegrin. :-) [16:35:25] hi leila [16:35:54] I was having trouble im'ing you y'day [16:35:54] thanks for the help with Ellery -- it's all good [16:36:45] np. :-) happy to hear that [16:39:39] Ironholds: link to mako’s work on redirects http://networkcollectiv.es/wiki-redirects/ [17:16:22] DarTar, cool! Thanks :) [17:17:35] Ironholds, so: to wrap up on edit attempt data, [17:17:56] other than discussing instrumentation with James and halfak tomorrow, we need to sort out the legal status of this data [17:18:11] as it’s neither reader behavior nor public revision data [17:18:15] but something in between [17:18:28] and I have no idea what licensing/privacy terms would apply to it [17:18:43] DarTar, makes sense [17:19:13] but I totally agree, this is one of the key pieces of data we should be generating [17:19:22] also one that lila has been asking for [17:19:43] she's going to like my Circ 2.0 stuff then [17:20:57] wait, what’s the link with edit attempts? [17:21:42] well, we have commonality between groups of editors andwhen they do things, and disparity between that and read actions overall [17:21:51] this means one of two things: 1, editors really are special snowlfakes [17:22:01] slash, edit actions only make sense at certain points in the day [17:22:15] or 2, for some unknown reason it's more difficult to edit at particular points and succeed [17:22:24] so I want to see how well edit attempt timings match up with edit timings. [17:22:36] see if 2 holds any water [17:23:00] it’s going to be really hard to answer that question until we have data of decent quality [17:23:05] we do! [17:23:09] the requestlogs [17:23:11] meh [17:23:22] what's wrong with the RL data? [17:23:53] that assumes that all edit attempt are recorded in a consistent and clean way across devices-methods [17:24:32] ah, but they are [17:24:48] RL data also won’t allow you to do any segmentation by anons vs registered [17:24:48] which is probably the most important one [17:25:31] yeah, that's fair [17:32:54] DarTar, 101 days, no explosions, btw [17:47:27] Ironholds: nice, keeping all my fingers/toes crossed [17:48:56] Ironholds, leila: check out the thread on staff meeting with lila in case you missed it [17:48:59] starting in 10 [17:49:06] and remote friendly [17:49:26] doh! good that you said it DarTar! [17:49:28] thanks! [17:54:30] Ironholds, yt? [17:54:47] leila, no! [17:54:55] * Ironholds waits to see what the outcome of that paradox is [17:55:00] What's up? [20:49:49] J-Mo: I fixed the bug that caused things to be stuck in 'waiting' [20:49:51] (I think) [20:50:35] sweet. I'll be using quarry tomorrow morning for some stuff, so I'll look out for it [20:50:59] J-Mo: cool. even if it does show up, hitting 'submit' again *always* fixes it [20:51:11] awesome [20:51:47] DarTar, can we kick our 1:1 down the road by 15 minutes? not eaten yet [20:52:22] J-Mo: yw! [20:52:43] J-Mo: I also set up an easy way to publish ipython notebooks on toollabs, btw: http://tools.wmflabs.org/notebooks/yuvipanda/test/test [20:53:02] ohh… maybe I'll finalyl start working in iPython now :) [20:53:48] J-Mo: I also have a way to run ipython notebooks on toollabs easily, and publish them trivially as well. I'm unsure if it is useful, tho. You get easy access to API, dumps and the db replicas with it [20:54:15] J-Mo: so workflow is: 1. run commandline, 2. it automatically opens up a browser running ipython notebook on toollabs for you, 3. create new notebook, 4. it is automagically published from the start [20:54:23] J-Mo: very similar to quarry except needs a toollabs account [20:54:28] Ironholds: np [20:54:32] ta! [20:54:51] J-Mo: I'm not sure how useful it will be, since halfak told me publishing is more important so I put it on ice for a moment [20:54:52] also, we’ve been talking a lot the last 2 days, so nbd if you prefer to push it [20:55:42] interesting. I may follow up with you on that next week or the week after, YuviPanda. I'd like to learn my way around iPython. [20:56:05] gtg for now YuviPanda. Thanks again! [20:56:06] J-Mo: :D ok. it currently works but is not fully secure, which is why I haven't publicized it. Need to publicize quarry first. [21:01:52] YuviPanda, example app UA please [21:01:59] again? :P [21:02:01] Ironholds: moment [21:02:16] ta [21:02:54] Ironholds: WikipediaApp/2.0-r2014-08-24 (Android 4.4; Phone) [21:03:10] danke! [21:04:05] Ironholds: running late here [21:05:15] my IRC client tells me tnegrin has quit [21:05:21] * YuviPanda nominates Ironholds for director of analytics [21:09:33] Ironholds: I also found out that one of my close friends is a Hive core dev [21:12:26] DarTar, kk [21:12:30] lemme know when you're around [21:12:31] YuviPanda, cool! [21:29:22] all: there is still time to contribute to the august wikimedia research newsletter (publication is likely on saturday or sunday) [21:29:28] todo list with lots of interesting new papers: https://etherpad.wikimedia.org/p/WRN201408 [21:30:16] Ironholds: in 5 [21:30:28] leila, DarTar: still lacks a mention of the last research showcase too ;) [21:30:38] DarTar, cool [21:31:34] HaeB: leila is working on wrapping up a project before she goes on vacation so I don’t think she’ll have much spare time to help with the newsletter [21:33:39] DarTar: ok - just following up on last week's suggestion by aaron [21:37:36] Ironholds: sorry, I need to find a room [21:37:43] np :) [21:37:48] (my laptop keeps dying on hangouts) [21:39:10] YongLe is the only room available, yay [21:45:24] Ironholds: just sent you an invite [21:45:31] ta [21:45:39] * Ironholds grabs phone [21:47:11] bah [21:47:18] I can't see it. can you email me the link? [21:47:25] stupid google [21:49:21] got it.. [22:26:51] DarTar, call dropped. bah. Finish via /query? [22:29:00] hey [22:29:04] sure, IRC [22:29:26] last item was basically if the referrer stuff was dragging you into a blackhole [22:32:42] * Ironholds snots [22:32:43] * [22:32:45] *r. bah