[00:50:17] halfak: works now, thanks and isn’t my spellchecker amazing? [00:51:29] EL reply will have to wait 1-2 more days I’m afraid (but I read the thread and yes we owe it some distinctions, let me handle this) [00:51:53] Awesome. Thanks, dude. :) [00:52:05] BTW, I have some really good news re. revscoring. [00:52:11] aarg, wait [00:52:26] I stil can only *view* the letter [00:53:17] and I’ll have to remove language that indicates “commitment for collaboration” I’m afraid [00:54:09] No worries. Fixing [00:54:47] Fixed. [00:54:51] Sorry for the lame. [00:55:17] So yeah. Revscoring. We pushed our models up to 90 AUC for *all namespaces* for enwiki and ptwiki with the data we got from the wikilabels campaign. [00:55:28] That's substantially better than before. [00:55:37] :D [00:55:50] And it's probably going to work better for a lot of other reasons. [00:56:56] w00t [00:57:09] so new data from wikilabels? [00:57:21] are you going to do a deploy soon? [00:57:22] as in: you guys changed the dimensions? [00:57:39] * yuvipanda is happy with our current state of deploys, but should find time to make it all the way to debs [00:58:32] * DarTar needs to get the ball rolling with Yan before leaving, but I want to hear about this [00:58:59] halfak: never mind editing the draft, I’ll take it from here [01:00:31] It's the data we have been gathering using the "Edit quality" wikilabels campaign that we have been running for a while. The enwiki one just finished and the ptwiki one finished a few weeks ago, so it was about time that I built the models. [01:02:42] I suspected that the models would fit way better because there's a whole bunch of different stuff captured in "reverts". "Damage" is a small subset of that. [01:03:20] It turns out that the good-faith model is a little bit better than the damage model in general. I'm sure they are highly correlated. It'll be fun to look at the exceptions. [01:04:37] I'm trying to juggle that and some content persistence data that I'm surprisingly rich in since I moved back to running my computations on stat1003. [01:07:47] yuvipanda|maybe, missed your Q. [01:08:01] No deploy for at least 48 hours [01:08:16] Dunno how I'll schedule in the time test out the new models :S [01:13:11] I'm off. Have a good night! o/ [14:57:53] o/ [15:20:41] o/ Ironholds halfak guillom good morning! [15:20:48] Good morning! :) [15:23:42] Moin bearloga & halfak [15:40:30] * halfak tries to squeeze all he can out of his free-level figshare account. [16:30:37] o/ yuvipanda [16:32:25] Woops. Never,ind [16:35:36] halfak: can you PM me links to the documents? [18:16:19] hey halfak [18:16:29] Hey dude. [18:16:39] re: figshare, when you first reported it I thought it was a glitch on their end, it sounds like we’ve reached a quota? [18:16:57] does it help to drop a line to Mark and ask him to bump it? [18:17:08] Yeah. I can't start another "project" without paying and then it's $8 per month for 3 projects. I already have two. [18:17:15] It doesn't seem to make sense to pay. [18:17:19] agreed [18:17:25] Anyway, I'm trying to see if I can upload the files anyway. [18:17:29] I have to make them smaller. [18:17:36] 558MB compressed [18:17:41] kk let me drop him a line anyway, what’s your user account? [18:18:07] http://figshare.com/authors/Aaron_Halfaker/96516 [18:18:43] Goddamn. Just the enwiki part of that is 252 MB. Just above the quota. Arg! [18:19:35] Quota goes up to 500MB per file if a paid account [18:20:12] I wonder if they just delete our datasets if we stop paying the monthly fee. [18:20:27] no idea, I’ll add that to my email [18:21:38] mail sent, cc’ed you [18:22:28] Thanks. [18:34:06] yuvipanda and I settled on a task. We should have a draft by EOD. [18:35:05] * halfak gathers a 5 minute recording of rcstream [19:00:51] DarTar: https://github.com/halfak/research-engineering-task [19:01:54] thanks, in a meeting now, will check it later [19:02:26] kk [19:25:25] J-Mo1: You're welcome. Also, I've said it before, but I'm a fan of your Phabricator username :) [19:26:02] thanks guillom. we are legion [19:37:46] I just finished my solution to the task. Took me 32 minutes [19:37:56] Stupid hairy corners in celery. [19:50:06] halfak: do we want the solution to the task to be in a repo? [19:50:20] wording makes it sound like you want the code mailed to you [19:50:27] other than that, I like the idea [19:50:32] Yeah. I think they should mail it to me. [19:50:47] I think a PR might encourage a bit of cheating :) [19:51:13] right, so I assume you’ll take care of sharing with Yuvi on your end [19:51:27] and other folks (like Nuria) will not be involved in the evaluation [19:53:00] Yup. That's right. [19:53:20] BTW, got Nuria's feedback. Generally positive and helped me clean up the language so that it is clear what we are looking for. [19:53:27] great [19:53:41] i think task is a nice one, gave halfak feedback mostly about 1)noting we can accept some data loss (makes parallelization easier) and 2) being open to candidates using third party components (in some of our tasks we disallowed these) [19:53:49] ah, sorry #metooslow [19:53:57] :D [19:54:29] be back in a bit [22:02:11] J-Mo1, how many observations in that experiment? [22:58:44] Started reading y'all's Research Software Engineer task and had a good chuckle. "One of these wizards processes edits to pages in realtime. Edits are saved roughly 1.2 times per second, but the wizard can take anywhere" Thank you, https://chrome.google.com/webstore/detail/algorithm-to-wizard/paheefgkiimahbclagfhcnehdiengfkb?hl=en-US [22:59:53] Ha! [23:00:13] Took me forever to figure out what that extension was