[00:03:37] 10Quarry, 10AutoWikiBrowser, 07WorkType-Maintenance: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#2255546 (10IKhitron) [13:35:17] o/ [13:43:25] I feel like I vaguely understand *what* machine learning is, at least, now! [13:43:40] fundamentally, given a set of known inputs and outputs, predict the function! [13:44:48] YuviPanda, !!! been doing some reading recently? [13:45:38] halfak: in keeping with my 'become really good at one thing but also always be total newbie and learning in another thing', I've decided I've become fairly decent at kubernetes / opsish-things, and should learn ML on the side. [13:45:49] very, very slowly, until I kill GridEngine, but it's slowly starting. [13:45:54] :) [13:46:04] and this understanding feels like a significant personal step into de-magicking it [13:46:11] If I may suggest, I think a really good place to dig into ML is evaluation metrics. [13:46:18] oh? [13:46:26] like onwiki evaluation metrics?! [13:46:43] Yeah. the fun thing about evaluation metrics is that you don't need to know how the prediction model works. [13:47:28] So, let's say that you needed a prediction model, but you didn't trust yourself to do it Right(TM). Well, Right(TM) doesn't matter if you have a high fitness prediction [13:48:03] So, I think of evaluation metrics of prediction models as a nice basic understanding. I can only be doing things so wrong so long as I know how to evaluate. [13:48:39] Sometimes these metrics take on a life of their own. See ROC-AUC for basic classifiers and RMSE for recommendation systems. [13:49:15] I see. so this is just a method of 'how close to the real function is my predictor function', and that is ultimately what matters [13:49:30] Yeah. Exactly. How useful is this? [13:49:41] But I find it fascinating to talk to people who specialize in evaluation of prediction models that support human-computer interfaces. [13:50:07] Because it turns out that, wrapped up in evaluation, is all sorts of power dynamics, psychology, and theory of consciousness. [13:50:41] > evaluation of prediction models that support human-computer interfaces. [13:51:08] funnel completion rates, conversion rates, etc? [13:51:33] Na. Still ROC-AUC or RMSE (root-mean-squared-error) [13:51:45] But the implications are interesting depending on what is being predicted. [13:52:01] E.g. in recommender systems, they are trying to predict how you'll rate a movie on a scale. [13:52:27] oh yeah, I meant those two as in things that are being predicted. 'people with gmail.com addresses are much more likely to convert to paying customers over people with a riseup.net address' as a naive example [13:52:55] * halfak barfs at ecommerce example. [13:52:58] But yeah [13:53:17] yeah, sorry about that. I'm wiki'd out from the two day workshop I just conducted :D [13:53:29] yeah, that seems incredibly interesting, since the 'function being predicted' here is 'human behavior' [13:53:55] Or human preference and other subjective things. [13:54:02] right [13:54:25] http://events.rice.edu/index.cfm?EventRecord=27870 [13:54:38] Oh. Seminar. [13:54:41] * halfak looks for a paper. [13:55:20] AHh yes. This looks like a good summary: https://cihr.eu/eoa2015web/ [13:55:45] I think over the last couple of years I've convinced myself that the argument that 'programs implement algorithms, algorithms are just math, math has no politics, and hence programs have no politics' is bogus, and once that clicked all of this seem very very fascinating but also important [13:55:49] * YuviPanda clicks [13:56:12] your blog post / talking about including 'isAnon' as a feature was very helpful in thinking it through [13:56:21] :) [13:56:39] I'm just digging in to writing a CSCW paper about ORES, anons, and subjective algs. generally. [13:56:43] _o/ [13:57:27] devouring above [13:57:43] halfak: awesome! [13:57:48] halfak: I hope to make it to CSCW next year too! [13:57:53] it was a mind-expanding event [13:58:21] I'm still pissed at the protectionism (almost) in those communities [13:59:09] halfak: the 'evaluation metric' was a useful bit of information! Thanks :D [13:59:15] I'm reading the link about ethics now [14:01:11] YuviPanda: http://machinelearningmastery.com/start-here/ is a good resource. Jason has a way of making you optimistic about succeeding in ML. And he teaches top down instead of bottom-up. [14:01:41] sabya_text: woah, that looks awesome! [14:01:58] 10Quarry, 10AutoWikiBrowser, 07WorkType-Maintenance: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#2256680 (10Reedy) I suspect column number isn't the most user friendly, specifying the actual could be more useful So from https://quarry.wmflabs.org/run/84084/output/0/json-li... [14:03:02] if you subscribe to his newsletter, you will get little paragraphs of lesson everyday [14:03:21] but starting with his blogs is a great start. [14:03:36] 10Quarry, 10AutoWikiBrowser, 07WorkType-Maintenance: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#2255546 (10yuvipanda) You can also get the latest output for any query with the following URL format: `/query//result/latest//` [14:04:49] Sorry. Got side-tracked working on a weekly update for ORES [14:05:05] sabya_text: yeah! I will do that (do the blog) [14:08:21] bye [14:10:39] 10Quarry, 10AutoWikiBrowser, 07WorkType-Maintenance: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#2256705 (10IKhitron) Thank you that you are interested. About column name - I thought about this, but sometimes it's not a good choice, because column name can be something as... [14:23:07] 10Quarry, 10AutoWikiBrowser, 07WorkType-Maintenance: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#2256735 (10Reedy) Teaching a little SQL seems easier... ``` concat((case page_namespace when 0 then "" when 4 then "Project" when 14 then ":Category" when 118 then "Draft" when... [14:25:04] 10Quarry, 10AutoWikiBrowser, 07WorkType-Maintenance: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#2256736 (10IKhitron) I knew about AS, of course. I just wanted make it possible usage of existing queries without any need to change them. [16:09:06] 10Quarry, 10AutoWikiBrowser, 07WorkType-Maintenance: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#2257063 (10IKhitron) To explain you, what did I mean about encoding. When I run Quarry, I never can just use the results. I open the CSV download in Notepad++, convert it to UTF...