[00:20:03] o/ bearloga [00:20:07] Still around? [00:26:17] Oh well. Was hoping to answer your questions re. p(desirable|scores) [00:26:53] o/ ewulczyn [00:27:01] halfak: was at meeting [00:27:03] wassup [00:27:08] Oh cool. [00:27:33] * halfak gets code for generating likelihoods. [00:27:58] halfak: actually what are the parameters for those beta distributions? [00:28:42] halfak: if i know the shape params I can just find out p(score|desirable) and p(score|undesirable) for any score [00:29:29] bearloga, see https://bitbucket.org/grouplens/snuggle/src/d9d17d592986bf693f3bf1992bf3a512db3c243b/snuggle/util/desirability.py?at=default&fileviewer=file-view-default [00:29:36] They are vars right at the top [00:30:25] halfak: whats up? [00:30:42] ewulczyn, was hoping to convince you to join us in #wikimedia-ai :D [00:31:09] It's weird getting into conversations about the ai in wikimedia stuffs and not be able to pull you in to conversations. [00:31:22] There's isn't one going on now, but it's somewhat regular. [00:31:41] * halfak has been meaning to poke leila too [00:34:11] thanks for the invite. i'm trying to be on IRC more regularly and am happy to be pulled in! [00:34:53] Woot! :) [00:35:38] halfak: you should sell him on IRCcloud too [00:38:56] YuviPanda, still haven't made the transition myself. [00:45:12] halfak: so based on those parameters [00:48:42] halfak: the smallest score that will yield a desirability ratio of <2 is 0.3525. The largest score that will yield a desirability ratio of >10 is 0.1298. So if my STiki score is less than, say, 0.13 then that desirability ratio goes up A LOT. [00:50:02] halfak: and it quickly goes to 0 basically after a STiki score of 0.712 [00:50:29] Yeah. That sounds like what I'm experiencing -- and maybe that'd how it *should* be. [00:50:40] Anyway, thanks for taking a look at it. [00:50:55] I'll be coming back to this soon, so I hope I can bug you again for protips :) [00:51:03] halfak: sorry I wasn't any more help [00:51:12] halfak: but yes, feel free to bug me :) [00:51:15] Oh no. As much help as I had hoped for :) [00:52:40] OK. Time to transition to evening-halfak. Have a good one! [00:53:02] Have a good one! [13:47:57] o/ [16:06:02] o/ tarrow [16:06:09] hi! [16:06:15] I don't know if you noticed, but I uploaded an updated mwcites dataset [16:06:16] I think I found it! [16:06:19] woot! [16:06:47] brb must throw ball for dog [17:07:34] o/ [17:35:07] o/ lzia [17:35:10] g'morning :) [17:36:06] Any chance I can get you to auto-join #wikimedia-ai? There's been some discussions about the article recommender there recently. :) [18:06:05] o/ guillom [18:06:12] Looking at https://phabricator.wikimedia.org/T115119 [18:06:31] We're trying to figure out where to set a threshold on the length of URLs that appear in tags. [18:06:40] I know you were working on building a dataset of such URLs. [18:20:26] halfak: Looking at phabricator now. Will follow up there. [18:21:02] guillom, sorry, might have been a false alarm. Your insights are valuable regardless :) [18:22:09] halfak: yeah, what I was interested in was domains, not URLs, so I trimmed everything after the first slash [18:22:16] So I have no insights on length :/ [18:22:18] Gotcha [18:22:25] Thanks anyway :) [18:22:34] sure [18:22:58] * guillom should cook lunch before the Research showcase. [18:23:19] * halfak forgot about the showcase! [18:23:28] Didn't we used to get meeting invites to it? [18:24:05] Hmm... Maybe not. [18:24:09] * halfak copies to calendar [18:35:10] 1234 o-clock in CST [18:35:15] Nooo. I missed it [18:35:59] UTC is the one true timezone :P [19:28:14] Hi everyone. We're starting the showcase in 2 min or so. :-) [19:28:38] halfak and myself will be collecting questions during the presentation. [19:30:31] Here is the link to the video: http://www.youtube.com/watch?v=kXCI6whgdUA [19:31:01] * guillom waves. [19:31:05] hi guillom [19:31:49] o/ guillom [19:32:18] Can someone let us know when the stream starts, just to be sure? :) [19:32:20] Welcome everyone. We're just getting started. [19:32:24] hah! [19:32:28] Stream should start in seconds. [19:32:43] Stream is live now! [19:32:47] gogoogog [19:32:57] ok, streaming is good. [19:33:00] thanks :) [19:34:31] o/ gilles [19:34:44] BTW, ping me "halfak" with questions. [19:34:48] The stream is live [19:35:15] * guillom chuckles @ Abraham Lincoln. [19:35:16] hey all, ping me if you are having issues with the stream [19:35:42] guillom: love the quote. :D [19:36:13] brendan_campbell: Thanks. There are a few drops in audio from time to time but it may be from the source. [19:36:44] yeah, guillom. I'm wondering if we should stop the presentation and ask Srijan to do something (if anything can be done). [19:37:10] guillom: unfortunately it sounds like a network problem on Srijan's end [19:37:13] leila: I think it's mostly ok; not sure anything can be done. [19:37:16] I'm skeptical. I'm getting 99% of the words [19:37:41] right, same here halfak. agreed guillom. [19:37:44] It's still OK but long drops make it tought to follow [19:37:59] I'd wait a bit [19:38:09] o/ GLCiampagliaaaa :) [19:38:17] Is Srijan talking about https://en.wikipedia.org/wiki/Wikipedia_Seigenthaler_biography_incident ? [19:38:22] ooki, let's reassess at the end of part 1 (impact of hoaxes) [19:38:30] halfak: he was briefly, yes [19:38:40] +1 leila [19:39:51] brendan_campbell: can you mute SF? [19:39:53] maybe it helps? [19:40:46] leila: it certainly wouldnt hurt. dont know if it will alleviate the dropouts. [19:40:50] 21K hoax articles! That's crazy [19:41:01] sf muted [19:41:27] 21K over 15 years ~= 4 hoax articles a day. [19:42:04] Obviously there are more now than in 2001, but still not unimaginable. [19:42:13] hoax-labeled-deleted-articles [19:42:20] GLCiampagliaaaa: crazy high or crazy low? [19:42:38] brendan_campbell: it may be better, but there is definitely an internet problem on Srijan's end that we can't fix. Thanks for testing it. [19:42:38] I bet. "Jimmy is presented of the US" is also flagged as a hoax, but would be obviously caught in patrolling. [19:43:24] *president [19:43:25] lol [19:45:34] i don't quite understand the definition of "inlinks, accessed through search engines and wiki" [19:45:59] Perhaps you guys could ask him to check if he's on a wireless connection and see if he could switch to wired? [19:46:24] Right now it's still more or less OK though [19:47:25] HaeB, I think that srijan was saying that inlinks suggest some traffic comes from the wiki, but we can assume that some also comes from search engines. [19:47:32] But i'll add it to the question set :) [19:48:36] Presumably, adding inbound links to a hoax article so it's not orphan is a common practice among hoaxers. [19:49:59] so you mean on-wiki inlinks... ok, could be, but IIRC it was in the section referring to jimmy's remark about hoaxes being cited by media, so i thought it could also refer to external inlinks [19:50:23] Yeah. Will make sure to ask. [19:50:29] Actually, are you in the room HaeB? [19:50:54] I think I see you behind J-Mo :) [19:51:35] halfak: damn, my cover is blown ;) [19:51:45] Interesting chart. [19:52:17] agreed, guillom. especially the legit curve. [19:52:35] This graph would be nicer as a density IMO [19:52:49] It's hard to look at in reverse cumulative order [19:54:58] It's just a matter of taking the derivative Aaron ;) [19:56:35] GLCiampagliaaaa, my brain only does derivatives when you throw things at me in environments with substantial gravity. [19:57:06] LOL! [19:57:17] Man. Hangouts is going crazy today [19:57:20] brendan_campbell: does it help if we stop the SF video? [19:57:45] leila: just muted sf camera [19:57:52] thanks brendan_campbell. [19:58:15] halfak, if you have camera on, could you please turn it off for now? (I'm not seeing you in Hangout so not sure if it's on or off) [19:58:15] did srijan say that it is surprising that non-hoaxes are mentioned more frequently elsewhere before the article's creation? [19:58:30] ...i don't understand why that should be surprising [19:59:27] Not sure I caught that in my refresh of hangouts. [20:00:20] I think he said it was surprising that a small fraction of hoaxes _are_ mentioned before the article's creation [20:00:25] Really hard to follow now... [20:00:58] GLCiampagliaaaa: I'm counting on those watching it in YouTube. If it's hard to follow, signal us and we interrupt and see if we can fix it. [20:01:36] Yes we are watching on YT -- the last few minutes where very choppy [20:01:39] *were [20:01:48] yes, the choppiness translates to youtube [20:02:37] Any IRC questions. We're doing those now [20:02:44] While srijan considers his wifi [20:03:27] The only Q I have is from HaeB and he's in SF and asking them himself. [20:03:31] Any that I missed? [20:04:03] I didn't notice any more, halfak. [20:04:08] kk [20:04:12] I have a couple too :) [20:04:39] was the 21k articles determined to be entirely hoaxes, or just ones tagged for having some hoax content? [20:04:40] and I think the sound is okay now, brendan_campbell. [20:05:16] Emufarmers, got it [20:05:44] leila: sound on YouTube is much better now, thanks!!! [20:05:50] leila: i havent heard any drops since srijan made whatever change he made [20:05:53] indeed [20:06:04] New rule for future presentations: Use ethernet! :p [20:06:24] guillom: :D [20:06:49] Yeah. Want a plain density [20:06:50] Yikes [20:06:57] seriously though, it's a detail worth noting (ethernet). i will add it to my "remote presenter" one-sheet [20:07:23] yeah, it's worth adding to a checklist. Thanks, brendan_campbell. [20:12:18] leila: waiting for a good time to reconnect this speakerphone...the artifacty distortion noise is on our end, not translating to youtube [20:12:42] it's not affecting the broadcast in other words, just annoying for us [20:12:44] brendan_campbell: we may have to live with it? [20:13:13] we don't have that much time I think so if it doesn't go to YouTube, I'll ask people in the room to be patient. :-) [20:13:26] leila: got it [20:17:16] 72% AUC is pretty low. 98% AUC is essentially perfect. [20:18:14] do we know what are support features? [20:18:24] Wait. Is that middle prediction for whether it will get flagged or not or whether or not it is a hoax? [20:18:33] Is the 98% AUC for the task of deciding whether an article should be flagged -- regardless of whether it is an hoax or not? [20:18:58] Yes my same question Aaron! :) [20:20:31] Aaron a Q for later: Are the two sets (legitimate vs succ. hoaxes) balanced? [20:21:20] Got them both [20:21:26] (In the slide where he shows the feature selection) [20:21:27] Thanks! [20:25:33] halfak, we're close to the end. [20:25:44] we have 5 min, please jump in with questions as soon as srijan is done. :-) [20:25:48] Questions questions questions! [20:25:49] Will do. [20:25:50] :D [20:26:04] I've got two from GLCiampagliaaaa and two from me. :) [20:26:17] and 4 min, perfect. :D [20:26:28] (we can stay few more min) [20:26:48] i have a question about impact aspect... will ask myself [20:26:55] ooki, HaeB [20:27:09] GLCiampagliaaaa, first [20:28:20] ah no previous slide! :) [20:28:33] But the other question (balanced sets) is on this slide [20:30:23] I have one more, but I'll hold it! [20:31:00] Thanks Aaron! [20:31:06] And Leila :) [20:31:12] :D [20:31:24] thanks, GLCiampagliaaaa, for your questions. [20:31:43] halfak: you shouldn't have paused. ;-) [20:32:03] Bob, have you guys tried to run the same classification on the full set btw? Is that balanced too or is it unbalanced? [20:32:06] I paused purposefully. HaeB should get his Q :) [20:32:40] got it. [20:33:12] halfak: thanks :) [20:34:05] HaeB: Thank you for making that point about all hoaxes not all existing at the same time. I was peeling a mandarine and couldn't bring it up myself :) [20:36:18] super-handicapped-human [20:38:07] * halfak comes back from Valhalla [20:39:13] brb doggie ball throw time [20:48:36] exercise session complete.