[06:10:35] Hi apergos thanks:] hi miriam:] sorry I got disconnected yesterday. [09:27:35] hi aikoChou! I hope you are doing well :) [09:28:01] thanks a lot for attending the research showcase! I hope you enjoyed it! [09:30:20] so, about re-use. You are absolutely right. Not only monitoring content re-use could help detecting statements needing citations. But also calculating sentence similarity within an article could help us getting rid of the "false positives". Sometimes, a sentnece is marked as "needing citation" but it practically doesn't need it because the fact in the reference has been referenced eslewhere in the article. Being able to detect similar [09:30:20] sentences could help us exclude those cases. Makes sense? [09:30:38] the fact in the *sentence [12:00:56] The idea is cool! I didn't think of that.=D so, the Citation Needed model seems cannot avoid this kind of false positive for now, right? Although the negative instances in the training data contain the reasons "Main Section" or "Already Cited", but the model has no way to know which sentences have been referenced elsewhere and calculate sentence [12:00:57] similarity. [12:04:15] It also reminds me of the second task we did before: retrieve individual sentences in an article and run those sentences through the model to classify them. Maybe we could add a "filter" that calculates similarity between sentences already cited and uncited within an article and to filter out those similar sentences before throwing sentences into [12:04:15] the model? [13:26:50] aikoChou, that would be amazing if it was possible! It should be fairly feasible with the word vectors that are calculated to get input to the RNN. Also - the model does have the notion of "section": it takes as input the section title, thus inferring associations between section and citation need. Most sentences in the main section don't strictly require a citation, and the model is capturing implicitly this rule [15:34:02] miriam_, the simplest way I can think of for measuring two sentences is to use cosine similarity between the average word vectors. And yeah, I remember "section" is a strong indicator for the model in the paper, so it shows the model implicitly learned this rule.=D [23:42:45] what would be amazing to atleast be competent enough to understand a word of that convo