[09:20:36] Three distinct datasets are used to train the CITATION NEED classifier. Each dataset consists of positives and negatives. Positives are sentences with an inline citation or with an inline citation needed tag. Negatives are sentences which require no inline citations. [09:21:14] Can I safely assume that negatives are all sentences without an inline citation needed tag? [09:23:49] In that case, such an assumption will also include sentences that actually not require a citation like "The Sun rises in the East". [09:24:46] which is a common knowledge. [09:26:56] Wouldn't the classifier then try to tag similar common knowledge sentences as requiring citations? [09:31:37] I'm concerned that the current scope of negatives make it difficult to identify common knowledge sentences. [09:42:32] hello again pgadige! [09:43:14] I guess that the negatives are sentences without such a tag; you could verify that by looking at the datasets directly. [09:44:18] I don't know how the 'common knowledge' sentences are handled; are such statements generally added to wikipedia articles? [09:44:40] maybe 'the sun rises in the east' ought to lead to an article about what that means (I mean, techincally speaking, the earth rotates and etc) [09:44:46] I've no idea [09:45:14] if the authors don't discuss this in their paper, you might want to contact them directly [09:45:49] miriam: see the above :-) [09:46:21] (I think miriam is in a different timezone than we are but you can also try email) [09:46:49] hi apergos [09:46:55] and pgadige [09:47:01] oh you're here! [09:47:24] yes sorry! [09:47:54] if you have a few minutes, pgadige is interested in the project to produce dumps from the CITATION NEEDED classifier [09:48:06] and has some questions :-0 [09:48:08] er, :-) [09:49:21] so pgadige thanks for your question - there are a few reasons why statements won't need citations, one of them is because they are about "common knowledge", but there are more, see here: https://meta.wikimedia.org/wiki/Research_talk:Identification_of_Unsourced_Statements/Labeling_Pilot [09:51:02] so in the training data, we take all sentences from featured articles that don't have a citation. Featured articles are the most well sourced and well constructed articles in Wikipedia, so we can trust that what doesn't have a citation there actually doesn't need one. Does it make sense pgadige? [09:54:06] miriam: ah! yes! So the clue I missed out is Featured Articles, which are well sourced articles. Thank you for the clarification. I appreciate it. [09:54:39] pgadige! no problem, thanks for your interest in the project! And thanks apergos for helping out :) [09:54:42] apergos: hello! [09:54:49] :-) [10:31:32] A sentence requires no citation if it appears in the lead section of an article, and its content is referenced elsewhere in the article. Is the lead section ( or main section) of a Wikipedia article the first paragraph that appears before the Contents box in the article? I tried looking up for parts of a Wikipedia article to understand the structure of any Wikipedia article if there is one. [11:03:33] https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Section_organization here's all you need to know and more about article format [11:03:51] (TL;DR: yes, the lead section is the first paragraph, i.e. the intro) [12:16:58] apergos: ah! It's an interesting read. Now, I clearly know what are the parts/components of a Wikipedia article. [12:17:36] great! [20:22:09] hey groceryheist: do you have a dataset for the Reading Time project?