[09:27:24] can anyone recommend beginner friendly resources (like blog posts, short video tutorials, learn-by-doing exercises etc.) to get started with the fundamentals of Natural Language Processing? I'm trying to understand word embeddings of a sentence, and I could only find university classroom lecture videos, which are too long and time consuming. [09:59:45] you could try something like this https://www.shanelynn.ie/get-busy-with-word-embeddings-introduction/ (really don't know what level you're looking for) and there are a few links on the bottom that might be useful for more info [10:39:09] apergos: thank you for sharing it. I'm a newbie to NLP, and I'm looking for resources that cater to beginner level audience with intermediate programming experience. I like this article you shared. It explains clearly, the basics and more details. [11:25:23] In the context of citation need classifier (https://wikimediafoundation.org/news/2019/04/03/can-machine-learning-uncover-wikipedias-missing-citation-needed-tags/), as I understand the model classifier was trained twice. The first time, the model was trained to classify whether any sentence needs a citation or not based on a group of words it considered important in the sentence. The second [11:25:29] time, the model was trained to classify an unverified sentence into one of 8 citation reason categories. [11:27:50] why is it that the model's accuracy has fallen to 69% after training it the second time? What happened to the model after feeding it the knowledge of citation reason categories? [11:29:51] apologies. The retrained model predicted citation reasons with an accuracy of 62% [11:33:56] so first, there is a comparison in the paper between the pretrained and not pretrained models [11:34:06] and the pretrained one does better for citation reasons in all cases [11:34:23] second, from the paper again: " It is important to note that due to the small number of statements in the Citation Reason dataset and additionally the number of classes, the prediction outcomes are not optimal." [11:35:12] if you have the time for it I would suggest you look at the paper some, you don't have to understand every detail but you can get the general picture https://arxiv.org/pdf/1902.11116.pdf [11:37:26] we're talking about only 4k sentences for the 'citation reason' dataset if I am reading thsi correctly [11:39:32] apergos: yes as the sample size for citation reason model is specified as 4K sentences. [11:39:36] the three datasets used for 'citation needed' are here: https://github.com/mirrys/citation-needed-paper/blob/master/training_data/training.txt [11:39:59] and we're talking 20k so that's a big difference [11:41:06] aha! 20K is the size of the dataset used for training the model the first time. [11:41:12] yep [11:41:42] it looks like for all three of those datasets but you could download them and check yourself [11:42:36] apergos: Thank you for sharing the references to look up for further reading. I shall explore the training data provided in the git repository. [11:43:03] ok! have fun, and hope it helps! [11:43:49] ah you are working on the dumps project aren't you! cool [11:49:21] apergos: yes! I'm an Outreachy applicant, and I'm trying to read up on things I do not understand before starting to work on the first onboarding task. [12:02:40] great! then we'll be seeing you around some. happy reading! [17:25:27] Hello everyone, My name is Anuradha, I wish to contribute to the project. Can somebody please guide me on the next steps to follow. Thank you very much for the help. [17:42:03] hello anuradha! [17:42:42] hey apergos [17:43:27] have you found a task in phabricator that you are interested in working on? [17:44:17] Yes I just read about the contribution process, i am selecting a micro task. Thank you very much. [17:45:39] ah great! when yo uhave decided on one, or think you are interested in one, you can link it here [17:46:26] then people can check it to see whether they can help to get you oriented etc [17:53:48] Sure, Thank you very much.