Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/3847
Full metadata record
DC FieldValueLanguage
dc.contributor.authorShah, Aanan-
dc.date.accessioned2022-03-30T14:03:43Z-
dc.date.available2022-03-30T14:03:43Z-
dc.date.issued2021-03-30-
dc.identifier.urihttps://zone.biblio.laurentian.ca/handle/10219/3847-
dc.description.abstractNews articles have an influence on people's belief and views about various circumstances. In this regard, some news publishers with political or ideological bias try to spread news which are distorted or totally wrong. This thesis intends to develop a machine learning model that identifies fake news and original news by taking aid from natural language processing. Natural language processing was used to preprocess the text. Some general features like, number of words, sentences, stopwords, non-alphabetic words, verbs, nouns, and adjectives were identified. The stopwords and hyperlinks were removed to clean the text data. In the preprocessing step after cleaning the data and removing the stopwords, the position of each word was concatenated with the word itself. This procedure helps in distinguishing between a word as a noun, a pronoun, an adjective or a verb in the sentences. After preprocessing, feature extraction methods were used for converting the text of news to analyzable data. The frequency of the words in each article was used for filtering out the non-informative words. Three feature extraction methods were used in this study namely, count vectorizer, Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer and word2vec embedding. It was observed that the results obtained by TF-IDF feature extraction method were superior compared with the other two methods. After feature extraction, various machine learning models were used for training the model namely, Naive Bayes, Logistic Regression, Random Forest, K-nearest neighbors (KNN) and Support Vector Machine (SVM). The Recurrent Neural Network (RNN) was also used as a deep learning model. The model was successfully tested on two datasets. On the first dataset, SVM achieved an accuracy of 98.5% and RNN achieved an accuracy of 98.03% which is much improvement over the best results of Agarwalla et al., 2019 (83.16 % accuracy). On the second dataset, SVM achieved an accuracy of 97.76%, RNN achieved 97.1% and Logistic Regression achieved 97.50% which is an improvement over the best results of Vijayraghavan et al. 2020 (94.88% accuracy).en_US
dc.language.isoenen_US
dc.subjecttwitteren_US
dc.subjectfake newsen_US
dc.subjectreal newsen_US
dc.subjectdataen_US
dc.subjectmachine learning techniquesen_US
dc.titleDistinguishing fake and real news of twitter data with the help of machine learning techniquesen_US
dc.typeThesisen_US
dc.description.degreeMSc Computational Sciencesen_US
dc.publisher.grantorLaurentian University of Sudburyen_US
Appears in Collections:Computational Sciences - Master's theses

Files in This Item:
File Description SizeFormat 
Thesis FINAL - Aanan Shah - 03-Mar-2021.pdf1.49 MBAdobe PDFView/Open


Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.