Distinguishing fake and real news of twitter data with the help of machine learning techniques

Shah, Aanan

Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/3847

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shah, Aanan	-
dc.date.accessioned	2022-03-30T14:03:43Z	-
dc.date.available	2022-03-30T14:03:43Z	-
dc.date.issued	2021-03-30	-
dc.identifier.uri	https://zone.biblio.laurentian.ca/handle/10219/3847	-
dc.description.abstract	News articles have an influence on people's belief and views about various circumstances. In this regard, some news publishers with political or ideological bias try to spread news which are distorted or totally wrong. This thesis intends to develop a machine learning model that identifies fake news and original news by taking aid from natural language processing. Natural language processing was used to preprocess the text. Some general features like, number of words, sentences, stopwords, non-alphabetic words, verbs, nouns, and adjectives were identified. The stopwords and hyperlinks were removed to clean the text data. In the preprocessing step after cleaning the data and removing the stopwords, the position of each word was concatenated with the word itself. This procedure helps in distinguishing between a word as a noun, a pronoun, an adjective or a verb in the sentences. After preprocessing, feature extraction methods were used for converting the text of news to analyzable data. The frequency of the words in each article was used for filtering out the non-informative words. Three feature extraction methods were used in this study namely, count vectorizer, Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer and word2vec embedding. It was observed that the results obtained by TF-IDF feature extraction method were superior compared with the other two methods. After feature extraction, various machine learning models were used for training the model namely, Naive Bayes, Logistic Regression, Random Forest, K-nearest neighbors (KNN) and Support Vector Machine (SVM). The Recurrent Neural Network (RNN) was also used as a deep learning model. The model was successfully tested on two datasets. On the first dataset, SVM achieved an accuracy of 98.5% and RNN achieved an accuracy of 98.03% which is much improvement over the best results of Agarwalla et al., 2019 (83.16 % accuracy). On the second dataset, SVM achieved an accuracy of 97.76%, RNN achieved 97.1% and Logistic Regression achieved 97.50% which is an improvement over the best results of Vijayraghavan et al. 2020 (94.88% accuracy).	en_US
dc.language.iso	en	en_US
dc.subject	twitter	en_US
dc.subject	fake news	en_US
dc.subject	real news	en_US
dc.subject	data	en_US
dc.subject	machine learning techniques	en_US
dc.title	Distinguishing fake and real news of twitter data with the help of machine learning techniques	en_US
dc.type	Thesis	en_US
dc.description.degree	MSc Computational Sciences	en_US
dc.publisher.grantor	Laurentian University of Sudbury	en_US
Appears in Collections:	Computational Sciences - Master's theses

Files in This Item:

File	Description	Size	Format
Thesis FINAL - Aanan Shah - 03-Mar-2021.pdf		1.49 MB	Adobe PDF	View/Open

Show simple item record