Opinion mining of online users’ comments using Natural Language Processing and machine learning

Pazooki, Anahita (Elham)

Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/3684

Full metadata record

DC Field	Value	Language
dc.contributor.author	Pazooki, Anahita (Elham)	-
dc.date.accessioned	2021-06-03T19:31:56Z	-
dc.date.available	2021-06-03T19:31:56Z	-
dc.date.issued	2020-08-28	-
dc.identifier.uri	https://zone.biblio.laurentian.ca/handle/10219/3684	-
dc.description.abstract	With the widespread popularity of World Wide Web, increasing number of people are active on social media and websites to post their opinions towards products or special events or to make decisions based on the opinions and experiences of people on social media. These Online opinions are unstructured or structured textual data containing insignificant as well as significant information which has attracted attention of researchers to extract knowledge from such textual data. Opinion mining and Natural Language Processing (NLP) techniques help to find information through the huge number of reviews in the form of unstructured comments. In this research a model is proposed for classification of online user’s feedback and opinions to improve the accuracy and precision of the classification in comparison to the existing research on the same dataset. More-precisely, in this research, Natural Language Processing (NLP) techniques as well as various supervised machine learning techniques are used to classify users’ opinions. The performances of all the classifiers are evaluated to find the best performance. The data set contains 689 comments extracted from the users' comments from Amazon.com, collected and annotated by Minqing Hu and Bing Liu. The selected comments are about the product “Speakers” on Amazon.com. Each comment is written by one user and it has a certain label that shows the author's desire to comment. This label can be classified as "positive", "negative" or "neutral". The data is provided in the form of XML file, a semi-structured format. The opinions are processed using natural language processing techniques, for instance by removing punctuations, removing URLs, removing numbers, removing spaces, removing stop-words, and their features are extracted using natural language processing techniques, for example, Word Tokenization, Stemming and Bag of words and Bag of N-grams and Term Frequency-Inverse Document Frequency (TF_IDF). The proposed method was implemented using Python programming language and Natural Language Toolkit (NLTK) and other libraries in python. The proposed model gave a peak of 88% precision by Random Forest with 140 trees and bigram feature space. Also, Random Forest, Gradient Boosting, Artificial Neutral Network, and SVM gave 87% precision for trigram feature space.	en_US
dc.language.iso	en	en_US
dc.subject	Data mining	en_US
dc.subject	opinion mining	en_US
dc.subject	Natural Language Processing (NLP)	en_US
dc.subject	data pre-processing	en_US
dc.subject	word tokenization	en_US
dc.subject	stemming	en_US
dc.subject	term frequency-inverse document frequency	en_US
dc.subject	supervised machine learning	en_US
dc.subject	random forest	en_US
dc.subject	gradient boosting	en_US
dc.subject	decision trees	en_US
dc.subject	SVMs	en_US
dc.subject	gini-index	en_US
dc.subject	artificial neural network	en_US
dc.title	Opinion mining of online users’ comments using Natural Language Processing and machine learning	en_US
dc.type	Thesis	en_US
dc.description.degree	Master of Science (MSc.) in Computational Sciences	en_US
dc.publisher.grantor	Laurentian University of Sudbury	en_US
Appears in Collections:	Computational Sciences - Master's theses Master's Theses

Files in This Item:

File	Description	Size	Format
Anahita(Elham) Pazooki_FINAL Thesis_4Sep2020.pdf		2.55 MB	Adobe PDF	View/Open

Show simple item record