Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/3684
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPazooki, Anahita (Elham)-
dc.date.accessioned2021-06-03T19:31:56Z-
dc.date.available2021-06-03T19:31:56Z-
dc.date.issued2020-08-28-
dc.identifier.urihttps://zone.biblio.laurentian.ca/handle/10219/3684-
dc.description.abstractWith the widespread popularity of World Wide Web, increasing number of people are active on social media and websites to post their opinions towards products or special events or to make decisions based on the opinions and experiences of people on social media. These Online opinions are unstructured or structured textual data containing insignificant as well as significant information which has attracted attention of researchers to extract knowledge from such textual data. Opinion mining and Natural Language Processing (NLP) techniques help to find information through the huge number of reviews in the form of unstructured comments. In this research a model is proposed for classification of online user’s feedback and opinions to improve the accuracy and precision of the classification in comparison to the existing research on the same dataset. More-precisely, in this research, Natural Language Processing (NLP) techniques as well as various supervised machine learning techniques are used to classify users’ opinions. The performances of all the classifiers are evaluated to find the best performance. The data set contains 689 comments extracted from the users' comments from Amazon.com, collected and annotated by Minqing Hu and Bing Liu. The selected comments are about the product “Speakers” on Amazon.com. Each comment is written by one user and it has a certain label that shows the author's desire to comment. This label can be classified as "positive", "negative" or "neutral". The data is provided in the form of XML file, a semi-structured format. The opinions are processed using natural language processing techniques, for instance by removing punctuations, removing URLs, removing numbers, removing spaces, removing stop-words, and their features are extracted using natural language processing techniques, for example, Word Tokenization, Stemming and Bag of words and Bag of N-grams and Term Frequency-Inverse Document Frequency (TF_IDF). The proposed method was implemented using Python programming language and Natural Language Toolkit (NLTK) and other libraries in python. The proposed model gave a peak of 88% precision by Random Forest with 140 trees and bigram feature space. Also, Random Forest, Gradient Boosting, Artificial Neutral Network, and SVM gave 87% precision for trigram feature space.en_US
dc.language.isoenen_US
dc.subjectData miningen_US
dc.subjectopinion miningen_US
dc.subjectNatural Language Processing (NLP)en_US
dc.subjectdata pre-processingen_US
dc.subjectword tokenizationen_US
dc.subjectstemmingen_US
dc.subjectterm frequency-inverse document frequencyen_US
dc.subjectsupervised machine learningen_US
dc.subjectrandom foresten_US
dc.subjectgradient boostingen_US
dc.subjectdecision treesen_US
dc.subjectSVMsen_US
dc.subjectgini-indexen_US
dc.subjectartificial neural networken_US
dc.titleOpinion mining of online users’ comments using Natural Language Processing and machine learningen_US
dc.typeThesisen_US
dc.description.degreeMaster of Science (MSc.) in Computational Sciencesen_US
dc.publisher.grantorLaurentian University of Sudburyen_US
Appears in Collections:Computational Sciences - Master's theses
Master's Theses

Files in This Item:
File Description SizeFormat 
Anahita(Elham) Pazooki_FINAL Thesis_4Sep2020.pdf2.55 MBAdobe PDFThumbnail
View/Open


Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.