Please use this identifier to cite or link to this item:
Title: A rule based sentiment analysis of whatsapp reviews in Telugu language
Authors: Kalakala, Sujay
Keywords: Sentiment;analysis;positive;negative;NLP;Telugu;python
Issue Date: 30-Sep-2021
Abstract: Sentiment analysis is one of the major fields of research for any case regarding natural language processing. For this purpose, the data is often some form of review or a feedback so that the emotion and the main sentiment behind the feedback can be assessed using machine learning techniques. A similar approach is performed in this research, In this report, Whatsapp reviews of customers in Telugu language were analysed and the sentiment polarity was calculated using a rule based approach. Telugu language is from the southern part of India and uses different sets of fonts from the general sets. The strings are treated as similarly as they are in any other machine learning process since the meaning behind them is captured through the patterns that emerge from the text. All the text processing is carried oput similarly to most NLP scenarios. To find out the overall sentiment in the review that is collected from the internet, a manual rule-based algorithm is developed which can apply certain sets of rules to a sentence to check the polarity, which can be positive, negative, or neutral. These rules check the presence of words such as major negative words and major positive words, and even auxiliary verbs and their position with respect to the negative and positive words. This rule-based approach was then used to train a machine learning model using a few parametric classifiers like K-nearest neighbours (KNN), XGBoost and support vector machines (SVM). The classifiers also fetched a decent accuracy of 81%, 82% and 78% respectively, which indicated towards the good performance of the rule-based approach and its effectiveness with error counts of 0.296, 0.288 and 0.252 with TF-IDF and 0.285, 0.285 and 0.234 with Bag of Words. Along the process, manual observation was also used to compare the assigned sentiments to the sentence to find the errors in the method. The best performance with respect to results was given by SVM classifiers that returned an f1 score of 79% and the lowest error count of 0.23 which is better among all the classifiers. The metrics which were used to judge these classifiers were the precision, recall, f1 scores and the mean squared error.
Appears in Collections:Computational Sciences - Master's theses

Files in This Item:
File Description SizeFormat 
Thesis Final - Sujay Kalakala - 04-Oct-2021.pdf1.75 MBAdobe PDFView/Open

Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.