A rule based sentiment analysis of whatsapp reviews in Telugu language

Kalakala, Sujay

Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/3948

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kalakala, Sujay	-
dc.date.accessioned	2022-10-05T20:21:25Z	-
dc.date.available	2022-10-05T20:21:25Z	-
dc.date.issued	2021-09-30	-
dc.identifier.uri	https://zone.biblio.laurentian.ca/handle/10219/3948	-
dc.description.abstract	Sentiment analysis is one of the major fields of research for any case regarding natural language processing. For this purpose, the data is often some form of review or a feedback so that the emotion and the main sentiment behind the feedback can be assessed using machine learning techniques. A similar approach is performed in this research, In this report, Whatsapp reviews of customers in Telugu language were analysed and the sentiment polarity was calculated using a rule based approach. Telugu language is from the southern part of India and uses different sets of fonts from the general sets. The strings are treated as similarly as they are in any other machine learning process since the meaning behind them is captured through the patterns that emerge from the text. All the text processing is carried oput similarly to most NLP scenarios. To find out the overall sentiment in the review that is collected from the internet, a manual rule-based algorithm is developed which can apply certain sets of rules to a sentence to check the polarity, which can be positive, negative, or neutral. These rules check the presence of words such as major negative words and major positive words, and even auxiliary verbs and their position with respect to the negative and positive words. This rule-based approach was then used to train a machine learning model using a few parametric classifiers like K-nearest neighbours (KNN), XGBoost and support vector machines (SVM). The classifiers also fetched a decent accuracy of 81%, 82% and 78% respectively, which indicated towards the good performance of the rule-based approach and its effectiveness with error counts of 0.296, 0.288 and 0.252 with TF-IDF and 0.285, 0.285 and 0.234 with Bag of Words. Along the process, manual observation was also used to compare the assigned sentiments to the sentence to find the errors in the method. The best performance with respect to results was given by SVM classifiers that returned an f1 score of 79% and the lowest error count of 0.23 which is better among all the classifiers. The metrics which were used to judge these classifiers were the precision, recall, f1 scores and the mean squared error.	en_US
dc.language.iso	en	en_US
dc.subject	Sentiment	en_US
dc.subject	analysis	en_US
dc.subject	positive	en_US
dc.subject	negative	en_US
dc.subject	NLP	en_US
dc.subject	Telugu	en_US
dc.subject	python	en_US
dc.title	A rule based sentiment analysis of whatsapp reviews in Telugu language	en_US
dc.type	Thesis	en_US
dc.description.degree	M.Sc. in Computational Sciences.	en_US
dc.publisher.grantor	Laurentian University of Sudbury	en_US
Appears in Collections:	Computational Sciences - Master's theses

Files in This Item:

File	Description	Size	Format
Thesis Final - Sujay Kalakala - 04-Oct-2021.pdf		1.75 MB	Adobe PDF	View/Open

Show simple item record