Please use this identifier to cite or link to this item:
|Title:||Sentiment analysis on Twitter data using machine learning|
|Keywords:||Natural Language Processing (NLP);data pre-processing;word tokenization;word stemming and lemmatizing;POS tagging;NER;machine learning;naïve bayes;SVM;maximum entropy;WEKA|
|Abstract:||In the world of social media people are more responsive towards product or certain events that are currently occurring. This response given by the user is in form of raw textual data (Semi Structured Data) in different languages and terms, which contains noise in data as well as critical information that encourage the analyst to discover knowledge and pattern from the dataset available. This is useful for decision making and taking strategic decision for the future market. To discover this unknown information from the linguistic data Natural Language Processing (NLP) and Data Mining techniques are most focused research terms used for sentiment analysis. In the derived approach the analysis on Twitter data to detect sentiment of the people throughout the world using machine learning techniques. Here the data set available for research is from Twitter for world cup Soccer 2014, held in Brazil. During this period, many people had given their opinion, emotion and attitude about the game, promotion, players. By filtering and analyzing the data using natural language processing techniques, and sentiment polarity has been calculated based on the emotion word detected in the user tweets. The data set is normalized to be used by machine learning algorithm and prepared using natural language processing techniques like Word Tokenization, Stemming and lemmatization, POS (Part of speech) Tagger, NER (Name Entity recognition) and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK), which is openly available for academic as well as for research purpose. Derived algorithm extracts emotional words using WordNet with its POS (Part-of-Speech) for the word in a sentence that has a meaning in current context, and is assigned sentiment polarity using ‘SentWordNet’ Dictionary or using lexicon based method. The resultant polarity assigned is further analyzed using Naïve Bayes and SVM (support vector Machine) machine learning algorithm and visualized data on WEKA platform. Finally, the goal is to compare both the results of implementation and prove the best approach for sentiment analysis on social media for semi structured data.|
|Appears in Collections:||Master's theses|
Files in This Item:
|Ravi Patel_Thesis_Final.pdf||1.58 MB||Adobe PDF|
Items in LU|ZONE|UL are protected by copyright, with all rights reserved, unless otherwise indicated.