Social media hate speech detection using explainable AI

Mehta, Harshkumar

Please use this identifier to cite or link to this item: https://zone.biblio.laurentian.ca/handle/10219/4022

Title:	Social media hate speech detection using explainable AI
Authors:	Mehta, Harshkumar
Keywords:	Explainable artificial intelligence;hate speech detection;offensive languages;LIME;BERT;neural networks
Issue Date:	25-May-2022
Abstract:	Artificial Intelligence has invaded various fields in the present times. Be it science, education, finance, business or social media, Artificial Intelligence has found its applications everywhere. But currently, AI is limited to only its subset ‘Machine Learning’ and has not even realized its full potential. In machine learning, in contrast to traditional programming which requires writing algorithms, it is required to find the algorithm that learns patterns from a given dataset and builds a predictive model and the computer learns the patterns between input and output based on that. However, a key impediment of current AI-based systems is that they often lack transparency. The current AI systems have adopted a black box nature which allows powerful predictions, but these predictions cannot be explained directly. To gain human trust and increase transparency of AIbased systems, many researchers think that Explainable AI is the way forward. In today’s era, an enormous part of human communication takes place over digital platforms, for example, through social media platforms and so does hate speech, which are dangerous for an individual person as well as the society. These days automated hate speech detection is built on social media platforms such as Twitter, Facebook, etc. using machine learning approaches. Deep learning models attain a high performance has low transparency due to complex models, which leads to “trade-off” between performance and explainability. Explainable Artificial Intelligence (XAI) was used to create black box approaches interpretable, without giving up on performance. These XAI methods provide explanations that can be translated by humans without having a depth of knowledge in deep learning models. XAI characteristics have flexible and multifaceted potential in the hate speech detection by the deep learning models. XAI thus provides a strong interconnection between an individual moderator and hate speech detection framework, which is a pivot for the research study in interactive machine learning. In the case of Twitter, the main tweets are detected for hate speech however retweets and replies are not detected for hate speech as there is no tool to handle the task to detect the hate speech for in progress conversations. Interpreting and explaining decisions made by complex AI models to understand the decision-making process of these model is the aim of this research. While machine learning models are being developed to detect the hate speech on social media, these models lack the interpretability and transparency on the decisions made. Traditional machine learning models achieve high performance at the cost of interpretability and explaining model decisions. The main objectives of this research are, to review and present a comparison of various techniques used in Explainable Artificial Intelligence (XAI), to present a novel approach for hate speech classification using Explainable Artificial Intelligence (XAI) and, to achieve a good trade-off between precision and recall for the method proposed. Explainable AI models for hate speech detection will help social media moderators and any other users for these models to not only see but also study and understand how the decisions are made and how the inputs are mapped to the output. As a part of this research study, two data sets were taken to demonstrate Hate Speech Detection using Explainable Artificial Intelligence (XAI). Data preprocessing was performed to remove any bias, clean data of any inconsistencies, clean the text of the tweets, tokenize, and lemmatize the text, etc. Categorical variables were also simplified in order to generate a clean dataset for training purposes. Exploratory data analysis was performed on the data sets to uncover various patterns and insights. Various pre-existent models were applied to the Google Jigsaw dataset such as Decision Trees, K-Nearest Neighbours, Multinomial Naïve Bayes, Random Forest, Logistic Regression, and Long Short-Term Memory (LSTM) out of which LSTM achieved an accuracy of 97.6%, which is an improvement compared to the studies of Risch et al. (2020). Explainable method like LIME (Local Interpretable Model-Agnostic Explanations) is applied on HateXplain dataset. Variants of BERT (Bidirectional Encoder Representations from Transformers) model like BERT + ANN (Artificial Neural Networks) and BERT + MLP (Multilayer Perceptron) were created to achieve a good performance in terms of explainability using ERASER (Evaluating Rationales and Simple English Reasoning) benchmark by DeYoung et al. (2019) where in BERT + ANN achieved better performance in terms of explainability as compared to the study by Mathew et al. (2020).
URI:	https://zone.biblio.laurentian.ca/handle/10219/4022
Appears in Collections:	Computational Sciences - Master's theses

Files in This Item:

File	Description	Size	Format
Harshkumar Mehta - FINAL THESIS 01-JUNE-2022.pdf		1.47 MB	Adobe PDF	View/Open

Show full item record