Abstract:
Hate speech is currently of broad and current interest in the domain of social media.
The anonymity and flexibility afforded by the Internet has made it easy for users to
communicate in an aggressive manner. And as the amount of online hate speech is
increasing, methods that automatically detect hate speech is very much required.
Moreover, these problems have also been attracting the Natural Language Processing
and Machine Learning communities a lot. Therefore, the goal of this project is to look
at how Machine Learning applies in detecting hate speech. Furthermore, this paper
also applies a current technique in this field on a dataset.
Machine learning model has been introduced, namely the SVM and Naïve
Bayes. This classifier assigns each comment to one of the categories of a dataset:
Positive and negative. The performance of this model has been tested using the
accuracy, as well as looking at the precision, recall and F-score. The final SVM model
resulted in an accuracy of 72%, precision of 96%, recall of 71% and F-measure of 82%.
However, when looking at each class separately, it should be noted that a lot of hate
comments have been misclassified. Therefore, it is recommended to further analyse
the predictions and errors, such that more insight is gained on the misclassification