Abstract:
It’s an age of Internet and electronic media and social media platforms are one of the most
frequently used communication medium now-a-days. The use of social media has grown
tremendously and is used by people to connect with each other and the world. These social
media sites have made our lives easier but still we can’t turn blind eye to its negative effects as
well. Some people use these sites for negative purpose and among those negative aspects
“Cyberbullying” is really common. Cyberbullying is a form of bullying done through electronic
means and is used to insult or harm others. Many researchers have proposed solutions and
strategies to overcome this menace but sarcasm is one aspect of it that still needs to be touched.
This study aims to highlight previous researches and to propose an approach to detect
cyberbullying along with the element of sarcasm included in it. For this purpose various machine
learning algorithms including SVM, Random Forest, Naïve Bayes, Logistic Regression and an
ensemble approach have been used on different social media datasets and for each algorithm the
results in terms of accuracy and some other metrics are mentioned in proper format. For
unbalanced datasets SVM performed the best with an average accuracy of 79% while after
balancing the datasets it was observed that Random Forest achieved the highest average accuracy
of 85%. ROC curves and confusion matrices were also created for each algorithm against each
datasets and again Random Forest took the lead.