A Framework to Detect Hate Speech in the Pashto Language from Social Media

Aftab Alam Janisar, 01-241171-002

DSpace Home
→
Thesis/Dissertation Repository Engineering School Islamabad
→
Department of Software Engineering (BUES)
→
MS(SE) (BUES)
→
View Item

dc.contributor.author	Aftab Alam Janisar, 01-241171-002
dc.date.accessioned	2023-02-23T10:17:00Z
dc.date.available	2023-02-23T10:17:00Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/123456789/14961
dc.description	Supervised by Dr. Hammad Afzal	en_US
dc.description.abstract	From the last few years, researchers are very much attracted to sentiment analysis and especially towards hate speech detection because in other different languages procreation of hate speech has compelling and symbolic consideration on social media. Hate speech has a great impact on society, using hate words harms others dignity. Hate speech detection is important to stop the transformation of hate words into crimes. In this research, we have developed a framework for hate speech detection in the Pashto language. A corpus is created for which data is collected from Twitter. Because there is no related data available. Most of the research work has been done in this domain for other languages, and it’s very mature in the context of detecting hate speech. But when it arrives at the morphological languages not much work has been done especially in the Pashto language. In this research, we have aimed and collected data from Twitter, Tweets related to ethnicity and religion. The data collected from twitter has been annotated manually and we have categorized the data as hate or not by comparing it with the offensive content. For hate speech to view the impact of different features/attribute we have performed experiments on the existing classifiers i.e. SVM, Naïve Bayes, Decision tree and KNN. SVM produced the highest result at dataset of 500 i.e. 74% among all the classifiers. KNN and Decision Tree produced same result at dataset of 1500 i.e. 65.0%. Dataset of 2800 Decision Tree produced the highest result i.e. 72% and SVM produced 71.9%.	en_US
dc.language.iso	en	en_US
dc.publisher	Software Engineering, Bahria University Engineering School Islamabad	en_US
dc.relation.ispartofseries	MS-SE;T-2053
dc.subject	Software Engineering	en_US
dc.title	A Framework to Detect Hate Speech in the Pashto Language from Social Media	en_US
dc.type	MS Thesis	en_US