A Framework to Detect Hate Speech in the Pashto Language from Social Media

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Aftab Alam Janisar, 01-241171-002
dc.date.accessioned 2023-02-23T10:17:00Z
dc.date.available 2023-02-23T10:17:00Z
dc.date.issued 2019
dc.identifier.uri http://hdl.handle.net/123456789/14961
dc.description Supervised by Dr. Hammad Afzal en_US
dc.description.abstract From the last few years, researchers are very much attracted to sentiment analysis and especially towards hate speech detection because in other different languages procreation of hate speech has compelling and symbolic consideration on social media. Hate speech has a great impact on society, using hate words harms others dignity. Hate speech detection is important to stop the transformation of hate words into crimes. In this research, we have developed a framework for hate speech detection in the Pashto language. A corpus is created for which data is collected from Twitter. Because there is no related data available. Most of the research work has been done in this domain for other languages, and it’s very mature in the context of detecting hate speech. But when it arrives at the morphological languages not much work has been done especially in the Pashto language. In this research, we have aimed and collected data from Twitter, Tweets related to ethnicity and religion. The data collected from twitter has been annotated manually and we have categorized the data as hate or not by comparing it with the offensive content. For hate speech to view the impact of different features/attribute we have performed experiments on the existing classifiers i.e. SVM, Naïve Bayes, Decision tree and KNN. SVM produced the highest result at dataset of 500 i.e. 74% among all the classifiers. KNN and Decision Tree produced same result at dataset of 1500 i.e. 65.0%. Dataset of 2800 Decision Tree produced the highest result i.e. 72% and SVM produced 71.9%. en_US
dc.language.iso en en_US
dc.publisher Software Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries MS-SE;T-2053
dc.subject Software Engineering en_US
dc.title A Framework to Detect Hate Speech in the Pashto Language from Social Media en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account