MULTICLASS IMBALANCED CLASSIFICATION OF QURANIC VERSES USING DEEP LEARNING APPROACH

AQSA NOOR, 01-241191-003

DSpace Home
→
Thesis/Dissertation Repository Engineering School Islamabad
→
Department of Software Engineering (BUES)
→
MS(SE) (BUES)
→
View Item

dc.contributor.author	AQSA NOOR, 01-241191-003
dc.date.accessioned	2022-12-22T11:42:26Z
dc.date.available	2022-12-22T11:42:26Z
dc.date.issued	2021
dc.identifier.uri	http://hdl.handle.net/123456789/14527
dc.description	Supervised by Dr. Ahmad Ali	en_US
dc.description.abstract	Quran is the sacred book of Muslims and a source of guidance for them. It discusses topics related to religion and worldly affairs by which Quranic verses can be categorized. Some topics in the Quran have been talked about a lot whereas some have not been discussed very much, therefore the dataset has imbalance. Hence, first the data is balanced by creating synthetic samples of minority class. There are several verses that use same words to depict different concepts whereas some of the verses use different words to depict similar meaning. Thus it is important to classify them on the basis of their context. Previously Tafsir and Hadith data has been used to better understand the context which makes the classification of Quranic verses dependent on additional corpora. Other techniques like Word2Vec and GloVe word embedding have also been used which have the limitation of ignoring rare words and position of the words, during classification. This study aims to classify the verses according to their topics by considering the context of words using Bidirectional Encoder Representation from Transformers (BERT). While creating representations of a word, BERT reads all its neighboring words and assigns representations accordingly. It creates 3- dimensional word embedding and assigns 768 representations to each token. Furthermore, to ensure that the classifier remembers the most important part of the input sequence, deep learning classifiers with Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are used for classification. As the BERT cased and uncased word embeddings of the text data are created, they are fed to 3 Neural Network (NN) models i.e. NN with LSTM which achieved F1-scores of 0.87 for uncased and 0.86 for cased embedding, NN with GRU which achieved F1-scores of 0.91 for uncased and 0.90 for cased embedding, and fine-tuned BERT model which achieved F1-scores of 0.93 for both base-uncased and base-cased.	en_US
dc.language.iso	en	en_US
dc.publisher	Software Engineering, Bahria University Engineering School Islamabad	en_US
dc.relation.ispartofseries	MS-SE;T-1847
dc.subject	Software Engineering	en_US
dc.title	MULTICLASS IMBALANCED CLASSIFICATION OF QURANIC VERSES USING DEEP LEARNING APPROACH	en_US
dc.type	MS Thesis	en_US