MULTICLASS IMBALANCED CLASSIFICATION OF QURANIC VERSES USING DEEP LEARNING APPROACH

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author AQSA NOOR, 01-241191-003
dc.date.accessioned 2022-12-22T11:42:26Z
dc.date.available 2022-12-22T11:42:26Z
dc.date.issued 2021
dc.identifier.uri http://hdl.handle.net/123456789/14527
dc.description Supervised by Dr. Ahmad Ali en_US
dc.description.abstract Quran is the sacred book of Muslims and a source of guidance for them. It discusses topics related to religion and worldly affairs by which Quranic verses can be categorized. Some topics in the Quran have been talked about a lot whereas some have not been discussed very much, therefore the dataset has imbalance. Hence, first the data is balanced by creating synthetic samples of minority class. There are several verses that use same words to depict different concepts whereas some of the verses use different words to depict similar meaning. Thus it is important to classify them on the basis of their context. Previously Tafsir and Hadith data has been used to better understand the context which makes the classification of Quranic verses dependent on additional corpora. Other techniques like Word2Vec and GloVe word embedding have also been used which have the limitation of ignoring rare words and position of the words, during classification. This study aims to classify the verses according to their topics by considering the context of words using Bidirectional Encoder Representation from Transformers (BERT). While creating representations of a word, BERT reads all its neighboring words and assigns representations accordingly. It creates 3- dimensional word embedding and assigns 768 representations to each token. Furthermore, to ensure that the classifier remembers the most important part of the input sequence, deep learning classifiers with Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are used for classification. As the BERT cased and uncased word embeddings of the text data are created, they are fed to 3 Neural Network (NN) models i.e. NN with LSTM which achieved F1-scores of 0.87 for uncased and 0.86 for cased embedding, NN with GRU which achieved F1-scores of 0.91 for uncased and 0.90 for cased embedding, and fine-tuned BERT model which achieved F1-scores of 0.93 for both base-uncased and base-cased. en_US
dc.language.iso en en_US
dc.publisher Software Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries MS-SE;T-1847
dc.subject Software Engineering en_US
dc.title MULTICLASS IMBALANCED CLASSIFICATION OF QURANIC VERSES USING DEEP LEARNING APPROACH en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account