Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
| dc.contributor.author | AQSA NOOR, 01-241191-003 | |
| dc.date.accessioned | 2022-12-22T11:42:26Z | |
| dc.date.available | 2022-12-22T11:42:26Z | |
| dc.date.issued | 2021 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/14527 | |
| dc.description | Supervised by Dr. Ahmad Ali | en_US |
| dc.description.abstract | Quran is the sacred book of Muslims and a source of guidance for them. It discusses topics related to religion and worldly affairs by which Quranic verses can be categorized. Some topics in the Quran have been talked about a lot whereas some have not been discussed very much, therefore the dataset has imbalance. Hence, first the data is balanced by creating synthetic samples of minority class. There are several verses that use same words to depict different concepts whereas some of the verses use different words to depict similar meaning. Thus it is important to classify them on the basis of their context. Previously Tafsir and Hadith data has been used to better understand the context which makes the classification of Quranic verses dependent on additional corpora. Other techniques like Word2Vec and GloVe word embedding have also been used which have the limitation of ignoring rare words and position of the words, during classification. This study aims to classify the verses according to their topics by considering the context of words using Bidirectional Encoder Representation from Transformers (BERT). While creating representations of a word, BERT reads all its neighboring words and assigns representations accordingly. It creates 3- dimensional word embedding and assigns 768 representations to each token. Furthermore, to ensure that the classifier remembers the most important part of the input sequence, deep learning classifiers with Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are used for classification. As the BERT cased and uncased word embeddings of the text data are created, they are fed to 3 Neural Network (NN) models i.e. NN with LSTM which achieved F1-scores of 0.87 for uncased and 0.86 for cased embedding, NN with GRU which achieved F1-scores of 0.91 for uncased and 0.90 for cased embedding, and fine-tuned BERT model which achieved F1-scores of 0.93 for both base-uncased and base-cased. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Software Engineering, Bahria University Engineering School Islamabad | en_US |
| dc.relation.ispartofseries | MS-SE;T-1847 | |
| dc.subject | Software Engineering | en_US |
| dc.title | MULTICLASS IMBALANCED CLASSIFICATION OF QURANIC VERSES USING DEEP LEARNING APPROACH | en_US |
| dc.type | MS Thesis | en_US |