Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
dc.contributor.author | Laraib Kaleem, 01-249212-005 | |
dc.date.accessioned | 2023-12-18T11:05:24Z | |
dc.date.available | 2023-12-18T11:05:24Z | |
dc.date.issued | 2023 | |
dc.identifier.uri | http://hdl.handle.net/123456789/16832 | |
dc.description | Supervised by Dr. Arif-Ur-Rahman | en_US |
dc.description.abstract | Recent research has shown that multilingual languages are used in roman form over generations. Due to this complex challenge, we are working on a Roman Urdu (RU) in terms of Abstractive Text Summarization (ATS). Roman Urdu (RU) is gathered from news articles. This paper restricts ground truth for Roman-Urdu summaries. Therefore, we used two ways to achieve different tactics. The first was a manual approach to transliterating the dataset into Roman Urdu (RU) by using tools, and for achieving baseline, we approached Google Bard to generate baseline summaries. After that, evaluate the outcomes. The second approach uses transformbased models T5-small and Bert-base-uncased with fine-tuned pretrained models for State-of-the-Art (SOTA) summarization models. For performance evaluation, there are three ways we explored, such as finding similarity to generate baseline results and using the feature extraction Term Frequency-Inverse Document Frequency (TF-IDF) technique to identify performance. And for Natural Language Processing (NLP) phases, we are using tokenization, then punctuation, and after that, loanwords are converted into the desired format to use in the models. However, as a predicted model, accuracy is not the best approach to evaluate, so for this purpose, we also identify intrinsic 1 and extrinsic 2 evaluations to find out the predicted fallout and also identify the model’s training and testing losses. Keywords: Baseline, Roman Urdu (RU), Natural Language Processing (NLP) , Abstractive Text Summarization (ATS), State-of-the-Art (SOTA). | en_US |
dc.language.iso | en | en_US |
dc.publisher | Computer Sciences | en_US |
dc.relation.ispartofseries | MS (DS);T-02067 | |
dc.subject | Text | en_US |
dc.subject | Summarization | en_US |
dc.subject | Roman Urdu | en_US |
dc.title | Text Summarization for Roman Urdu | en_US |
dc.type | Thesis | en_US |