Urdu Text Analysis (UTA)

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Alishba Muhammad, 01-135211-012
dc.contributor.author Eman Fatima, 01-135211-026
dc.date.accessioned 2025-07-08T03:58:34Z
dc.date.available 2025-07-08T03:58:34Z
dc.date.issued 2024
dc.identifier.uri http://hdl.handle.net/123456789/19768
dc.description Supervised by Ms. Maryam Aslam en_US
dc.description.abstract In response to the growing need for efficient language processing tools among Urdu-speaking people, this project aims to create a comprehensive framework for Urdu text analysis that includes sentiment analysis, significant word extraction, text summarization, and text classification. Due to Urdu’s intricate morphology and sparse linguistic resources, existing algorithms frequently have difficulty effectively analyzing the text. Therefore, there is an urgent need for specific solutions that are made to consider the special linguistic qualities of Urdu. This will enable more perceptive analysis and interpretation of textual material provided in Urdu across a range of areas. For sentiment analysis, the first module utilized logistic regression in machine learning. In the second module, TF-IDF vectorization and chi-square feature selection techniques were employed for significant word extraction from the Urdu corpus. In the third module, a frequency-based extractive method was applied for text summarization, condensing input text while retaining essential information. These methodologies facilitated comprehensive model training and evaluation on a dataset comprising 50,000 Urdu movie reviews. In the fourth module, we utilized logistic regression for text classification. The workflow began with loading a dataset and handling missing values. We transformed headlines into TF-IDF features and split the data into training and testing sets. A Logistic Regression model was trained and evaluated using accuracy metrics. Finally, the model and vectorizer were saved for future deployment, providing a streamlined approach to text classification. The system’s robustness and reliability were confirmed through extensive functional and non-functional tests, including accuracy, efficiency, and security assessments. Module-level component testing and real-world user scenarios further validated the system’s performance and usability, guiding refinement and optimization efforts. The Urdu text analysis system underwent rigorous software testing to ensure its functionality across diverse inputs. Inputs from various Urdu sources were utilized to validate each module’s performance. The system accurately determined sentiment, extracted meaningful words, and provided concise summaries. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences en_US
dc.relation.ispartofseries BS(IT);P-02337
dc.subject Urdu en_US
dc.subject Text en_US
dc.subject Analysis en_US
dc.title Urdu Text Analysis (UTA) en_US
dc.type Project Reports en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account