Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
dc.contributor.author | Alishba Muhammad, 01-135211-012 | |
dc.contributor.author | Eman Fatima, 01-135211-026 | |
dc.date.accessioned | 2025-07-08T03:58:34Z | |
dc.date.available | 2025-07-08T03:58:34Z | |
dc.date.issued | 2024 | |
dc.identifier.uri | http://hdl.handle.net/123456789/19768 | |
dc.description | Supervised by Ms. Maryam Aslam | en_US |
dc.description.abstract | In response to the growing need for efficient language processing tools among Urdu-speaking people, this project aims to create a comprehensive framework for Urdu text analysis that includes sentiment analysis, significant word extraction, text summarization, and text classification. Due to Urdu’s intricate morphology and sparse linguistic resources, existing algorithms frequently have difficulty effectively analyzing the text. Therefore, there is an urgent need for specific solutions that are made to consider the special linguistic qualities of Urdu. This will enable more perceptive analysis and interpretation of textual material provided in Urdu across a range of areas. For sentiment analysis, the first module utilized logistic regression in machine learning. In the second module, TF-IDF vectorization and chi-square feature selection techniques were employed for significant word extraction from the Urdu corpus. In the third module, a frequency-based extractive method was applied for text summarization, condensing input text while retaining essential information. These methodologies facilitated comprehensive model training and evaluation on a dataset comprising 50,000 Urdu movie reviews. In the fourth module, we utilized logistic regression for text classification. The workflow began with loading a dataset and handling missing values. We transformed headlines into TF-IDF features and split the data into training and testing sets. A Logistic Regression model was trained and evaluated using accuracy metrics. Finally, the model and vectorizer were saved for future deployment, providing a streamlined approach to text classification. The system’s robustness and reliability were confirmed through extensive functional and non-functional tests, including accuracy, efficiency, and security assessments. Module-level component testing and real-world user scenarios further validated the system’s performance and usability, guiding refinement and optimization efforts. The Urdu text analysis system underwent rigorous software testing to ensure its functionality across diverse inputs. Inputs from various Urdu sources were utilized to validate each module’s performance. The system accurately determined sentiment, extracted meaningful words, and provided concise summaries. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Computer Sciences | en_US |
dc.relation.ispartofseries | BS(IT);P-02337 | |
dc.subject | Urdu | en_US |
dc.subject | Text | en_US |
dc.subject | Analysis | en_US |
dc.title | Urdu Text Analysis (UTA) | en_US |
dc.type | Project Reports | en_US |