Urdu Text Analysis (UTA)

Alishba Muhammad, 01-135211-012; Eman Fatima, 01-135211-026

DSpace Home
→
Final Year Project Report (BUIC)
→
Department of Computer Science and IT (BUIC-E-8)
→
BS (IT) (BUIC-FYP-E8)
→
View Item

dc.contributor.author	Alishba Muhammad, 01-135211-012
dc.contributor.author	Eman Fatima, 01-135211-026
dc.date.accessioned	2025-07-08T03:58:34Z
dc.date.available	2025-07-08T03:58:34Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/123456789/19768
dc.description	Supervised by Ms. Maryam Aslam	en_US
dc.description.abstract	In response to the growing need for efficient language processing tools among Urdu-speaking people, this project aims to create a comprehensive framework for Urdu text analysis that includes sentiment analysis, significant word extraction, text summarization, and text classification. Due to Urdu’s intricate morphology and sparse linguistic resources, existing algorithms frequently have difficulty effectively analyzing the text. Therefore, there is an urgent need for specific solutions that are made to consider the special linguistic qualities of Urdu. This will enable more perceptive analysis and interpretation of textual material provided in Urdu across a range of areas. For sentiment analysis, the first module utilized logistic regression in machine learning. In the second module, TF-IDF vectorization and chi-square feature selection techniques were employed for significant word extraction from the Urdu corpus. In the third module, a frequency-based extractive method was applied for text summarization, condensing input text while retaining essential information. These methodologies facilitated comprehensive model training and evaluation on a dataset comprising 50,000 Urdu movie reviews. In the fourth module, we utilized logistic regression for text classification. The workflow began with loading a dataset and handling missing values. We transformed headlines into TF-IDF features and split the data into training and testing sets. A Logistic Regression model was trained and evaluated using accuracy metrics. Finally, the model and vectorizer were saved for future deployment, providing a streamlined approach to text classification. The system’s robustness and reliability were confirmed through extensive functional and non-functional tests, including accuracy, efficiency, and security assessments. Module-level component testing and real-world user scenarios further validated the system’s performance and usability, guiding refinement and optimization efforts. The Urdu text analysis system underwent rigorous software testing to ensure its functionality across diverse inputs. Inputs from various Urdu sources were utilized to validate each module’s performance. The system accurately determined sentiment, extracted meaningful words, and provided concise summaries.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Sciences	en_US
dc.relation.ispartofseries	BS (IT);P-2687
dc.subject	Urdu	en_US
dc.subject	Text	en_US
dc.subject	Analysis	en_US
dc.title	Urdu Text Analysis (UTA)	en_US
dc.type	Project Reports	en_US