UDOC QA-Urdu Document-Based Question Answering

Misbah Zafar, 01-249212-006

DSpace Home
→
Thesis/Dissertation Repository Islamabad Campus
→
Department of Computer Sciences (BUIC-E-8)
→
MS (DS) (BUIC-E-8)
→
View Item

dc.contributor.author	Misbah Zafar, 01-249212-006
dc.date.accessioned	2023-12-18T10:43:17Z
dc.date.available	2023-12-18T10:43:17Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/123456789/16831
dc.description	Supervised by Dr. Arif ur Rahman	en_US
dc.description.abstract	In today’s data-driven world, Document AI and Machine Reading Comprehension (MRC) have emerged as pivotal technologies with profound implications. This abstract explores their significant impact and the compelling reasons for their necessity in contemporary applications. It delves into the historical context, emphasizing the reliance on specific models and techniques, particularly in low-resource languages like Urdu, which have been relatively uncharted territory in the realm of question answering. Traditionally, the field of document AI and MRC predominantly relied on state-of-the-art models and techniques, often leaving low-resource languages underrepresented and underserved. In response to this gap, our research initiative sought to address the challenges faced in Urdu question answering. To this end, we embarked on the creation of a dedicated Urdu dataset by translating the wellestablished MLQA dataset. Our study introduces two distinct methodologies tailored to enhance Urdu question answering performance. The first methodology involves feature extraction combined with a predictive model, while the second method focuses on fine-tuning state-of-the-art models on our newly crafted dataset. Through a rigorous comparative analysis, we aimed to discern which approach yields superior results in the context of Urdu question answering. The findings of our research indicate that the feature extraction methodology surpasses the fine-tuning of state-of-the-art models when applied to our Urdu dataset. This conclusion highlights the potential for innovative techniques in lowresource language applications within the domain of Document AI and MRC, showcasing the significance of such endeavors in bridging linguistic and technological gaps.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Sciences	en_US
dc.relation.ispartofseries	MS (DS);T-1107
dc.subject	UDOC	en_US
dc.subject	Urdu Document-Based	en_US
dc.subject	Question Answering	en_US
dc.title	UDOC QA-Urdu Document-Based Question Answering	en_US
dc.type	Thesis	en_US