| dc.description.abstract |
In today’s data-driven world, Document AI and Machine Reading Comprehension (MRC) have emerged as pivotal technologies with profound implications. This abstract explores their significant impact and the compelling reasons for their necessity in contemporary applications. It delves into the historical context, emphasizing the reliance on specific models and techniques, particularly in low-resource languages like Urdu, which have been relatively uncharted territory in the realm of question answering. Traditionally, the field of document AI and MRC predominantly relied on state-of-the-art models and techniques, often leaving low-resource languages underrepresented and underserved. In response to this gap, our research initiative sought to address the challenges faced in Urdu question answering. To this end, we embarked on the creation of a dedicated Urdu dataset by translating the wellestablished MLQA dataset. Our study introduces two distinct methodologies tailored to enhance Urdu question answering performance. The first methodology involves feature extraction combined with a predictive model, while the second method focuses on fine-tuning state-of-the-art models on our newly crafted dataset. Through a rigorous comparative analysis, we aimed to discern which approach yields superior results in the context of Urdu question answering. The findings of our research indicate that the feature extraction methodology surpasses the fine-tuning of state-of-the-art models when applied to our Urdu dataset. This conclusion highlights the potential for innovative techniques in lowresource language applications within the domain of Document AI and MRC, showcasing the significance of such endeavors in bridging linguistic and technological gaps. |
en_US |