DSpace Repository

Document Visual Question Answering

Show simple item record

dc.contributor.author Eiza Batool, 01-249202-002
dc.date.accessioned 2022-12-21T10:29:22Z
dc.date.available 2022-12-21T10:29:22Z
dc.date.issued 2022
dc.identifier.uri http://hdl.handle.net/123456789/14476
dc.description Supervised by Dr.Imran Ahmed Siddiqi en_US
dc.description.abstract Visual Question Answering is one of the most emerging problems in computer vision and natural language processing (NLP). Visual question answering is the process of answering the question about an image by using visual elements of the image and inference gathered from textual questions. In most cases, Visual Question Answering models only consider visual features while ignoring the textual content in a given scene or image. For VQA in document images, visual as well as the textual information plays a key role in finding appropriate answers to the posted questions. This research targets the problem of VQA in document images by exploiting both visual and textual information leveraging the recent advancements in deep learning. The focus of this study is to answer the question that defined on an document image. We have proposed a method to use textual features along with the visual features in order to predict an answer. We used DocVQA dataset that includes 50k questions and answers, 12k+ document images. In our system, the model takes the question, Optical Character Recognition(OCR) and an image as input and deep learning model processed the input in order to generate an answer. We have used pre-trained inception v3 to represent the image, and Gated recurrent unit to represent the question and OCR. To create our VQA system, we used different deep learning techniques, functions, and approaches. Experimental results are generated from our deep learning predictive models, and we have evaluated our model using evaluation metrics such as Average Normalized Levenshtein Similarity(ANLS) score. The inception v3 with OCR and attention model perform well as compared to the other models. The OCR and attention model played a vital role in enhancing the performance. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences en_US
dc.relation.ispartofseries MS (DS);T-1130
dc.subject Optical Character Recognition en_US
dc.subject Natural Language Processing en_US
dc.title Document Visual Question Answering en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account