Visual Question Answering Using Deep Learning.

Ali Hamza, 01-134181-009; Raja Muneer, 01-134181-056

DSpace Home
→
Final Year Project Report (BUIC)
→
Department of Computer Science & IT (BUIC)
→
BS-CS (IC-FYP)
→
View Item

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author	Ali Hamza, 01-134181-009
dc.contributor.author	Raja Muneer, 01-134181-056
dc.date.accessioned	2022-06-17T07:14:55Z
dc.date.available	2022-06-17T07:14:55Z
dc.date.issued	2022
dc.identifier.uri	http://hdl.handle.net/123456789/12851
dc.description	Supervised by Dr.Imran Siddiqui	en_US
dc.description.abstract	The problem of answering questions about an image is commonly known as visual question answering. It is a well-established problem in computer vision. The Visual Question Answering(VQA) task requires the understanding of both text and vision. Given an image and a question in natural language, the VQA system tries to find the correct answer to it using visual elements of the image and inference gathered from textual questions. These types of models are really helpful for visually-impaired people to get information about the surrounding environment or certain set of images. In the recent years there are many VQA systems developed by using different techniques of computer vision, Natural language processing, and deep Learning. Mostly these VQA systems are developed on scenes images and very few models work on textual features of images but learning textual features plays an important role in predicting an answer. Computer vision experts can develop an AI system for blind or visually-impaired people, so they can ask questions regarding any particular scene and environment. Using textual features in the model can help in predicting more accurate answers. There is a lot of textual data which are present in images, which can be used for very useful predictions.However, most of the VQA methods does not utilize the text often present in the images. These “texts in images” provide additional useful cues and facilitate better understanding of the visual content. In our project to develop a VQA system, we approached such a method that uses textual features along with visual features to predict an answer. In our project we use a books cover dataset which contains 207k images of the book cover and contains more than 1 million questions-answers.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Sciences BUIC	en_US
dc.relation.ispartofseries	BS (CS);MFN-P 10481
dc.subject	Deep Learning	en_US
dc.subject	Pre-Trained CNN	en_US
dc.title	Visual Question Answering Using Deep Learning.	en_US
dc.type	Project Reports	en_US