Visual Question Answering Using Deep Learning.

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Ali Hamza, 01-134181-009
dc.contributor.author Raja Muneer, 01-134181-056
dc.date.accessioned 2022-06-17T07:14:55Z
dc.date.available 2022-06-17T07:14:55Z
dc.date.issued 2022
dc.identifier.uri http://hdl.handle.net/123456789/12851
dc.description Supervised by Dr.Imran Siddiqui en_US
dc.description.abstract The problem of answering questions about an image is commonly known as visual question answering. It is a well-established problem in computer vision. The Visual Question Answering(VQA) task requires the understanding of both text and vision. Given an image and a question in natural language, the VQA system tries to find the correct answer to it using visual elements of the image and inference gathered from textual questions. These types of models are really helpful for visually-impaired people to get information about the surrounding environment or certain set of images. In the recent years there are many VQA systems developed by using different techniques of computer vision, Natural language processing, and deep Learning. Mostly these VQA systems are developed on scenes images and very few models work on textual features of images but learning textual features plays an important role in predicting an answer. Computer vision experts can develop an AI system for blind or visually-impaired people, so they can ask questions regarding any particular scene and environment. Using textual features in the model can help in predicting more accurate answers. There is a lot of textual data which are present in images, which can be used for very useful predictions.However, most of the VQA methods does not utilize the text often present in the images. These “texts in images” provide additional useful cues and facilitate better understanding of the visual content. In our project to develop a VQA system, we approached such a method that uses textual features along with visual features to predict an answer. In our project we use a books cover dataset which contains 207k images of the book cover and contains more than 1 million questions-answers. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences BUIC en_US
dc.relation.ispartofseries BS (CS);MFN-P 10481
dc.subject Deep Learning en_US
dc.subject Pre-Trained CNN en_US
dc.title Visual Question Answering Using Deep Learning. en_US
dc.type Project Reports en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account