Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
dc.contributor.author | Ali Hamza, 01-134181-009 | |
dc.contributor.author | Raja Muneer, 01-134181-056 | |
dc.date.accessioned | 2022-06-17T07:14:55Z | |
dc.date.available | 2022-06-17T07:14:55Z | |
dc.date.issued | 2022 | |
dc.identifier.uri | http://hdl.handle.net/123456789/12851 | |
dc.description | Supervised by Dr.Imran Siddiqui | en_US |
dc.description.abstract | The problem of answering questions about an image is commonly known as visual question answering. It is a well-established problem in computer vision. The Visual Question Answering(VQA) task requires the understanding of both text and vision. Given an image and a question in natural language, the VQA system tries to find the correct answer to it using visual elements of the image and inference gathered from textual questions. These types of models are really helpful for visually-impaired people to get information about the surrounding environment or certain set of images. In the recent years there are many VQA systems developed by using different techniques of computer vision, Natural language processing, and deep Learning. Mostly these VQA systems are developed on scenes images and very few models work on textual features of images but learning textual features plays an important role in predicting an answer. Computer vision experts can develop an AI system for blind or visually-impaired people, so they can ask questions regarding any particular scene and environment. Using textual features in the model can help in predicting more accurate answers. There is a lot of textual data which are present in images, which can be used for very useful predictions.However, most of the VQA methods does not utilize the text often present in the images. These “texts in images” provide additional useful cues and facilitate better understanding of the visual content. In our project to develop a VQA system, we approached such a method that uses textual features along with visual features to predict an answer. In our project we use a books cover dataset which contains 207k images of the book cover and contains more than 1 million questions-answers. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Computer Sciences BUIC | en_US |
dc.relation.ispartofseries | BS (CS);MFN-P 10481 | |
dc.subject | Deep Learning | en_US |
dc.subject | Pre-Trained CNN | en_US |
dc.title | Visual Question Answering Using Deep Learning. | en_US |
dc.type | Project Reports | en_US |