Real-time deep visual semantic alignment for automatic image captioning (P-0007) (MFN 8646)

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Hamna Akram, 01-132152-008
dc.contributor.author Sidra Akram, 01-132152-039
dc.contributor.author Talha Wajid, 01-132152-043
dc.date.accessioned 2020-08-06T11:39:34Z
dc.date.available 2020-08-06T11:39:34Z
dc.date.issued 2019
dc.identifier.uri http://hdl.handle.net/123456789/9822
dc.description Supervised by Mr.Ammar Ajmal en_US
dc.description.abstract Humans are capable of easily describing the environment they are in because they possess cognitive abilities. However, it is difficult for machines to infer the visual world around them. By blending the concepts of computer vision, Natural Language Processing (NLP) and deep learning, we implement an embedded system that understands the spatial relationship of objects in the images and describe them in natural language. The proposed system can be utilized in many different scenarios mainly in natural robot human interactions, navigation for the blind, image retrieval, creating social media content and for early childhood development. Cross modal retrieval is performed to generate semantically correct image descriptions. Convolution Neural Network (CNN) is used for image pre-processing and Long Short-Term Memory (LSTM) module is use for text pre-processing followed by word embeddings. Then both, as input vectors are provided to Feed Forward Neural Network (FFNN) to create semantically and syntactically correct sentence. The algorithms are trained using Flickr8k dataset and after performing model selection the system is implemented on hardware as a stand-alone device. Our systems performance is calculated by the NLP scores namely, BLEU, METEOR and CIDER score. These scores measure precision, recall and accuracy respectively. en_US
dc.language.iso en en_US
dc.publisher Computer Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries BCE;P-0007
dc.subject Computer Engineering en_US
dc.title Real-time deep visual semantic alignment for automatic image captioning (P-0007) (MFN 8646) en_US
dc.type Project Report en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account