Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
dc.contributor.author | Hamna Akram, 01-132152-008 | |
dc.contributor.author | Sidra Akram, 01-132152-039 | |
dc.contributor.author | Talha Wajid, 01-132152-043 | |
dc.date.accessioned | 2020-08-06T11:39:34Z | |
dc.date.available | 2020-08-06T11:39:34Z | |
dc.date.issued | 2019 | |
dc.identifier.uri | http://hdl.handle.net/123456789/9822 | |
dc.description | Supervised by Mr.Ammar Ajmal | en_US |
dc.description.abstract | Humans are capable of easily describing the environment they are in because they possess cognitive abilities. However, it is difficult for machines to infer the visual world around them. By blending the concepts of computer vision, Natural Language Processing (NLP) and deep learning, we implement an embedded system that understands the spatial relationship of objects in the images and describe them in natural language. The proposed system can be utilized in many different scenarios mainly in natural robot human interactions, navigation for the blind, image retrieval, creating social media content and for early childhood development. Cross modal retrieval is performed to generate semantically correct image descriptions. Convolution Neural Network (CNN) is used for image pre-processing and Long Short-Term Memory (LSTM) module is use for text pre-processing followed by word embeddings. Then both, as input vectors are provided to Feed Forward Neural Network (FFNN) to create semantically and syntactically correct sentence. The algorithms are trained using Flickr8k dataset and after performing model selection the system is implemented on hardware as a stand-alone device. Our systems performance is calculated by the NLP scores namely, BLEU, METEOR and CIDER score. These scores measure precision, recall and accuracy respectively. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Computer Engineering, Bahria University Engineering School Islamabad | en_US |
dc.relation.ispartofseries | BCE;P-0007 | |
dc.subject | Computer Engineering | en_US |
dc.title | Real-time deep visual semantic alignment for automatic image captioning (P-0007) (MFN 8646) | en_US |
dc.type | Project Report | en_US |