Real-time deep visual semantic alignment for automatic image captioning (P-0007) (MFN 8646)

Hamna Akram, 01-132152-008; Sidra Akram, 01-132152-039; Talha Wajid, 01-132152-043

DSpace Home
→
Final Year Project Report (BUES)
→
Department of Computer Engineering (BUES)
→
BCE (BUES-FYP)
→
View Item

dc.contributor.author	Hamna Akram, 01-132152-008
dc.contributor.author	Sidra Akram, 01-132152-039
dc.contributor.author	Talha Wajid, 01-132152-043
dc.date.accessioned	2020-08-06T11:39:34Z
dc.date.available	2020-08-06T11:39:34Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/123456789/9822
dc.description	Supervised by Mr.Ammar Ajmal	en_US
dc.description.abstract	Humans are capable of easily describing the environment they are in because they possess cognitive abilities. However, it is difficult for machines to infer the visual world around them. By blending the concepts of computer vision, Natural Language Processing (NLP) and deep learning, we implement an embedded system that understands the spatial relationship of objects in the images and describe them in natural language. The proposed system can be utilized in many different scenarios mainly in natural robot human interactions, navigation for the blind, image retrieval, creating social media content and for early childhood development. Cross modal retrieval is performed to generate semantically correct image descriptions. Convolution Neural Network (CNN) is used for image pre-processing and Long Short-Term Memory (LSTM) module is use for text pre-processing followed by word embeddings. Then both, as input vectors are provided to Feed Forward Neural Network (FFNN) to create semantically and syntactically correct sentence. The algorithms are trained using Flickr8k dataset and after performing model selection the system is implemented on hardware as a stand-alone device. Our systems performance is calculated by the NLP scores namely, BLEU, METEOR and CIDER score. These scores measure precision, recall and accuracy respectively.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Engineering, Bahria University Engineering School Islamabad	en_US
dc.relation.ispartofseries	BCE;P-0007
dc.subject	Computer Engineering	en_US
dc.title	Real-time deep visual semantic alignment for automatic image captioning (P-0007) (MFN 8646)	en_US
dc.type	Project Report	en_US