Semantic video retrieval using natural language queries

Ahmed Hassan, 0 l -249 182-002

DSpace Home
→
Thesis/Dissertation Repository Islamabad Campus
→
Department of Computer Sciences (BUIC-E-8)
→
MS (DS) (BUIC-E-8)
→
View Item

Semantic video retrieval using natural language queries

Ahmed Hassan, 0 l -249 182-002

URI: http://hdl.handle.net/123456789/10541

Date: 2020

Abstract:

Video retrieval is searching and retrieving videos that are relevant to user-defined query. This is one one the most challenging and novel issue in multimedia search as well as in real life This research work is focused on employing the concepts of deep learning and natural language processing to solve the video retrieval problem, Thanks to Deep learning which enables us to make an end-to-end trainable system and avoiding the complexity of image and video processing techniques present in traditional systems. We are proposing a semantic-based video retrieval system in which the actual content of the video will be explored, persons in the video will be recognized, and description of the frames of video will be generated using image caption technique, it will help to understand the contents of the video. so combining both person recognition, and captioning models we will be able to have both person-related information and the description of the video frames. In retrieval phase, we will employ word embedding technique to find similar words to those appearing in the given query text which would help to retrieve the most relevant videos w.r.t given query. This will help to reduce the semantic gap and desired videos are expected to be retrieved. We considered 20 key individuals in our study, There are three key components in our study i.e Face recognition, caption generation, and query similarity measure. To recognize persons face appearing in a video we use FaccNet model, and to generate a description of the scene in the video frames we employ an image captioning model. The output of these two models along with the frame and video information is saved in the database. In the retrieval phase, a natural language query is provided to the system, here we usc the concept of word embedding model to find the top five similar words against the provided query. Videos against the matching words and queried individuals are then returned by the system. We conducted the experiments on a collection of 100 videos and promising results are reported.