Abstract:
Visual speech recognition is a relatively new research field where speech is recognized through the movement pattern of lips. This technique is applied in scenarios where audio is not available or is unclear, or the user cannot or prefers not to use his voice. The aim of this project is to recognize the word spoken by a person in a video without the aid of audio. The developed application first reads a video and converts it to frames. Then for each frame it detects face, then face landmarks and then extract lips region. These lips region frames are pre-processed before they are provided to the deep learning model for classification the result of which is displayed on the screen