REAL TIME SPEECH DRIVEN FACE ANIMATION SYSTEM

Kayani, Izhar us Salam Reg # 36567; Asim, Muhammad Ahmer Reg # 36581; Abbassi, Hassan Shahab Reg # 36563; Bhimani, Rakesh Kumar Reg # 36598

DSpace Home
→
Thesis/Dissertation Repository Karachi Campus
→
Department of Computer Science (BUKC)
→
BS-CS (BUKC)
→
View Item

REAL TIME SPEECH DRIVEN FACE ANIMATION SYSTEM

Kayani, Izhar us Salam Reg # 36567; Asim, Muhammad Ahmer Reg # 36581; Abbassi, Hassan Shahab Reg # 36563; Bhimani, Rakesh Kumar Reg # 36598

URI: http://hdl.handle.net/123456789/10444

Date: 2018

Abstract:

We have opted a paper “LIPNET: END-TO-END SENTENCE-LEVEL LIPREADING” as a base paper of our Final Year Project. Lip-reading is the task of decryption text from the movement of a speaker’s mouth. Ancient approaches separated the issue into 2 stages: planning or learning visual options, and prediction. Newer deep lip-reading approaches are end-to-end trainable (Wand et al„ 2016; Chung & Zisserman, 20j6a). However, existing work on models trained end-to-end perform solely word classification, instead of sentence-level sequence prediction. Studies have shown that human lip-reading performance will for extended words (Easton & Basala, 1982), indicating the importance of options capturing temporal context in an ambiguous communication. Intended by this observation, our project presents, a model that maps a video frames to text, creating of spatial-temporal convolutions, a neural network, and therefore the connection temporal classification loss, trained entirely end-to-end. End-to-end sentence-level lip reading model that at the same time learns spatial-temporal visual options and a sequence model.