| dc.contributor.author | Kayani, Izhar us Salam Reg # 36567 | |
| dc.contributor.author | Asim, Muhammad Ahmer Reg # 36581 | |
| dc.contributor.author | Abbassi, Hassan Shahab Reg # 36563 | |
| dc.contributor.author | Bhimani, Rakesh Kumar Reg # 36598 | |
| dc.date.accessioned | 2020-12-12T00:57:29Z | |
| dc.date.available | 2020-12-12T00:57:29Z | |
| dc.date.issued | 2018 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/10444 | |
| dc.description | Supervised by Asia Samreen | en_US |
| dc.description.abstract | We have opted a paper “LIPNET: END-TO-END SENTENCE-LEVEL LIPREADING” as a base paper of our Final Year Project. Lip-reading is the task of decryption text from the movement of a speaker’s mouth. Ancient approaches separated the issue into 2 stages: planning or learning visual options, and prediction. Newer deep lip-reading approaches are end-to-end trainable (Wand et al„ 2016; Chung & Zisserman, 20j6a). However, existing work on models trained end-to-end perform solely word classification, instead of sentence-level sequence prediction. Studies have shown that human lip-reading performance will for extended words (Easton & Basala, 1982), indicating the importance of options capturing temporal context in an ambiguous communication. Intended by this observation, our project presents, a model that maps a video frames to text, creating of spatial-temporal convolutions, a neural network, and therefore the connection temporal classification loss, trained entirely end-to-end. End-to-end sentence-level lip reading model that at the same time learns spatial-temporal visual options and a sequence model. | en_US |
| dc.language.iso | en_US | en_US |
| dc.publisher | Bahria University Karachi Campus | en_US |
| dc.relation.ispartofseries | BS CS;MFN BSCS 109 | |
| dc.title | REAL TIME SPEECH DRIVEN FACE ANIMATION SYSTEM | en_US |
| dc.type | Thesis | en_US |