Abstract:
Lip reading is a technique which is used to understand or interpret speech without
hearing it, this technique is especially for people who faces hearing difficulties. The
ability to communicate easily to everyone is a blessing with hearing impairment do
not have they completely depend on vision around their and faces difficulties in lip
reading. For this reason our research work is another step to create solution of this
problem. The ability to lip read enables a person with a hearing impairment to
communicate with others and to engage in social activities, which otherwise would
be difficult. Recent advances in the fields of computer vision, pattern recognition,
and signal processing has led to a growing interest in automating this challenging
task of lip reading. Indeed, automating the human ability to lip read, a process
referred to as visual speech recognition, could open the door for other novel
applications. This report investigates various issues faced by a research-oriented
speech recognition system based on “Recognize word which is spoken”.
The research is for Urdu language alphabets that are recognize by number video
analysis and motion estimation in which system can detect lips movements that
resemble utterances, and then converts it to readable characters. The algorithm on
which we are working is based on dividing the video into n number of frames to
generate n-I image frame which is produced by taking the difference between
consecutive frames. Then, video features are extracted to be used by
function which provided recognition of approximately. The traditional approaches to
automatic lip reading are based on lips pattern (mouth shapes (or appearances) or
sequences of lips dynamics that are required to generate a phoneme in the visual
domain). However, several problems arise in recognition lip patterns such as the
different style of pronunciation, Gender style of pronouncing URDU ALPHABETS,
these problems contribute to the bad performance ofthe traditional approaches so we
conclude to not work on Hidden Markov model and choose visemes instead.
our errorThe proposed approach consists oftwo major stages: the first one is the training sub system which is used by administrator (trainer). The other one is the recognition sub system which is used by any user. For this purpose we collect database (a video
database, which was recorded using a personal digital assistant camera, contains
number of video clips of 12 subjects uttering of both male and female indoor/
lighting conditions)