Abstract:
With the tremendous growth in the amount of multimedia data, especially videos, has increased
the need for efficient indexing and retrieval techniques. In addition to the audio-visual content
itself, a power tool that be employed for indexing of videos is the caption text appearing in them.
An important component of textual content based video indexing and retrieval systems is the
detection and extraction of text from video frames. Most of the existing text extraction system
target textual occurrences in a particular script or language. We have proposed a generic
multilingual text extraction system that relies on a combination of unsupervised and supervised
techniques. The unsupervised approach is based on application of image analysis techniques
which exploit the contrast, alignment and geometrical properties of text and identify candidate
text regions in an image. Potential text regions are then validated by an Artificial Neural
Network (ANN) using a set of features computed from Gray Level Co-occurrence Matrices
(GLCM). Detected text regions are then binarized to segment text from the background. The
script of the extracted text is finally identified using texture based features based on Local Binary
Patterns (LBP). The proposed system was evaluated on video images containing textual
occurrences in five different languages including English, Urdu, Hindi, Chinese and Arabic. The
promising results of the experimental evaluations validate the effectiveness of the proposed
system for text extraction and script identification.