Abstract:
Textual content in videos contain rich information that can be exploited for semantic indexing and subsequent retrieval as well as development of video analytics solutions. The key modules in a textual content based video retrieval system include detection (localization) of text followed by its recognition, the later being the subject of our study. More specifically, this research presents a caption text recognition system targeting Urdu text. The technique relies on a holistic approach using ligatures as units of recognition. Data driven feature extraction techniques are employed using a number of pre-trained deep convolution neural networks. The networks are used as feature extractors as well as fine-tuned on the ligature data set under study and realized high ligature recognition rates.