Detection and Recognition of Artificial Urdu Text in Videos

Ghulam Ali Mirza, 01-284151-002

DSpace Home
→
Thesis/Dissertation Repository Islamabad Campus
→
Department of Computer Sciences (BUIC-E-8)
→
PhD (CS) (BUIC-E-8)
→
View Item

dc.contributor.author	Ghulam Ali Mirza, 01-284151-002
dc.date.accessioned	2022-01-17T10:20:36Z
dc.date.available	2022-01-17T10:20:36Z
dc.date.issued	2021
dc.identifier.uri	http://hdl.handle.net/123456789/11645
dc.description	Supervised by Dr. Imran Siddiqi	en_US
dc.description.abstract	Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams) as well as high level applications like opinion mining and content summarization. Key components of a textual content based retrieval system include detection (localization) of text regions and recognition of text through Video Optical Character Recognition (V-OCR) systems. While mature detection and recognition systems are available for text in non-cursive scripts, research on cursive scripts (like Urdu) is fairly limited and is marked by many challenges. These include complex and overlapping ligatures, context-dependent shape variations and presence of a large number of dots and diacritics. This research aims at detection and recognition of artificial (caption) Urdu text appearing in video frames, primarily targeting the local News channels. Leveraging the recent advancements in deep neural networks (DNN), we propose robust techniques to detect and recognize Urdu caption text from frames with bilingual (English & Urdu) textual content, the most common scenario in majority of our News channels. Detection of textual content relies on adapting the deep convolutional neural networks(CNN) based object detectors for text localization. To cater multiple scripts, text detection and script identification are combined in a single end-to-end trainable system. For recognition, we employ an implicit segmentation based analytical technique that relies on a combination of a CNN and recurrent neural network (RNN) with a connectionist temporal classification (CTC) layer. Images of text lines extracted from video frames along with ground truth transcription are fed to the CNN for feature extraction. The extracted feature sequences are then employed by the recurrent part of the network to predict the likely sequence of characters. Finally, the CTC layer converts raw predictions into meaningful Urdu text.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Sciences BUIC	en_US
dc.relation.ispartofseries	PHD (CS);T-1421
dc.subject	Computer Science	en_US
dc.subject	Detection and Recognition	en_US
dc.subject	Artificial Urdu Text in Videos	en_US
dc.title	Detection and Recognition of Artificial Urdu Text in Videos	en_US
dc.type	PhD Thesis	en_US