DSpace Repository

Detection and Recognition of Artificial Urdu Text in Videos

Show simple item record

dc.contributor.author Ghulam Ali Mirza, 01-284151-002
dc.date.accessioned 2022-01-17T10:20:36Z
dc.date.available 2022-01-17T10:20:36Z
dc.date.issued 2021
dc.identifier.uri http://hdl.handle.net/123456789/11645
dc.description Supervised by Dr. Imran Siddiqi en_US
dc.description.abstract Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams) as well as high level applications like opinion mining and content summarization. Key components of a textual content based retrieval system include detection (localization) of text regions and recognition of text through Video Optical Character Recognition (V-OCR) systems. While mature detection and recognition systems are available for text in non-cursive scripts, research on cursive scripts (like Urdu) is fairly limited and is marked by many challenges. These include complex and overlapping ligatures, context-dependent shape variations and presence of a large number of dots and diacritics. This research aims at detection and recognition of artificial (caption) Urdu text appearing in video frames, primarily targeting the local News channels. Leveraging the recent advancements in deep neural networks (DNN), we propose robust techniques to detect and recognize Urdu caption text from frames with bilingual (English & Urdu) textual content, the most common scenario in majority of our News channels. Detection of textual content relies on adapting the deep convolutional neural networks(CNN) based object detectors for text localization. To cater multiple scripts, text detection and script identification are combined in a single end-to-end trainable system. For recognition, we employ an implicit segmentation based analytical technique that relies on a combination of a CNN and recurrent neural network (RNN) with a connectionist temporal classification (CTC) layer. Images of text lines extracted from video frames along with ground truth transcription are fed to the CNN for feature extraction. The extracted feature sequences are then employed by the recurrent part of the network to predict the likely sequence of characters. Finally, the CTC layer converts raw predictions into meaningful Urdu text. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences BUIC en_US
dc.relation.ispartofseries PHD (CS);T-1421
dc.subject Computer Science en_US
dc.subject Detection and Recognition en_US
dc.subject Artificial Urdu Text in Videos en_US
dc.title Detection and Recognition of Artificial Urdu Text in Videos en_US
dc.type PhD Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account