Segmentation-free optical character recognition for printed Urdu text

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Israr Ud Din
dc.contributor.author Imran Siddiqi
dc.contributor.author Shehzad Khalid
dc.contributor.author Tahir Azam
dc.date.accessioned 2018-09-26T09:38:37Z
dc.date.available 2018-09-26T09:38:37Z
dc.date.issued 2017
dc.identifier.uri http://hdl.handle.net/123456789/7504
dc.description.abstract This paper presents a segmentation-free optical character recognition system for printed Urdu Nastaliq font using ligatures as units of recognition. The proposed technique relies on statistical features and employs Hidden Markov Models for classification. A total of 1525 unique high-frequency Urdu ligatures from the standard Urdu Printed Text Images (UPTI) database are considered in our study. Ligatures extracted from text lines are first split into primary (main body) and secondary (dots and diacritics) ligatures and multiple instances of the same ligature are grouped into clusters using a sequential clustering algorithm. Hidden Markov Models are trained separately for each ligature using the examples in the respective cluster by sliding right-to-left the overlapped windows and extracting a set of statistical features. Given the query text, the primary and secondary ligatures are separately recognized and later associated together using a set of heuristics to recognize the complete ligature. The system evaluated on the standard UPTI Urdu database reported a ligature recognition rate of 92% on more than 6000 query ligatures. en_US
dc.language.iso en en_US
dc.publisher Bahria University Islamabad Campus en_US
dc.relation.ispartofseries ;10.1186/s13640-017-0208-z
dc.subject Department of Computer Science CS en_US
dc.title Segmentation-free optical character recognition for printed Urdu text en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account