Offline Optical Character Recognition for Urdu Script (T-0681) (MFN 4238)

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Ayesha Rafiq, 01-244121-002
dc.date.accessioned 2017-07-20T07:21:24Z
dc.date.available 2017-07-20T07:21:24Z
dc.date.issued 2014
dc.identifier.uri http://hdl.handle.net/123456789/2859
dc.description Supervised by Dr. Shehzad Khalid en_US
dc.description.abstract Development of OCR system for Urdu language has been much challenging task for Urdu researchers for last few years. Intensive complex behavior of Urdu language system is one of prime reason. Urdu images are difficult to understand or manipulate properly unlike English. Retrieving text, sorting out diacritics, and more other functionalities are almost becomes impossible, until or unless they do not have satisfactory domain knowledge of the concerned field. In view of research limitations, proposed work in existing area, presents segmentation free approach using ligature base recognition for various fonts size and different writing style of Urdu. Binary image of Urdu text separates into individual lines. By using connected component labeling on segmented lines extracted ligature along with diacritics. After extraction of ligatures and diacritics, diacritics connected with their respective ligature and then these associated ligatures consider as basic recognition unit. Total 2017 clusters are used in our research; half of them serve as training data and remaining treated as test data. Discrete Fourier Transform (DFT) extracted feature vectors for data set. K-Nearest Neighbor was used to find closest node to query ligature. Our Propose system handled five type of diacritics i.e. different number and position of dots, hamza( ء), toay( ط), diacritics connected with haey( ہا ) and gaaf( گ). The proposed system evaluated on 70595 most commonly used ligatures of Urdu script and found system is able to recognize Urdu ligature with accuracy rate 98.6%. en_US
dc.language.iso en en_US
dc.publisher Software Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries MS SE;T-0681
dc.subject Software Engineering en_US
dc.title Offline Optical Character Recognition for Urdu Script (T-0681) (MFN 4238) en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account