Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
dc.contributor.author | Ayesha Rafiq, 01-244121-002 | |
dc.date.accessioned | 2017-07-20T07:21:24Z | |
dc.date.available | 2017-07-20T07:21:24Z | |
dc.date.issued | 2014 | |
dc.identifier.uri | http://hdl.handle.net/123456789/2859 | |
dc.description | Supervised by Dr. Shehzad Khalid | en_US |
dc.description.abstract | Development of OCR system for Urdu language has been much challenging task for Urdu researchers for last few years. Intensive complex behavior of Urdu language system is one of prime reason. Urdu images are difficult to understand or manipulate properly unlike English. Retrieving text, sorting out diacritics, and more other functionalities are almost becomes impossible, until or unless they do not have satisfactory domain knowledge of the concerned field. In view of research limitations, proposed work in existing area, presents segmentation free approach using ligature base recognition for various fonts size and different writing style of Urdu. Binary image of Urdu text separates into individual lines. By using connected component labeling on segmented lines extracted ligature along with diacritics. After extraction of ligatures and diacritics, diacritics connected with their respective ligature and then these associated ligatures consider as basic recognition unit. Total 2017 clusters are used in our research; half of them serve as training data and remaining treated as test data. Discrete Fourier Transform (DFT) extracted feature vectors for data set. K-Nearest Neighbor was used to find closest node to query ligature. Our Propose system handled five type of diacritics i.e. different number and position of dots, hamza( ء), toay( ط), diacritics connected with haey( ہا ) and gaaf( گ). The proposed system evaluated on 70595 most commonly used ligatures of Urdu script and found system is able to recognize Urdu ligature with accuracy rate 98.6%. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Software Engineering, Bahria University Engineering School Islamabad | en_US |
dc.relation.ispartofseries | MS SE;T-0681 | |
dc.subject | Software Engineering | en_US |
dc.title | Offline Optical Character Recognition for Urdu Script (T-0681) (MFN 4238) | en_US |
dc.type | MS Thesis | en_US |