Abstract:
The Idea is to develop an OCR system for Urdu language. There is no such tool available
in market to help Urdu publishers to digitalize available Urdu books for Urdu readers.
Urdu OCR is a dream goal demands a lot of effort. The available research media was
helpful for us to get our directions. We as a team studied available research papers and
discussed the possibilities regarding idea implementation. After the discussions we
decided to try each and every possibility. We used ACCORD Library for image
processing. These possible implementations led us to a decent solution. We focused on
pixels and extracted black pixels from the image. Then processed these details for
learning purposes (getting new characters / combinations) and then compare these details
with the learned data. Provided images are processed to resolve image quality issues,
each separate connected ligature is segmented and stored with a unique code. These
segmented parts are learned and saved to a defined location and is used to recognize
words and symbols. After this process verified words was presented in an Urdu Text
editor. In the final quote we would like to get attention and help of the concerned
person’s regarding future work to make our solution more capable.
5