Abstract:
This paper presents a technique for segmentation of printed Urdu text images into lines and ligatures, a key pre-processing step in Urdu Optical Character Recognition (OCR) systems. Unlike classical projection profile based line segmentation methods, the proposed scheme successfully segments overlapping and touching lines. Once the lines are segmented, ligatures are extracted from each text line by associating the secondary ligatures with their respective primary ligatures. The system evaluated on 30 printed Urdu documents with 310 text lines and 7,364 ligatures realized promising results on line and ligature segmentation.