Classification of Demographic Traits based on Urdu Handwriting using Deep Learning

Huma Rehman, 01-134142-030; Syed Ghulam Mustufa, 01-134142-072

DSpace Home
→
Final Year Project Report (BUIC)
→
Department of Computer Science and IT (BUIC-E-8)
→
BS (CS) (BUIC-FYP-E8)
→
View Item

Classification of Demographic Traits based on Urdu Handwriting using Deep Learning

Huma Rehman, 01-134142-030; Syed Ghulam Mustufa, 01-134142-072

URI: http://hdl.handle.net/123456789/8348

Date: 2018

Abstract:

Classification problems are being solved through machine learning for some time now. Researches have created ideal models for many image classification problems. Similarly, researchers have also proved that the handwriting of a person contains information that can be used to recognize the demographic traits on that person. This project provides experimental results based on the demographic traits recognition problem. Demographic Traits include gender, age, handedness, education, province and occupation. First of all, the basic requirement of this project was to acquire an Urdu Handwriting data set of at least 1000 individuals along with the demographic information. Since, there was not any data set available which fulfills our requirements. Hence, we had to create one. The data set was properly handled and created from scratch, which required a number of steps such as Sample Generation, Data Collection, Ground Truth and Labeling. The data set created during this project contains images of Urdu handwriting and demographic information of 1000 individuals. Furthermore, the image data was pre-processed in order to perform classification to the best. For finding a better solution for the problem, a better choice was to solve this problem using state-of-the-art deep learning models for image-classification problems. Some of the chosen deep learning models for experimenting were Alex net, V G G-16, Resent-50, Resent-101 and Resent-152. The common technique is Transfer Learning and Fine Tuning, if one is using p re-trained model on his data set. This technique usually provides better results. While applying transfer learning on our data set, many problems and issues came regarding resources and results, since deep learning models require a great amount of processing power which can be achieved using high-processing GP Us. Thanks to google cloud, the machine related issues were solved. The next part was to come up with good results for the problem which was solely based on trial and error. Some models gave satisfying accuracy, and some did not. Furthermore, other data processing techniques provided some improvement in the classification results such as, patching of images and data-augmentation. Another requirement of the project was to create an application that can help user to create his own custom neural network and also train and test the network on any data set provided. The application was built as a windows form application using .NET Framework in C. A machine learning library named as “Sharp Learning” was used to create the application. The application provides various options to the user such as Loading Training and Testing data set as c s v or image. The user can create his own neural network by adding different type of layers used to create a simple Artificial Neural Network or a Convolution Neural Network. The user can also add, delete and edit the layers along with the option to save the trained model. The user can also load a previous model in order to save time for faster testing results. More options are available in the application. This project is a contribution to the research area of document image analysis and Further contributions will be made in this research area.