Abstract:
Computational analysis of ancient historical documents has been an interesting area of research for the pattern recognition community for many decades. Identifying the documents based on the structural similarity between their features is a major challenge that limits author identification. The focus of this study is to classify the scribes of ancient manuscripts based on structural similarity in the handwriting. The examined documents were created on papyrus, they are badly damaged, and identifying writer-specific features from these images poses a difficult problem. In our study, the documents are binarized using a model based on deep learning. Small title blocks are then extracted from the binarized documents and fed to a Siamese neural network with various pre-trained models fine-tuned on a user-defined dataset. In contrast to the classical recognition framework, where the model is expected to learn class labels, a Siamese network supports learning of similarities between samples of the same class and differences between samples drawn from different classes. We formulate the writer identification task as a similarity learning problem and use a contrastive loss function. Models are trained with positive and negative pairs, and among the baseline models examined, we achieved an overall accuracy of 75% with DenseNet-121. The reported performance is indeed quite promising given the complexity of the problem.