This paper presents a character-based approach for enhancing writer retrieval performance in the context of Greek papyri. Our contribution lies in introducing character-level annotations for frequently used characters, in our case the trigram kai and four additional letters (epsilon, kappa, mu, omega), in Greek texts. We use a state-of-the-art writer retrieval approach based on NetVLAD and compare a character-level-based feature aggregation method against the current default baseline of using small patches located at SIFT keypoint locations for building the page descriptors. We demonstrate that by using only about 15 characters per page, we are able to boost the performance up to 4% mAP (a relative improvement of 11%) on the GRK-120 dataset. Additionally, our qualitative analysis offers insights into the similarity scores of SIFT patches and specific characters. We publish the dataset with character-level annotations, including a quality label and our binarized images for further research.
翻译:本文提出一种基于字符的方法,用于提升希腊纸莎草文稿的作者检索性能。我们的贡献在于为希腊文本中频繁使用的字符引入字符级标注,具体包括三字母组合kai以及四个额外字母(epsilon、kappa、mu、omega)。我们采用基于NetVLAD的先进作者检索方法,并将基于字符级别的特征聚合方法与当前默认基线——使用位于SIFT关键点位置的小图像块构建页面描述符——进行对比。实验表明,仅需每页约15个字符,我们就能在GRK-120数据集上将性能提升高达4% mAP(相对改进11%)。此外,我们的定性分析深入探讨了SIFT图像块与特定字符的相似度得分。我们公开了包含字符级标注的数据集,其中附有质量标签及二值化图像,以供进一步研究。