Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts.
翻译:历史手稿处理面临标注训练数据有限和新类别出现等挑战。为解决这些问题,我们提出了一种基于一次性学习的文本识别(OTS)方法,仅需一个标注支持样本即可准确可靠地识别新字符。受认知研究启发,我们引入空间对齐模块,该模块基于单个支持图像,在查询图像中定位、聚焦并学习最具判别性的空间区域。特别地,针对低资源识别任务中常见的样本不平衡问题,我们提出了一种名为环面损失的损失函数,该函数能提升距离度量嵌入空间的判别性。我们的方法高效且仅需少量训练样本,同时展现出处理新字符和符号的卓越能力。为增强数据集多样性,我们创建了一个包含古代东巴象形文字(DBH)的新手稿数据集。我们在公开的VML-HD、TKH、NC数据集以及新提出的DBH数据集上进行了实验。实验结果表明,OTS在一次性文本识别中优于现有最先进方法。总体而言,我们的方法为历史手稿文本识别领域提供了有前景的应用。