OTS: A One-shot Learning Approach for Text Spotting in Historical Manuscripts

Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts.

翻译：历史手稿处理面临标注训练数据有限和新类别出现等挑战。为解决这些问题，我们提出了一种基于一次性学习的文本识别（OTS）方法，仅需一个标注支持样本即可准确可靠地识别新字符。受认知研究启发，我们引入空间对齐模块，该模块基于单个支持图像，在查询图像中定位、聚焦并学习最具判别性的空间区域。特别地，针对低资源识别任务中常见的样本不平衡问题，我们提出了一种名为环面损失的损失函数，该函数能提升距离度量嵌入空间的判别性。我们的方法高效且仅需少量训练样本，同时展现出处理新字符和符号的卓越能力。为增强数据集多样性，我们创建了一个包含古代东巴象形文字（DBH）的新手稿数据集。我们在公开的VML-HD、TKH、NC数据集以及新提出的DBH数据集上进行了实验。实验结果表明，OTS在一次性文本识别中优于现有最先进方法。总体而言，我们的方法为历史手稿文本识别领域提供了有前景的应用。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。