Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts.
翻译:历史手稿处理面临标注训练数据有限和新类别出现等挑战。为解决这一问题,我们提出了一种基于单样本学习的文本定位方法(OTS),该方法仅需一个标注支持样本即可准确可靠地定位新字符。受认知研究启发,我们引入空间对齐模块,该模块能基于单个支持图像在查询图像中发现、聚焦并学习最具判别性的空间区域。特别地,针对低资源文本定位任务常面临的样本不平衡问题,我们提出一种名为环形损失的新损失函数,可使距离度量的嵌入空间更具判别性。该方法高效且仅需少量训练样本,同时展现出处理新字符和新符号的卓越能力。为增强数据集多样性,我们创建了包含古代东巴象形文字(DBH)的新手稿数据集。我们在公开VML-HD、TKH、NC数据集及新提出的DBH数据集上开展实验。结果表明,OTS在单样本文本定位任务中优于现有最优方法。总体而言,本方法为历史手稿文本定位领域提供了具有前景的应用。