We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real images of text and that they can be trained using weak supervision are significant progresses. Second, we show the potential of our method for new applications, more specifically in the field of paleography, which studies the history and variations of handwriting, and for cipher analysis. We demonstrate our approach on three very different datasets: a printed volume of the Google1000 dataset, the Copiale cipher and historical handwritten charters from the 12th and early 13th century.
翻译:我们提出了一种针对文本行中字符分析与识别的生成式文档特异性方法。核心思想是基于无监督多目标分割方法,特别是那些通过有限视觉元素(称为精灵)重建图像的方法。该方法以一组具有相似字体或手写风格的文本行作为输入,能够学习大量不同字符,并在可用时利用行级标注。我们的贡献体现在两方面。首先,我们首次将深度无监督多目标分割方法适配并评估于文本行分析。由于此前方法主要在全无监督环境下基于合成数据进行评估,因此证明其能够适配并在真实文本图像上实现定量评估,且可通过弱监督训练,这构成了重要进展。其次,我们展示了该方法在新应用中的潜力,特别是在研究手写历史与变体的古文字学领域,以及密码分析领域。我们在三个截然不同的数据集上验证了该方法:Google1000数据集中的印刷卷本、Copiale密码本以及12世纪至13世纪早期的历史手写宪章。