We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real text images and that they can be trained using weak supervision are significant progresses. Second, we demonstrate the potential of our method for new applications, more specifically in the field of paleography, which studies the history and variations of handwriting, and for cipher analysis. We evaluate our approach on three very different datasets: a printed volume of the Google1000 dataset, the Copiale cipher and historical handwritten charters from the 12th and early 13th century.
翻译:我们提出了一种基于生成式文档特定方法,用于文本行中的字符分析与识别。核心思想是借鉴无监督多目标分割方法,特别是那些利用有限视觉元素(称为精灵图)重建图像的方法。该方法能学习大量不同字符,并在可用时利用行级标注。我们的贡献有两方面:首先,首次将深度无监督多目标分割方法适配并评估于文本行分析。由于这些方法主要在全无监督设置下对合成数据进行评估,证明其可适配并在真实文本图像上进行定量评估,且能通过弱监督训练,是重要进展。其次,展示了该方法在新应用中的潜力,特别是古文字学领域(研究手写历史与演变)和密码分析领域。我们在三个不同数据集上评估了方法:Google1000数据集的印刷卷、Copiale密码以及12至13世纪初的历史手写宪章。