We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real text images and that they can be trained using weak supervision are significant progresses. Second, we demonstrate the potential of our method for new applications, more specifically in the field of paleography, which studies the history and variations of handwriting, and for cipher analysis. We evaluate our approach on three very different datasets: a printed volume of the Google1000 dataset, the Copiale cipher and historical handwritten charters from the 12th and early 13th century.
翻译:我们提出了一种面向文档的生成式方法,用于文本行中的字符分析与识别。核心思想在于利用无监督多目标分割方法,特别是那些基于有限数量视觉元素(称为精灵图)重建图像的技术。该方法能够学习大量不同的字符,并在可用时利用行级标注信息。我们的贡献包含两个方面。首先,我们首次将深度无监督多目标分割方法适配并评估于文本行分析任务。由于此类方法此前主要在完全无监督设置下基于合成数据进行评估,证明它们能够被适配并在真实文本图像上进行定量评估,以及能够利用弱监督进行训练,是重要的进展。其次,我们展示了该方法在新型应用中的潜力,特别是涉及手写史与变体研究的古文字学领域,以及密码分析领域。我们在三个截然不同的数据集上评估了该方法:Google1000数据集的印刷卷本、Copiale密码本,以及12世纪至13世纪初的历史手写特许状。