Is it an i or an l: Test-time Adaptation of Text Line Recognition Models

Recognizing text lines from images is a challenging problem, especially for handwritten documents due to large variations in writing styles. While text line recognition models are generally trained on large corpora of real and synthetic data, such models can still make frequent mistakes if the handwriting is inscrutable or the image acquisition process adds corruptions, such as noise, blur, compression, etc. Writing style is generally quite consistent for an individual, which can be leveraged to correct mistakes made by such models. Motivated by this, we introduce the problem of adapting text line recognition models during test time. We focus on a challenging and realistic setting where, given only a single test image consisting of multiple text lines, the task is to adapt the model such that it performs better on the image, without any labels. We propose an iterative self-training approach that uses feedback from the language model to update the optical model, with confident self-labels in each iteration. The confidence measure is based on an augmentation mechanism that evaluates the divergence of the prediction of the model in a local region. We perform rigorous evaluation of our method on several benchmark datasets as well as their corrupted versions. Experimental results on multiple datasets spanning multiple scripts show that the proposed adaptation method offers an absolute improvement of up to 8% in character error rate with just a few iterations of self-training at test time.

翻译：摘要：从图像中识别文本行是一项具有挑战性的问题，尤其是对于手写文档，由于书写风格的巨大差异。尽管文本行识别模型通常在大规模真实与合成数据上训练，但当笔迹难以辨认或图像获取过程引入噪声、模糊、压缩等污染时，这些模型仍可能频繁出错。个体的书写风格通常较为一致，这可用于纠正此类模型的错误。受此启发，我们提出了在测试时自适应文本行识别模型的问题。我们聚焦于一个具有挑战性且现实的情景：给定仅含多个文本行的单一测试图像，任务是在无需任何标签的情况下自适应模型，使其在该图像上表现更优。我们提出了一种迭代式自训练方法，该方法利用语言模型的反馈来更新光学模型，并在每次迭代中使用可信的自标签。置信度度量基于一种增强机制，用于评估模型在局部区域预测的分散度。我们在多个基准数据集及其污染版本上进行了严格的评估。跨多种文字系统的多个数据集的实验结果表明，所提出的自适应方法在测试时仅需几次自训练迭代，即可在字符错误率上实现高达8%的绝对改进。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日