PyLaia is one of the most popular open-source software for Automatic Text Recognition (ATR), delivering strong performance in terms of speed and accuracy. In this paper, we outline our recent contributions to the PyLaia library, focusing on the incorporation of reliable confidence scores and the integration of statistical language modeling during decoding. Our implementation provides an easy way to combine PyLaia with n-grams language models at different levels. One of the highlights of this work is that language models are completely auto-tuned: they can be built and used easily without any expert knowledge, and without requiring any additional data. To demonstrate the significance of our contribution, we evaluate PyLaia's performance on twelve datasets, both with and without language modelling. The results show that decoding with small language models improves the Word Error Rate by 13% and the Character Error Rate by 12% in average. Additionally, we conduct an analysis of confidence scores and highlight the importance of calibration techniques. Our implementation is publicly available in the official PyLaia repository at https://gitlab.teklia.com/atr/pylaia, and twelve open-source models are released on Hugging Face.
翻译:PyLaia是最流行的自动文本识别(ATR)开源软件之一,在速度和准确性方面表现出色。本文概述了我们近期对PyLaia库的改进,重点包括引入可靠的置信度分数以及解码过程中集成统计语言建模。我们的实现提供了一种简便方法,可将PyLaia与不同级别的N元语言模型相结合。本研究的亮点之一是语言模型完全自动调优:用户无需专业知识或额外数据即可轻松构建和使用语言模型。为证明我们贡献的重要性,我们在十二个数据集上对PyLaia的性能进行了评估(分别在有/无语言建模条件下)。结果表明,结合小型语言模型进行解码,平均可使词错误率降低13%,字符错误率降低12%。此外,我们分析了置信度分数,并强调了校准技术的重要性。我们的实现已在PyLaia官方仓库(https://gitlab.teklia.com/atr/pylaia)中公开,并在Hugging Face平台上发布了十二个开源模型。