Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems. Therefore, a crucial preprocessing step is essential to eliminate noise while preserving text and key features of documents. In this paper, we propose NAF-DPM, a novel generative framework based on a diffusion probabilistic model (DPM) designed to restore the original quality of degraded documents. While DPMs are recognized for their high-quality generated images, they are also known for their large inference time. To mitigate this problem we provide the DPM with an efficient nonlinear activation-free (NAF) network and we employ as a sampler a fast solver of ordinary differential equations, which can converge in a few iterations. To better preserve text characters, we introduce an additional differentiable module based on convolutional recurrent neural networks, simulating the behavior of an OCR system during training. Experiments conducted on various datasets showcase the superiority of our approach, achieving state-of-the-art performance in terms of pixel-level and perceptual similarity metrics. Furthermore, the results demonstrate a notable character error reduction made by OCR systems when transcribing real-world document images enhanced by our framework. Code and pre-trained models are available at https://github.com/ispamm/NAF-DPM.
翻译:现实世界中的文档可能遭受多种形式的退化,常导致光学字符识别(OCR)系统的准确率降低。因此,在消除噪声的同时保留文本及文档关键特征的预处理步骤至关重要。本文提出NAF-DPM,一种基于扩散概率模型(DPM)的新型生成框架,旨在恢复退化文档的原始质量。尽管DPM以其生成的高质量图像而著称,但其推理时间过长的问题同样广为人知。为解决该问题,我们为DPM配备高效的无非线性激活(NAF)网络,并采用快速常微分方程求解器作为采样器,使其能在少量迭代内收敛。为更好地保留文本字符,我们引入基于卷积循环神经网络的额外可微模块,在训练过程中模拟OCR系统的行为。在多个数据集上进行的实验展示了我们方法的优越性,在像素级和感知相似性指标上均达到最优性能。此外,实验结果也表明,经我们框架增强的真实世界文档图像转录后,OCR系统的字符错误率显著降低。代码与预训练模型已开源至https://github.com/ispamm/NAF-DPM。