NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement

Real-world documents may suffer various forms of degradation, often resulting in lower accuracy in optical character recognition (OCR) systems. Therefore, a crucial preprocessing step is essential to eliminate noise while preserving text and key features of documents. In this paper, we propose NAF-DPM, a novel generative framework based on a diffusion probabilistic model (DPM) designed to restore the original quality of degraded documents. While DPMs are recognized for their high-quality generated images, they are also known for their large inference time. To mitigate this problem we provide the DPM with an efficient nonlinear activation-free (NAF) network and we employ as a sampler a fast solver of ordinary differential equations, which can converge in a few iterations. To better preserve text characters, we introduce an additional differentiable module based on convolutional recurrent neural networks, simulating the behavior of an OCR system during training. Experiments conducted on various datasets showcase the superiority of our approach, achieving state-of-the-art performance in terms of pixel-level and perceptual similarity metrics. Furthermore, the results demonstrate a notable character error reduction made by OCR systems when transcribing real-world document images enhanced by our framework. Code and pre-trained models are available at https://github.com/ispamm/NAF-DPM.

翻译：现实世界中的文档可能遭受多种形式的退化，常导致光学字符识别（OCR）系统的准确率降低。因此，在消除噪声的同时保留文本及文档关键特征的预处理步骤至关重要。本文提出NAF-DPM，一种基于扩散概率模型（DPM）的新型生成框架，旨在恢复退化文档的原始质量。尽管DPM以其生成的高质量图像而著称，但其推理时间过长的问题同样广为人知。为解决该问题，我们为DPM配备高效的无非线性激活（NAF）网络，并采用快速常微分方程求解器作为采样器，使其能在少量迭代内收敛。为更好地保留文本字符，我们引入基于卷积循环神经网络的额外可微模块，在训练过程中模拟OCR系统的行为。在多个数据集上进行的实验展示了我们方法的优越性，在像素级和感知相似性指标上均达到最优性能。此外，实验结果也表明，经我们框架增强的真实世界文档图像转录后，OCR系统的字符错误率显著降低。代码与预训练模型已开源至https://github.com/ispamm/NAF-DPM。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日