DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require supervised data pairs, which raises concerns about data separation and privacy protection, and makes it challenging to adapt these methods to new domain pairs. To address these issues, we propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models. Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models, making it possible to apply domain-specific diffusion models to other pairs. DECDM trains on one dataset at a time, eliminating the need to scan both datasets concurrently, and effectively preserving data privacy from the source or target domain. We also introduce simple data augmentation strategies to improve character-glyph conservation during translation. We compare DECDM with state-of-the-art methods on multiple synthetic data and benchmark datasets, such as document denoising and {\color{black}shadow} removal, and demonstrate the superiority of performance quantitatively and qualitatively.

翻译：光学字符识别（OCR）的性能严重依赖于文档图像质量，而这对于自动文档处理和文档智能至关重要。然而，现有的大多数文档增强方法需要监督数据对，这引发了关于数据分离和隐私保护的担忧，并使得这些方法难以适应新的领域对。为了解决这些问题，我们受扩散模型最新进展的启发，提出了DECDM，一种端到端的文档级图像翻译方法。我们的方法通过独立训练源（噪声输入）和目标（干净输出）模型，克服了配对训练的限制，从而能够将特定领域的扩散模型应用于其他配对。DECDM每次仅在一个数据集上训练，无需同时扫描两个数据集，并有效保护了源域或目标域的数据隐私。我们还引入了简单的数据增强策略，以改善翻译过程中字符字形的保持。我们在多个合成数据和基准数据集（例如文档去噪和阴影去除）上将DECDM与最先进的方法进行了比较，并在定量和定性上展示了其性能的优越性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日