The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require supervised data pairs, which raises concerns about data separation and privacy protection, and makes it challenging to adapt these methods to new domain pairs. To address these issues, we propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models. Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models, making it possible to apply domain-specific diffusion models to other pairs. DECDM trains on one dataset at a time, eliminating the need to scan both datasets concurrently, and effectively preserving data privacy from the source or target domain. We also introduce simple data augmentation strategies to improve character-glyph conservation during translation. We compare DECDM with state-of-the-art methods on multiple synthetic data and benchmark datasets, such as document denoising and {\color{black}shadow} removal, and demonstrate the superiority of performance quantitatively and qualitatively.
翻译:光学字符识别(OCR)的性能严重依赖于文档图像质量,而这对于自动文档处理和文档智能至关重要。然而,现有的大多数文档增强方法需要监督数据对,这引发了关于数据分离和隐私保护的担忧,并使得这些方法难以适应新的领域对。为了解决这些问题,我们受扩散模型最新进展的启发,提出了DECDM,一种端到端的文档级图像翻译方法。我们的方法通过独立训练源(噪声输入)和目标(干净输出)模型,克服了配对训练的限制,从而能够将特定领域的扩散模型应用于其他配对。DECDM每次仅在一个数据集上训练,无需同时扫描两个数据集,并有效保护了源域或目标域的数据隐私。我们还引入了简单的数据增强策略,以改善翻译过程中字符字形的保持。我们在多个合成数据和基准数据集(例如文档去噪和阴影去除)上将DECDM与最先进的方法进行了比较,并在定量和定性上展示了其性能的优越性。