Document image restoration is a crucial aspect of Document AI systems, as the quality of document images significantly influences the overall performance. Prevailing methods address distinct restoration tasks independently, leading to intricate systems and the incapability to harness the potential synergies of multi-task learning. To overcome this challenge, we propose DocRes, a generalist model that unifies five document image restoration tasks including dewarping, deshadowing, appearance enhancement, deblurring, and binarization. To instruct DocRes to perform various restoration tasks, we propose a novel visual prompt approach called Dynamic Task-Specific Prompt (DTSPrompt). The DTSPrompt for different tasks comprises distinct prior features, which are additional characteristics extracted from the input image. Beyond its role as a cue for task-specific execution, DTSPrompt can also serve as supplementary information to enhance the model's performance. Moreover, DTSPrompt is more flexible than prior visual prompt approaches as it can be seamlessly applied and adapted to inputs with high and variable resolutions. Experimental results demonstrate that DocRes achieves competitive or superior performance compared to existing state-of-the-art task-specific models. This underscores the potential of DocRes across a broader spectrum of document image restoration tasks. The source code is publicly available at https://github.com/ZZZHANG-jx/DocRes
翻译:文档图像复原是文档AI系统的关键环节,文档图像质量显著影响系统整体性能。现有方法独立处理不同复原任务,导致系统复杂且无法充分利用多任务学习带来的协同效应。为应对这一挑战,我们提出DocRes——一种统一五种文档图像复原任务的通用模型,涵盖去卷曲、去阴影、外观增强、去模糊和二值化。为引导DocRes执行不同复原任务,我们提出了一种新颖的视觉提示方法——动态任务特定提示(DTSPrompt)。不同任务的DTSPrompt包含独特的先验特征,这些特征是从输入图像中提取的附加属性。DTSPrompt不仅作为任务特定执行的提示,还可作为补充信息增强模型性能。此外,相比现有视觉提示方法,DTSPrompt更具灵活性,可无缝应用于高分辨率可变分辨率的输入场景。实验结果表明,DocRes在与现有最先进的专用任务模型对比中展现出具有竞争力或更优的性能,这验证了其在更广泛文档图像复原任务中的应用潜力。源代码已公开于https://github.com/ZZZHANG-jx/DocRes