Pre-trained models with large-scale training data, such as CLIP and Stable Diffusion, have demonstrated remarkable performance in various high-level computer vision tasks such as image understanding and generation from language descriptions. Yet, their potential for low-level tasks such as image restoration remains relatively unexplored. In this paper, we explore such models to enhance image restoration. As off-the-shelf features (OSF) from pre-trained models do not directly serve image restoration, we propose to learn an additional lightweight module called Pre-Train-Guided Refinement Module (PTG-RM) to refine restoration results of a target restoration network with OSF. PTG-RM consists of two components, Pre-Train-Guided Spatial-Varying Enhancement (PTG-SVE), and Pre-Train-Guided Channel-Spatial Attention (PTG-CSA). PTG-SVE enables optimal short- and long-range neural operations, while PTG-CSA enhances spatial-channel attention for restoration-related learning. Extensive experiments demonstrate that PTG-RM, with its compact size ($<$1M parameters), effectively enhances restoration performance of various models across different tasks, including low-light enhancement, deraining, deblurring, and denoising.
翻译:基于大规模训练数据的预训练模型(如CLIP和Stable Diffusion)已在图像理解、语言描述生成等高级计算机视觉任务中展现出卓越性能。然而,这类模型在图像恢复等低级任务中的潜力尚未被充分探索。本文探究如何利用此类模型增强图像恢复效果。由于预训练模型的现成特征(OSF)无法直接服务于图像恢复任务,我们提出学习一个轻量级附加模块——预训练引导精调模块(PTG-RM),用于通过OSF精炼目标恢复网络的恢复结果。PTG-RM由两个子模块组成:预训练引导空间-自适应增强(PTG-SVE)和预训练引导通道-空间注意力机制(PTG-CSA)。PTG-SVE能够实现最优的短程和长程神经操作,而PTG-CSA则增强恢复任务相关的空间-通道注意力学习。大量实验表明,PTG-RM凭借其紧凑的参数量(<1M参数),能有效提升多种模型在不同任务中的恢复性能,涵盖低光照增强、去雨、去模糊和去噪等场景。