Despite substantial progress, all-in-one image restoration (IR) grapples with persistent challenges in handling intricate real-world degradations. This paper introduces MPerceiver: a novel multimodal prompt learning approach that harnesses Stable Diffusion (SD) priors to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration. Specifically, we develop a dual-branch module to master two types of SD prompts: textual for holistic representation and visual for multiscale detail representation. Both prompts are dynamically adjusted by degradation predictions from the CLIP image encoder, enabling adaptive responses to diverse unknown degradations. Moreover, a plug-in detail refinement module improves restoration fidelity via direct encoder-to-decoder information transformation. To assess our method, MPerceiver is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art task-specific methods across most tasks. Post multitask pre-training, MPerceiver attains a generalized representation in low-level vision, exhibiting remarkable zero-shot and few-shot capabilities in unseen tasks. Extensive experiments on 16 IR tasks and 26 benchmarks underscore the superiority of MPerceiver in terms of adaptiveness, generalizability and fidelity.
翻译:尽管已取得显著进展,全能图像恢复在处理复杂真实退化时仍面临持续挑战。本文提出MPerceiver:一种利用稳定扩散先验增强全能图像恢复自适应性、泛化性与保真度的新颖多模态提示学习方法。具体而言,我们开发了双分支模块以掌握两类SD提示:用于整体表征的文本提示与用于多尺度细节表征的视觉提示。两种提示均通过CLIP图像编码器的退化预测动态调整,从而实现对各类未知退化的自适应响应。此外,插件式细节精炼模块通过编码器到解码器的直接信息变换提升了恢复保真度。为评估该方法,MPerceiver在9项全能IR任务上训练,并在多数任务中优于目前最先进的专用方法。经过多任务预训练后,MPerceiver在低级视觉领域获得了泛化表征,在未见任务中展现出卓越的零样本和小样本能力。在16项IR任务与26个基准上的广泛实验充分证明了MPerceiver在自适应性、泛化性与保真度方面的优越性。