Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for AiOR. In this work, we show that these pre-trained diffusion models inherently possess restoration behavior, which can be unlocked by directly learning prompt embeddings at the output of the text encoder. Interestingly, this behavior is largely inaccessible through text prompts and text-token embedding optimization. Furthermore, we observe that naive prompt learning is unstable because the forward noising process using degraded images is misaligned with the reverse sampling trajectory. To resolve this, we train prompts within a diffusion bridge formulation that aligns training and inference dynamics, enforcing a coherent denoising path from noisy degraded states to clean images. Building on these insights, we introduce our lightweight learned prompts on the pre-trained WAN video model and FLUX image models, converting them into high-performing restoration models. Extensive experiments demonstrate that our approach achieves competitive performance and generalization across diverse degradations, while avoiding fine-tuning and restoration-specific control modules.
翻译:预训练扩散模型在全能图像恢复(AiOR)领域取得了显著进展,提供了更优的感知质量和泛化能力。然而,基于扩散的恢复方法主要依赖微调或Control-Net风格模块来利用预训练扩散模型的先验知识进行AiOR。在这项工作中,我们表明这些预训练扩散模型本身具备恢复能力,而这一能力可通过直接学习文本编码器输出端的提示嵌入来解锁。有趣的是,这种能力在很大程度上无法通过文本提示或文本标记嵌入优化来获得。此外,我们观察到朴素的提示学习并不稳定,因为使用退化图像的正向加噪过程与反向采样轨迹存在不匹配。为解决此问题,我们在扩散桥框架内训练提示,该框架对齐了训练与推理动态过程,强制形成一条从噪声退化状态到清晰图像的一致去噪路径。基于这些发现,我们在预训练的WAN视频模型和FLUX图像模型上引入了轻量级学习提示,将其转化为高性能恢复模型。大量实验表明,我们的方法在多种退化场景下实现了具有竞争力的性能与泛化能力,同时避免了微调和恢复专用控制模块的使用。