Generalization has long been a central challenge in real-world image restoration. While recent diffusion-based restoration methods, which leverage generative priors from text-to-image models, have made progress in recovering more realistic details, they still encounter "generative capability deactivation" when applied to out-of-distribution real-world data. To address this, we propose using text as an auxiliary invariant representation to reactivate the generative capabilities of these models. We begin by identifying two key properties of text input: richness and relevance, and examine their respective influence on model performance. Building on these insights, we introduce Res-Captioner, a module that generates enhanced textual descriptions tailored to image content and degradation levels, effectively mitigating response failures. Additionally, we present RealIR, a new benchmark designed to capture diverse real-world scenarios. Extensive experiments demonstrate that Res-Captioner significantly enhances the generalization abilities of diffusion-based restoration models, while remaining fully plug-and-play.
翻译:泛化能力长期以来一直是真实世界图像复原的核心挑战。尽管近期基于扩散的复原方法通过利用文本到图像模型的生成先验,在恢复更真实细节方面取得了进展,但当应用于分布外真实世界数据时,这些方法仍会遇到"生成能力失活"问题。为解决这一问题,我们提出使用文本作为辅助不变表示来重新激活这些模型的生成能力。我们首先识别了文本输入的两个关键属性:丰富性和相关性,并考察了它们各自对模型性能的影响。基于这些发现,我们提出了Res-Captioner模块,该模块能根据图像内容和退化程度生成增强的文本描述,有效缓解响应失效问题。此外,我们提出了RealIR基准数据集,旨在捕捉多样化的真实世界场景。大量实验表明,Res-Captioner显著提升了基于扩散的复原模型的泛化能力,同时保持完全即插即用的特性。