Universal image restoration (UIR) aims to recover clean images from diverse and unknown degradations using a unified model. Existing UIR methods primarily focus on pixel reconstruction and often lack explicit diagnostic reasoning over degradation composition, severity, and scene semantics prior to restoration. We propose Reason and Restore (R\&R), a novel framework that integrates structured Chain-of-Thought (CoT) reasoning into the image restoration pipeline. R\&R introduces an explicit reasoner, implemented by fine-tuning Qwen3-VL, to diagnose degradation types, quantify degradation severity, infer key degradation-related factors, and describe relevant scene and object semantics. The resulting structured reasoning provides interpretable and fine-grained diagnostic priors for the restorer. To further improve restoration quality, the quantified degradation severity produced by the reasoner is leveraged as reinforcement learning (RL) signals to guide and strengthen the restorer. Unlike existing multimodal LLM-based agentic systems that decouple reasoning from low-level vision tasks, R\&R tightly couples semantic diagnostic reasoning with pixel-level restoration in a unified framework. Extensive experiments across diverse UIR benchmarks demonstrate that R\&R achieves state-of-the-art performance while offering unique interpretability into the restoration process.
翻译:通用图像复原旨在利用统一模型从多样且未知的退化中恢复出清晰图像。现有通用图像复原方法主要聚焦于像素重建,缺乏在复原前对退化组成、严重程度及场景语义的显式诊断性推理。我们提出"推理与复原"框架,该框架将结构化思维链推理集成到图像复原流程中。该框架通过微调Qwen3-VL实现显式推理模块,用于诊断退化类型、量化退化程度、推断关键退化相关因素,并描述相关场景与物体语义。所得结构化推理为复原器提供了可解释且细粒度的诊断先验。为提升复原质量,推理器生成的量化退化程度被用作强化学习信号,以引导并强化复原器。与现有将推理与低级视觉任务解耦的多模态大模型驱动系统不同,该框架在统一框架中紧密耦合语义诊断推理与像素级复原。在多个通用图像复原基准上的大量实验表明,该框架在实现最先进性能的同时,为复原过程提供了独特的可解释性。