We present MoE-DiffIR, an innovative universal compressed image restoration (CIR) method with task-customized diffusion priors. This intends to handle two pivotal challenges in the existing CIR methods: (i) lacking adaptability and universality for different image codecs, e.g., JPEG and WebP; (ii) poor texture generation capability, particularly at low bitrates. Specifically, our MoE-DiffIR develops the powerful mixture-of-experts (MoE) prompt module, where some basic prompts cooperate to excavate the task-customized diffusion priors from Stable Diffusion (SD) for each compression task. Moreover, the degradation-aware routing mechanism is proposed to enable the flexible assignment of basic prompts. To activate and reuse the cross-modality generation prior of SD, we design the visual-to-text adapter for MoE-DiffIR, which aims to adapt the embedding of low-quality images from the visual domain to the textual domain as the textual guidance for SD, enabling more consistent and reasonable texture generation. We also construct one comprehensive benchmark dataset for universal CIR, covering 21 types of degradations from 7 popular traditional and learned codecs. Extensive experiments on universal CIR have demonstrated the excellent robustness and texture restoration capability of our proposed MoE-DiffIR. The project can be found at https://renyulin-f.github.io/MoE-DiffIR.github.io/.
翻译:我们提出了MoE-DiffIR,一种创新的通用压缩图像复原方法,其核心在于任务定制化的扩散先验。该方法旨在解决现有CIR方法中的两个关键挑战:(i) 对不同图像编解码器(例如JPEG和WebP)缺乏适应性和普适性;(ii) 纹理生成能力不足,尤其是在低比特率条件下。具体而言,我们的MoE-DiffIR开发了强大的专家混合提示模块,其中若干基础提示协同工作,从Stable Diffusion中挖掘出针对每个压缩任务的任务定制化扩散先验。此外,我们提出了退化感知路由机制,以实现基础提示的灵活分配。为了激活并重用SD的跨模态生成先验,我们为MoE-DiffIR设计了视觉到文本适配器,旨在将低质量图像的嵌入从视觉域适配到文本域,作为SD的文本引导,从而实现更一致、更合理的纹理生成。我们还构建了一个用于通用CIR的综合基准数据集,涵盖了来自7种流行的传统和基于学习的编解码器的21种退化类型。在通用CIR任务上的大量实验表明,我们提出的MoE-DiffIR具有出色的鲁棒性和纹理复原能力。项目地址为:https://renyulin-f.github.io/MoE-DiffIR.github.io/。