Enforcing alignment between the internal representations of diffusion or flow-based generative models and those of pretrained self-supervised encoders has recently been shown to provide a powerful inductive bias, improving both convergence and sample quality. In this work, we extend this idea to inverse problems, where pretrained generative models are employed as priors. We propose applying representation alignment (REPA) between diffusion or flow-based models and a DINOv2 visual encoder, to guide the reconstruction process at inference time. Although ground-truth signals are unavailable in inverse problems, we empirically show that aligning model representations of approximate target features can substantially enhance reconstruction quality and perceptual realism. We provide theoretical results showing (a) that REPA regularization can be viewed as a variational approach for minimizing a divergence measure in the DINOv2 embedding space, and (b) how under certain regularity assumptions REPA updates steer the latent diffusion states toward those of the clean image. These results offer insights into the role of REPA in improving perceptual fidelity. Finally, we demonstrate the generality of our approach by We integrate REPA into multiple state-of-the-art inverse problem solvers, and provide extensive experiments on super-resolution, box inpainting, Gaussian deblurring, and motion deblurring confirming that our method consistently improves reconstruction quality, while also providing efficiency gains reducing the number of required discretization steps.
翻译:在扩散模型或基于流的生成模型与预训练自监督编码器的内部表征之间实施对齐,已被证明能够提供强大的归纳偏置,从而改善收敛性与样本质量。在本工作中,我们将这一思想拓展至逆问题领域,其中预训练的生成模型被用作先验。我们提出在推理阶段应用扩散或流模型与DINOv2视觉编码器之间的表征对齐(REPA),以指导重建过程。尽管在逆问题中无法获得真实信号,但我们通过实验表明,对齐模型对近似目标特征的表征能够显著提升重建质量与感知真实感。我们提供了理论结果证明:(a)REPA正则化可被视为在DINOv2嵌入空间中最小化散度度量的变分方法;(b)在特定正则性假设下,REPA更新如何引导潜在扩散状态朝向清晰图像的状态。这些结果为理解REPA在提升感知保真度中的作用提供了见解。最后,我们通过将REPA集成至多种先进的逆问题求解器中,展示了该方法的普适性,并在超分辨率、矩形区域修复、高斯去模糊与运动去模糊任务上进行了大量实验,结果证实我们的方法能持续提升重建质量,同时通过减少所需离散化步骤数提高了计算效率。