We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To address this limitation, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion process. This allows us to generate images that are more faithful to the diffusion prior. In addition, we propose a method to keep the evolution of latent variables within the range space of the encoder, by projection. This helps to reduce image artifacts, a major problem when using latent diffusion models instead of pixel-based diffusion models. Our combined method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.
翻译:我们提出了一种新方法,将文本到图像的潜在扩散模型作为通用先验,用于解决成像逆问题。现有使用潜在扩散模型求解逆问题的方法通常依赖简单的空文本提示,这可能导致性能欠佳。为克服这一局限,我们引入了一种提示调优方法,该方法在运行反向扩散过程的同时,实时联合优化文本嵌入,从而生成更忠实于扩散先验的图像。此外,我们提出了一种通过投影保持潜在变量在编码器范围空间内演化的方法。这有助于减少图像伪影——当使用潜在扩散模型而非基于像素的扩散模型时,这是一个主要问题。我们结合后的方法称为P2L,在超分辨率、去模糊和图像修复等多种任务上,均优于基于图像扩散模型和潜在扩散模型的逆问题求解器。