In this paper, we introduce the problem of zero-shot text-guided exploration of the solutions to open-domain image super-resolution. Our goal is to allow users to explore diverse, semantically accurate reconstructions that preserve data consistency with the low-resolution inputs for different large downsampling factors without explicitly training for these specific degradations. We propose two approaches for zero-shot text-guided super-resolution - i) modifying the generative process of text-to-image \textit{T2I} diffusion models to promote consistency with low-resolution inputs, and ii) incorporating language guidance into zero-shot diffusion-based restoration methods. We show that the proposed approaches result in diverse solutions that match the semantic meaning provided by the text prompt while preserving data consistency with the degraded inputs. We evaluate the proposed baselines for the task of extreme super-resolution and demonstrate advantages in terms of restoration quality, diversity, and explorability of solutions.
翻译:本文提出了零样本文字引导下探索开放域图像超分辨率解的问题。我们的目标是让用户能够在不同大下采样因子下,无需针对特定退化进行显式训练,即可探索多样且语义精确的重建结果,同时保持与低分辨率输入的数据一致性。我们提出了两种零样本文字引导超分辨率方法:i) 修改文本到图像(T2I)扩散模型的生成过程,以促进与低分辨率输入的一致性;ii) 将语言引导融入基于零样本扩散的复原方法中。实验表明,所提方法在保持与退化输入数据一致性的同时,能产生符合文本提示语义含义的多样化解。我们在极端超分辨率任务上评估了所提基线方法,并在复原质量、多样性和解的可探索性方面展示了优势。