Most existing image restoration methods use neural networks to learn strong image-level priors from huge data to estimate the lost information. However, these works still struggle in cases when images have severe information deficits. Introducing external priors or using reference images to provide information also have limitations in the application domain. In contrast, text input is more readily available and provides information with higher flexibility. In this work, we design an effective framework that allows the user to control the restoration process of degraded images with text descriptions. We use the text-image feature compatibility of the CLIP to alleviate the difficulty of fusing text and image features. Our framework can be used for various image restoration tasks, including image inpainting, image super-resolution, and image colorization. Extensive experiments demonstrate the effectiveness of our method.
翻译:现有图像恢复方法大多利用神经网络从海量数据中学习强大的图像级先验知识,以估计缺失信息。然而,当图像存在严重信息缺失时,这些方法仍面临挑战。引入外部先验或使用参考图像提供信息在应用领域也存在局限性。相比之下,文本输入更易获取,并能以更高的灵活性提供信息。在本工作中,我们设计了一个有效框架,允许用户通过文本描述控制退化图像的恢复过程。我们利用CLIP的文本-图像特征兼容性减轻文本与图像特征融合的难度。该框架可应用于多种图像恢复任务,包括图像修复、图像超分辨率和图像着色。大量实验证明了我们方法的有效性。