Image colorization has been attracting the research interests of the community for decades. However, existing methods still struggle to provide satisfactory colorized results given grayscale images due to a lack of human-like global understanding of colors. Recently, large-scale Text-to-Image (T2I) models have been exploited to transfer the semantic information from the text prompts to the image domain, where text provides a global control for semantic objects in the image. In this work, we introduce a colorization model piggybacking on the existing powerful T2I diffusion model. Our key idea is to exploit the color prior knowledge in the pre-trained T2I diffusion model for realistic and diverse colorization. A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model to output a latent color prior that conforms to the visual semantics of the grayscale input. A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image. Our model can also achieve conditional colorization with additional inputs (e.g. user hints and texts). Extensive experiments show that our method achieves state-of-the-art performance in terms of perceptual quality.
翻译:图像彩色化数十年来一直吸引着学术界的关注。然而,由于缺乏类人化的全局色彩理解能力,现有方法在处理灰度图像时仍难以提供令人满意的彩色化结果。近年来,大规模文生图(Text-to-Image, T2I)模型被用于将文本提示中的语义信息迁移至图像域,其中文本为图像中的语义物体提供了全局控制。本工作提出了一种搭载于现有强大T2I扩散模型的彩色化模型。核心思想是利用预训练T2I扩散模型中的色彩先验知识,实现逼真且多样化的彩色化。我们设计了一个扩散引导器,通过整合潜在扩散模型的预训练权重,输出与灰度输入视觉语义一致的潜在色彩先验。随后,一种亮度感知型VQVAE将生成与给定灰度图像逐像素精确对齐的彩色化结果。该模型还可通过额外输入(如用户提示和文本)实现条件式彩色化。大量实验表明,本方法在感知质量上达到了最先进水平。