Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausible semantics. To overcome the artifacts introduced by the diffusion prior, we apply the luminance conditional guidance. Moreover, we adopt multimodal high-level semantic priors to help the model understand the image content and deliver saturated colors. Besides, a luminance-aware decoder is designed to restore details and enhance overall visual quality. The proposed pipeline synthesizes saturated colors while maintaining plausible semantics. Experiments indicate that our proposed method considers both diversity and fidelity, surpassing previous methods in terms of perceptual realism and gain most human preference.
翻译:灰度图像着色能带来引人入胜的视觉体验。现有的自动着色方法常因语义色彩错误及色彩饱和度不足而难以生成令人满意的结果。本研究提出一种自动着色流程以应对上述挑战。我们利用扩散先验的卓越生成能力合成具有合理语义的色彩;为克服扩散先验引入的伪影,采用亮度条件引导策略。此外,引入多模态高层语义先验辅助模型理解图像内容并输出饱和色彩;同时设计亮度感知解码器以恢复细节并提升整体视觉质量。所提流程能在保持语义合理性的同时合成饱和色彩。实验表明,本方法兼顾多样性与保真度,在感知真实度上超越现有方法并赢得多数人类偏好。