Despite the existence of numerous colorization methods, several limitations still exist, such as lack of user interaction, inflexibility in local colorization, unnatural color rendering, insufficient color variation, and color overflow. To solve these issues, we introduce Control Color (CtrlColor), a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model, offering promising capabilities in highly controllable interactive image colorization. While several diffusion-based methods have been proposed, supporting colorization in multiple modalities remains non-trivial. In this study, we aim to tackle both unconditional and conditional image colorization (text prompts, strokes, exemplars) and address color overflow and incorrect color within a unified framework. Specifically, we present an effective way to encode user strokes to enable precise local color manipulation and employ a practical way to constrain the color distribution similar to exemplars. Apart from accepting text prompts as conditions, these designs add versatility to our approach. We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring. Extensive comparisons show that our model outperforms state-of-the-art image colorization methods both qualitatively and quantitatively.
翻译:尽管已存在众多图像着色方法,但仍存在用户交互缺失、局部着色缺乏灵活性、色彩渲染不自然、色彩变化不足以及色彩溢出等局限性。为解决这些问题,我们提出了Control Color(CtrlColor)——一种利用预训练稳定扩散模型的多模态着色方法,在高度可控的交互式图像着色中展现出卓越性能。虽然已有若干基于扩散的方法被提出,但在多模态框架下支持着色仍颇具挑战。本研究旨在统一框架内同时处理无条件和条件图像着色(包括文本提示、笔触、示例图像),并解决色彩溢出与错误着色问题。具体而言,我们提出了一种有效编码用户笔触的方法以实现精准局部色彩操控,并采用实用策略约束色彩分布使其接近示例图像。除了支持文本提示作为条件输入外,这些设计还增强了方法的通用性。此外,我们基于自注意力机制和内容引导的可变形自编码器构建了新型模块,用于解决长期存在的色彩溢出与着色不准确问题。大量对比实验表明,本方法在定性和定量层面均优于当前最先进的图像着色方法。