Language-based colorization produces plausible and visually pleasing colors under the guidance of user-friendly natural language descriptions. Previous methods implicitly assume that users provide comprehensive color descriptions for most of the objects in the image, which leads to suboptimal performance. In this paper, we propose a unified model to perform language-based colorization with any-level descriptions. We leverage the pretrained cross-modality generative model for its robust language understanding and rich color priors to handle the inherent ambiguity of any-level descriptions. We further design modules to align with input conditions to preserve local spatial structures and prevent the ghosting effect. With the proposed novel sampling strategy, our model achieves instance-aware colorization in diverse and complex scenarios. Extensive experimental results demonstrate our advantages of effectively handling any-level descriptions and outperforming both language-based and automatic colorization methods. The code and pretrained models are available at: https://github.com/changzheng123/L-CAD.
翻译:基于语言引导的着色方法在用户友好的自然语言描述指导下能够生成合理且视觉愉悦的色彩。现有方法隐含假设用户能为图像中大多数物体提供全面颜色描述,导致在描述不完整时性能欠佳。本文提出一种统一模型,可基于任意层级描述执行语言驱动着色。我们利用预训练的跨模态生成模型,凭借其强大的语言理解能力与丰富的颜色先验知识,应对任意层级描述固有的语义歧义性。进一步设计条件对齐模块以保持局部空间结构并避免伪影效应。通过提出的新颖采样策略,本模型可在多样复杂场景中实现实例感知着色。大量实验结果表明,本方法能有效处理任意层级描述,且性能优于基于语言描述与自动着色的既有方法。代码与预训练模型已开源至:https://github.com/changzheng123/L-CAD