Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization. Despite advancements in visual colorization using large-scale generative models like diffusion models, challenges with controllability and identity consistency persist, making current solutions unsuitable for industrial application.To address this, we propose ColorFlow, a three-stage diffusion-based framework tailored for image sequence colorization in industrial applications. Unlike existing methods that require per-ID finetuning or explicit ID embedding extraction, we propose a novel robust and generalizable Retrieval Augmented Colorization pipeline for colorizing images with relevant color references. Our pipeline also features a dual-branch design: one branch for color identity extraction and the other for colorization, leveraging the strengths of diffusion models. We utilize the self-attention mechanism in diffusion models for strong in-context learning and color identity matching. To evaluate our model, we introduce ColorFlow-Bench, a comprehensive benchmark for reference-based colorization. Results show that ColorFlow outperforms existing models across multiple metrics, setting a new standard in sequential image colorization and potentially benefiting the art industry. We release our codes and models on our project page: https://zhuang2002.github.io/ColorFlow/.
翻译:自动为黑白图像序列着色同时保持角色和物体身份(ID)一致性,是一项具有重要市场需求(如卡通或漫画系列着色)的复杂任务。尽管基于扩散模型等大规模生成模型的视觉着色技术已取得进展,但可控性与身份一致性问题依然存在,使得现有方案难以适用于工业场景。为此,我们提出ColorFlow——一个专为工业应用图像序列着色设计的三阶段扩散模型框架。不同于现有方法需要针对每个ID进行微调或显式提取ID嵌入,我们提出了一种新颖的鲁棒且可泛化的检索增强着色流程,通过相关色彩参考图像实现着色。该流程采用双分支设计:一个分支用于色彩身份提取,另一个分支负责着色,充分发挥了扩散模型的优势。我们利用扩散模型中的自注意力机制实现强上下文学习与色彩身份匹配。为评估模型性能,我们构建了ColorFlow-Bench——一个全面的基于参考图像的着色基准测试。实验结果表明,ColorFlow在多项指标上超越现有模型,为序列图像着色设立了新标准,有望为艺术产业带来实际效益。相关代码与模型已在项目页面发布:https://zhuang2002.github.io/ColorFlow/。