Automatic colorization of line drawings has been widely studied to reduce the labor cost of hand-drawn anime production. Deep learning approaches, including image/video generation and feature-based correspondence, have improved accuracy but struggle with occlusions, pose variations, and viewpoint changes. To address these challenges, we propose DACoN, a framework that leverages foundation models to capture part-level semantics, even in line drawings. Our method fuses low-resolution semantic features from foundation models with high-resolution spatial features from CNNs for fine-grained yet robust feature extraction. In contrast to previous methods that rely on the Multiplex Transformer and support only one or two reference images, DACoN removes this constraint, allowing any number of references. Quantitative and qualitative evaluations demonstrate the benefits of using multiple reference images, achieving superior colorization performance. Our code and model are available at https://github.com/kzmngt/DACoN.
翻译:动漫线稿的自动上色技术已被广泛研究,旨在降低手绘动漫制作的劳动成本。深度学习方法,包括图像/视频生成和基于特征对应的技术,虽提升了上色精度,但在处理遮挡、姿态变化和视角改变时仍面临挑战。为解决这些问题,我们提出了DACoN框架,该框架利用基础模型捕获部件级语义信息,即使在线稿中也能实现。我们的方法将来自基础模型的低分辨率语义特征与来自CNN的高分辨率空间特征相融合,以实现细粒度且鲁棒的特征提取。与先前依赖Multiplex Transformer且仅支持一至两张参考图像的方法不同,DACoN移除了这一限制,允许使用任意数量的参考图像。定量与定性评估均证明了使用多张参考图像的优势,实现了更优的上色性能。我们的代码与模型已公开于 https://github.com/kzmngt/DACoN。