AnimeDiffusion: Anime Face Line Drawing Colorization via Diffusion Models

It is a time-consuming and tedious work for manually colorizing anime line drawing images, which is an essential stage in cartoon animation creation pipeline. Reference-based line drawing colorization is a challenging task that relies on the precise cross-domain long-range dependency modelling between the line drawing and reference image. Existing learning methods still utilize generative adversarial networks (GANs) as one key module of their model architecture. In this paper, we propose a novel method called AnimeDiffusion using diffusion models that performs anime face line drawing colorization automatically. To the best of our knowledge, this is the first diffusion model tailored for anime content creation. In order to solve the huge training consumption problem of diffusion models, we design a hybrid training strategy, first pre-training a diffusion model with classifier-free guidance and then fine-tuning it with image reconstruction guidance. We find that with a few iterations of fine-tuning, the model shows wonderful colorization performance, as illustrated in Fig. 1. For training AnimeDiffusion, we conduct an anime face line drawing colorization benchmark dataset, which contains 31696 training data and 579 testing data. We hope this dataset can fill the gap of no available high resolution anime face dataset for colorization method evaluation. Through multiple quantitative metrics evaluated on our dataset and a user study, we demonstrate AnimeDiffusion outperforms state-of-the-art GANs-based models for anime face line drawing colorization. We also collaborate with professional artists to test and apply our AnimeDiffusion for their creation work. We release our code on https://github.com/xq-meng/AnimeDiffusion.

翻译：手动为动漫线稿图像着色是一项耗时且繁琐的工作，而这正是卡通动画制作流程中的关键环节。基于参考图像的线稿着色是一项具有挑战性的任务，它依赖于线稿与参考图像之间精确的跨域长程依赖建模。现有的学习方法仍将生成对抗网络（GANs）作为其模型架构的核心模块之一。本文提出了一种名为AnimeDiffusion的新方法，利用扩散模型自动完成动漫人脸线稿着色。据我们所知，这是首个专为动漫内容创作定制的扩散模型。为解决扩散模型训练消耗巨大的问题，我们设计了一种混合训练策略：首先预训练一个无分类器引导的扩散模型，再通过图像重建引导进行微调。我们发现，仅需少量迭代微调，模型便展现出卓越的着色性能，如图1所示。为训练AnimeDiffusion，我们构建了一个动漫人脸线稿着色基准数据集，包含31696个训练数据和579个测试数据。我们希望该数据集能填补当前缺乏高分辨率动漫人脸数据集用于着色方法评估的空白。通过基于该数据集的多种定量指标评估和用户研究，我们证明AnimeDiffusion在动漫人脸线稿着色任务上优于基于GANs的最先进模型。我们还与专业艺术家合作，在其创作工作中测试并应用了AnimeDiffusion。我们将代码开源在https://github.com/xq-meng/AnimeDiffusion。