Toon shading is a type of non-photorealistic rendering task of animation. Its primary purpose is to render objects with a flat and stylized appearance. As diffusion models have ascended to the forefront of image synthesis methodologies, this paper delves into an innovative form of toon shading based on diffusion models, aiming to directly render photorealistic videos into anime styles. In video stylization, extant methods encounter persistent challenges, notably in maintaining consistency and achieving high visual quality. In this paper, we model the toon shading problem as four subproblems: stylization, consistency enhancement, structure guidance, and colorization. To address the challenges in video stylization, we propose an effective toon shading approach called \textit{Diffutoon}. Diffutoon is capable of rendering remarkably detailed, high-resolution, and extended-duration videos in anime style. It can also edit the content according to prompts via an additional branch. The efficacy of Diffutoon is evaluated through quantitive metrics and human evaluation. Notably, Diffutoon surpasses both open-source and closed-source baseline approaches in our experiments. Our work is accompanied by the release of both the source code and example videos on Github (Project page: https://ecnu-cilab.github.io/DiffutoonProjectPage/).
翻译:卡通着色是动画中一种非真实感渲染任务,其主要目标是以平面化、风格化的方式渲染物体。随着扩散模型成为图像合成方法的前沿,本文探索了一种基于扩散模型的创新卡通着色形式,旨在直接将逼真视频渲染为动漫风格。在视频风格化领域,现有方法在保持一致性和实现高视觉质量方面仍面临持续挑战。本文将卡通着色问题建模为四个子问题:风格化、一致性增强、结构引导和色彩化。针对视频风格化的挑战,我们提出了一种名为Diffutoon的高效卡通着色方法。Diffutoon能够以动漫风格渲染出细节丰富、高分辨率且长时间的视频,并可通过额外分支根据提示编辑内容。通过定量指标和人工评估验证了Diffutoon的有效性。值得注意的是,Diffutoon在实验中超越了开源和闭源基线方法。本工作已在Github上发布源代码和示例视频(项目页面:https://ecnu-cilab.github.io/DiffutoonProjectPage/)。