The 2D animation workflow is typically initiated with the creation of keyframes using sketch-based drawing. Subsequent inbetweens (i.e., intermediate sketch frames) are crafted through manual interpolation for smooth animations, which is a labor-intensive process. Thus, the prospect of automatic animation sketch interpolation has become highly appealing. However, existing video interpolation methods are generally hindered by two key issues for sketch inbetweening: 1) limited texture and colour details in sketches, and 2) exaggerated alterations between two sketch keyframes. To overcome these issues, we propose a novel deep learning method, namely Fine-to-Coarse Sketch Interpolation Network (FC-SIN). This approach incorporates multi-level guidance that formulates region-level correspondence, sketch-level correspondence and pixel-level dynamics. A multi-stream U-Transformer is then devised to characterize sketch inbewteening patterns using these multi-level guides through the integration of both self-attention and cross-attention mechanisms. Additionally, to facilitate future research on animation sketch inbetweening, we constructed a large-scale dataset - STD-12K, comprising 30 sketch animation series in diverse artistic styles. Comprehensive experiments on this dataset convincingly show that our proposed FC-SIN surpasses the state-of-the-art interpolation methods. Our code and dataset will be publicly available.
翻译:二维动画制作流程通常以基于草图绘制的关键帧创作为起点。后续的中间帧(即中间草图帧)需通过人工插值实现流畅动画,这是一个劳动密集型过程。因此,自动化动画草图插值的前景极具吸引力。然而,现有视频插值方法在草图插帧中普遍面临两个关键难题:1)草图缺乏充分的纹理与色彩细节;2)两个草图关键帧之间存在夸张的形变。为解决这些问题,我们提出了一种新颖的深度学习方法——精到粗草图插值网络(FC-SIN)。该方法引入了多层级引导机制,分别构建区域级对应关系、草图级对应关系及像素级动态特征。在此基础上,设计了一个多流U-Transformer,通过融合自注意力与交叉注意力机制,利用这些多层级引导来表征草图插帧模式。此外,为促进动画草图插帧领域的未来研究,我们构建了大规模数据集STD-12K,包含30种不同艺术风格的草图动画序列。在该数据集上的综合实验令人信服地表明,我们提出的FC-SIN超越了最先进的插值方法。我们的代码和数据集将公开提供。