Synthesizing realistic animations of humans, animals, and even imaginary creatures, has long been a goal for artists and computer graphics professionals. Compared to the imaging domain, which is rich with large available datasets, the number of data instances for the motion domain is limited, particularly for the animation of animals and exotic creatures (e.g., dragons), which have unique skeletons and motion patterns. In this work, we present a Single Motion Diffusion Model, dubbed SinMDM, a model designed to learn the internal motifs of a single motion sequence with arbitrary topology and synthesize motions of arbitrary length that are faithful to them. We harness the power of diffusion models and present a denoising network designed specifically for the task of learning from a single input motion. Our transformer-based architecture avoids overfitting by using local attention layers that narrow the receptive field, and encourages motion diversity by using relative positional embedding. SinMDM can be applied in a variety of contexts, including spatial and temporal in-betweening, motion expansion, style transfer, and crowd animation. Our results show that SinMDM outperforms existing methods both in quality and time-space efficiency. Moreover, while current approaches require additional training for different applications, our work facilitates these applications at inference time. Our code and trained models are available at https://sinmdm.github.io/SinMDM-page.
翻译:合成人类、动物甚至虚构生物的真实动画,长期以来一直是艺术家和计算机图形专业人士的目标。与拥有丰富大型数据集的成像领域相比,运动领域的数据实例数量有限,尤其对于具有独特骨骼结构和运动模式的动物及奇异生物(例如龙)的动画而言更是如此。在本工作中,我们提出了一种单运动扩散模型,命名为SinMDM,该模型旨在学习任意拓扑结构的单个运动序列的内部模式,并合成与之相符的任意长度运动。我们利用扩散模型的强大能力,提出了一个专门针对从单一输入运动中学习任务设计的去噪网络。我们的基于变换器的架构通过使用窄化感受野的局部注意力层来避免过拟合,并通过使用相对位置嵌入来促进运动多样性。SinMDM可应用于多种场景,包括时空插帧、运动扩展、风格迁移和群体动画。结果表明,SinMDM在质量和时空效率上均优于现有方法。此外,当前方法需要为不同应用进行额外训练,而我们的工作可在推理时直接支持这些应用。我们的代码和预训练模型可在https://sinmdm.github.io/SinMDM-page获取。