Synthesizing realistic animations of humans, animals, and even imaginary creatures, has long been a goal for artists and computer graphics professionals. Compared to the imaging domain, which is rich with large available datasets, the number of data instances for the motion domain is limited, particularly for the animation of animals and exotic creatures (e.g., dragons), which have unique skeletons and motion patterns. In this work, we present a Single Motion Diffusion Model, dubbed SinMDM, a model designed to learn the internal motifs of a single motion sequence with arbitrary topology and synthesize motions of arbitrary length that are faithful to them. We harness the power of diffusion models and present a denoising network explicitly designed for the task of learning from a single input motion. SinMDM is designed to be a lightweight architecture, which avoids overfitting by using a shallow network with local attention layers that narrow the receptive field and encourage motion diversity. SinMDM can be applied in various contexts, including spatial and temporal in-betweening, motion expansion, style transfer, and crowd animation. Our results show that SinMDM outperforms existing methods both in quality and time-space efficiency. Moreover, while current approaches require additional training for different applications, our work facilitates these applications at inference time. Our code and trained models are available at https://sinmdm.github.io/SinMDM-page.
翻译:合成人类、动物乃至虚构生物的逼真动画,长期以来一直是艺术家和计算机图形专业人士的目标。与拥有大量可用数据集的成像领域相比,动作领域的数据实例数量有限,尤其是对于具有独特骨骼和运动模式的动物及异域生物(例如龙)的动画。在这项工作中,我们提出了一种单动作扩散模型(SinMDM),该模型旨在学习任意拓扑结构的单条动作序列的内部模式,并合成忠实于这些模式的任意长度动作。我们利用扩散模型的强大能力,提出了一种专门为从单条输入动作中学习而设计的去噪网络。SinMDM采用轻量级架构,通过使用带有局部注意力层的浅层网络来避免过拟合,该网络缩小了感受野并促进了动作多样性。SinMDM可应用于多种场景,包括时空插值、动作扩展、风格迁移和人群动画。我们的结果显示,SinMDM在质量和时空效率上均优于现有方法。此外,当前方法需要针对不同应用进行额外训练,而我们的工作可在推理时直接支持这些应用。我们的代码和训练模型可在https://sinmdm.github.io/SinMDM-page获取。