Motion in-betweening, a fundamental task in character animation, consists of generating motion sequences that plausibly interpolate user-provided keyframe constraints. It has long been recognized as a labor-intensive and challenging process. We investigate the potential of diffusion models in generating diverse human motions guided by keyframes. Unlike previous inbetweening methods, we propose a simple unified model capable of generating precise and diverse motions that conform to a flexible range of user-specified spatial constraints, as well as text conditioning. To this end, we propose Conditional Motion Diffusion In-betweening (CondMDI) which allows for arbitrary dense-or-sparse keyframe placement and partial keyframe constraints while generating high-quality motions that are diverse and coherent with the given keyframes. We evaluate the performance of CondMDI on the text-conditioned HumanML3D dataset and demonstrate the versatility and efficacy of diffusion models for keyframe in-betweening. We further explore the use of guidance and imputation-based approaches for inference-time keyframing and compare CondMDI against these methods.
翻译:运动插值是角色动画中的一项基础任务,其核心在于生成能够合理插值用户提供的关键帧约束的运动序列。长期以来,该任务一直被认为是一个劳动密集型且具有挑战性的过程。我们研究了扩散模型在关键帧引导下生成多样化人体运动的潜力。与以往的插值方法不同,我们提出了一个简单而统一的模型,该模型能够生成精确且多样化的运动,这些运动不仅符合用户指定的灵活空间约束范围,还能与文本条件相结合。为此,我们提出了条件运动扩散插值(CondMDI),它允许任意稠密或稀疏的关键帧布局以及部分关键帧约束,同时生成高质量、多样化且与给定关键帧连贯一致的运动。我们在文本条件数据集HumanML3D上评估了CondMDI的性能,并展示了扩散模型在关键帧插值任务中的多功能性和高效性。我们进一步探索了基于引导和填补的方法在推理时关键帧处理中的应用,并将CondMDI与这些方法进行了比较。