Current approaches for 3D human motion synthesis generate high quality animations of digital humans performing a wide variety of actions and gestures. However, a notable technological gap exists in addressing the complex dynamics of multi human interactions within this paradigm. In this work, we present ReMoS, a denoising diffusion based model that synthesizes full body reactive motion of a person in a two person interaction scenario. Given the motion of one person, we employ a combined spatio temporal cross attention mechanism to synthesize the reactive body and hand motion of the second person, thereby completing the interactions between the two. We demonstrate ReMoS across challenging two person scenarios such as pair dancing, Ninjutsu, kickboxing, and acrobatics, where one persons movements have complex and diverse influences on the other. We also contribute the ReMoCap dataset for two person interactions containing full body and finger motions. We evaluate ReMoS through multiple quantitative metrics, qualitative visualizations, and a user study, and also indicate usability in interactive motion editing applications.
翻译:当前的三维人体运动合成方法能够生成数字人执行各种动作和手势的高质量动画。然而,在该范式下处理多人交互的复杂动力学仍存在显著的技术空白。本研究提出ReMoS——一种基于去噪扩散的模型,用于合成双人交互场景中个体的全身反应式运动。给定其中一人的运动,我们采用联合时空交叉注意力机制来合成第二人的反应式身体与手部运动,从而完善两者间的交互。我们在具有挑战性的双人场景(如双人舞蹈、忍术、踢拳和杂技)中验证ReMoS,这些场景中一方的动作会对另一方产生复杂多样的影响。同时,我们贡献了包含全身及手指运动的双人交互数据集ReMoCap。通过多项量化指标、定性可视化分析和用户研究对ReMoS进行评估,并展示了其在交互式运动编辑应用中的可用性。