Current approaches for 3D human motion synthesis generate high-quality animations of digital humans performing a wide variety of actions and gestures. However, a notable technological gap exists in addressing the complex dynamics of multi-human interactions within this paradigm. In this work, we present ReMoS, a denoising diffusion-based model that synthesizes full-body reactive motion of a person in a two-person interaction scenario. Assuming the motion of one person is given, we employ a combined spatio-temporal cross-attention mechanism to synthesize the reactive body and hand motion of the second person, thereby completing the interactions between the two. We demonstrate ReMoS across challenging two-person scenarios such as pair-dancing, Ninjutsu, kickboxing, and acrobatics, where one person's movements have complex and diverse influences on the other. We also contribute the ReMoCap dataset for two-person interactions containing full-body and finger motions. We evaluate ReMoS through multiple quantitative metrics, qualitative visualizations, and a user study, and also indicate usability in interactive motion editing applications.
翻译:当前的三维人体运动合成方法能够生成执行多种动作和姿态的数字人高质量动画。然而,在解决多人体交互的复杂动力学方面仍存在显著的技术空白。本文提出ReMoS——一种基于去噪扩散的模型,用于合成双人交互场景中个体的全身反应性运动。在给定其中一人运动轨迹的前提下,我们采用联合时空交叉注意力机制,合成第二人的反应性身体与手部运动,从而完成两者间的交互。我们通过双人舞蹈、忍术、踢拳和杂技等具有挑战性的双人场景验证了ReMoS的有效性,这些场景中一人的运动对另一人产生复杂多样的影响。此外,我们贡献了包含全身与手指运动的双人交互ReMoCap数据集。通过多项量化指标、定性可视化与用户研究,我们评估了ReMoS的性能,并展示了其在交互式运动编辑应用中的可用性。