Existing automatic approaches for 3D virtual character motion synthesis supporting scene interactions do not generalise well to new objects outside training distributions, even when trained on extensive motion capture datasets with diverse objects and annotated interactions. This paper addresses this limitation and shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object. We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object. Given an unseen object and a reference pose-object pair, we optimise for the object-aware pose that is closest in the feature space to the reference pose. Finally, we use l-NSM, i.e., our motion generation model that is trained to seamlessly transition from locomotion to object interaction with the proposed bidirectional pose blending scheme. Through comprehensive numerical comparisons to state-of-the-art methods and in a user study, we demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects. Our project page is available at https://vcai.mpi-inf.mpg.de/projects/ROAM/.
翻译:现有支持场景交互的三维虚拟角色运动合成自动方法,即使在大规模包含多样对象和标注交互的动作捕捉数据集上训练,仍难以泛化至训练分布之外的新对象。本文针对这一局限性,证明通过仅使用一个参考对象训练运动模型,即可实现三维对象感知角色合成中对新场景对象的鲁棒性与泛化能力。我们利用基于对象数据集训练的隐式特征表示,该表示在对象周围编码SE(3)-等变描述子场。给定未见对象和参考姿态-对象对,我们在特征空间中优化与参考姿态最接近的对象感知姿态。最后,我们采用l-NSM(即我们提出的运动生成模型)结合所设计的双向姿态混合方案,训练该模型实现从运动到对象交互的无缝过渡。通过与最先进方法的全面数值对比及用户研究,我们证明该方法在三维虚拟角色运动与交互质量、以及对未见对象场景的鲁棒性方面均有显著提升。项目主页:https://vcai.mpi-inf.mpg.de/projects/ROAM/。