We present a novel method for populating 3D indoor scenes with virtual humans that can navigate the environment and interact with objects in a realistic manner. Existing approaches rely on high-quality training sequences that capture a diverse range of human motions in 3D scenes. However, such motion data is costly, difficult to obtain and can never cover the full range of plausible human-scene interactions in complex indoor environments. To address these challenges, we propose a reinforcement learning-based approach to learn policy networks that predict latent variables of a powerful generative motion model that is trained on a large-scale motion capture dataset (AMASS). For navigating in a 3D environment, we propose a scene-aware policy training scheme with a novel collision avoidance reward function. Combined with the powerful generative motion model, we can synthesize highly diverse human motions navigating 3D indoor scenes, meanwhile effectively avoiding obstacles. For detailed human-object interactions, we carefully curate interaction-aware reward functions by leveraging a marker-based body representation and the signed distance field (SDF) representation of the 3D scene. With a number of important training design schemes, our method can synthesize realistic and diverse human-object interactions (e.g.,~sitting on a chair and then getting up) even for out-of-distribution test scenarios with different object shapes, orientations, starting body positions, and poses. Experimental results demonstrate that our approach outperforms state-of-the-art human-scene interaction synthesis frameworks in terms of both motion naturalness and diversity. Video results are available on the project page: https://zkf1997.github.io/DIMOS.
翻译:我们提出了一种新颖的方法,用于在三维室内场景中填充虚拟人物,使其能够以逼真的方式在环境中导航并与物体交互。现有方法依赖于高质量的训练序列,这些序列捕捉了三维场景中多样的人体运动。然而,此类运动数据成本高昂、难以获取,且永远无法涵盖复杂室内环境中所有合理的人-场景交互。为应对这些挑战,我们提出一种基于强化学习的方法来训练策略网络,以预测一个强大的生成式运动模型的潜在变量,该模型基于大规模运动捕捉数据集(AMASS)进行训练。针对三维环境中的导航,我们提出了一种场景感知的策略训练方案,并引入了新颖的避碰奖励函数。结合强大的生成式运动模型,我们能够合成高度多样化的人体运动,使其在三维室内场景中导航,同时有效避开障碍物。对于详细的人-物交互,我们利用基于标志的人体表示和三维场景的有符号距离场(SDF)表示,精心设计了交互感知奖励函数。通过一系列重要的训练设计方案,我们的方法能够合成逼真且多样化的人-物交互(例如:坐在椅子上然后起身),即使对于不同物体形状、朝向、起始身体位置和姿态的分布外测试场景也能生成。实验结果表明,我们的方法在运动自然度和多样性方面均优于最先进的人-场景交互合成框架。项目页面提供视频结果:https://zkf1997.github.io/DIMOS。