We address the problem of generating realistic 3D motions of humans interacting with objects in a scene. Our key idea is to create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input. This interaction field guides the sampling of an object-conditioned human motion diffusion model, so as to encourage plausible contacts and affordance semantics. To support interactions with scarcely available data, we propose an automated synthetic data pipeline. For this, we seed a pre-trained motion model, which has priors for the basics of human movement, with interaction-specific anchor poses extracted from limited motion capture data. Using our guided diffusion model trained on generated synthetic data, we synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion. We call our framework NIFTY: Neural Interaction Fields for Trajectory sYnthesis.
翻译:我们致力于解决在场景中生成人与物体交互的逼真3D运动问题。核心思路是为特定物体创建神经交互场,该场以人体姿态为输入,输出到有效交互流形的距离。此交互场可引导基于物体条件的人体运动扩散模型的采样过程,从而促进合理接触与功能语义的生成。针对数据稀缺场景下的交互建模,我们提出自动化合成数据流水线:首先从有限运动捕捉数据中提取交互特定锚点姿态,并以此初始化具备人体运动基本先验的预训练运动模型;随后利用生成数据训练的引导扩散模型,成功合成与多物体交互的坐姿和抬举动作,在运动质量与动作完成度上均优于现有方案。我们将该框架命名为NIFTY:神经交互场引导轨迹合成(Neural Interaction Fields for Trajectory sYnthesis)。