We focus on the human-humanoid interaction task optionally with an object. We propose a new task named online full-body motion reaction synthesis, which generates humanoid reactions based on the human actor's motions. The previous work only focuses on human interaction without objects and generates body reactions without hand. Besides, they also do not consider the task as an online setting, which means the inability to observe information beyond the current moment in practical situations. To support this task, we construct two datasets named HHI and CoChair and propose a unified method. Specifically, we propose to construct a social affordance representation. We first select a social affordance carrier and use SE(3)-Equivariant Neural Networks to learn the local frame for the carrier, then we canonicalize the social affordance. Besides, we propose a social affordance forecasting scheme to enable the reactor to predict based on the imagined future. Experiments demonstrate that our approach can effectively generate high-quality reactions on HHI and CoChair. Furthermore, we also validate our method on existing human interaction datasets Interhuman and Chi3D.
翻译:我们聚焦于人或物体参与的⼈-⼈形机器人交互任务。提出了一种名为"在线全身运动反应合成"的新任务,即根据人类行为者的动作生成人形机器人的反应。以往研究仅关注无物体参与的人机交互,且生成的身体反应不包含手部动作。此外,这些研究未将该任务设定为在线模式,这意味着在实际场景中无法观测当前时刻之外的信息。为支撑该任务,我们构建了HHI和CoChair两个数据集,并提出了一种统一方法。具体而言,我们构建了社交可供性表征:首先选择社交可供性载体,利用SE(3)-等变神经网络学习载体的局部坐标系,进而实现社交可供性的规范化;同时提出社交可供性预测方案,使反应器能够基于想象的未来进行预测。实验表明,我们的方法可在HHI和CoChair上有效生成高质量反应。此外,在现有⼈类交互数据集Interhuman和Chi3D上也验证了该方法的有效性。