We focus on the human-humanoid interaction task optionally with an object. We propose a new task named online full-body motion reaction synthesis, which generates humanoid reactions based on the human actor's motions. The previous work only focuses on human interaction without objects and generates body reactions without hand. Besides, they also do not consider the task as an online setting, which means the inability to observe information beyond the current moment in practical situations. To support this task, we construct two datasets named HHI and CoChair and propose a unified method. Specifically, we propose to construct a social affordance representation. We first select a social affordance carrier and use SE(3)-Equivariant Neural Networks to learn the local frame for the carrier, then we canonicalize the social affordance. Besides, we propose a social affordance forecasting scheme to enable the reactor to predict based on the imagined future. Experiments demonstrate that our approach can effectively generate high-quality reactions on HHI and CoChair. Furthermore, we also validate our method on existing human interaction datasets Interhuman and Chi3D.
翻译:我们聚焦于人类与类人机器人之间的交互任务,该任务可能涉及物体。我们提出一项名为“在线全身运动反应合成”的新任务,旨在根据人类主体的动作生成类人机器人反应。此前的研究仅关注无物体的人类交互,且生成的肢体反应不包括手部动作。此外,这些研究未将该任务视为在线场景,即在现实情况下无法获取当前时刻之外的信息。为支持该任务,我们构建了两个数据集HHI和CoChair,并提出一种统一方法。具体而言,我们提出构建社会可供性表示。首先选择社会可供性载体,利用SE(3)-等变神经网络学习载体的局部坐标系,进而对社会可供性进行规范化处理。此外,我们还提出社会可供性预测方案,使反应器能够基于想象未来进行预测。实验表明,我们的方法可在HHI和CoChair数据集上高效生成高质量反应。同时,我们也在现有的人类交互数据集Interhuman和Chi3D上验证了该方法。