Learning 3D human-object interaction relation is pivotal to embodied AI and interaction modeling. Most existing methods approach the goal by learning to predict isolated interaction elements, e.g., human contact, object affordance, and human-object spatial relation, primarily from the perspective of either the human or the object. Which underexploit certain correlations between the interaction counterparts (human and object), and struggle to address the uncertainty in interactions. Actually, objects' functionalities potentially affect humans' interaction intentions, which reveals what the interaction is. Meanwhile, the interacting humans and objects exhibit matching geometric structures, which presents how to interact. In light of this, we propose harnessing these inherent correlations between interaction counterparts to mitigate the uncertainty and jointly anticipate the above interaction elements in 3D space. To achieve this, we present LEMON (LEarning 3D huMan-Object iNteraction relation), a unified model that mines interaction intentions of the counterparts and employs curvatures to guide the extraction of geometric correlations, combining them to anticipate the interaction elements. Besides, the 3D Interaction Relation dataset (3DIR) is collected to serve as the test bed for training and evaluation. Extensive experiments demonstrate the superiority of LEMON over methods estimating each element in isolation.
翻译:摘要:学习三维人-物交互关系对于具身人工智能与交互建模至关重要。现有方法大多通过预测孤立的交互元素(如人体接触点、物体可供性以及人-物空间关系)来实现该目标,且主要从人体或物体的单一视角出发。这类方法未能充分利用交互双方(人与物体)间的特定关联,难以应对交互中的不确定性。实际上,物体的功能会潜在影响人类的交互意图,这揭示了交互的本质内容;同时,交互中的人与物体呈现出匹配的几何结构,这展示了交互的具体方式。基于此,我们提出利用交互双方间的内在关联来降低不确定性,并在三维空间中联合预测上述交互元素。为此,我们设计了LEMON(三维人-物交互关系学习)统一模型:该模型挖掘交互双方的意图,并利用曲率引导几何相关性提取,通过融合两者来预测交互元素。此外,我们构建了三维交互关系数据集(3DIR)作为训练与评估的测试平台。大量实验表明,相较于各元素独立估计的方法,LEMON展现出显著优越性。