Visual affordance segmentation identifies the surfaces of an object an agent can interact with. Common challenges for the identification of affordances are the variety of the geometry and physical properties of these surfaces as well as occlusions. In this paper, we focus on occlusions of an object that is hand-held by a person manipulating it. To address this challenge, we propose an affordance segmentation model that uses auxiliary branches to process the object and hand regions separately. The proposed model learns affordance features under hand-occlusion by weighting the feature map through hand and object segmentation. To train the model, we annotated the visual affordances of an existing dataset with mixed-reality images of hand-held containers in third-person (exocentric) images. Experiments on both real and mixed-reality images show that our model achieves better affordance segmentation and generalisation than existing models.
翻译:视觉可供性分割旨在识别智能体可以与之交互的物体表面区域。此类表面几何形态与物理属性的多样性,以及遮挡问题,是可供性识别的常见挑战。本文聚焦于人类操作过程中手持物体所引发的遮挡问题。为应对这一挑战,我们提出一种基于辅助分支分别处理物体与手部区域的可供性分割模型。该模型通过手部与物体分割结果对特征图进行加权,从而学习手部遮挡条件下的可供性特征。为训练模型,我们对现有数据集的混合现实图像进行视觉可供性标注,这些图像包含第三人称(外视角)视角下的手持容器。在真实图像与混合现实图像上的实验表明,相较于现有模型,我们的模型在可供性分割与泛化性能上均表现更优。