Perceiving and manipulating 3D articulated objects in diverse environments is essential for home-assistant robots. Recent studies have shown that point-level affordance provides actionable priors for downstream manipulation tasks. However, existing works primarily focus on single-object scenarios with homogeneous agents, overlooking the realistic constraints imposed by the environment and the agent's morphology, e.g., occlusions and physical limitations. In this paper, we propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints. Unlike object-centric affordance approaches, learning environment-aware affordance faces the challenge of combinatorial explosion due to the complexity of various occlusions, characterized by their quantities, geometries, positions and poses. To address this and enhance data efficiency, we introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations. Experiments demonstrate the effectiveness of our proposed approach in learning affordance considering environment constraints.
翻译:感知并操作多样环境中的三维铰接物体是家庭辅助机器人的关键能力。近期研究表明,点级功能可供性(affordance)能为下游操作任务提供可行动先验。然而,现有研究主要聚焦于同质智能体与单一物体场景,忽视了环境及智能体形态带来的现实约束,例如遮挡与物理限制。本文提出一种环境感知功能可供性框架,该框架融合了物体级可行动先验与环境约束。与以物体为中心的功能可供性方法不同,学习环境感知功能可供性面临因遮挡复杂性(包括遮挡物数量、几何形状、位置与姿态)导致的组合爆炸挑战。为解决这一问题并提升数据效率,我们引入一种新颖的对比学习功能可供性框架,该框架可在仅含单个遮挡物的场景上训练,并泛化至含复杂遮挡物组合的场景。实验表明,所提方法在考虑环境约束下学习功能可供性的有效性。