We introduce AffordanceGrasp-R1, a reasoning-driven affordance segmentation framework for robotic grasping that combines a chain-of-thought (CoT) cold-start strategy with reinforcement learning to enhance deduction and spatial grounding. In addition, we redesign the grasping pipeline to be more context-aware by generating grasp candidates from the global scene point cloud and subsequently filtering them using instruction-conditioned affordance masks. Extensive experiments demonstrate that AffordanceGrasp-R1 consistently outperforms state-of-the-art (SOTA) methods on benchmark datasets, and real-world robotic grasping evaluations further validate its robustness and generalization under complex language-conditioned manipulation scenarios.
翻译:本文提出AffordanceGrasp-R1,一种用于机器人抓取的推理驱动型功用分割框架。该框架将思维链(CoT)冷启动策略与强化学习相结合,以增强推理能力与空间定位精度。此外,我们重新设计了抓取流程,使其更具上下文感知能力:首先从全局场景点云生成抓取候选,随后利用指令条件化的功用掩码对这些候选进行筛选。大量实验表明,AffordanceGrasp-R1在基准数据集上持续优于当前最先进(SOTA)方法;真实世界中的机器人抓取评估进一步验证了其在复杂语言条件操控场景下的鲁棒性与泛化能力。