Learning a precise robotic grasping policy is crucial for embodied agents operating in complex real-world manipulation tasks. Despite significant advancements, most models still struggle with accurate spatial positioning of objects to be grasped. We first show that this spatial generalization challenge stems primarily from the extensive data requirements for adequate spatial understanding. However, collecting such data with real robots is prohibitively expensive, and relying on simulation data often leads to visual generalization gaps upon deployment. To overcome these challenges, we then focus on state-based policy generalization and present \textbf{ManiBox}, a novel bounding-box-guided manipulation method built on a simulation-based teacher-student framework. The teacher policy efficiently generates scalable simulation data using bounding boxes, which are proven to uniquely determine the objects' spatial positions. The student policy then utilizes these low-dimensional spatial states to enable zero-shot transfer to real robots. Through comprehensive evaluations in simulated and real-world environments, ManiBox demonstrates a marked improvement in spatial grasping generalization and adaptability to diverse objects and backgrounds. Further, our empirical study into scaling laws for policy performance indicates that spatial volume generalization scales positively with data volume. For a certain level of spatial volume, the success rate of grasping empirically follows Michaelis-Menten kinetics relative to data volume, showing a saturation effect as data increases. Our videos and code are available in https://thkkk.github.io/manibox.
翻译:学习精确的机器人抓取策略对于在复杂现实世界操作任务中运行的具身智能体至关重要。尽管取得了显著进展,大多数模型在待抓取物体的精确定位方面仍面临困难。我们首先证明这种空间泛化挑战主要源于充分空间理解所需的大量数据需求。然而,使用真实机器人收集此类数据成本极高,而依赖仿真数据在部署时往往会导致视觉泛化差距。为克服这些挑战,我们随后聚焦于基于状态的策略泛化,并提出\textbf{ManiBox}——一种基于仿真师生框架构建的新型边界框引导操作方法。教师策略利用边界框高效生成可扩展的仿真数据,这些边界框被证明能唯一确定物体的空间位置。学生策略随后利用这些低维空间状态实现向真实机器人的零样本迁移。通过在仿真和现实环境中的综合评估,ManiBox在空间抓取泛化能力以及对不同物体和背景的适应性方面展现出显著提升。此外,我们对策略性能缩放规律的实证研究表明,空间体积泛化能力随数据量呈正比增长。对于特定水平的空间体积,抓取成功率相对于数据量经验性地遵循米氏动力学规律,显示出随数据增长而出现的饱和效应。我们的演示视频与代码公开于 https://thkkk.github.io/manibox。