RGBSQGrasp: Inferring Local Superquadric Primitives from Single RGB Image for Graspability-Aware Bin Picking

Bin picking is a challenging robotic task due to occlusions and physical constraints that limit visual information for object recognition and grasping. Existing approaches often rely on known CAD models or prior object geometries, restricting generalization to novel or unknown objects. Other methods directly regress grasp poses from RGB-D data without object priors, but the inherent noise in depth sensing and the lack of object understanding make grasp synthesis and evaluation more difficult. Superquadrics (SQ) offer a compact, interpretable shape representation that captures the physical and graspability understanding of objects. However, recovering them from limited viewpoints is challenging, as existing methods rely on multiple perspectives for near-complete point cloud reconstruction, limiting their effectiveness in bin-picking. To address these challenges, we propose \textbf{RGBSQGrasp}, a grasping framework that leverages superquadric shape primitives and foundation metric depth estimation models to infer grasp poses from a monocular RGB camera -- eliminating the need for depth sensors. Our framework integrates a universal, cross-platform dataset generation pipeline, a foundation model-based object point cloud estimation module, a global-local superquadric fitting network, and an SQ-guided grasp pose sampling module. By integrating these components, RGBSQGrasp reliably infers grasp poses through geometric reasoning, enhancing grasp stability and adaptability to unseen objects. Real-world robotic experiments demonstrate a 92% grasp success rate, highlighting the effectiveness of RGBSQGrasp in packed bin-picking environments.

翻译：箱体拣选是一项具有挑战性的机器人任务，由于遮挡和物理约束限制了用于物体识别与抓取的视觉信息。现有方法通常依赖于已知的CAD模型或先验物体几何，这限制了对新颖或未知物体的泛化能力。其他方法直接从RGB-D数据回归抓取位姿而无需物体先验，但深度感知的固有噪声以及缺乏对物体的理解使得抓取合成与评估更为困难。超二次曲面提供了一种紧凑、可解释的形状表示，能够捕捉物体的物理特性和抓取可操作性理解。然而，从有限视角恢复超二次曲面具有挑战性，因为现有方法依赖多视角进行近乎完整的点云重建，限制了其在箱体拣选中的有效性。为解决这些挑战，我们提出\\textbf{RGBSQGrasp}，一种利用超二次曲面形状基元和基础度量深度估计模型从单目RGB相机推断抓取位姿的抓取框架——无需深度传感器。我们的框架集成了通用的跨平台数据集生成流程、基于基础模型的物体点云估计模块、全局-局部超二次曲面拟合网络以及SQ引导的抓取位姿采样模块。通过整合这些组件，RGBSQGrasp通过几何推理可靠地推断抓取位姿，提升了抓取稳定性以及对未见物体的适应性。真实世界机器人实验展示了92%的抓取成功率，突显了RGBSQGrasp在密集箱体拣选环境中的有效性。