In warehouse environments, robots require robust picking capabilities to manage a wide variety of objects. Effective deployment demands minimal hardware, strong generalization to new products, and resilience in diverse settings. Current methods often rely on depth sensors for structural information, which suffer from high costs, complex setups, and technical limitations. Inspired by recent advancements in computer vision, we propose an innovative approach that leverages foundation models to enhance suction grasping using only RGB images. Trained solely on a synthetic dataset, our method generalizes its grasp prediction capabilities to real-world robots and a diverse range of novel objects not included in the training set. Our network achieves an 82.3\% success rate in real-world applications. The project website with code and data will be available at http://optigrasp.github.io.
翻译:在仓库环境中,机器人需要具备鲁棒的抓取能力以处理种类繁多的物品。有效的部署要求硬件需求最小化、对新产品的泛化能力强,并在多样化场景中保持稳健性。现有方法通常依赖深度传感器获取结构信息,但这些设备存在成本高、设置复杂和技术局限等问题。受计算机视觉领域最新进展的启发,我们提出了一种创新方法,利用基础模型仅通过RGB图像来增强吸盘抓取性能。我们的方法仅在合成数据集上进行训练,即可将其抓取预测能力泛化到真实世界的机器人以及训练集中未包含的多种新物体上。我们的网络在真实应用场景中取得了82.3%的成功率。项目网站(包含代码和数据)将在 http://optigrasp.github.io 发布。