Current robotic grasping methods often rely on estimating the pose of the target object, explicitly predicting grasp poses, or implicitly estimating grasp success probabilities. In this work, we propose a novel approach that directly maps gripper poses to their corresponding grasp success values, without considering objectness. Specifically, we leverage a Neural Radiance Field (NeRF) architecture to learn a scene representation and use it to train a grasp success estimator that maps each pose in the robot's task space to a grasp success value. We employ this learned estimator to tune its inputs, i.e., grasp poses, by gradient-based optimization to obtain successful grasp poses. Contrary to other NeRF-based methods which enhance existing grasp pose estimation approaches by relying on NeRF's rendering capabilities or directly estimate grasp poses in a discretized space using NeRF's scene representation capabilities, our approach uniquely sidesteps both the need for rendering and the limitation of discretization. We demonstrate the effectiveness of our approach on four simulated 3DoF (Degree of Freedom) robotic grasping tasks and show that it can generalize to novel objects. Our best model achieves an average translation error of 3mm from valid grasp poses. This work opens the door for future research to apply our approach to higher DoF grasps and real-world scenarios.
翻译:当前机器人抓取方法通常依赖估计目标物体姿态、显式预测抓取姿态或隐式估算抓取成功概率。本文提出一种新方法,直接映射夹爪姿态至对应抓取成功值,无需考虑物体性。具体而言,我们利用神经辐射场(NeRF)架构学习场景表征,并基于此训练一个抓取成功估计器,将机器人任务空间中每个姿态映射为抓取成功值。我们采用该学习型估计器通过梯度优化调节其输入(即抓取姿态),从而获得成功抓取姿态。与其他基于NeRF的方法不同——这些方法依赖NeRF的渲染能力增强现有抓取姿态估计技术,或利用NeRF场景表征能力在离散空间中直接估计抓取姿态——我们的方法独特地避免了渲染需求和离散化限制。在四个模拟三自由度(3DoF)机器人抓取任务中验证了该方法有效性,并证明其可泛化至新物体。最佳模型与有效抓取姿态的平均平移误差仅为3毫米。本研究为将该方法推广至更高自由度抓取及真实场景开辟了道路。