Grasping accuracy is a critical prerequisite for precise object manipulation, often requiring careful alignment between the robot hand and object. Neural Descriptor Fields (NDF) offer a promising vision-based method to generate grasping poses that generalize across object categories. However, NDF alone can produce inaccurate poses due to imperfect camera calibration, incomplete point clouds, and object variability. Meanwhile, tactile sensing enables more precise contact, but existing approaches typically learn policies limited to simple, predefined contact geometries. In this work, we introduce NeuralTouch, a multimodal framework that integrates NDF and tactile sensing to enable accurate, generalizable grasping through gentle physical interaction. Our approach leverages NDF to implicitly represent the target contact geometry, from which a deep reinforcement learning (RL) policy is trained to refine the grasp using tactile feedback. This policy is conditioned on the neural descriptors and does not require explicit specification of contact types. We validate NeuralTouch through ablation studies in simulation and zero-shot transfer to real-world manipulation tasks--such as peg-out-in-hole and bottle lid opening--without additional fine-tuning. Results show that NeuralTouch significantly improves grasping accuracy and robustness over baseline methods, offering a general framework for precise, contact-rich robotic manipulation.
翻译:抓取精度是实现精确物体操作的关键前提,通常需要机器人手与物体之间的精细对齐。神经描述符场(NDF)提供了一种基于视觉的、有前景的方法来生成跨物体类别泛化的抓取姿态。然而,由于相机标定不完美、点云不完整以及物体本身的变异性,仅使用NDF可能产生不准确的姿态。与此同时,触觉感知能够实现更精确的接触,但现有方法通常学习到的策略局限于简单、预定义的接触几何形状。在本工作中,我们提出了NeuralTouch,一个多模态框架,它整合了NDF与触觉感知,通过轻柔的物理交互实现准确、可泛化的抓取。我们的方法利用NDF隐式地表示目标接触几何形状,并基于此训练一个深度强化学习(RL)策略,该策略利用触觉反馈来优化抓取。该策略以神经描述符为条件,无需显式指定接触类型。我们通过在仿真中的消融研究以及向真实世界操作任务(如孔轴插拔和瓶盖开启)的零样本迁移(无需额外微调)来验证NeuralTouch。结果表明,与基线方法相比,NeuralTouch显著提高了抓取精度和鲁棒性,为精确、富含接触的机器人操作提供了一个通用框架。