Dexterous robotic manipulation requires perception that remains informative from pre-contact approach to contact initiation and post-contact control. We introduce FingerEye, a sensing and learning framework that strengthens robotic dexterity through continuous vision-tactile feedback throughout interaction. On the sensing side, FingerEye integrates binocular RGB cameras with a compliant contact interface to support perception both before and after contact. Before contact, the fingertip cameras provide close-range visual cues and implicit stereo for precise approach and object localization. After contact, marker-tracked deformation of the compliant ring provides a proxy for contact wrench sensing. On the learning side, we build real-and-sim infrastructure for data collection and evaluation, systematically study policy-interface designs for learning with multiple FingerEye sensors, and develop FingerEye Policy, which applies group-structured modality fusion to reduce modality shortcuts and better exploit distributed fingertip feedback. Across seven contact-sensitive task settings, FingerEye improves wrist-only policy by over 30 percentage points in mean success rate in both simulation and the real world.
翻译:摘要:灵巧机器人操作需要从接触前接近、接触启动到接触后控制的整个过程中保持信息丰富的感知能力。我们提出FingerEye,一种通过交互过程中的连续视觉-触觉反馈增强机器人灵巧性的感知与学习框架。在感知方面,FingerEye将双目RGB相机与柔性接触界面相结合,支持接触前和接触后的感知。接触前,指尖相机提供近距离视觉线索和隐式立体信息,实现精确的接近和目标定位。接触后,对柔性环标记点变形的追踪可作为接触力螺旋感应的代理。在学习方面,我们构建了面向数据收集与评估的虚实结合基础设施,系统研究了利用多个FingerEye传感器进行学习的策略-接口设计,并开发了FingerEye策略——该策略应用分组结构化模态融合以减少模态捷径,更好地利用分布式指尖反馈。在七个接触敏感的任务设定中,FingerEye在仿真和真实环境中均将仅使用腕部策略的平均成功率提升了超过30个百分点。