Dexterous robotic manipulation requires comprehensive perception across all phases of interaction: pre-contact, contact initiation, and post-contact. Such continuous feedback allows a robot to adapt its actions throughout interaction. However, many existing tactile sensors, such as GelSight and its variants, only provide feedback after contact is established, limiting a robot's ability to precisely initiate contact. We introduce FingerEye, a compact and cost-effective sensor that provides continuous vision-tactile feedback throughout the interaction process. FingerEye integrates binocular RGB cameras to provide close-range visual perception with implicit stereo depth. Upon contact, external forces and torques deform a compliant ring structure; these deformations are captured via marker-based pose estimation and serve as a proxy for contact wrench sensing. This design enables a perception stream that smoothly transitions from pre-contact visual cues to post-contact tactile feedback. Building on this sensing capability, we develop a vision-tactile imitation learning policy that fuses signals from multiple FingerEye sensors to learn dexterous manipulation behaviors from limited real-world data. We further develop a digital twin of our sensor and robot platform to improve policy generalization. By combining real demonstrations with visually augmented simulated observations for representation learning, the learned policies become more robust to object appearance variations. Together, these design aspects enable dexterous manipulation across diverse object properties and interaction regimes, including coin standing, chip picking, letter retrieving, and syringe manipulation. The hardware design, code, appendix, and videos are available on our project website: https://nus-lins-lab.github.io/FingerEyeWeb/
翻译:[translated abstract in Chinese]
灵巧的机器人操作需要贯穿交互全阶段(接触前、接触初始、接触后)的全面感知能力。这类连续反馈可使机器人在交互过程中实时调整动作。然而,现有触觉传感器(如GelSight及其变体)大多仅在接触建立后提供反馈,限制了机器人精确启动接触的能力。我们提出FingerEye——一种紧凑且经济的传感器,能够在整个交互过程中提供连续的视觉-触觉联合反馈。FingerEye集成双目RGB摄像头,通过隐式立体深度实现近距视觉感知。当接触发生时,外部力与力矩会使柔性环状结构产生形变;这些形变通过基于标记点的位姿估计被捕获,并作为接触力旋量的代理信号。这种设计实现了从接触前视觉线索到接触后触觉反馈的无缝过渡感知流。基于此感知能力,我们开发了一种视觉-触觉模仿学习策略,该策略通过融合多个FingerEye传感器的信号,在有限真实世界数据中学习灵巧操作行为。我们进一步构建了传感器与机器人平台的数字孪生模型以提升策略泛化能力。通过将真实演示与视觉增强的仿真观测相结合进行表征学习,所学策略对物体外观变化的鲁棒性显著增强。这些设计要素共同实现了对多种物体属性与交互模式(包括硬币直立、芯片拾取、信件取回、注射器操作)的灵巧操作。硬件设计、代码、附录及视频均可在项目网站获取:https://nus-lins-lab.github.io/FingerEyeWeb/