FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

Dexterous robotic manipulation requires comprehensive perception across all phases of interaction: pre-contact, contact initiation, and post-contact. Such continuous feedback allows a robot to adapt its actions throughout interaction. However, many existing tactile sensors, such as GelSight and its variants, only provide feedback after contact is established, limiting a robot's ability to precisely initiate contact. We introduce FingerEye, a compact and cost-effective sensor that provides continuous vision-tactile feedback throughout the interaction process. FingerEye integrates binocular RGB cameras to provide close-range visual perception with implicit stereo depth. Upon contact, external forces and torques deform a compliant ring structure; these deformations are captured via marker-based pose estimation and serve as a proxy for contact wrench sensing. This design enables a perception stream that smoothly transitions from pre-contact visual cues to post-contact tactile feedback. Building on this sensing capability, we develop a vision-tactile imitation learning policy that fuses signals from multiple FingerEye sensors to learn dexterous manipulation behaviors from limited real-world data. We further develop a digital twin of our sensor and robot platform to improve policy generalization. By combining real demonstrations with visually augmented simulated observations for representation learning, the learned policies become more robust to object appearance variations. Together, these design aspects enable dexterous manipulation across diverse object properties and interaction regimes, including coin standing, chip picking, letter retrieving, and syringe manipulation. The hardware design, code, appendix, and videos are available on our project website: https://nus-lins-lab.github.io/FingerEyeWeb/

翻译：灵巧的机器人操作需要涵盖交互全阶段的综合感知能力：接触前、接触初始化与接触后。这种连续反馈使机器人能够在整个交互过程中自适应调整动作。然而，现有触觉传感器（如GelSight及其变体）大多仅在接触建立后提供反馈，限制了机器人精确启动接触的能力。我们提出FingerEye——一种紧凑且成本低廉的传感器，可在交互过程中提供连续的视觉-触觉反馈。Fingereye集成双目RGB相机，实现具备隐式立体深度的近距视觉感知。接触时，外力与力矩使柔性环状结构产生形变；通过基于标记点的位姿估计捕获这些形变，并将其作为接触力/力矩传感的代理信号。该设计实现了从接触前视觉线索到接触后触觉反馈的无缝感知流。基于此感知能力，我们开发了一种视觉-触觉模仿学习策略，融合多个FingerEye传感器的信号，从有限真实世界数据中学习灵巧操作行为。为进一步提升策略泛化能力，我们构建了传感器与机器人平台的数字孪生。通过将真实演示与视觉增强的模拟观测结合用于表征学习，所学策略对物体外观变化的鲁棒性得到增强。这些设计要素共同实现了跨多样化物体属性与交互场景的灵巧操作，包括硬币立起、芯片抓取、信件取回与注射器操作。硬件设计、代码、附录及视频已发布在项目网站：https://nus-lins-lab.github.io/FingerEyeWeb/