The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor with synchronized multi-modal cameras and resemble a more human-like tactile receptor. Furthermore, we focus on 3D tactile reconstruction and implement a compact sensor structure that maintains a comparable size to state-of-the-art VT sensors, even with the addition of a prism and a near infrared (NIR) camera. We also design a photometric fusion stereo neural network (PFSNN), which estimates surface normals of objects and reconstructs touch geometry from both infrared and visible images. Our results demonstrate that the accuracy of RGB and NIR fusion is higher than that of RGB images alone. Additionally, our GelSplitter framework allows for a flexible configuration of different camera sensor combinations, such as RGB and thermal imaging.
翻译:类GelSight的视觉触觉传感器因其高分辨率触觉感知能力在机器人领域广受欢迎,能够通过单RGB相机测量接触几何形状。然而,受限于单相机配置,视觉触觉传感器的多模态感知发展仍面临挑战。本文提出GelSplitter框架,通过同步多模态相机实现多模态视觉触觉传感器的新方法,并模拟更接近人类触觉感受器的功能。重点围绕三维触觉重建任务,我们设计了一种紧凑型传感器结构,即便添加棱镜与近红外相机,其尺寸仍能与当前最先进的视觉触觉传感器保持可比性。同时创新提出了光度融合立体神经网络(PFSNN),可从红外与可见光图像中估算物体表面法向量并重建接触几何形状。实验结果表明,RGB与近红外融合的精度优于仅使用RGB图像。此外,GelSplitter框架支持灵活配置不同相机传感器组合,例如结合RGB与热成像相机。