We present SeeThruFinger, a soft robotic finger with an in-finger vision for multi-modal perception, including visual perception and tactile sensing, for geometrically adaptive and real-time reactive grasping. Multi-modal perception of intrinsic and extrinsic interactions is critical in building intelligent robots that learn. Instead of adding various sensors for different modalities, a preferred solution is to integrate them into one elegant and coherent design, which is a challenging task. This study leverages the Soft Polyhedral Network design as a robotic finger, capable of omni-directional adaptation with an unobstructed view of the finger's spatial deformation from the inside. By embedding a miniature camera underneath, we achieve the visual perception of the external environment by inpainting the finger mask using E2FGV, which can be used for object detection in the downstream tasks for grasping. After contacting the objects, we use real-time object segmentation algorithms, such as XMem, to track the soft finger's spatial deformations. We also learned a Supervised Variational Autoencoder to enable tactile sensing of 6D forces and torques for reactive grasp. As a result, we achieved multi-modal perception, including visual perception and tactile sensing, and soft, adaptive object grasping within a single vision-based soft finger design compatible with multi-fingered robotic grippers.
翻译:我们提出SeeThruFinger——一种内置视觉的软体机器人手指,通过视觉感知与触觉传感实现多模态感知,从而完成几何自适应与实时反应性抓取。对内在与外在交互的多模态感知是构建具备学习能力的智能机器人的关键。不同于为不同模态添加多种传感器,更优的方案是将它们整合至一个优雅连贯的设计中,而此任务极具挑战。本研究利用软体多面体网络(Soft Polyhedral Network)设计作为机器人手指,既能实现全向自适应,又能从内部无遮挡地观测手指空间形变。通过指内嵌入微型摄像头,我们利用E2FGV修复手指遮罩区域以获取外部环境视觉感知,从而服务于下游抓取任务中的目标检测。在接触物体后,我们采用XMem等实时目标分割算法追踪软体手指的空间形变。此外,我们训练了一个监督变分自编码器(Supervised Variational Autoencoder),使其具备六维力与力矩的触觉传感能力以实现反应性抓取。最终,我们在单一基于视觉的软体手指设计中(兼容多指机器人夹爪),实现了包含视觉感知与触觉传感的多模态感知,以及柔软的自适应物体抓取。