Tactile imaging seeks to reconstruct the internal structure of soft objects through touch sensing, with applications in medical diagnosis and robotic manipulation. Recent self-supervised learning approaches have shown promising results, but rely on global, unstructured representations and robot-controlled sensing, limiting generalization and practical use. We propose Local Encoder for Spatial Sensing (LESS), an object-centric tactile representation that exploits the local nature of touch. The tactile scene is modeled as a grid of recurrent encoders with local receptive fields, whose states are fused to reconstruct 2D or 3D images of internal structure. This compositional design enables strong generalization: models trained on single-inclusion phantoms accurately image objects with multiple inclusions and varying sizes. The local structure further supports spatial uncertainty estimation. In addition, we enable hand-held tactile imaging via external pose tracking and human-like palpation data, and extend tactile imaging to full 3D reconstruction.
翻译:触觉成像旨在通过触觉感知重建软物体的内部结构,应用于医学诊断和机器人操作。近期基于自监督学习的方法已展现出可喜成果,但这些方法依赖全局非结构化表征和机器人控制感知,局限了泛化能力与实际应用。我们提出局部空间感知编码器(LESS),这是一种以物体为中心的触觉表征方法,充分利用了触觉的局部特性。触觉场景被建模为具有局部感受野的循环编码器网格,其状态经融合后可重建内部结构的二维或三维图像。这种组合式设计实现了强泛化能力:在单内含体模体上训练的模型,能够准确成像含多内含体及不同尺寸的物体。局部结构还支持空间不确定性估计。此外,我们通过外部位姿追踪与类人触诊数据实现了手持式触觉成像,并将触觉成像拓展至完整三维重建。