Force and tactile sensing are indispensable in contact-rich manipulation. However, force-aware robot learning faces critical challenges due to the incompatible assembly of tactile and force sensors in handheld or wearable devices. To address these limitations, we first introduce AetheRock for gripper-force, vision, and tactile data collection, which is an arm-worn device featuring a modular and easily manufactured visuo-tactile sensor, GelSlim-MiniFab, at the fingertip, a resistive pressure sensor at the human finger contact region, a customized PCB module, and a wearable kit for comfortable and robust collection. Building on this, we propose ForceVT, a representation learning framework that uses force and vision to guide fidelity-agnostic tactile learning, enabling robust inference in any tactile situation. Real-world experiments show that AetheRock achieves qualified data efficiency and that ForceVT effectively alleviates inefficiencies when visuo-tactile sensors exhibit manufacturing and utilization inconsistencies. Overall, our work mitigates the limitations of gripper-force vision-tactile robot learning through innovative hardware design and algorithms.
翻译:力与触觉感知在接触密集的操作任务中不可或缺。然而,由于手持或可穿戴设备中触觉传感器与力传感器装配不兼容,力感知机器人学习面临关键挑战。为解决上述局限,我们首先提出AetheRock系统,用于收集夹爪力、视觉与触觉数据。该设备为臂戴式装置,包含模块化且易制造的指尖视觉-触觉传感器GelSlim-MiniFab、人体手指接触区的电阻式压力传感器、定制化PCB模块,以及用于舒适稳定数据采集的可穿戴套件。在此基础上,我们提出ForceVT表征学习框架,利用力与视觉引导保真度无关的触觉学习,实现任意触觉场景下的鲁棒推理。实际实验表明,AetheRock具备合格的数据效率,且ForceVT能有效缓解因视觉-触觉传感器制造与使用不一致导致的低效问题。总体而言,本研究通过创新硬件设计与算法,缓解了夹爪力-视觉-触觉机器人学习的局限性。