Force and tactile sensing are indispensable in contact-rich manipulation. However, force-aware robot learning faces critical challenges due to the incompatible assembly of tactile and force sensors in handheld or wearable devices. To address these limitations, we first introduce AetheRock for gripper-force, vision, and tactile data collection, which is an arm-worn device featuring a modular and easily manufactured visuo-tactile sensor, GelSlim-MiniFab, at the fingertip, a resistive pressure sensor at the human finger contact region, a customized PCB module, and a wearable kit for comfortable and robust collection. Building on this, we propose ForceVT, a representation learning framework that uses force and vision to guide fidelity-agnostic tactile learning, enabling robust inference in any tactile situation. Real-world experiments show that AetheRock achieves qualified data efficiency and that ForceVT effectively alleviates inefficiencies when visuo-tactile sensors exhibit manufacturing and utilization inconsistencies. Overall, our work mitigates the limitations of gripper-force vision-tactile robot learning through innovative hardware design and algorithms.
翻译:力和触觉感知在接触密集型操作中不可或缺。然而,由于手持或可穿戴设备中触觉传感器与力传感器的组装不兼容,力感知机器人学习面临严峻挑战。为解决这些限制,我们首先提出AetheRock系统,用于收集夹持力、视觉和触觉数据。该系统是一种臂戴式设备,集成了指尖处模块化且易于制造的视觉-触觉传感器GelSlim-MiniFab、人体手指接触区域的电阻式压力传感器、定制PCB模块以及一套用于舒适稳健采集的可穿戴套件。在此基础上,我们提出ForceVT表示学习框架,该框架利用力和视觉引导保真度无关的触觉学习,能够在任何触觉情境下实现稳健推断。实际实验表明,AetheRock实现了合格的数据采集效率,且ForceVT在视觉-触觉传感器存在制造和使用不一致性时能有效缓解其低效问题。总体而言,本工作通过创新的硬件设计和算法,缓解了夹持力-视觉-触觉机器人学习中的局限性。