Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. To tackle the first challenge, we develop HATO, a low-cost hands-arms teleoperation system that leverages off-the-shelf electronics, complemented with a software suite that enables efficient data collection; the comprehensive software suite also supports multimodal data processing, scalable policy learning, and smooth policy deployment. To tackle the latter challenge, we introduce a novel hardware adaptation by repurposing two prosthetic hands equipped with touch sensors for research. Using visuotactile data collected from our system, we learn skills to complete long-horizon, high-precision tasks which are difficult to achieve without multifingered dexterity and touch feedback. Furthermore, we empirically investigate the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data. Videos, code, and datasets can be found at https://toruowo.github.io/hato/ .
翻译:旨在复现类人灵巧性、感知体验和运动模式,我们探索基于人类示范的学习方法,采用配备多指手的双臂系统及视觉触觉数据。当前存在两大挑战:缺乏适用于双臂多指设置的廉价易用遥操作系统,以及配备触觉感知的多指手硬件稀缺。为解决前一个挑战,我们开发了HATO——一种低成本的双手-手臂遥操作系统,利用现成电子元件并配套完整软件套件,支持高效数据采集;该软件套件同时支持多模态数据处理、可扩展策略学习及流畅策略部署。针对后一个挑战,我们提出一种新型硬件适配方案:将两只配备触觉传感器的假肢手改造用于研究。利用从该系统采集的视觉触觉数据,我们学习完成需要多指灵巧性与触觉反馈才能实现的长时域、高精度任务。此外,我们通过实验系统研究了数据集规模、感知模态及视觉输入预处理对策略学习的影响。研究结果表明,基于视觉触觉数据的双臂多指操作取得了重要进展。视频、代码及数据集详见https://toruowo.github.io/hato/。