In the context of imitation learning applied to dexterous robotic hands, the high complexity of the systems makes learning complex manipulation tasks challenging. However, the numerous datasets depicting human hands in various different tasks could provide us with better knowledge regarding human hand motion. We propose a method to leverage multiple large-scale task-agnostic datasets to obtain latent representations that effectively encode motion subtrajectories that we included in a transformer-based behavior cloning method. Our results demonstrate that employing latent representations yields enhanced performance compared to conventional behavior cloning methods, particularly regarding resilience to errors and noise in perception and proprioception. Furthermore, the proposed approach solely relies on human demonstrations, eliminating the need for teleoperation and, therefore, accelerating the data acquisition process. Accurate inverse kinematics for fingertip retargeting ensures precise transfer from human hand data to the robot, facilitating effective learning and deployment of manipulation policies. Finally, the trained policies have been successfully transferred to a real-world 23Dof robotic system.
翻译:在将模仿学习应用于灵巧机器手的背景下,系统的高度复杂性使得学习复杂操作任务极具挑战性。然而,描绘人类手部在不同任务中运动的大量数据集,能够为我们提供关于人类手部运动的更深入知识。我们提出一种方法,利用多个大规模任务无关数据集,获取能够有效编码运动子轨迹的潜在表示,并将其融入基于Transformer的行为克隆方法中。结果表明,与传统行为克隆方法相比,采用潜在表示可提升性能,尤其在感知与本体感觉的误差及噪声鲁棒性方面表现突出。此外,所提方法仅依赖人类示范,无需遥操作,从而加速了数据采集过程。通过精确的逆向运动学实现指尖重定向,确保了人类手部数据向机器人的精准迁移,促进了操作策略的有效学习与部署。最终,训练得到的策略已成功迁移至真实世界的23自由度机器人系统。