Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/
翻译:精细操作任务(如扎线带或安装电池)对机器人而言极具挑战性,因其要求高精度、接触力的谨慎协调以及闭环视觉反馈。执行这些任务通常需要高端机器人、高精度传感器或精细校准,这可能导致成本高昂且部署困难。那么,能否通过机器学习使低成本、低精度的硬件完成此类精细操作?我们提出一套低成本系统,通过定制遥操作接口直接采集真实演示数据,实现端到端的模仿学习。然而,模仿学习在极高精度领域面临独特挑战:策略误差可能随时间累积,且人类演示存在非平稳性。为解决这些问题,我们开发了一种简洁新颖的算法——动作分块变换器(ACT),该算法能够学习动作序列的生成式模型。借助ACT,机器人仅需10分钟演示数据,即可在现实世界中学习完成6项困难任务(如打开半透明调料杯、安装电池),成功率高达80-90%。项目网站:https://tonyzhaozh.github.io/aloha/