Learning robot skills from scratch is often time-consuming, while reusing data promotes sustainability and improves sample efficiency. This study investigates policy transfer across different robotic platforms, focusing on peg-in-hole task using reinforcement learning (RL). Policy training is carried out on two different robots. Their policies are transferred and evaluated for zero-shot, fine-tuning, and training from scratch. Results indicate that zero-shot transfer leads to lower success rates and relatively longer task execution times, while fine-tuning significantly improves performance with fewer training time-steps. These findings highlight that policy transfer with adaptation techniques improves sample efficiency and generalization, reducing the need for extensive retraining and supporting sustainable robotic learning.
翻译:从头学习机器人技能往往耗时较长,而数据复用能促进可持续性并提升样本效率。本研究聚焦于不同机器人平台间的策略迁移,以强化学习(RL)完成插销入孔任务为例开展研究。在两个不同机器人上分别进行策略训练,并对其策略进行零样本迁移、微调及从头训练的评估。结果表明:零样本迁移会导致较低的成功率和相对较长的任务执行时间,而微调能以更少的训练步数显著提升性能。这些发现表明,结合自适应技术的策略迁移能提升样本效率与泛化能力,减少对大量重复训练的需求,从而支持可持续的机器人学习。