Efficiently training control policies for robots is a major challenge that can greatly benefit from utilizing knowledge gained from training similar systems through cross-embodiment knowledge transfer. In this work, we focus on accelerating policy training using a library-based initialization scheme that enables effective knowledge transfer across multirotor configurations. By leveraging a physics-aware neural control architecture that combines a reinforcement learning-based controller and a supervised control allocation network, we enable the reuse of previously trained policies. To this end, we utilize a policy evaluation-based similarity measure that identifies suitable policies for initialization from a library. We demonstrate that this measure correlates with the reduction in environment interactions needed to reach target performance and is therefore suited for initialization. Extensive simulation and real-world experiments confirm that our control architecture achieves state-of-the-art control performance, and that our initialization scheme saves on average up to $73.5\%$ of environment interactions (compared to training a policy from scratch) across diverse quadrotor and hexarotor designs, paving the way for efficient cross-embodiment transfer in reinforcement learning.
翻译:高效训练机器人控制策略是一项重大挑战,通过跨具身知识迁移利用相似系统训练中获得的知识可为此带来显著助益。本研究聚焦于采用基于策略库的初始化方案来加速策略训练,该方案能够实现跨多旋翼构型的有效知识迁移。通过利用结合强化学习控制器与监督式控制分配网络的物理感知神经控制架构,我们实现了对已训练策略的复用。为此,我们采用基于策略评估的相似性度量方法,从策略库中识别适用于初始化的候选策略。实验证明该度量指标与达到目标性能所需环境交互次数的减少量具有相关性,因而适用于初始化任务。大量仿真与实物实验证实:我们的控制架构实现了最先进的控制性能,且初始化方案在多种四旋翼与六旋翼设计中平均节省高达$73.5\%$的环境交互成本(相较于从零开始训练策略),这为强化学习中的高效跨具身迁移铺平了道路。