Reinforcement Learning (RL) is an emerging approach to control many dynamical systems for which classical control approaches are not applicable or insufficient. However, the resultant policies may not generalize to variations in the parameters that the system may exhibit. This paper presents a powerful yet simple algorithm in which collaboration is facilitated between RL agents that are trained independently to perform the same task but with different system parameters. The independency among agents allows the exploitation of multi-core processing to perform parallel training. Two examples are provided to demonstrate the effectiveness of the proposed technique. The main demonstration is performed on a quadrotor with slung load tracking problem in a real-time experimental setup. It is shown that integrating the developed algorithm outperforms individual policies by reducing the RMSE tracking error. The robustness of the ensemble is also verified against wind disturbance.
翻译:强化学习是一种新兴的控制方法,适用于经典控制方法不可行或不充分的众多动态系统。然而,由此产生的策略可能无法泛化到系统可能出现的参数变化。本文提出一种强大而简洁的算法,该算法能够促进独立训练但针对不同系统参数执行相同任务的强化学习智能体之间的协作。智能体之间的独立性允许利用多核处理进行并行训练。通过两个示例验证了所提技术的有效性。主要演示在真实实验环境中针对携带悬挂负载的四旋翼飞行器跟踪问题展开。结果表明,集成所开发的算法相比单一策略能够降低均方根跟踪误差,其集成鲁棒性也在风扰动条件下得到验证。