Many instances of similar or almost-identical industrial machines or tools are often deployed at once, or in quick succession. For instance, a particular model of air compressor may be installed at hundreds of customers. Because these tools perform distinct but highly similar tasks, it is interesting to be able to quickly produce a high-quality controller for machine $N+1$ given the controllers already produced for machines $1..N$. This is even more important when the controllers are learned through Reinforcement Learning, as training takes time, energy and other resources. In this paper, we apply Policy Intersection, a Policy Shaping method, to help a Reinforcement Learning agent learn to solve a new variant of a compressors control problem faster, by transferring knowledge from several previously learned controllers. We show that our approach outperforms loading an old controller, and significantly improves performance in the long run.
翻译:许多同类型或高度相似的工业机器/工具常常会同时或接连部署。例如,某型号空压机可能安装在数百个客户现场。由于这些设备执行不同但高度相似的任务,若能基于已为1号至N号机器生成的控制器,快速为第N+1号机器生成高质量控制器将具有重要价值。当控制器通过强化学习训练获得时,这一点尤为重要,因为训练过程需要消耗时间、能源及其他资源。本文采用策略交集这一策略塑形方法,通过迁移多个先前学习到的控制器知识,帮助强化学习智能体更快学习解决空压机控制问题的新变体。实验表明,该方法优于直接加载旧控制器的方案,且能显著提升长期性能表现。