Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning. Humans are great observers who can learn by aggregating external knowledge from various sources, including observations from others' policies of attempting a task. Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency. However, it remains non-trivial to perform arbitrary combinations and replacements of those policies, an essential feature for generalization and transferability. In this work, we present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility. We propose a new actor architecture for KGRL, Knowledge-Inclusive Attention Network (KIAN), which allows free knowledge rearrangement due to embedding-based attentive action prediction. KIAN also addresses entropy imbalance, a problem arising in maximum entropy KGRL that hinders an agent from efficiently exploring the environment, through a new design of policy distributions. The experimental results demonstrate that KIAN outperforms alternative methods incorporating external knowledge policies and achieves efficient and flexible learning. Our implementation is available at https://github.com/Pascalson/KGRL.git
翻译:强化学习(RL)智能体长期以来一直追求接近人类学习效率的目标。人类是出色的观察者,能够通过聚合来自各种来源的外部知识进行学习,包括观察他人尝试任务时的策略。以往强化学习研究中已引入外部知识策略来帮助智能体提升样本效率。然而,如何实现这些策略的任意组合与替换仍具挑战性——这一特性对于泛化与迁移能力至关重要。本文提出知识引导强化学习(KGRL),一种融合多种知识策略并追求类人效率与灵活性的强化学习范式。我们为KGRL设计了一种新的智能体架构——知识包容注意力网络(KIAN),其通过基于嵌入的注意力动作预测实现知识的自由重组。针对最大熵KGRL中阻碍智能体高效探索环境的熵失衡问题,KIAN通过策略分布的新型设计予以解决。实验结果表明,KIAN在融合外部知识策略方面优于对比方法,并实现了高效且灵活的学习。我们的实现代码已开源:https://github.com/Pascalson/KGRL.git