This paper presents a novel approach combining inductive logic programming with reinforcement learning to improve training performance and explainability. We exploit inductive learning of answer set programs from noisy examples to learn a set of logical rules representing an explainable approximation of the agent policy at each batch of experience. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch, without requiring inefficient reward shaping and preserving optimality with soft bias. The entire procedure is conducted during the online execution of the reinforcement learning algorithm. We preliminarily validate the efficacy of our approach by integrating it into the Q-learning algorithm for the Pac-Man scenario in two maps of increasing complexity. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training. Moreover, inductive learning does not compromise the computational time required by Q-learning and learned rules quickly converge to an explanation of the agent policy.
翻译:本文提出了一种结合归纳逻辑编程与强化学习的新方法,以提升训练性能与可解释性。我们利用从含噪示例中归纳学习答案集程序,在每批经验数据中学习一组逻辑规则,这些规则构成了智能体策略的可解释近似表示。随后,我们对习得的规则执行答案集推理,以指导学习智能体在下一批数据中的探索过程。该方法无需低效的奖励塑形,并通过软偏置保持最优性。整个流程在强化学习算法的在线执行过程中完成。我们通过将本方法集成至Q-learning算法,在复杂度递增的两张地图上对吃豆人场景进行初步验证。实验表明,该方法能显著提升智能体获得的折扣回报,即使在训练初期批次中亦如此。此外,归纳学习过程未增加Q-learning所需的计算时间,且习得的规则能快速收敛为对智能体策略的有效解释。