The learning process of a reinforcement learning (RL) agent remains poorly understood beyond the mathematical formulation of its learning algorithm. To address this gap, we introduce attention-oriented metrics (ATOMs) to investigate the development of an RL agent's attention during training. We tested ATOMs on three variations of a Pong game, each designed to teach the agent distinct behaviours, complemented by a behavioural assessment. Our findings reveal that ATOMs successfully delineate the attention patterns of an agent trained on each game variation, and that these differences in attention patterns translate into differences in the agent's behaviour. Through continuous monitoring of ATOMs during training, we observed that the agent's attention developed in phases, and that these phases were consistent across games. Finally, we noted that the agent's attention to its paddle emerged relatively late in the training and coincided with a marked increase in its performance score. Overall, we believe that ATOMs could significantly enhance our understanding of RL agents' learning processes, which is essential for improving their reliability and efficiency.
翻译:强化学习(RL)智能体的学习过程,除了其学习算法的数学表述之外,至今仍鲜为人知。为弥补这一空白,我们引入了注意力导向指标(ATOMs)来研究RL智能体在训练过程中注意力的发展。我们在三个不同版本的Pong游戏上测试了ATOMs,每个版本旨在教会智能体不同的行为,并辅以行为评估。我们的研究结果表明,ATOMs成功地描绘了在不同游戏版本上训练的智能体的注意力模式,并且这些注意力模式的差异转化为智能体行为的差异。通过在训练期间持续监测ATOMs,我们观察到智能体的注意力是分阶段发展的,并且这些阶段在不同游戏中具有一致性。最后,我们注意到,智能体对其球拍的注意力在训练中相对较晚才出现,并且与其性能分数的显著提升相吻合。总体而言,我们相信ATOMs可以显著增强我们对RL智能体学习过程的理解,这对于提高其可靠性和效率至关重要。