Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.
翻译:理解通过深度强化学习训练的智能体的交互行为,对于在游戏或现实世界中部署智能体至关重要。在前者中,不合理的操作会令玩家困惑;在后者中,这种影响更为显著,因为意外行为可能导致事故,对相关个体造成潜在严重且持久的后果。本文提出利用程序合成方法,在观察动作序列轨迹后模仿强化学习策略。程序具有天然的可解释性和正确性可验证优势。我们采用最先进的程序合成系统DreamCoder,学习基于网格环境中的概念,具体包括导航任务以及两款微型版Atari游戏《太空 invaders》与《阿斯特里克斯》。通过检查生成的代码库,可推断黑盒智能体已习得的概念,并更深入理解智能体行为。同时,我们通过可视化智能体对模仿序列的决策过程达成相同目标。我们采用基于纯搜索方法、神经引导搜索以及基于代码微调语言模型构建的三种程序合成器,对所提方法进行了评估。