Explaining reinforcement learning agents is challenging because policies emerge from complex reward structures and neural representations that are difficult for humans to interpret. Existing approaches often rely on curated demonstrations that expose local behaviors but provide limited insight into an agent's global strategy, leaving users to infer intent from raw observations. We propose SySLLM (Synthesized Summary using Large Language Models), a framework that reframes policy interpretation as a language-generation problem. Instead of visual demonstrations, SySLLM converts spatiotemporal trajectories into structured text and prompts an LLM to generate coherent summaries describing the agent's goals, exploration style, and decision patterns. SySLLM scales to long-horizon, semantically rich environments without task-specific fine-tuning, leveraging LLM world knowledge and compositional reasoning to capture latent behavioral structure across policies. Expert evaluations show strong alignment with human analyses, and a large-scale user study found that 75.5% of participants preferred SySLLM summaries over state-of-the-art demonstration-based explanations. Together, these results position abstractive textual summarization as a paradigm for interpreting complex RL behavior.
翻译:解释强化学习智能体具有挑战性,因为策略源于复杂的奖励结构和难以被人理解的神经表征。现有方法通常依赖于精心设计的演示,这些演示虽然能揭示局部行为,但对智能体全局策略的洞察有限,使得用户不得不从原始观察中推断其意图。我们提出了SySLLM(基于大语言模型的合成摘要),该框架将策略解释重新定义为语言生成问题。SySLLM并非依赖视觉演示,而是将时空轨迹转换为结构化文本,并提示大语言模型生成连贯的摘要,描述智能体的目标、探索风格和决策模式。SySLLM无需针对特定任务进行微调,即可扩展到长时程、语义丰富的环境中,它利用大语言模型的世界知识和组合推理能力,捕捉不同策略间的潜在行为结构。专家评估表明,该方法与人类分析高度一致;一项大规模用户研究发现,75.5%的参与者更倾向于选择SySLLM生成的摘要,而非当前最先进的基于演示的解释方法。这些结果共同表明,抽象文本摘要生成是解释复杂强化学习行为的一种有效范式。