Text-based games (TBGs) have emerged as an important collection of NLP tasks, requiring reinforcement learning (RL) agents to combine natural language understanding with reasoning. A key challenge for agents attempting to solve such tasks is to generalize across multiple games and demonstrate good performance on both seen and unseen objects. Purely deep-RL-based approaches may perform well on seen objects; however, they fail to showcase the same performance on unseen objects. Commonsense-infused deep-RL agents may work better on unseen data; unfortunately, their policies are often not interpretable or easily transferable. To tackle these issues, in this paper, we present EXPLORER which is an exploration-guided reasoning agent for textual reinforcement learning. EXPLORER is neurosymbolic in nature, as it relies on a neural module for exploration and a symbolic module for exploitation. It can also learn generalized symbolic policies and perform well over unseen data. Our experiments show that EXPLORER outperforms the baseline agents on Text-World cooking (TW-Cooking) and Text-World Commonsense (TWC) games.
翻译:文本类游戏(TBGs)已成为重要的自然语言处理任务集合,要求强化学习(RL)智能体结合自然语言理解与推理能力。智能体解决此类任务的核心挑战在于跨多游戏泛化,并对已知与未知对象均展现良好性能。纯深度强化学习方法可能在已知对象上表现优异,但在未知对象上难以复现同等性能。注入常识的深度强化学习智能体虽对未知数据表现更佳,但其策略往往缺乏可解释性与可迁移性。为攻克这些难题,本文提出EXPLORER——一种面向文本强化学习的探索引导推理智能体。EXPLORER本质属于神经符号系统,通过神经模块实现探索、符号模块实现利用。该智能体能学习可泛化的符号化策略,并在未知数据上取得卓越表现。实验结果表明,EXPLORER在Text-World烹饪(TW-Cooking)与Text-World常识(TWC)游戏中均优于基线智能体。