Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretability, and sparse supervision signals. To tackle these limitations, we take inspiration from the human learning process and introduce Natural Language Reinforcement Learning (NLRL), which innovatively combines RL principles with natural language representation. Specifically, NLRL redefines RL concepts like task objectives, policy, value function, Bellman equation, and policy iteration in natural language space. We present how NLRL can be practically implemented with the latest advancements in large language models (LLMs) like GPT-4. Initial experiments over tabular MDPs demonstrate the effectiveness, efficiency, and also interpretability of the NLRL framework.
翻译:强化学习(RL)在决策任务的策略学习方面展现出显著能力。然而,RL常受样本效率低、缺乏可解释性及监督信号稀疏等问题的制约。为解决这些局限,我们从人类学习过程中汲取灵感,提出自然语言强化学习(NLRL),该方法创新性地将RL原理与自然语言表征相结合。具体而言,NLRL在自然语言空间中重新定义了任务目标、策略、价值函数、贝尔曼方程及策略迭代等RL核心概念。我们展示了如何利用GPT-4等大型语言模型(LLMs)的最新进展来实际实现NLRL。在表格型MDP上的初步实验验证了NLRL框架的有效性、效率及可解释性。