Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretability, and sparse supervision signals. To tackle these limitations, we take inspiration from the human learning process and introduce Natural Language Reinforcement Learning (NLRL), which innovatively combines RL principles with natural language representation. Specifically, NLRL redefines RL concepts like task objectives, policy, value function, Bellman equation, and policy iteration in natural language space. We present how NLRL can be practically implemented with the latest advancements in large language models (LLMs) like GPT-4. Initial experiments over tabular MDPs demonstrate the effectiveness, efficiency, and also interpretability of the NLRL framework.
翻译:强化学习(RL)在决策任务的学习策略方面展现出卓越能力。然而,RL常受限于样本效率低下、可解释性不足及监督信号稀疏等问题。为解决这些局限,我们从人类学习过程中汲取灵感,提出自然语言强化学习(NLRL),该框架创新性地将RL原理与自然语言表征相融合。具体而言,NLRL将任务目标、策略、价值函数、贝尔曼方程及策略迭代等RL核心概念重新定义于自然语言空间中。我们展示了如何利用GPT-4等大语言模型(LLMs)的最新进展实际实现NLRL。在表格型马尔可夫决策过程上的初步实验验证了NLRL框架的有效性、高效性及可解释性。