Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. However, RL algorithms may require extensive trial-and-error interactions to collect useful feedback for improvement. On the other hand, recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities for planning tasks, lacking the ability to autonomously refine their responses based on feedback. Therefore, in this paper, we study how the policy prior provided by the LLM can enhance the sample efficiency of RL algorithms. Specifically, we develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning, particularly when the difference between the ideal policy and the LLM-informed policy is small, which suggests that the initial policy is close to optimal, reducing the need for further exploration. Additionally, we present a practical algorithm SLINVIT that simplifies the construction of the value function and employs subgoals to reduce the search complexity. Our experiments across three interactive environments ALFWorld, InterCode, and BlocksWorld demonstrate that our method achieves state-of-the-art success rates and also surpasses previous RL and LLM approaches in terms of sample efficiency. Our code is available at https://github.com/agentification/Language-Integrated-VI.
翻译:强化学习(RL)已成为序贯决策问题的实际标准范式,通过反馈改进未来行动策略。然而,RL算法可能需要大量试错交互来收集有用的改进反馈。另一方面,大语言模型(LLM)的最新发展在语言理解与生成方面展现了卓越能力,但在规划任务的探索与自我改进能力上仍显不足,缺乏基于反馈自主优化响应的能力。因此,本文研究LLM提供的策略先验如何提升RL算法的样本效率。具体而言,我们提出一种名为LINVIT的算法,将LLM引导作为基于价值的RL中的正则化因子,显著降低学习所需的数据量——尤其在理想策略与LLM指导策略差异较小时效果更为显著,这表明初始策略接近最优,从而减少进一步探索的需求。此外,我们提出一种实用算法SLINVIT,简化了价值函数的构建,并通过子目标降低搜索复杂度。在ALFWorld、InterCode和BlocksWorld三个交互环境中的实验表明,我们的方法不仅达到了最先进的成功率,在样本效率上也超越了以往的RL和LLM方法。代码已开源:https://github.com/agentification/Language-Integrated-VI。