Large Language Models (LLMs) have demonstrated significant success across various domains. However, their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning, leading to challenges in unseen downstream tasks and heavy demands on computational resources. Meanwhile, Reinforcement Learning (RL) has been recognized as effective in decision-making problems but struggles in environments with sparse rewards, such as open-world games. To overcome these challenges, we introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. The key component of AdaRefiner is a lightweight Adapter Language Model (LM), which automatically refines task comprehension based on feedback from RL agents. This method mitigates the need for intricate prompt engineering and intensive LLM fine-tuning while maintaining the LLMs' generalization abilities and enhancing their decision-making capabilities in downstream tasks. Empirical evaluations of AdaRefiner on 22 diverse tasks within the open-world game Crafter have demonstrated its superior effectiveness, especially in guiding agents towards higher-level and common-sense skills. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.
翻译:大语言模型(LLMs)已在多个领域展现出显著成功。然而,它们在复杂决策任务中的应用通常需要复杂的提示工程或微调,导致在未见下游任务中面临挑战并对计算资源产生沉重需求。同时,强化学习(RL)已被公认为在决策问题中有效,但在稀疏奖励环境(如开放世界游戏)中表现困难。为克服这些挑战,我们提出AdaRefiner——一种旨在增强LLMs与RL反馈协同作用的新框架。其核心组件是一个轻量级适配语言模型(Adapter Language Model),可基于RL智能体的反馈自动优化任务理解。该方法减少了复杂提示工程和密集LLM微调的需求,同时保持LLMs的泛化能力并提升其在下游任务中的决策能力。在开放世界游戏Crafter中22项多样化任务上的实证评估表明,AdaRefiner具有卓越有效性,尤其在引导智能体掌握高阶常识技能方面表现突出。本研究为通过RL反馈实现LLMs自动自我优化做出了贡献,为复杂决策问题提供了更具适应性和高效的解决方案。