Large Language Models (LLMs) have demonstrated significant success across various domains. However, their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning, leading to challenges in unseen downstream tasks and heavy demands on computational resources. Meanwhile, Reinforcement Learning (RL) has been recognized as effective in decision-making problems but struggles in environments with sparse rewards, such as open-world games. To overcome these challenges, we introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. The key component of AdaRefiner is a lightweight Adapter Language Model (LM), which automatically refines task comprehension based on feedback from RL agents. This method mitigates the need for intricate prompt engineering and intensive LLM fine-tuning while maintaining the LLMs' generalization abilities and enhancing their decision-making capabilities in downstream tasks. Empirical evaluations of AdaRefiner on 22 diverse tasks within the open-world game Crafter have demonstrated its superior effectiveness, especially in guiding agents towards higher-level and common-sense skills. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.
翻译:摘要:大型语言模型(LLMs)已在多个领域展现出显著成功。然而,它们在复杂决策任务中的应用通常需要繁琐的提示工程或微调,这导致在未见过的下游任务中面临挑战,并带来高昂的计算资源需求。与此同时,强化学习(RL)虽在决策问题中表现有效,但在稀疏奖励环境(如开放世界游戏)中却面临困境。为克服上述挑战,我们提出AdaRefiner——一种旨在增强LLMs与RL反馈协同效应的新型框架。其核心组件是一个轻量级的适配器语言模型(Adapter Language Model),能够基于RL智能体的反馈自动优化任务理解。该方法不仅免除了复杂的提示工程和密集的LLM微调需求,同时保持了LLMs的泛化能力,并增强了其在下游任务中的决策性能。在开放世界游戏Crafter的22个多样化任务上的实证评估表明,AdaRefiner具有显著优越性,尤其能有效引导智能体掌握高阶常识技能。我们的工作为利用RL反馈实现LLMs的自动自优化做出了贡献,为复杂决策问题提供了更具适应性和高效性的解决方案。