Large Language Models exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, fine-tuning its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.
翻译:大型语言模型在各类任务中展现出强大的问题解决能力。然而,大多数基于LLM的智能体通过复杂的提示工程被设计为特定任务求解器,而非具备通过交互进行学习和进化能力的智能体。这些任务求解器需要人工构建提示来告知任务规则并规范LLM行为,本质上无法应对复杂动态场景(如大型交互游戏)。鉴于此,我们提出Agent-Pro:一种具有策略级反思与优化的LLM智能体,能够从交互经验中学习丰富专业知识并逐步提升行为策略。具体而言,该方法通过动态信念生成与反思机制实现策略进化。与行动级反思不同,Agent-Pro迭代式反思过往轨迹与信念,修正非理性信念以优化策略。此外,采用深度优先搜索进行策略优化,确保持续提升策略收益。我们在二十一点和德州扑克两个游戏中评估Agent-Pro,其表现优于基础LLM和专用模型。实验结果表明,Agent-Pro能够在复杂动态场景中学习与进化,这对众多基于LLM的应用具有重要意义。