Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications. Existing approaches to LLM-based decision-making predominantly build upon the manually-designed external performance metrics to guide the decision-making process. However, reliance on the external performance metrics as prior is problematic in real-world scenarios, where such prior may be unavailable, flawed, or even erroneous. For genuine autonomous decision making, it is imperative for the agent to develop its rationality from its posterior experiences to judge decisions independently. Central to the development of rationality is the construction of an internalized utility judgment, capable of assigning numerical utilities to each decision. This paper proposes RadAgent (Rational Decision-Making Agent), which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning. Within this framework, Elo-based Utility Construction is devised to assign Elo scores to individual decision steps to judge their utilities via pairwise comparisons. Consequently, these Elo scores guide the decision-making process to derive optimal outcomes. Experimental results on the ToolBench dataset demonstrate RadAgent's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks. It offers higher-quality solutions and reduces costs (ChatGPT API calls), highlighting its effectiveness and efficiency.
翻译:大型语言模型(LLMs)展现出显著进展,并促使大量研究致力于将其开发为能够执行传统自然语言处理应用之外复杂多步决策任务的智能体。现有基于LLM的决策方法主要依赖人工设计的外部性能指标来指导决策过程。然而,在真实场景中,依赖外部性能指标作为先验知识可能存在问题——这类先验可能缺失、有缺陷甚至错误。要实现真正的自主决策,智能体必须通过后验经验发展自身理性,独立判断决策优劣。理性发展的核心在于构建内在化效用判断机制,为每个决策分配数值化效用。本文提出RadAgent(理性决策智能体),通过迭代框架(包含经验探索与效用学习)促进其理性发展。在该框架中,设计了基于Elo评级的效用构建方法,通过成对比较为每个决策步骤分配Elo分数以评估其效用。这些Elo分数进而指导决策过程获取最优结果。在ToolBench数据集上的实验表明,RadAgent在各类任务中相较于基线方法优势显著,通过率提升超过10%。该方法不仅提供更高质量的解决方案,还降低了成本(ChatGPT API调用量),凸显其有效性与高效性。