While large language models (LLMs) exhibit impressive language understanding and in-context learning abilities, their decision-making ability still heavily relies on the guidance of task-specific expert knowledge when solving real-world tasks. To unleash the potential of LLMs as autonomous decision makers, this paper presents an approach JuDec to endow LLMs with the self-judgment ability, enabling LLMs to achieve autonomous judgment and exploration for decision making. Specifically, in JuDec, Elo-based Self-Judgment Mechanism is designed to assign Elo scores to decision steps to judge their values and utilities via pairwise comparisons between two solutions and then guide the decision-searching process toward the optimal solution accordingly. Experimental results on the ToolBench dataset demonstrate JuDec's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks. It offers higher-quality solutions and reduces costs (ChatGPT API calls), highlighting its effectiveness and efficiency.
翻译:尽管大语言模型(LLMs)展现出令人印象深刻的语言理解和上下文学习能力,但其在解决实际任务时的决策能力仍严重依赖于任务特定专家知识的指导。为释放大语言模型作为自主决策者的潜力,本文提出一种名为JuDec的方法,赋予LLMs自我判断能力,使其能够实现自主判断和探索以进行决策。具体而言,JuDec中设计了基于Elo的自我判断机制,通过在两两解决方案之间进行成对比较,为决策步骤分配Elo分数以评估其价值和效用,从而引导决策搜索过程向最优解演进。在ToolBench数据集上的实验结果表明,JuDec优于基线方法,在多样任务上实现了超过10%的通过率提升。该方法能提供更高质量的解决方案并降低成本(ChatGPT API调用),凸显了其有效性和效率。