This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional (left-to-right) reasoning strategy. This lack of bi-directional deliberation reasoning results in limited awareness of potential future outcomes and insufficient integration of historical context, leading to suboptimal decisions. BIDDER addresses this gap by incorporating principles of rational decision-making, specifically managing uncertainty and predicting expected utility. Our approach involves three key processes: Inferring hidden states to represent uncertain information in the decision-making process from historical data; Using these hidden states to predict future potential states and potential outcomes; Integrating historical information (past contexts) and long-term outcomes (future contexts) to inform reasoning. By leveraging bi-directional reasoning, BIDDER ensures thorough exploration of both past and future contexts, leading to more informed and rational decisions. We tested BIDDER's effectiveness in two well-defined scenarios: Poker (Limit Texas Hold'em) and Negotiation. Our experiments demonstrate that BIDDER significantly improves the decision-making capabilities of LLMs and LLM agents.
翻译:本文提出了一种新颖的推理方法——双向审慎推理(BIDDER),旨在提升语言模型的决策理性。传统推理方法通常依赖历史信息并采用单向(从左到右)推理策略,这种缺乏双向审慎推理的机制导致对潜在未来结果的认知有限,且历史语境整合不足,从而产生次优决策。BIDDER通过融入理性决策原则——特别是管理不确定性和预测期望效用——来弥补这一缺陷。我们的方法包含三个关键过程:从历史数据推断隐藏状态以表征决策过程中的不确定信息;利用这些隐藏状态预测未来潜在状态及可能结果;整合历史信息(过去语境)与长期结果(未来语境)以指导推理。通过运用双向推理,BIDDER确保对过去和未来语境进行全面探索,从而做出更明智、更理性的决策。我们在两个明确场景中测试了BIDDER的有效性:扑克游戏(限注德州扑克)与谈判协商。实验结果表明,BIDDER显著提升了LLM及LLM智能体的决策能力。