基于反馈感知的蒙特卡洛树搜索在目标导向对话中实现高效信息获取 (Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations)

Effective decision-making and problem-solving in conversational systems require the ability to identify and acquire missing information through targeted questioning. A key challenge lies in efficiently narrowing down a large space of possible outcomes by posing questions that minimize uncertainty. To address this, we introduce a novel framework that leverages Large Language Models (LLMs) to generate information-seeking questions, with Monte Carlo Tree Search (MCTS) to strategically select questions that maximize information gain, as a part of inference-time planning. Our primary contribution includes a hierarchical feedback mechanism that exploits past interaction patterns to guide future strategy. Specifically, each new problem is mapped to a cluster based on semantic similarity, and our UCT (Upper Confidence bound for Trees) formulation employs a cluster-specific bonus reward to prioritize successful question trajectories that have proven effective for similar problems in the past. Extensive empirical evaluation across medical diagnosis and technical troubleshooting domains shows that our method achieves an average of 12% improvement in success rates and about 10x reduction in the number of LLM calls made for planning per conversation, compared to the state of the art. An additional 8% gain in success rate is observed on average when we start with a constrained set of possibilities. Our results underscore the efficacy of feedback-aware MCTS in enhancing information-seeking in goal-oriented dialogues.

翻译：在对话系统中进行有效的决策和问题解决，需要具备通过针对性提问来识别和获取缺失信息的能力。一个关键挑战在于，通过提出能最大限度减少不确定性的问题，高效地缩小可能结果的大规模空间。为此，我们引入了一个新颖的框架，该框架利用大型语言模型（LLMs）生成信息寻求型问题，并采用蒙特卡洛树搜索（MCTS）作为推理时规划的一部分，以战略性地选择能最大化信息增益的问题。我们的主要贡献包括一个分层反馈机制，该机制利用过去的交互模式来指导未来的策略。具体而言，每个新问题都会根据语义相似性被映射到一个聚类，而我们的UCT（树的上置信界）公式采用了一个特定于聚类的奖励加成，以优先考虑那些在过去对类似问题已被证明有效的问题轨迹。在医疗诊断和技术故障排除领域进行的广泛实证评估表明，与现有技术相比，我们的方法在成功率上平均提高了12%，并且在每次对话中用于规划的LLM调用次数减少了约10倍。当我们从一个受限的可能性集合开始时，平均还能观察到8%的成功率提升。我们的结果凸显了反馈感知MCTS在增强目标导向对话中信息获取效率方面的有效性。