Large language models (LLMs) have shown strong capabilities across diverse decision-making tasks. However, existing approaches often overlook the specialization differences among available models, treating all LLMs as uniformly applicable regardless of task characteristics. This limits their ability to adapt to varying reasoning demands and task complexities. In this work, we propose Task-Aware LLM Council (TALC), a task-adaptive decision framework that integrates a council of LLMs with Monte Carlo Tree Search (MCTS) to enable dynamic expert selection and efficient multi-step planning. Each LLM is equipped with a structured success memory profile derived from prior task trajectories, enabling semantic matching between current reasoning context and past successes. At each decision point, TALC routes control to the most contextually appropriate model and estimates node value using a dual-signal mechanism that fuses model-based evaluations with historical utility scores. These signals are adaptively weighted based on intra-node variance and used to guide MCTS selection, allowing the system to balance exploration depth with planning confidence. Experiments on WebShop, HumanEval, and the Game of 24 demonstrate that TALC achieves superior task success rates and improved search efficiency compared to strong baselines, validating the benefits of specialization-aware routing and adaptive planning.
翻译:大型语言模型(LLM)已在多种决策任务中展现出强大能力。然而,现有方法往往忽视可用模型间的专业化差异,将所有LLM视为普遍适用而忽略任务特性,这限制了其适应不同推理需求与任务复杂度的能力。本研究提出任务感知型LLM委员会(TALC),这是一个集成LLM委员会与蒙特卡洛树搜索(MCTS)的任务自适应决策框架,能够实现动态专家选择与高效多步规划。每个LLM配备基于历史任务轨迹构建的结构化成功记忆档案,从而实现对当前推理语境与历史成功经验的语义匹配。在每个决策点,TALC将控制流转至语境最适配的模型,并通过融合模型评估与历史效用评分的双信号机制估算节点价值。这些信号根据节点内方差进行自适应加权,并用于指导MCTS选择,使系统能在探索深度与规划置信度间取得平衡。在WebShop、HumanEval及24点游戏上的实验表明,相较于现有强基线方法,TALC在任务成功率和搜索效率方面均取得显著提升,验证了专业化感知路由与自适应规划机制的有效性。