LLM-based agents are increasingly deployed for expert decision support, yet human-AI teams in high-stakes settings do not yet reliably outperform the best individual. We argue this complementarity gap reflects a fundamental mismatch: current agents are trained as answer engines, not as partners in the collaborative sensemaking through which experts actually make decisions. Sensemaking (the ability to co-construct causal explanations, surface uncertainties, and adapt goals) is the key capability that current training pipelines do not explicitly develop or evaluate. We propose Collaborative Causal Sensemaking (CCS) as a research agenda to develop this capability from the ground up, spanning new training environments that reward collaborative thinking, representations for shared human-AI mental models, and evaluation centred on trust and complementarity. Taken together, these directions shift MAS research from building oracle-like answer engines to cultivating AI teammates that co-reason with their human partners over the causal structure of shared decisions, advancing the design of effective human-AI teams.
翻译:基于大语言模型的智能体正日益被部署用于专家决策支持,然而在高风险场景中,人机团队的表现尚未稳定超越最优个体。我们认为这一互补性差距反映了一个根本性错配:当前智能体被训练为答案引擎,而非作为专家实际决策过程中协同意义构建的合作伙伴。意义构建(即共同构建因果解释、揭示不确定性及调整目标的能力)是当前训练流程未明确开发或评估的关键能力。我们提出协同因果意义构建作为一个研究议程,旨在从头发展这一能力,涵盖奖励协同思维的新训练环境、共享人机心智模型的表征方法,以及以信任和互补性为核心的评估体系。这些方向共同将多智能体系统研究从构建类神谕的答案引擎,转向培育能与人类伙伴就共同决策的因果结构进行协同推理的AI队友,从而推动高效人机团队的设计。