LLM-based agents are increasingly deployed for expert decision support, yet human-AI teams in high-stakes settings do not yet reliably outperform the best individual. We argue this complementarity gap reflects a fundamental mismatch: current agents are trained as answer engines, not as partners in the collaborative sensemaking through which experts actually make decisions. Sensemaking (the ability to co-construct causal explanations, surface uncertainties, and adapt goals) is the key capability that current training pipelines do not explicitly develop or evaluate. We propose Collaborative Causal Sensemaking (CCS) as a research agenda to develop this capability from the ground up, spanning new training environments that reward collaborative thinking, representations for shared human-AI mental models, and evaluation centred on trust and complementarity. Taken together, these directions shift MAS research from building oracle-like answer engines to cultivating AI teammates that co-reason with their human partners over the causal structure of shared decisions, advancing the design of effective human-AI teams.
翻译:基于大语言模型的智能体越来越多地用于专家决策支持,但在高风险场景中,人机协作团队仍未能稳定超越最佳个体表现。我们认为这一互补性鸿沟反映了根本性错位:当前智能体被训练为答案引擎,而非专家实际决策过程中通过协作意义构建来实现的合作伙伴。意义构建(共同构建因果解释、揭示不确定性、调整目标的能力)是当前训练流程未明确培养或评估的关键能力。我们提出"协作因果意义构建"(CCS)研究议程,旨在从根本上发展这一能力,涵盖奖励协作思维的新型训练环境、共享人机心智模型的表征方法,以及以信任与互补性为核心的评估体系。综合而言,这些方向将多智能体系统研究从构建先知式的答案引擎,转向培养能与其人类伙伴针对共享决策的因果结构进行协同推理的AI协作者,从而推动有效人机协作团队的设计。