AI predictive systems are increasingly embedded in decision making pipelines, shaping high stakes choices once made solely by humans. Yet robust decisions under uncertainty still rely on capabilities that current AI lacks: domain knowledge not captured by data, long horizon context, and reasoning grounded in the physical world. This gap has motivated growing efforts to design collaborative frameworks that combine the complementary strengths of humans and AI. This work advances this vision by identifying the fundamental principles of Human AI collaboration within uncertainty quantification, a key component of reliable decision making. We introduce Human AI Collaborative Uncertainty Quantification, a framework that formalizes how an AI model can refine a human expert's proposed prediction set with two goals: avoiding counterfactual harm, ensuring the AI does not degrade correct human judgments, and complementarity, enabling recovery of correct outcomes the human missed. At the population level, we show that the optimal collaborative prediction set follows an intuitive two threshold structure over a single score function, extending a classical result in conformal prediction. Building on this insight, we develop practical offline and online calibration algorithms with provable distribution free finite sample guarantees. The online method adapts to distribution shifts, including human behavior evolving through interaction with AI, a phenomenon we call Human to AI Adaptation. Experiments across image classification, regression, and text based medical decision making show that collaborative prediction sets consistently outperform either agent alone, achieving higher coverage and smaller set sizes across various conditions.
翻译:人工智能预测系统正日益嵌入决策流程,影响着以往仅由人类做出的高风险选择。然而,在不确定性下的稳健决策仍依赖于当前AI所缺乏的能力:数据未能捕获的领域知识、长时域上下文以及基于物理世界的推理。这一差距推动了日益增长的研究努力,旨在设计结合人类与AI互补优势的协同框架。本文通过识别不确定性量化(可靠决策的关键组成部分)中人机协作的基本原理,推进了这一愿景。我们提出了人机协同不确定性量化框架,该框架形式化了AI模型如何通过两个目标来优化人类专家提出的预测集合:避免反事实损害(确保AI不会降低正确的人类判断)与互补性(能够恢复人类遗漏的正确结果)。在群体层面,我们证明最优协同预测集合遵循基于单一评分函数的直观双阈值结构,这扩展了保形预测中的经典结论。基于这一洞见,我们开发了具有可证明的分布无关有限样本保证的实用离线和在线校准算法。在线方法能够适应分布偏移,包括人类通过与AI交互而演化的行为(我们称之为人类对AI的适应)。在图像分类、回归和基于文本的医疗决策等任务上的实验表明,协同预测集合始终优于任何单一智能体,在各种条件下实现了更高的覆盖率和更小的集合规模。