Conformal prediction under ambiguous ground truth

Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the "true" posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$ rather than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}^{Y|X}$ with a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}^{Y|X}$ using a non-degenerate distribution $\mathbb{P}_{agg}^{Y|X}$. We develop Monte Carlo CP procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically.

翻译：共形预测（CP）通过构建预测集$C(X)$并满足$\mathbb{P}(Y \in C(X))\geq 1-\alpha$（其中$\alpha \in [0,1]$由用户选定），实现严格的量化不确定性分析。该方法依赖校准数据$(X_1,Y_1),...,(X_n,Y_n)$，这些数据来自$\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$。通常默认假设$\mathbb{P}^{Y|X}$为"真实"后验标签分布。然而，在许多实际场景中，标签$Y_1,...,Y_n$通过投票程序聚合专家意见获得，产生独热分布$\mathbb{P}_{vote}^{Y|X}$。对于此类"投票"标签，CP保证针对的是$\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$，而非真实分布$\mathbb{P}$。在真实标注明确的情况下，$\mathbb{P}_{vote}$与$\mathbb{P}$的差异无关紧要。但若因标注模糊导致专家意见分歧，用独热分布$\mathbb{P}_{vote}^{Y|X}$近似$\mathbb{P}^{Y|X}$会忽略这种不确定性。本文提出利用专家意见，通过非退化分布$\mathbb{P}_{agg}^{Y|X}$来近似$\mathbb{P}^{Y|X}$。我们开发了蒙特卡洛CP方法，通过为每个校准样本$X_1,...,X_n$从$\mathbb{P}_{agg}^{Y|X}$中采样多个合成伪标签，提供针对$\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$的保证。在皮肤病变分类案例研究中（专家标注者存在显著分歧），我们证明基于$\mathbb{P}_{vote}$的CP方法会低估专家标注：校准覆盖率为$72\%$时，实际平均偏低$10\%$；而我们的蒙特卡洛CP方法在经验与理论上均弥补了这一差距。

相关内容

关注 1

这是第25届年度会议，讨论有约束计算的所有方面，包括理论、算法、环境、语言、模型、系统和应用，如决策、资源分配、调度、配置和规划。为了纪念25周年，吉恩·弗洛伊德创作了一本“虚拟卷”来庆祝这个系列会议。信息可以在这里找到。约束编程协会有本系列中以前的会议列表。CP 2019计划将包括展示关于约束技术的高质量科学论文。除了通常的技术轨道外，CP 2019年会议还将有主题轨道。每个赛道都有一个专门的小组委员会，以确保有能力的评审员将审查这些领域的人提交的论文。官网链接：https://cp2019.a4cp.org/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日