Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the "true" posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$ rather than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}^{Y|X}$ with a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}^{Y|X}$ using a non-degenerate distribution $\mathbb{P}_{agg}^{Y|X}$. We develop Monte Carlo CP procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically.
翻译:共形预测(CP)通过构建预测集$C(X)$并满足$\mathbb{P}(Y \in C(X))\geq 1-\alpha$(其中$\alpha \in [0,1]$由用户选定),实现严格的量化不确定性分析。该方法依赖校准数据$(X_1,Y_1),...,(X_n,Y_n)$,这些数据来自$\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$。通常默认假设$\mathbb{P}^{Y|X}$为"真实"后验标签分布。然而,在许多实际场景中,标签$Y_1,...,Y_n$通过投票程序聚合专家意见获得,产生独热分布$\mathbb{P}_{vote}^{Y|X}$。对于此类"投票"标签,CP保证针对的是$\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$,而非真实分布$\mathbb{P}$。在真实标注明确的情况下,$\mathbb{P}_{vote}$与$\mathbb{P}$的差异无关紧要。但若因标注模糊导致专家意见分歧,用独热分布$\mathbb{P}_{vote}^{Y|X}$近似$\mathbb{P}^{Y|X}$会忽略这种不确定性。本文提出利用专家意见,通过非退化分布$\mathbb{P}_{agg}^{Y|X}$来近似$\mathbb{P}^{Y|X}$。我们开发了蒙特卡洛CP方法,通过为每个校准样本$X_1,...,X_n$从$\mathbb{P}_{agg}^{Y|X}$中采样多个合成伪标签,提供针对$\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$的保证。在皮肤病变分类案例研究中(专家标注者存在显著分歧),我们证明基于$\mathbb{P}_{vote}$的CP方法会低估专家标注:校准覆盖率为$72\%$时,实际平均偏低$10\%$;而我们的蒙特卡洛CP方法在经验与理论上均弥补了这一差距。