In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfortunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. Applying conformal prediction using such labels underestimates uncertainty. Indeed, when expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have ``crisp'', definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such ambiguous ground truth settings which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.
翻译:在安全关键型分类任务中,保形预测通过提供以用户指定概率包含真实类别的置信集,实现严格的量化不确定性分析。该方法通常假设可利用一个保留的校准集,且该集合可获取地面真值标签。然而在许多领域,此类标签难以获取,通常需通过汇总专家意见进行近似。事实上,几乎所有数据集(包括CIFAR和ImageNet等知名数据集)均存在此问题。使用此类标签进行保形预测会低估不确定性。当专家意见不可调和时,标签本身即存在固有模糊性。即我们缺乏“清晰”、确定性的地面真值标签,而这种不确定性应在校准过程中予以考虑。本文针对此类模糊地面真值场景,开发了一种基于输入标签后验分布近似的保形预测框架。我们通过合成数据集与真实数据集(包括皮肤科病变分类案例研究)验证了该方法的有效性。