An open question in \emph{Imprecise Probabilistic Machine Learning} is how to empirically derive a credal region (i.e., a closed and convex family of probabilities on the output space) from the available data, without any prior knowledge or assumption. In classification problems, credal regions are a tool that is able to provide provable guarantees under realistic assumptions by characterizing the uncertainty about the distribution of the labels. Building on previous work, we show that credal regions can be directly constructed using conformal methods. This allows us to provide a novel extension of classical conformal prediction to problems with ambiguous ground truth, that is, when the exact labels for given inputs are not exactly known. The resulting construction enjoys desirable practical and theoretical properties: (i) conformal coverage guarantees, (ii) smaller prediction sets (compared to classical conformal prediction regions) and (iii) disentanglement of uncertainty sources (epistemic, aleatoric). We empirically verify our findings on both synthetic and real datasets.
翻译:在《不精确概率机器学习》中,一个开放性问题是如何在没有任何先验知识或假设的情况下,从现有数据中经验性地推导出信任区域(即输出空间上概率的一个闭凸族)。在分类问题中,信任区域是一种工具,能够通过刻画标签分布的不确定性,在现实的假设下提供可证明的保证。基于先前的工作,我们证明了信任区域可以直接使用共形方法构建。这使我们能够将经典的共形预测新颖地扩展到具有模糊真实标签的问题,即当给定输入的确切标签并非完全已知时。所得到的构造具有理想的实践和理论特性:(i)共形覆盖保证,(ii)更小的预测集(与经典的共形预测区域相比),以及(iii)不确定性来源(认知、偶然)的解耦。我们在合成数据集和真实数据集上经验性地验证了我们的发现。