Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.
翻译:信度集是一组被视为不精确已知真实分布候选的概率分布集合。在机器学习中,它们近期因作为不确定性表示的一种有吸引力的形式而受到关注,尤其因其能够同时表示预测中的随机不确定性和认知不确定性。然而,设计学习信度集预测器的方法仍然是一个具有挑战性的问题。本文为此目的利用了共形预测方法。具体而言,我们针对分类任务提出了一种预测信度集的方法,该方法基于由概率分布标记的训练数据。由于我们的方法继承了共形预测的覆盖保证,我们的共形信度集能够以高概率保证有效性(无需对模型或分布做任何假设)。我们展示了该方法在自然语言推理中的适用性——这是一项高度歧义的自然语言任务,其中每条样本通常需要获取多个标注。