Standard conformal prediction methods provide a marginal coverage guarantee, which means that for a random test point, the conformal prediction set contains the true label with a user-chosen probability. In many classification problems, we would like to obtain a stronger guarantee -- that for test points of a specific class, the prediction set contains the true label with the same user-chosen probability. Existing conformal prediction methods do not work well when there is a limited amount of labeled data per class, as is often the case in real applications where the number of classes is large. We propose a method called clustered conformal prediction, which clusters together classes that have "similar" conformal scores and then performs conformal prediction at the cluster level. Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
翻译:标准共形预测方法提供边际覆盖保证,即对于随机测试点,共形预测集以用户选择的概率包含真实标签。在许多分类问题中,我们希望获得更强的保证——对于特定类别的测试点,预测集以相同的用户选择概率包含真实标签。当每个类别的标注数据有限时(这在类别数量庞大的实际应用中常见),现有共形预测方法表现不佳。我们提出一种称为聚类共形预测的方法,该方法将具有"相似"共形得分的类别聚类,然后在聚类层面执行共形预测。基于对四个包含多达1000个类别的图像数据集的实证评估,我们发现聚类共形方法在类别条件覆盖率和预测集大小指标上通常优于现有方法。