Standard conformal prediction methods provide a marginal coverage guarantee, which means that for a random test point, the conformal prediction set contains the true label with a user-specified probability. In many classification problems, we would like to obtain a stronger guarantee--that for test points of a specific class, the prediction set contains the true label with the same user-chosen probability. For the latter goal, existing conformal prediction methods do not work well when there is a limited amount of labeled data per class, as is often the case in real applications where the number of classes is large. We propose a method called clustered conformal prediction that clusters together classes having "similar" conformal scores and performs conformal prediction at the cluster level. Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
翻译:标准共形预测方法提供边际覆盖保证,即对于随机测试点,共形预测集以用户指定的概率包含真实标签。在许多分类问题中,我们期望获得更强的保证——对于特定类别的测试点,预测集以相同的用户选定概率包含真实标签。针对这一目标,当每个类别标记数据有限时(这在类别数量庞大的实际应用中常见),现有共形预测方法表现不佳。我们提出一种名为“聚类共形预测”的方法,该方法将具有“相似”共形分数的类别进行聚类,并在聚类层级执行共形预测。基于四个包含大量类别(多达1000类)图像数据集的实证评估,我们发现聚类共形方法在类别条件覆盖率和集合大小指标上通常优于现有方法。