Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. Yet, its performance is known to degrade under distribution shift and long-tailed class distributions, which are often present in real world applications. Here, we characterize the performance of several post-hoc and training-based conformal prediction methods under these settings, providing the first empirical evaluation on large-scale datasets and models. We show that across numerous conformal methods and neural network families, performance greatly degrades under distribution shifts violating safety guarantees. Similarly, we show that in long-tailed settings the guarantees are frequently violated on many classes. Understanding the limitations of these methods is necessary for deployment in real world and safety-critical applications.
翻译:保形预测已成为一种严格的方法,可为深度学习模型提供可靠的置信度估计与安全保证。然而,在现实应用中常见的分布偏移和长尾类别分布下,其性能已知会下降。本研究系统刻画了多种事后训练与基于训练的保形预测方法在上述场景中的表现,首次在大规模数据集与模型上提供了实证评估。我们表明,在多种保形方法和神经网络家族中,分布偏移会严重损害性能,从而违背安全保证。类似地,在长尾设置下,许多类别的保障也频繁被违反。理解这些方法的局限性对于真实世界与安全关键应用的部署至关重要。