Conformal Prediction (CP) is a popular uncertainty quantification method that provides distribution-free, statistically valid prediction sets, assuming that training and test data are exchangeable. In such a case, CP's prediction sets are guaranteed to cover the (unknown) true test output with a user-specified probability. Nevertheless, this guarantee is violated when the data is subjected to adversarial attacks, which often result in a significant loss of coverage. Recently, several approaches have been put forward to recover CP guarantees in this setting. These approaches leverage variations of randomised smoothing to produce conservative sets which account for the effect of the adversarial perturbations. They are, however, limited in that they only support $\ell^2$-bounded perturbations and classification tasks. This paper introduces VRCP (Verifiably Robust Conformal Prediction), a new framework that leverages recent neural network verification methods to recover coverage guarantees under adversarial attacks. Our VRCP method is the first to support perturbations bounded by arbitrary norms including $\ell^1$, $\ell^2$, and $\ell^\infty$, as well as regression tasks. We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and TinyImageNet) and regression tasks for deep reinforcement learning environments. In every case, VRCP achieves above nominal coverage and yields significantly more efficient and informative prediction regions than the SotA.
翻译:保形预测是一种流行的不确定性量化方法,它在假设训练数据与测试数据可交换的前提下,提供无分布且统计有效的预测集。在此条件下,保形预测的预测集能以用户指定的概率保证覆盖(未知的)真实测试输出。然而,当数据遭受对抗攻击时,这一保证会被破坏,通常导致覆盖率显著下降。最近,已有若干方法被提出以在此设定下恢复保形预测的保证。这些方法利用随机平滑的变体来生成能够考虑对抗扰动影响的保守集合。然而,这些方法存在局限,仅支持 $\ell^2$ 范数有界的扰动和分类任务。本文提出了VRCP(可验证鲁棒的保形预测),这是一个利用最新神经网络验证方法以在对抗攻击下恢复覆盖率保证的新框架。我们的VRCP方法是首个支持包括 $\ell^1$、$\ell^2$ 和 $\ell^\infty$ 在内的任意范数有界扰动以及回归任务的方法。我们在图像分类任务(CIFAR10、CIFAR100 和 TinyImageNet)以及深度强化学习环境的回归任务上评估并比较了我们的方法。在所有案例中,VRCP均达到了高于名义水平的覆盖率,并且相比现有技术,产生了显著更高效且信息量更大的预测区域。