As the use of machine learning continues to expand, the importance of ensuring its safety cannot be overstated. A key concern in this regard is the ability to identify whether a given sample is from the training distribution, or is an "Out-Of-Distribution" (OOD) sample. In addition, adversaries can manipulate OOD samples in ways that lead a classifier to make a confident prediction. In this study, we present a novel approach for certifying the robustness of OOD detection within a $\ell_2$-norm around the input, regardless of network architecture and without the need for specific components or additional training. Further, we improve current techniques for detecting adversarial attacks on OOD samples, while providing high levels of certified and adversarial robustness on in-distribution samples. The average of all OOD detection metrics on CIFAR10/100 shows an increase of $\sim 13 \% / 5\%$ relative to previous approaches.
翻译:随着机器学习应用的持续扩展,确保其安全性变得至关重要。其中一个关键问题是识别给定样本是否来自训练分布,或是“分布外”(OOD)样本。此外,攻击者可能操纵OOD样本,导致分类器做出高置信度的预测。本研究提出了一种新方法,可在输入周围$\ell_2$范数内认证OOD检测的鲁棒性,该方法不依赖网络架构,无需特定组件或额外训练。同时,我们改进了现有技术以检测针对OOD样本的对抗攻击,并提供了对分布内样本的高水平认证鲁棒性和对抗鲁棒性。在CIFAR10/100数据集上,所有OOD检测指标的平均值较先前方法提升了约$13\%/5\%$。