As the use of machine learning continues to expand, the importance of ensuring its safety cannot be overstated. A key concern in this regard is the ability to identify whether a given sample is from the training distribution, or is an "Out-Of-Distribution" (OOD) sample. In addition, adversaries can manipulate OOD samples in ways that lead a classifier to make a confident prediction. In this study, we present a novel approach for certifying the robustness of OOD detection within a $\ell_2$-norm around the input, regardless of network architecture and without the need for specific components or additional training. Further, we improve current techniques for detecting adversarial attacks on OOD samples, while providing high levels of certified and adversarial robustness on in-distribution samples. The average of all OOD detection metrics on CIFAR10/100 shows an increase of $\sim 13 \% / 5\%$ relative to previous approaches.
翻译:随着机器学习的广泛应用,确保其安全性的重要性不言而喻。其中一项关键关注点是能否识别给定样本是否来自训练分布,即是否为"分布外"样本。此外,对抗攻击者可能操纵分布外样本,导致分类器做出置信度较高的错误预测。在本研究中,我们提出一种新颖方法,可在输入周围$\ell_2$-范数范围内认证分布外检测的鲁棒性,该方法无需考虑网络架构,也不依赖特定组件或额外训练。同时,我们改进了针对分布外样本的对抗攻击检测技术,并确保分布内样本具备高水平的可认证鲁棒性和对抗鲁棒性。在CIFAR10/100数据集上,所有分布外检测指标的平均值较先前方法提升约$\sim 13\%/5\%$。