We study out-of-distribution (OOD) prediction behavior of neural networks when they classify images from unseen classes or corrupted images. To probe the OOD behavior, we introduce a new measure, nearest category generalization (NCG), where we compute the fraction of OOD inputs that are classified with the same label as their nearest neighbor in the training set. Our motivation stems from understanding the prediction patterns of adversarially robust networks, since previous work has identified unexpected consequences of training to be robust to norm-bounded perturbations. We find that robust networks have consistently higher NCG accuracy than natural training, even when the OOD data is much farther away than the robustness radius. This implies that the local regularization of robust training has a significant impact on the network's decision regions. We replicate our findings using many datasets, comparing new and existing training methods. Overall, adversarially robust networks resemble a nearest neighbor classifier when it comes to OOD data. Code available at https://github.com/yangarbiter/nearest-category-generalization.
翻译:我们研究了神经网络在对未见类别图像或损坏图像进行分类时的分布外(OOD)预测行为。为了探测OOD行为,我们引入了一种新的度量——最近邻类别泛化(NCG),即计算被分类为与训练集中最近邻相同标签的OOD输入所占的比例。我们的研究动机源于理解对抗鲁棒网络的预测模式,因为已有研究识别出训练以应对范数有界扰动会带来意料之外的后果。我们发现,即使OOD数据远离鲁棒半径,鲁棒网络的NCG准确率也始终高于自然训练。这表明鲁棒训练的局部正则化对网络的决策区域产生了显著影响。我们使用多个数据集复制了研究结果,并比较了新旧训练方法。总体而言,对抗鲁棒网络在处理OOD数据时类似于最近邻分类器。代码见https://github.com/yangarbiter/nearest-category-generalization。