We study the understanding of deep neural networks from the scope in which they are trained on. While the accuracy of these models is usually impressive on the aggregate level, they still make mistakes, sometimes on cases that appear to be trivial. Moreover, these models are not reliable in realizing what they do not know leading to failures such as adversarial vulnerability and out-of-distribution failures. Here, we propose a measure for quantifying the ambiguity of inputs for any given model with regard to the scope of its training. We define the ambiguity based on the geometric arrangements of the decision boundaries and the convex hull of training set in the feature space learned by the trained model, and demonstrate that a single ambiguity measure may detect a considerable portion of mistakes of a model on in-distribution samples, adversarial inputs, as well as out-of-distribution inputs. Using our ambiguity measure, a model may abstain from classification when it encounters ambiguous inputs leading to a better model accuracy not just on a given testing set, but on the inputs it may encounter at the world at large. In pursuit of this measure, we develop a theoretical framework that can identify the unknowns of the model in relation to its scope. We put this in perspective with the confidence of the model and develop formulations to identify the regions of the domain which are unknown to the model, yet the model is guaranteed to have high confidence.
翻译:我们研究深度神经网络在其训练范围的理解。尽管这些模型在总体水平上的准确率通常令人印象深刻,但它们仍然会犯错,有时甚至是在看似简单的问题上。此外,这些模型无法可靠地意识到自身未知的领域,从而导致了诸如对抗性脆弱性和分布外失效等问题。本文提出了一种度量方法,用于量化任意给定模型在其训练范围内输入数据的模糊度。我们基于训练好的模型在特征空间中所学习到的决策边界几何排列和训练集的凸包来定义模糊度,并证明单一的模糊度度量可以检测出模型在分布内样本、对抗性输入以及分布外输入中的相当一部分错误。利用我们的模糊度度量,当模型遇到模糊输入时可以拒绝分类,从而不仅能提高模型在给定测试集上的准确率,还能提升其在现实世界中可能遇到的各种输入上的表现。为实现这一度量,我们构建了一个理论框架,用于识别模型在其范围内未知的领域。我们将这一框架与模型的置信度相结合,提出了相应的公式来识别那些模型未知但保证具有高置信度的域空间。