The generalization ability of Deep Neural Networks (DNNs) is still not fully understood, despite numerous theoretical and empirical analyses. Recently, Allen-Zhu & Li (2023) introduced the concept of multi-views to explain the generalization ability of DNNs, but their main target is ensemble or distilled models, and no method for estimating multi-views used in a prediction of a specific input is discussed. In this paper, we propose Minimal Sufficient Views (MSVs), which is similar to multi-views but can be efficiently computed for real images. MSVs is a set of minimal and distinct features in an input, each of which preserves a model's prediction for the input. We empirically show that there is a clear relationship between the number of MSVs and prediction accuracy across models, including convolutional and transformer models, suggesting that a multi-view like perspective is also important for understanding the generalization ability of (non-ensemble or non-distilled) DNNs.
翻译:深度神经网络(DNNs)的泛化能力至今仍未完全理解,尽管已有大量理论和实证分析。近期,Allen-Zhu & Li(2023)引入了“多视角”概念来解释DNNs的泛化能力,但其主要针对集成或蒸馏模型,且未讨论如何估计特定输入预测中使用的多视角数量。本文提出最小充分视角(MSVs),该概念类似于多视角,但可高效计算于真实图像。MSVs是输入中的一组最小且独特的特征,每个特征均能独立保持模型对该输入的预测。我们通过实验证明,包括卷积模型和Transformer模型在内,MSVs数量与预测准确率之间存在明确关系,这表明多视角类似视角对于理解(非集成或非蒸馏)DNNs的泛化能力同样重要。