Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.
翻译:尽管深度学习广受欢迎且取得了巨大成功,但人们对神经网络何时、如何以及为何能泛化到未见样本仍缺乏深入理解。由于学习可视为从数据中提取信息的过程,我们从形式化角度研究了训练过程中神经网络捕获的信息。具体而言,我们首先从信息论视角审视含噪标签场景下的学习过程,推导出限制权重中标签噪声信息的学习算法,进而定义单个样本提供给深度网络训练的独特信息概念,揭示神经网络对非典型、歧义或代表性不足子群体样本的行为特征。通过建立非平凡泛化差距界,我们将样本信息量与泛化性能相关联。最后,通过研究知识蒸馏,我们强调了数据复杂度和标签复杂度在泛化中的关键作用。总体而言,本研究为理解神经网络泛化机制提供了更深入的洞察。