Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize -- and subsequently show -- that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. We quantify the diversity in the neural activations through information-theoretic measures and find support for our hypothesis on experiments spanning several natural language and vision tasks. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples. Lastly, we demonstrate the utility of our findings for the problem of model selection. The associated code and other resources for this work are available at https://rachitbansal.github.io/information-measures.
翻译:神经网络已知会利用与目标标签共现的虚假特征(或捷径),表现出启发式记忆。另一方面,网络已被证明会记忆训练样本,导致样本级记忆。这些记忆形式阻碍了网络在其训练分布之外的泛化能力。检测此类记忆可能具有挑战性,通常需要研究人员精心构建定制测试集。在这项工作中,我们假设——并随后证明——不同神经元激活模式的多样性反映了模型的泛化与记忆特性。我们通过信息论度量量化神经激活的多样性,并在涵盖多个自然语言与视觉任务的实验中发现支持我们假设的证据。重要的是,我们发现即使针对未标记的分布内样本计算的神经激活,信息组织也能指向这两种记忆形式。最后,我们展示了我们的发现对模型选择问题的实用性。本工作的相关代码及其他资源可从 https://rachitbansal.github.io/information-measures 获取。