A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enables precise characterization of this phenomenon by bridging two classical lines of work in the statistical physics of learning: the teacher-student framework for generalization and Gardner-style capacity analysis for memorization. In the RAF model, a fraction $1 - \varepsilon$ of training labels is generated by a structured teacher rule, while a fraction $\varepsilon$ consists of unstructured facts with random labels. We characterize when the learner can simultaneously recover the underlying rule - allowing generalization to new data - and memorize the unstructured examples. Our results quantify how overparameterization enables the simultaneous realization of these two objectives: sufficient excess capacity supports memorization, while regularization and the choice of kernel or nonlinearity control the allocation of capacity between rule learning and memorization. The RAF model provides a theoretical foundation for understanding how modern neural networks can infer structure while storing rare or non-compressible information.
翻译:现代神经网络的一项关键能力是能够同时学习潜在规则并记忆特定事实或异常情况。然而,对这种双重能力的理论理解仍然有限。我们引入了规则-事实(RAF)模型,这是一个最小可解设定,通过桥接统计物理学习中两条经典研究路线——用于泛化的师生框架和用于记忆的Gardner式容量分析——能够精确刻画这一现象。在RAF模型中,训练标签的$1 - \varepsilon$部分由结构化教师规则生成,而$\varepsilon$部分则由带有随机标签的非结构化事实组成。我们刻画了学习者何时能够同时恢复潜在规则(从而泛化到新数据)并记忆非结构化样本。我们的结果量化了过参数化如何使这两个目标得以同步实现:充足的过剩容量支持记忆,而正则化以及核函数或非线性的选择则控制着容量在规则学习与记忆之间的分配。RAF模型为理解现代神经网络如何在存储罕见或不可压缩信息的同时推断结构提供了理论基础。