The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a neural network) by fitting the model's residuals with a $k$-nearest neighbor based regressor. The final prediction is then the sum of the original model and the fitted residual regressor. By construction, ResMem can explicitly memorize the training labels. Empirically, we show that ResMem consistently improves the test set generalization of the original prediction model across various standard vision and natural language processing benchmarks. Theoretically, we formulate a stylized linear regression problem and rigorously show that ResMem results in a more favorable test risk over the base predictor.
翻译:现代神经网络令人印象深刻的泛化性能部分归功于其隐式记忆复杂训练模式的能力。受此启发,我们探索了一种通过显式记忆提升模型泛化性能的新机制。具体而言,我们提出了残差记忆(ResMem)算法,这是一种通过将模型残差拟合至基于$k$近邻的回归器来增强现有预测模型(如神经网络)的新方法。最终预测结果为原始模型与拟合残差回归器之和。从结构设计上看,ResMem能够显式记忆训练标签。实验表明,在多个标准视觉与自然语言处理基准测试中,ResMem能持续提升原始预测模型的测试集泛化性能。理论层面,我们构建了一个简约的线性回归问题,并严格证明了ResMem相较于基础预测器能获得更优的测试风险。