The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a neural network) by fitting the model's residuals with a $k$-nearest neighbor based regressor. The final prediction is then the sum of the original model and the fitted residual regressor. By construction, ResMem can explicitly memorize the training labels. Empirically, we show that ResMem consistently improves the test set generalization of the original prediction model across various standard vision and natural language processing benchmarks. Theoretically, we formulate a stylized linear regression problem and rigorously show that ResMem results in a more favorable test risk over the base predictor.
翻译:现代神经网络卓越的泛化性能,部分归因于其隐式记忆复杂训练模式的能力。受此启发,我们探索了一种通过显式记忆来提升模型泛化性能的新机制。具体而言,我们提出了残差记忆(ResMem)算法,这是一种通过基于$k$近邻的回归器拟合原模型(例如神经网络)的残差来增强现有预测模型的新方法。最终预测结果为原模型与拟合后的残差回归器的输出之和。从构造上看,ResMem能够显式记忆训练标签。实验表明,在多种标准视觉和自然语言处理基准测试中,ResMem均能持续提升原预测模型的测试集泛化性能。理论方面,我们构建了简约的线性回归问题,并严格证明ResMem相较于基础预测器可获得更优的测试风险。