Neural networks have achieved impressive results on many technological and scientific tasks. Yet, their empirical successes have outpaced our fundamental understanding of their structure and function. Identifying mechanisms driving the successes of neural networks can provide principled approaches for improving neural network performance and developing simple and effective alternatives. In this work, we isolate a key mechanism driving feature learning in fully connected neural networks by connecting neural feature learning to a statistical estimator known as average gradient outer product. We subsequently leverage this mechanism to design \textit{Recursive Feature Machines} (RFMs), which are kernel machines that learn features. We show that RFMs (1) accurately capture features learned by deep fully connected neural networks, and (2) outperform a broad spectrum of models including neural networks on tabular data. Furthermore, we show how RFMs shed light on recently observed deep learning phenomena including grokking, lottery tickets, simplicity biases, and spurious features. We provide a Python implementation to make our method easily accessible [\url{https://github.com/aradha/recursive_feature_machines}].
翻译:神经网络已在众多科技任务中取得令人瞩目的成果,然而其实证成功已超越对其结构与功能的基础性理解。识别驱动神经网络成功的机制,可为提升性能及开发简洁有效的替代方案提供原则性方法。本研究通过将神经网络特征学习与一种称为平均梯度外积的统计估计量相关联,分离出全连接神经网络中特征学习的关键机制。进而利用该机制设计出递归特征机器(RFMs)——一种能够学习特征的核机器。研究表明,RFMs能(1)精确捕捉深度全连接神经网络所学特征,且(2)在表格数据上超越包括神经网络在内的广泛模型。此外,我们揭示RFMs如何阐明近期观测到的深度学习现象,包括grokking、彩票假说、简单性偏好及伪特征。我们提供Python实现以便于方法获取[网址:\url{https://github.com/aradha/recursive_feature_machines}]。