In this article, we review the literature on statistical theories of neural networks from three perspectives. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression or classification. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks, in that tools from the approximation theory are adopted. Through these constructions, the width and depth of the networks can be expressed in terms of sample size, data dimension, and function smoothness. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. In the last part, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs). The former two models are known to be the main pillars of the modern generative AI era, while ICL is a strong capability of LLMs in learning from a few examples in the context. Finally, we conclude the paper by suggesting several promising directions for deep learning theory.
翻译:本文从三个视角综述神经网络统计理论的相关文献。第一部分回顾非参数回归或分类框架下神经网络过风险的现有成果。这些结果依赖于网络结构的显式构造,通过引入逼近理论工具实现过风险的快速收敛速率。基于此类构造,网络宽度与深度可表示为样本量、数据维度和函数光滑性的函数。然而,其分析仅适用于深度神经网络高度非凸景观中的全局最优解。这促使我们在第二部分回顾神经网络的训练动力学。具体而言,我们综述了试图解答"基于梯度的训练方法如何使神经网络在未见数据上实现良好泛化"的文献,重点介绍两个经典范式:神经正切核(NTK)范式与平均场(MF)范式。最后部分回顾生成模型的最新理论进展,包括生成对抗网络(GAN)、扩散模型以及大语言模型(LLM)中的上下文学习(ICL)。前两类模型被视为现代生成式AI时代的主要支柱,而ICL则是LLM基于上下文少量样本进行学习的强大能力。最后,本文通过提出深度学习理论的若干潜在发展方向进行总结。