The generalization of neural networks is a central challenge in machine learning, especially concerning the performance under distributions that differ from training ones. Current methods, mainly based on the data-driven paradigm such as data augmentation, adversarial training, and noise injection, may encounter limited generalization due to model non-smoothness. In this paper, we propose to investigate generalization from a Partial Differential Equation (PDE) perspective, aiming to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data. Specifically, we first establish the connection between neural network generalization and the smoothness of the solution to a specific PDE, namely "transport equation". Building upon this, we propose a general framework that introduces adaptive distributional diffusion into transport equation to enhance the smoothness of its solution, thereby improving generalization. In the context of neural networks, we put this theoretical framework into practice as $\textbf{PDE+}$ ($\textbf{PDE}$ with $\textbf{A}$daptive $\textbf{D}$istributional $\textbf{D}$iffusion) which diffuses each sample into a distribution covering semantically similar inputs. This enables better coverage of potentially unobserved distributions in training, thus improving generalization beyond merely data-driven methods. The effectiveness of PDE+ is validated through extensive experimental settings, demonstrating its superior performance compared to SOTA methods.
翻译:神经网络泛化是机器学习中的核心挑战,尤其是在处理与训练分布不同的数据时。当前主要基于数据驱动范式的方法(如数据增强、对抗训练和噪声注入)可能因模型非光滑性而面临泛化局限性。本文从偏微分方程(PDE)视角研究泛化问题,旨在直接通过神经网络底层函数而非调整输入数据来提升泛化能力。具体而言,我们首先建立了神经网络泛化与特定PDE(即"输运方程")解的光滑性之间的关联。在此基础上,我们提出一个通用框架,通过向输运方程引入自适应分布扩散增强其解的光滑性,从而提升泛化能力。在神经网络背景下,我们将该理论框架实践为$\textbf{PDE+}$(带$\textbf{A}$自适应$\textbf{D}$分布$\textbf{D}$扩散的$\textbf{PDE}$),该方法将每个样本扩散为覆盖语义相似输入的分布。这使得模型能更好地覆盖训练中潜在未观测的分布,从而超越纯数据驱动方法实现更优泛化。通过广泛的实验设置验证了PDE+的有效性,其性能显著优于当前最优方法。