The generalization of neural networks is a central challenge in machine learning, especially concerning the performance under distributions that differ from training ones. Current methods, mainly based on the data-driven paradigm such as data augmentation, adversarial training, and noise injection, may encounter limited generalization due to model non-smoothness. In this paper, we propose to investigate generalization from a Partial Differential Equation (PDE) perspective, aiming to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data. Specifically, we first establish the connection between neural network generalization and the smoothness of the solution to a specific PDE, namely ``transport equation''. Building upon this, we propose a general framework that introduces adaptive distributional diffusion into transport equation to enhance the smoothness of its solution, thereby improving generalization. In the context of neural networks, we put this theoretical framework into practice as PDE+ (\textbf{PDE} with \textbf{A}daptive \textbf{D}istributional \textbf{D}iffusion) which diffuses each sample into a distribution covering semantically similar inputs. This enables better coverage of potentially unobserved distributions in training, thus improving generalization beyond merely data-driven methods. The effectiveness of PDE+ is validated in extensive settings, including clean samples and various corruptions, demonstrating its superior performance compared to SOTA methods.
翻译:神经网络泛化能力是机器学习领域的核心挑战,尤其是在处理与训练分布不同的数据时。当前基于数据驱动范式的方法(如数据增强、对抗训练和噪声注入)因模型非光滑性而面临泛化局限性。本文从偏微分方程(PDE)视角出发,旨在直接通过神经网络底层函数增强泛化能力,而非聚焦于调整输入数据。具体而言,我们首先建立了神经网络泛化性与特定PDE(即“输运方程”)解的光滑性之间的关联。基于此,我们提出一个通用框架,通过向输运方程引入自适应分布扩散来增强其解的光滑性,从而提升泛化能力。在神经网络场景中,我们将该理论框架实现为PDE+(含自适应分布扩散的偏微分方程),它将每个样本扩散为覆盖语义相似输入的分布。这能够更好地覆盖训练中可能未观测到的分布,从而在纯数据驱动方法之外实现泛化能力的提升。在包含干净样本及多种数据损坏的广泛场景中,PDE+的有效性得到验证,其性能显著优于当前最优(SOTA)方法。