Fine-tuning Neural-Operator architectures for training and generalization

In this work, we present an analysis of the generalization of Neural Operators (NOs) and derived architectures. We proposed a family of networks, which we name (${\textit{s}}{\text{NO}}+\varepsilon$), where we modify the layout of NOs towards an architecture resembling a Transformer; mainly, we substitute the Attention module with the Integral Operator part of NOs. The resulting network preserves universality, has a better generalization to unseen data, and similar number of parameters as NOs. On the one hand, we study numerically the generalization by gradually transforming NOs into ${\textit{s}}{\text{NO}}+\varepsilon$ and verifying a reduction of the test loss considering a time-harmonic wave dataset with different frequencies. We perform the following changes in NOs: (a) we split the Integral Operator (non-local) and the (local) feed-forward network (MLP) into different layers, generating a {\it sequential} structure which we call sequential Neural Operator (${\textit{s}}{\text{NO}}$), (b) we add the skip connection, and layer normalization in ${\textit{s}}{\text{NO}}$, and (c) we incorporate dropout and stochastic depth that allows us to generate deep networks. In each case, we observe a decrease in the test loss in a wide variety of initialization, indicating that our changes outperform the NO. On the other hand, building on infinite-dimensional Statistics, and in particular the Dudley Theorem, we provide bounds of the Rademacher complexity of NOs and ${\textit{s}}{\text{NO}}$, and we find the following relationship: the upper bound of the Rademacher complexity of the ${\textit{s}}{\text{NO}}$ is a lower-bound of the NOs, thereby, the generalization error bound of ${\textit{s}}{\text{NO}}$ is smaller than NO, which further strengthens our numerical results.

翻译：本文对神经算子及其衍生架构的泛化性能进行了系统分析。我们提出了一系列命名为(${\textit{s}}{\text{NO}}+\varepsilon$)的网络族，通过调整神经算子的布局使其接近Transformer架构：主要创新在于用神经算子的积分算子部分替代注意力模块。该网络在保持通用性的同时，对未见数据具有更强的泛化能力，且参数量与标准神经算子相当。一方面，我们通过逐步将神经算子转化为${\textit{s}}{\text{NO}}+\varepsilon$架构，并利用不同频率的时谐波数据集验证测试损失下降，对泛化性能进行了数值研究。具体改进包括：(a) 将积分算子（非局部）与前馈网络（局部MLP）分离至不同层级，形成名为"序列化神经算子"的{\it 序列化}结构；(b) 在${\textit{s}}{\text{NO}}$中引入跳跃连接与层归一化；(c) 集成dropout与随机深度机制以构建深层网络。实验表明，在多种初始化条件下，每次改进均显著降低测试损失，证实所提方法全面优于原始神经算子。另一方面，基于无穷维统计理论（特别是Dudley定理），我们推导出神经算子与${\textit{s}}{\text{NO}}$的Rademacher复杂度边界，并揭示如下关键关系：${\textit{s}}{\text{NO}}$的Rademacher复杂度上界恰好是神经算子的下界，因此${\textit{s}}{\text{NO}}$的泛化误差界严格小于神经算子，这进一步佐证了我们的数值实验结果。