Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstable optimization, and plasticity loss in dense neural operators. In particular, broadband open-channel fluid dynamics and boundary-dominated porous media flows impose incompatible spectral and geometric demands on a single dense parameter path. We introduce Shodh-MoE, a sparse-activated latent transformer architecture for multi-physics transport. Shodh-MoE operates on compressed 16^3 physical latents produced by a physics-informed autoencoder with an intra-tokenizer Helmholtz-style velocity parameterization, restricting decoded states to divergence-free velocity manifolds. The model guarantees exact mass conservation, achieving a physically verifiable velocity divergence of ~2.8 x 10^-10 (evaluated post-hoc in FP64) on 128^3 grids. A Top-1 soft-semantic router dynamically assigns localized latent patches to expert subnetworks, enabling specialized parameter paths for distinct physical mechanisms while preserving shared experts for universal symmetries. In a 20,000-step distributed pretraining run over mixed three-dimensional physical tensors, routing telemetry shows autonomous domain bifurcation: held-out validation tokens from the open-channel domain route exclusively to Expert 0, while porous-media tokens route exclusively to Expert 1. The model converges simultaneously across both regimes, achieving latent validation MSEs of 2.46 x 10^-5 and 9.76 x 10^-6, and decoded physical MSEs of 2.48 x 10^-6 and 1.76 x 10^-6. These results support sparse expert routing as a practical architectural mechanism for mitigating multi-physics interference in universal neural operators.

翻译：将科学机器学习向通用基础模型扩展的过程中，负迁移成为核心瓶颈：不同偏微分方程体系的同步协同训练会引发密集神经算子中的梯度冲突、优化不稳定性和可塑性丧失。具体而言，宽频开渠流体动力学与边界主导多孔介质流动对单一密集参数路径提出了不相容的频谱与几何约束。我们提出Shodh-MoE——面向多物理场输运的稀疏激活潜变量Transformer架构。该模型基于物理信息自编码器生成的16^3压缩潜变量进行运算，该自编码器内置基于亥姆霍兹速度参数化的分词器，将解码状态限定在无散度速度流形上。模型保证严格质量守恒，在128^3网格上实现了可物理验证的速度散度约为2.8×10^-10（事后FP64精度评估）。Top-1软语义路由动态分配局部潜变量补丁至专家子网络，在为不同物理机制建立专用参数路径的同时保留共享专家处理通用对称性。在混合三维物理张量上的20000步分布式预训练过程中，路由遥测显示自主领域分岔：来自开渠域的保留验证令牌被专属路由至专家0，而多孔介质令牌则专属路由至专家1。模型在两个体系同步收敛，达到2.46×10^-5和9.76×10^-6的潜变量验证均方误差，以及2.48×10^-6和1.76×10^-6的解码物理量均方误差。这些结果验证了稀疏专家路由作为缓解通用神经算子中多物理场干扰的实用架构机制。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【普林斯顿博士论文】大规模模型的迁移学习与优化理论

专知会员服务

35+阅读 · 2025年2月15日

【AAAI2025】穿越多模态领域：通过低秩序列多模态适配器实现高效迁移学习

专知会员服务

14+阅读 · 2024年12月13日

资源受限的大模型高效迁移学习算法研究

专知会员服务

27+阅读 · 2024年11月8日

【ICML2023】基于最优多任务插值的多模态基础模型迁移

专知会员服务

31+阅读 · 2023年4月29日