We study the problem of \emph{architecture selection} for deep learning models trained to solve partial differential equations (PDEs), asking when transformer-based architectures with learned attention outperform Fourier-domain neural operators. We introduce the \textbf{Multi-Scale Attention Transformer} (\msat{}), a deep learning architecture that encodes spatiotemporal solution histories as token sequences and trains end-to-end via a composite supervised objective with optional physics-informed regularization terms. We conduct a comprehensive empirical evaluation against nine baselines -- including physics-informed neural networks (PINNs), neural operators (FNO, DeepONet, GNOT), and state-space models (Mamba-NO) -- across five benchmark problems from the PINNacle suite, using identical train/test splits and reference data for all methods. \msat{} achieves state-of-the-art generalization on complex geometry problems ($L^2_\mathrm{rel} = 0.0101$ on Heat2D-CG, a $3.7\times$ improvement over FNO) at $34\,\mathrm{s}$ total inference vs.\ $120{,}812\,\mathrm{s}$ for Mamba-NO. Ablation studies over the physics regularization component reveal a precise inductive bias tradeoff: physics priors reduce test error on diffusion-dominated problems but degrade generalization on chaotic and recirculating-flow regimes, directly characterizing the prior misspecification boundary. Approximation error bounds as a function of domain boundary complexity $κ$ provide a theoretical basis for these empirical findings and a principled rule for architecture selection.
翻译:我们研究用于求解偏微分方程(PDE)的深度学习模型的**架构选择**问题,探究基于学习注意力的Transformer架构何时优于傅里叶域神经算子。我们提出了**多尺度注意力Transformer**(\msat{}),这是一种深度学习架构,它将时空求解历史编码为token序列,并通过包含可选物理信息正则化项的复合监督目标进行端到端训练。我们使用所有方法相同的训练/测试划分和参考数据,在来自PINNacle基准套件的五个基准问题上,对九种基线方法——包括物理信息神经网络(PINN)、神经算子(FNO、DeepONet、GNOT)以及状态空间模型(Mamba-NO)——进行了全面的实证评估。在复杂几何问题上,\msat{} 实现了最先进的泛化效果(在Heat2D-CG上相对L²误差为0.0101,比FNO提升3.7倍),其总推理时间为34秒,而Mamba-NO需要120,812秒。针对物理正则化项的消融研究揭示了一种精确的归纳偏差权衡:物理先验能降低扩散主导问题上的测试误差,但在混沌和回流机制下会损害泛化能力,这直接刻画了先验误设边界。关于域边界复杂度κ的近似误差界为这些实证发现提供了理论基础,并为架构选择提供了原则性规则。