The advent of foundation models in AI has significantly advanced general-purpose learning, enabling remarkable capabilities in zero-shot inference and in-context learning. However, training such models on physics data, including solutions to partial differential equations (PDEs), poses a unique challenge due to varying dimensionalities across different systems. Traditional approaches either fix a maximum dimension or employ separate encoders for different dimensionalities, resulting in inefficiencies. To address this, we propose a dimension-agnostic neural network architecture, the Axial Neural Network (XNN), inspired by parameter-sharing structures such as Deep Sets and Graph Neural Networks. XNN generalizes across varying tensor dimensions while maintaining computational efficiency. We convert existing PDE foundation models into axial neural networks and evaluate their performance across three training scenarios: training from scratch, pretraining on multiple PDEs, and fine-tuning on a single PDE. Our experiments show that XNNs perform competitively with original models and exhibit superior generalization to unseen dimensions, highlighting the importance of multidimensional pretraining for foundation models.
翻译:人工智能基础模型的出现显著推进了通用学习的发展,实现了零样本推理和上下文学习的卓越能力。然而,在物理数据(包括偏微分方程的解)上训练此类模型面临独特挑战,因为不同系统的维度存在差异。传统方法要么固定最大维度,要么为不同维度使用独立的编码器,导致效率低下。为解决这一问题,我们提出了一种维度无关的神经网络架构——轴向神经网络,其设计灵感来源于Deep Sets和图神经网络等参数共享结构。XNN能够在不同张量维度间实现泛化,同时保持计算效率。我们将现有的偏微分方程基础模型转化为轴向神经网络,并在三种训练场景下评估其性能:从头训练、多偏微分方程预训练以及单一偏微分方程微调。实验表明,XNN与原模型性能相当,并在未见维度上表现出更优的泛化能力,这凸显了多维预训练对基础模型的重要性。