轴向神经网络：面向维度无关的基础模型 (Axial Neural Networks for Dimension-Free Foundation Models)

The advent of foundation models in AI has significantly advanced general-purpose learning, enabling remarkable capabilities in zero-shot inference and in-context learning. However, training such models on physics data, including solutions to partial differential equations (PDEs), poses a unique challenge due to varying dimensionalities across different systems. Traditional approaches either fix a maximum dimension or employ separate encoders for different dimensionalities, resulting in inefficiencies. To address this, we propose a dimension-agnostic neural network architecture, the Axial Neural Network (XNN), inspired by parameter-sharing structures such as Deep Sets and Graph Neural Networks. XNN generalizes across varying tensor dimensions while maintaining computational efficiency. We convert existing PDE foundation models into axial neural networks and evaluate their performance across three training scenarios: training from scratch, pretraining on multiple PDEs, and fine-tuning on a single PDE. Our experiments show that XNNs perform competitively with original models and exhibit superior generalization to unseen dimensions, highlighting the importance of multidimensional pretraining for foundation models.

翻译：人工智能基础模型的出现显著推进了通用学习的发展，实现了零样本推理和上下文学习的卓越能力。然而，在物理数据（包括偏微分方程的解）上训练此类模型面临独特挑战，因为不同系统的维度存在差异。传统方法要么固定最大维度，要么为不同维度使用独立的编码器，导致效率低下。为解决这一问题，我们提出了一种维度无关的神经网络架构——轴向神经网络，其设计灵感来源于Deep Sets和图神经网络等参数共享结构。XNN能够在不同张量维度间实现泛化，同时保持计算效率。我们将现有的偏微分方程基础模型转化为轴向神经网络，并在三种训练场景下评估其性能：从头训练、多偏微分方程预训练以及单一偏微分方程微调。实验表明，XNN与原模型性能相当，并在未见维度上表现出更优的泛化能力，这凸显了多维预训练对基础模型的重要性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/