Despite the recent popularity of attention-based neural architectures in core AI fields like natural language processing (NLP) and computer vision (CV), their potential in modeling complex physical systems remains under-explored. Learning problems in physical systems are often characterized as discovering operators that map between function spaces based on a few instances of function pairs. This task frequently presents a severely ill-posed PDE inverse problem. In this work, we propose a novel neural operator architecture based on the attention mechanism, which we coin Nonlocal Attention Operator (NAO), and explore its capability towards developing a foundation physical model. In particular, we show that the attention mechanism is equivalent to a double integral operator that enables nonlocal interactions among spatial tokens, with a data-dependent kernel characterizing the inverse mapping from data to the hidden parameter field of the underlying operator. As such, the attention mechanism extracts global prior information from training data generated by multiple systems, and suggests the exploratory space in the form of a nonlinear kernel map. Consequently, NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and achieving generalizability. We empirically demonstrate the advantages of NAO over baseline neural models in terms of generalizability to unseen data resolutions and system states. Our work not only suggests a novel neural operator architecture for learning interpretable foundation models of physical systems, but also offers a new perspective towards understanding the attention mechanism.
翻译:尽管基于注意力的神经架构在自然语言处理(NLP)和计算机视觉(CV)等核心人工智能领域近期广受欢迎,但它们在建模复杂物理系统方面的潜力仍未得到充分探索。物理系统中的学习问题通常被描述为基于少量函数对实例,发现函数空间之间的映射算子。这一任务常常呈现为严重不适定的偏微分方程(PDE)反问题。在本工作中,我们提出了一种基于注意力机制的新型神经算子架构,我们将其命名为非局部注意力算子(NAO),并探索其构建基础物理模型的能力。特别地,我们证明了注意力机制等价于一个双重积分算子,该算子能够实现空间标记之间的非局部交互,其数据依赖的核函数刻画了从数据到底层算子隐藏参数场的逆映射。因此,注意力机制从多个系统生成的训练数据中提取全局先验信息,并以非线性核映射的形式提示探索空间。由此,NAO能够通过编码正则化并实现泛化能力,从而解决反PDE问题中的不适定性和秩亏缺问题。我们通过实验证明了NAO在泛化到未见数据分辨率和系统状态方面相较于基线神经模型的优势。我们的工作不仅为学习物理系统的可解释基础模型提出了一种新颖的神经算子架构,也为理解注意力机制提供了新的视角。