Operator learning is an emerging area of machine learning which aims to learn mappings between infinite dimensional function spaces. Here we uncover a connection between operator learning architectures and conditioned neural fields from computer vision, providing a unified perspective for examining differences between popular operator learning models. We find that many commonly used operator learning models can be viewed as neural fields with conditioning mechanisms restricted to point-wise and/or global information. Motivated by this, we propose the Continuous Vision Transformer (CViT), a novel neural operator architecture that employs a vision transformer encoder and uses cross-attention to modulate a base field constructed with a trainable grid-based positional encoding of query coordinates. Despite its simplicity, CViT achieves state-of-the-art results across challenging benchmarks in climate modeling and fluid dynamics. Our contributions can be viewed as a first step towards adapting advanced computer vision architectures for building more flexible and accurate machine learning models in physical sciences.
翻译:算子学习是机器学习的一个新兴领域,旨在学习无限维函数空间之间的映射。本文揭示了算子学习架构与计算机视觉中条件神经场之间的联系,为审视主流算子学习模型之间的差异提供了一个统一视角。我们发现,许多常用的算子学习模型可被视为条件机制仅限于逐点及/或全局信息的神经场。受此启发,我们提出连续视觉变换器(CViT),这是一种新颖的神经算子架构,它采用视觉变换器编码器,并利用交叉注意力来调制一个由基于可训练网格的查询坐标位置编码构建的基础场。尽管结构简洁,CViT在气候建模和流体动力学等具有挑战性的基准测试中均取得了最先进的结果。我们的贡献可视为将先进的计算机视觉架构应用于构建物理科学中更灵活、更精确的机器学习模型的第一步。