In-Context Operator Networks (ICONs) are models that learn operators across different types of PDEs using a few-shot, in-context approach. Although they show successful generalization to various PDEs, existing methods treat each data point as a single token, and suffer from computational inefficiency when processing dense data, limiting their application in higher spatial dimensions. In this work, we propose Vision In-Context Operator Networks (VICON), incorporating a vision transformer architecture that efficiently processes 2D functions through patch-wise operations. We evaluated our method on three fluid dynamics datasets, demonstrating both superior performance (reducing scaled $L^2$ error by $40\%$ and $61.6\%$ for two benchmark datasets for compressible flows, respectively) and computational efficiency (requiring only one-third of the inference time per frame) in long-term rollout predictions compared to the current state-of-the-art sequence-to-sequence model with fixed timestep prediction: Multiple Physics Pretraining (MPP). Compared to MPP, our method preserves the benefits of in-context operator learning, enabling flexible context formation when dealing with insufficient frame counts or varying timestep values.
翻译:上下文算子网络(ICONs)是一种通过少样本上下文学习方法学习不同类型偏微分方程算子的模型。尽管这些模型已成功推广至多种偏微分方程,但现有方法将每个数据点视为单一标记,在处理密集数据时存在计算效率低下的问题,限制了其在高维空间中的应用。本研究提出视觉上下文算子网络(VICON),通过引入视觉Transformer架构,采用分块操作高效处理二维函数。我们在三个流体动力学数据集上评估了该方法,结果表明:与当前基于固定时间步长预测的先进序列到序列模型——多物理场预训练模型(MPP)相比,VICON在长期推演预测中既展现出卓越性能(在两个可压缩流基准数据集上分别将缩放$L^2$误差降低$40\%$和$61.6\%$),又具备计算高效性(单帧推理时间仅需MPP的三分之一)。相较于MPP,我们的方法保留了上下文算子学习的优势,能够在处理帧数不足或时间步长变化时实现灵活的上下文构建。