Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance. Despite these advances, our understanding of how pretraining affects neural operators is still limited; studies generally propose tailored architectures and datasets that make it challenging to compare or examine different pretraining frameworks. To address this, we compare various pretraining methods without optimizing architecture choices to characterize pretraining dynamics on different models and datasets as well as to understand its scaling and generalization behavior. We find that pretraining is highly dependent on model and dataset choices, but in general transfer learning or physics-based pretraining strategies work best. In addition, pretraining performance can be further improved by using data augmentations. Lastly, pretraining can be additionally beneficial when fine-tuning in scarce data regimes or when generalizing to downstream data similar to the pretraining distribution. Through providing insights into pretraining neural operators for physics prediction, we hope to motivate future work in developing and evaluating pretraining methods for PDEs.
翻译:偏微分方程建模的预训练近期在扩展神经算子跨数据集应用方面展现出潜力,以提升泛化能力与性能。尽管取得这些进展,我们对预训练如何影响神经算子的理解仍有限;现有研究通常提出定制化的架构与数据集,使得比较或检验不同预训练框架变得困难。为解决此问题,我们在不优化架构选择的前提下比较多种预训练方法,以刻画不同模型与数据集上的预训练动态,并理解其缩放与泛化行为。我们发现预训练高度依赖于模型与数据集选择,但总体而言迁移学习或基于物理的预训练策略效果最佳。此外,通过使用数据增强可进一步提升预训练性能。最后,在稀缺数据场景中进行微调,或当泛化至与预训练分布相似的下游数据时,预训练能带来额外益处。通过深入探讨物理预测中神经算子的预训练机制,我们希望推动未来在偏微分方程预训练方法的开发与评估方面的研究。