Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance. Despite these advances, our understanding of how pretraining affects neural operators is still limited; studies generally propose tailored architectures and datasets that make it challenging to compare or examine different pretraining frameworks. To address this, we compare various pretraining methods without optimizing architecture choices to characterize pretraining dynamics on different models and datasets as well as to understand its scaling and generalization behavior. We find that pretraining is highly dependent on model and dataset choices, but in general transfer learning or physics-based pretraining strategies work best. In addition, pretraining performance can be further improved by using data augmentations. Lastly, pretraining is additionally beneficial when fine-tuning in scarce data regimes or when generalizing to downstream data similar to the pretraining distribution. Through providing insights into pretraining neural operators for physics prediction, we hope to motivate future work in developing and evaluating pretraining methods for PDEs.
翻译:偏微分方程建模的预训练最近在扩展神经算子跨数据集以提升泛化性和性能方面显示出潜力。尽管取得了这些进展,我们对预训练如何影响神经算子的理解仍然有限;研究通常提出定制的架构和数据集,这使得比较或检验不同预训练框架具有挑战性。为解决这一问题,我们在不优化架构选择的情况下比较了多种预训练方法,以刻画不同模型和数据集上的预训练动态,并理解其缩放和泛化行为。我们发现预训练高度依赖于模型和数据集的选择,但通常迁移学习或基于物理的预训练策略效果最佳。此外,通过使用数据增强可以进一步提升预训练性能。最后,在数据稀缺的微调场景中,或当泛化至与预训练分布相似的下游数据时,预训练尤为有益。通过为物理预测的神经算子预训练提供见解,我们希望激励未来在偏微分方程预训练方法的开发和评估方面的工作。