In the growing domain of scientific machine learning, in-context operator learning has shown notable potential in building foundation models, as in this framework the model is trained to learn operators and solve differential equations using prompted data, during the inference stage without weight updates. However, the current model's overdependence on function data overlooks the invaluable human insight into the operator. To address this, we present a transformation of in-context operator learning into a multi-modal paradigm. In particular, we take inspiration from the recent success of large language models, and propose using "captions" to integrate human knowledge about the operator, expressed through natural language descriptions and equations. Also, we introduce a novel approach to train a language-model-like architecture, or directly fine-tune existing language models, for in-context operator learning. We beat the baseline on single-modal learning tasks, and also demonstrated the effectiveness of multi-modal learning in enhancing performance and reducing function data requirements. The proposed method not only significantly enhanced the development of the in-context operator learning paradigm, but also created a new path for the application of language models.
翻译:在科学机器学习这一日益发展的领域中,上下文内算子学习在构建基础模型方面展现出显著潜力——该框架下,模型通过提示数据学习算子并求解微分方程,推理阶段无需权重更新。然而,当前模型对函数数据的过度依赖忽视了人类对算子的宝贵见解。为解决这一问题,我们提出将上下文内算子学习转化为多模态范式。具体而言,受大型语言模型近期成功的启发,我们提出利用"标题"整合关于算子的人类知识,这些知识通过自然语言描述和方程表达。同时,我们引入一种新颖方法训练类语言模型架构,或直接微调现有语言模型,用于上下文内算子学习。我们在单模态学习任务上超越了基线,并展示了多模态学习在提升性能、减少函数数据需求方面的有效性。所提方法不仅显著促进了上下文内算子学习范式的发展,还为语言模型的应用开辟了新路径。