Machine-Guided Discovery of a Real-World Rogue Wave Model

Big data and large-scale machine learning have had a profound impact on science and engineering, particularly in fields focused on forecasting and prediction. Yet, it is still not clear how we can use the superior pattern matching abilities of machine learning models for scientific discovery. This is because the goals of machine learning and science are generally not aligned. In addition to being accurate, scientific theories must also be causally consistent with the underlying physical process and allow for human analysis, reasoning, and manipulation to advance the field. In this paper, we present a case study on discovering a new symbolic model for oceanic rogue waves from data using causal analysis, deep learning, parsimony-guided model selection, and symbolic regression. We train an artificial neural network on causal features from an extensive dataset of observations from wave buoys, while selecting for predictive performance and causal invariance. We apply symbolic regression to distill this black-box model into a mathematical equation that retains the neural network's predictive capabilities, while allowing for interpretation in the context of existing wave theory. The resulting model reproduces known behavior, generates well-calibrated probabilities, and achieves better predictive scores on unseen data than current theory. This showcases how machine learning can facilitate inductive scientific discovery, and paves the way for more accurate rogue wave forecasting.

翻译：大数据和大规模机器学习对科学和工程领域产生了深远影响，尤其是在预测和预报相关领域。然而，如何利用机器学习模型的卓越模式匹配能力促进科学发现仍不明确，因为机器学习与科学的目标通常存在差异。除了准确性之外，科学理论还必须与潜在物理过程保持因果一致性，并允许人类进行分析、推理和操控以推动领域发展。本文通过案例研究，展示了如何结合因果分析、深度学习、简约性引导的模型选择以及符号回归，从数据中发现海洋畸形波的符号化新模型。我们基于波浪浮标观测数据的广泛数据集，在因果特征上训练人工神经网络，同时兼顾预测性能与因果不变性。通过符号回归将这一黑箱模型蒸馏为数学方程，该方程既保留神经网络的预测能力，又允许在现有波浪理论框架下进行解释。最终模型不仅重现已知行为、生成校准良好的概率，且在未见数据上取得优于现有理论的预测得分。这揭示了机器学习如何促进归纳式科学发现，并为更精准的畸形波预报铺平道路。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/