Big data and large-scale machine learning have had a profound impact on science and engineering, particularly in fields focused on forecasting and prediction. Yet, it is still not clear how we can use the superior pattern matching abilities of machine learning models for scientific discovery. This is because the goals of machine learning and science are generally not aligned. In addition to being accurate, scientific theories must also be causally consistent with the underlying physical process and allow for human analysis, reasoning, and manipulation to advance the field. In this paper, we present a case study on discovering a new symbolic model for oceanic rogue waves from data using causal analysis, deep learning, parsimony-guided model selection, and symbolic regression. We train an artificial neural network on causal features from an extensive dataset of observations from wave buoys, while selecting for predictive performance and causal invariance. We apply symbolic regression to distill this black-box model into a mathematical equation that retains the neural network's predictive capabilities, while allowing for interpretation in the context of existing wave theory. The resulting model reproduces known behavior, generates well-calibrated probabilities, and achieves better predictive scores on unseen data than current theory. This showcases how machine learning can facilitate inductive scientific discovery, and paves the way for more accurate rogue wave forecasting.
翻译:大数据和大规模机器学习对科学和工程领域产生了深远影响,尤其是在预测和预报相关领域。然而,如何利用机器学习模型的卓越模式匹配能力促进科学发现仍不明确,因为机器学习与科学的目标通常存在差异。除了准确性之外,科学理论还必须与潜在物理过程保持因果一致性,并允许人类进行分析、推理和操控以推动领域发展。本文通过案例研究,展示了如何结合因果分析、深度学习、简约性引导的模型选择以及符号回归,从数据中发现海洋畸形波的符号化新模型。我们基于波浪浮标观测数据的广泛数据集,在因果特征上训练人工神经网络,同时兼顾预测性能与因果不变性。通过符号回归将这一黑箱模型蒸馏为数学方程,该方程既保留神经网络的预测能力,又允许在现有波浪理论框架下进行解释。最终模型不仅重现已知行为、生成校准良好的概率,且在未见数据上取得优于现有理论的预测得分。这揭示了机器学习如何促进归纳式科学发现,并为更精准的畸形波预报铺平道路。