Automated scientific discovery aims to improve scientific understanding through machine learning. A central approach in this field is symbolic regression, which uses genetic programming or sparse regression to learn interpretable mathematical expressions to explain observed data. Conventionally, the focus of symbolic regression is on identifying ordinary differential equations. The general view is that noise only complicates the recovery of deterministic dynamics. However, explicitly learning a symbolic function of the noise component in stochastic differential equations enhances modelling capacity, increases knowledge gain and enables generative sampling. We introduce a method for symbolic discovery of stochastic differential equations based on genetic programming, jointly optimizing drift and diffusion functions via the maximum likelihood estimate. Our results demonstrate accurate recovery of governing equations, efficient scaling to higher-dimensional systems, robustness to sparsely sampled problems and generalization to stochastic partial differential equations. This work extends symbolic regression toward interpretable discovery of stochastic dynamical systems, contributing to the automation of science in a noisy and dynamic world.
翻译:自动化科学发现旨在通过机器学习增进科学理解。该领域的核心方法之一是符号回归,它利用遗传编程或稀疏回归来学习可解释的数学表达式以解释观测数据。传统上,符号回归的重点在于识别常微分方程。普遍观点认为噪声只会使确定性动力学的恢复复杂化。然而,显式学习随机微分方程中噪声分量的符号函数能够增强建模能力、增加知识获取并实现生成式采样。我们提出一种基于遗传编程的随机微分方程符号发现方法,通过最大似然估计联合优化漂移函数与扩散函数。我们的结果表明:该方法能够准确恢复控制方程,高效扩展至高维系统,对稀疏采样问题具有鲁棒性,并可推广至随机偏微分方程。这项工作将符号回归扩展到随机动力系统的可解释发现,为在充满噪声的动态世界中实现科学自动化做出贡献。