Modeling real-world systems requires accounting for noise - whether it arises from unpredictable fluctuations in financial markets, irregular rhythms in biological systems, or environmental variability in ecosystems. While the behavior of such systems can often be described by stochastic differential equations, a central challenge is understanding how noise influences the inference of system parameters and dynamics from data. Traditional symbolic regression methods can uncover governing equations but typically ignore uncertainty. Conversely, Gaussian processes provide principled uncertainty quantification but offer little insight into the underlying dynamics. In this work, we bridge this gap with a hybrid symbolic regression-probabilistic machine learning framework that recovers the symbolic form of the governing equations while simultaneously inferring uncertainty in the system parameters. The framework combines deep symbolic regression with Gaussian process-based maximum likelihood estimation to separately model the deterministic dynamics and the noise structure, without requiring prior assumptions about their functional forms. We verify the approach on numerical benchmarks, including harmonic, Duffing, and van der Pol oscillators, and validate it on an experimental system of coupled biological oscillators exhibiting synchronization, where the algorithm successfully identifies both the symbolic and stochastic components. The framework is data-efficient, requiring as few as 100-1000 data points, and robust to noise - demonstrating its broad potential in domains where uncertainty is intrinsic and both the structure and variability of dynamical systems must be understood.
翻译:对真实世界系统进行建模需要考虑噪声——无论该噪声源自金融市场的不可预测波动、生物系统的不规则节律,还是生态系统中的环境变异性。尽管此类系统的行为通常可用随机微分方程描述,但核心挑战在于理解噪声如何影响从数据中推断系统参数和动力学过程。传统符号回归方法虽能揭示控制方程,但通常忽略不确定性;而高斯过程虽能提供严谨的不确定性量化,却对潜在动力学过程揭示不足。本研究通过构建混合符号回归-概率机器学习框架弥合了这一鸿沟,该框架在推断系统参数不确定性的同时,可恢复控制方程的符号形式。该框架将深度符号回归与基于高斯过程的最大似然估计相结合,无需对确定性动力学和噪声结构的函数形式进行先验假设,即可对二者分别建模。我们在包括谐波、杜芬和范德波尔振荡器在内的数值基准上验证了该方法,并在呈现同步行为的耦合生物振荡器实验系统中进行了验证,算法成功识别了符号分量和随机分量。该框架具有数据高效性(仅需100-1000个数据点)和噪声鲁棒性,在不确定性固有存在且必须同时理解动力系统结构与变异性的领域中展现出广泛应用潜力。