Symbolic regression is a powerful tool for discovering governing equations directly from data, but its sensitivity to noise hinders its broader application. This paper introduces a Sequential Monte Carlo (SMC) framework for Bayesian symbolic regression that approximates the posterior distribution over symbolic expressions, enhancing robustness and enabling uncertainty quantification for symbolic regression in the presence of noise. Differing from traditional genetic programming approaches, the SMC-based algorithm combines probabilistic selection, adaptive tempering, and the use of normalized marginal likelihood to efficiently explore the search space of symbolic expressions, yielding parsimonious expressions with improved generalization. When compared to standard genetic programming baselines, the proposed method better deals with challenging, noisy benchmark datasets. The reduced tendency to overfit and enhanced ability to discover accurate and interpretable equations paves the way for more robust symbolic regression in scientific discovery and engineering design applications.
翻译:符号回归是一种直接从数据中发现控制方程的有力工具,但其对噪声的敏感性限制了其更广泛的应用。本文提出了一种用于贝叶斯符号回归的序贯蒙特卡洛(SMC)框架,该框架近似符号表达式的后验分布,增强了鲁棒性,并在存在噪声的情况下实现了符号回归的不确定性量化。与传统的遗传编程方法不同,基于SMC的算法结合了概率选择、自适应退火和归一化边际似然的使用,以高效探索符号表达式的搜索空间,产生具有更好泛化能力的简约表达式。与标准遗传编程基线相比,所提方法能更好地处理具有挑战性的噪声基准数据集。其降低过拟合趋势的能力以及发现准确且可解释方程的增强能力,为科学发现和工程设计应用中更稳健的符号回归铺平了道路。