In the field of equation learning, exhaustively considering all possible equations derived from a basis function dictionary is infeasible. Sparse regression and greedy algorithms have emerged as popular approaches to tackle this challenge. However, the presence of multicollinearity poses difficulties for sparse regression techniques, and greedy steps may inadvertently exclude terms of the true equation, leading to reduced identification accuracy. In this article, we present an approach that strikes a balance between comprehensiveness and efficiency in equation learning. Inspired by stepwise regression, our approach combines the coefficient of determination, $R^2$, and the Bayesian model evidence, $p(\boldsymbol y|\mathcal M)$, in a novel way. Our procedure is characterized by a comprehensive search with just a minor reduction of the model space at each iteration step. With two flavors of our approach and the adoption of $p(\boldsymbol y|\mathcal M)$ for bi-directional stepwise regression, we present a total of three new avenues for equation learning. Through three extensive numerical experiments involving random polynomials and dynamical systems, we compare our approach against four state-of-the-art methods and two standard approaches. The results demonstrate that our comprehensive search approach surpasses all other methods in terms of identification accuracy. In particular, the second flavor of our approach establishes an efficient overfitting penalty solely based on $R^2$, which achieves highest rates of exact equation recovery.
翻译:在方程学习领域,穷尽考虑基于基函数字典的所有可能方程是不可行的。稀疏回归与贪心算法已成为应对这一挑战的流行方法。然而,多重共线性给稀疏回归技术带来困难,贪心步骤可能无意中排除真实方程的项,导致识别精度下降。本文提出一种在方程学习中兼顾全面性与效率的方法。受逐步回归启发,该方法以新颖方式结合决定系数$R^2$与贝叶斯模型证据$p(\boldsymbol y|\mathcal M)$。其特点是在每次迭代中仅对模型空间进行小幅缩减,实现全面搜索。通过两种变体方法及采用$p(\boldsymbol y|\mathcal M)$进行双向逐步回归,我们共提出三种方程学习新路径。基于涉及随机多项式与动力系统的三项广泛数值实验,我们将所提方法与四种最新方法及两种标准方法进行对比。结果表明,我们的全面搜索方法在识别精度上超越所有其他方法。特别地,第二种变体方法仅基于$R^2$建立高效的过拟合惩罚项,实现了最高的精确方程恢复率。