Symbolic regression (SR) is an area of interpretable machine learning that aims to identify mathematical expressions, often composed of simple functions, that best fit in a given set of covariates $X$ and response $y$. In recent years, deep symbolic regression (DSR) has emerged as a popular method in the field by leveraging deep reinforcement learning to solve the complicated combinatorial search problem. In this work, we propose an alternative framework (GFN-SR) to approach SR with deep learning. We model the construction of an expression tree as traversing through a directed acyclic graph (DAG) so that GFlowNet can learn a stochastic policy to generate such trees sequentially. Enhanced with an adaptive reward baseline, our method is capable of generating a diverse set of best-fitting expressions. Notably, we observe that GFN-SR outperforms other SR algorithms in noisy data regimes, owing to its ability to learn a distribution of rewards over a space of candidate solutions.
翻译:符号回归是可解释机器学习的一个领域,旨在识别通常由简单函数组成的数学表达式,使其最佳拟合给定的协变量$X$和响应$y$。近年来,深度符号回归通过利用深度强化学习解决复杂的组合搜索问题,已成为该领域的一种流行方法。在本研究中,我们提出了一种替代框架(GFN-SR),使用深度学习方法进行符号回归。我们将表达式树的构建建模为在有向无环图上的遍历过程,从而使GFlowNet能够学习一种随机策略来顺序生成此类树。通过自适应奖励基线增强,我们的方法能够生成一组多样化的最佳拟合表达式。值得注意的是,我们观察到GFN-SR在噪声数据环境下优于其他符号回归算法,这得益于其能够在候选解空间上学习奖励分布的能力。