Symbolic regression discovers mathematical formulas from data. Some methods fix a tree of operators, assign learnable weights, and train by gradient descent. The tree's structure, which determines what operators and variables appear at each position, is chosen once and applied to every target. This paper tests whether that choice affects which targets are actually recovered. Three structures are compared, all sharing the same operator and target language but differing in how variables enter the tree; one is strictly more expressive. Across over 12,700 training runs, one structure recovers a target at 100% while another scores 0%, and the ranking reverses on a different target. Expressiveness guarantees that a solution exists in the search space, but not that gradient descent finds it: the most expressive structure fails on targets that a restricted alternative solves reliably. Switching the operator changes which targets succeed; reversing its gradient profile collapses recovery entirely. Balanced (non-chain) tree shapes are never recovered. These findings show that the optimization landscape, not expressiveness alone, determines what gradient-based symbolic regression recovers.
翻译:符号回归从数据中发现数学公式。某些方法固定一个算子树结构,为节点分配可学习权重,并通过梯度下降进行训练。树的结构决定了每个位置出现何种算子与变量,该结构只被选择一次并应用于所有目标函数。本文检验这一选择是否会影响实际可恢复的目标函数。研究比较了三种结构,它们共享相同的算子集和目标语言,但变量进入树的方式不同——其中一种结构严格更具表达能力。在超过12,700次训练运行中,一种结构能以100%恢复目标,而另一种的恢复率为0%,且针对不同目标时排名会反转。表达能力保证解存在于搜索空间中,但无法保证梯度下降能找到它:最具表达能力的结构反而在受限替代方案可稳定恢复的目标上失败。更换算子集改变成功恢复的目标类型;反转其梯度分布则完全破坏恢复能力。平衡型(非链式)树形结构从未被恢复。这些发现表明,决定基于梯度的符号回归恢复效果的并非表达能力本身,而是优化景观。