Existing symbolic regression methods organize the space of candidate mathematical expressions primarily based on their syntactic, structural similarity. However, this approach overlooks crucial equivalences between expressions that arise from mathematical symmetries, such as commutativity, associativity, and distribution laws for arithmetic operations. Consequently, expressions with similar errors on a given data set are apart from each other in the search space. This leads to a rough error landscape in the search space that efficient local, gradient-based methods cannot explore. This paper proposes and implements a measure of a behavioral distance, BED, that clusters together expressions with similar errors. The experimental results show that the stochastic method for calculating BED achieves consistency with a modest number of sampled values for evaluating the expressions. This leads to computational efficiency comparable to the tree-based syntactic distance. Our findings also reveal that BED significantly improves the smoothness of the error landscape in the search space for symbolic regression.
翻译:现有的符号回归方法主要依据候选数学表达式的句法结构相似性来组织其搜索空间。然而,这种方法忽略了由数学对称性(如算术运算的交换律、结合律和分配律)所产生的重要表达式等价性。因此,在给定数据集上具有相似误差的表达式在搜索空间中彼此远离。这导致搜索空间中的误差地形变得粗糙,使得高效的局部梯度方法难以有效探索。本文提出并实现了一种行为距离度量方法BED,该方法能够将具有相似误差的表达式聚类在一起。实验结果表明,用于计算BED的随机方法在评估表达式时,仅需适度数量的采样值即可达到一致性。这使得其计算效率可与基于树的句法距离相媲美。我们的研究还发现,BED能显著改善符号回归搜索空间中误差地形的平滑性。