Data increasingly abounds, but distilling their underlying relationships down to something interpretable remains challenging. One approach is genetic programming, which `symbolically regresses' a data set down into an equation. However, symbolic regression (SR) faces the issue of requiring training from scratch for each new dataset. To generalize across all datasets, deep learning techniques have been applied to SR. These networks, however, are only able to be trained using a symbolic objective: NN-generated and target equations are symbolically compared. But this does not consider the predictive power of these equations, which could be measured by a behavioral objective that compares the generated equation's predictions to actual data. Here we introduce a method that combines gradient descent and evolutionary computation to yield neural networks that minimize the symbolic and behavioral errors of the equations they generate from data. As a result, these evolved networks are shown to generate more symbolically and behaviorally accurate equations than those generated by networks trained by state-of-the-art gradient based neural symbolic regression methods. We hope this method suggests that evolutionary algorithms, combined with gradient descent, can improve SR results by yielding equations with more accurate form and function.
翻译:数据日益丰富,但将其底层关系提炼为可解释的形式仍具挑战性。遗传编程是一种将数据集“符号回归”为方程的方法。然而,符号回归面临需为每个新数据集从头训练的难题。为实现跨数据集的泛化,深度学习技术已被应用于符号回归。但此类网络仅能通过符号目标进行训练:即对神经网络生成的方程与目标方程进行符号比较。这种方法未考虑方程在数据预测方面的能力,而后者可通过行为目标(比较生成方程的预测值与实际数据)来衡量。本文提出一种结合梯度下降与进化计算的方法,使神经网络能够最小化其从数据生成的方程在符号与行为两方面的误差。实验表明,相较于基于梯度下降的先进神经符号回归方法所训练的网络,经此方法演化的网络能生成符号与行为精度更高的方程。我们希望该方法能证明:进化算法与梯度下降相结合,可通过生成形态更准确、功能更完善的方程来改进符号回归的结果。