Many real-world systems can be described by mathematical formulas that are human-comprehensible, easy to analyze and can be helpful in explaining the system's behaviour. Symbolic regression is a method that generates nonlinear models from data in the form of analytic expressions. Historically, symbolic regression has been predominantly realized using genetic programming, a method that iteratively evolves a population of candidate solutions that are sampled by genetic operators crossover and mutation. This gradient-free evolutionary approach suffers from several deficiencies: it does not scale well with the number of variables and samples in the training data, models tend to grow in size and complexity without an adequate accuracy gain, and it is hard to fine-tune the inner model coefficients using just genetic operators. Recently, neural networks have been applied to learn the whole analytic formula, i.e., its structure as well as the coefficients, by means of gradient-based optimization algorithms. We propose a novel neural network-based symbolic regression method that constructs physically plausible models based on limited training data and prior knowledge about the system. The method employs an adaptive weighting scheme to effectively deal with multiple loss function terms and an epoch-wise learning process to reduce the chance of getting stuck in poor local optima. Furthermore, we propose a parameter-free method for choosing the model with the best interpolation and extrapolation performance out of all models generated through the whole learning process. We experimentally evaluate the approach on the TurtleBot 2 mobile robot, the magnetic manipulation system, the equivalent resistance of two resistors in parallel, and the anti-lock braking system. The results clearly show the potential of the method to find sparse and accurate models that comply with the prior knowledge provided.
翻译:许多实际系统可以由人类可理解的数学公式描述,这些公式易于分析并有助于解释系统的行为。符号回归是一种从数据中生成非线性模型的方法,其输出形式为解析表达式。传统上,符号回归主要通过遗传编程实现,该方法迭代地进化候选解群体,并通过遗传算子(交叉和变异)进行采样。这种无梯度进化方法存在若干缺陷:其在训练数据中的变量和样本数量扩展性不佳;模型容易在缺乏足够精度提升的情况下体积和复杂度增长;且难以仅通过遗传算子对内部模型系数进行精细调整。近年来,神经网络已被用于通过梯度优化算法学习完整的解析公式,包括其结构及系数。本文提出一种基于神经网络的新型符号回归方法,该方法能够基于有限的训练数据及对系统的先验知识构建物理上合理的模型。该方法采用自适应加权方案有效处理多个损失函数项,并引入逐轮学习过程以减少陷入不良局部最优的概率。此外,我们提出一种无参数方法,可从整个学习过程中生成的所有模型中选出插值和外推性能最优的模型。我们在TurtleBot 2移动机器人、磁操控系统、两个电阻并联的等效电阻以及防抱死制动系统上对该方法进行了实验评估。结果清晰表明,该方法具有发现既稀疏又准确、且符合所提供先验知识的模型的潜力。