Many real-world systems can be described by mathematical formulas that are human-comprehensible, easy to analyze and can be helpful in explaining the system's behaviour. Symbolic regression is a method that generates nonlinear models from data in the form of analytic expressions. Historically, symbolic regression has been predominantly realized using genetic programming, a method that iteratively evolves a population of candidate solutions that are sampled by genetic operators crossover and mutation. This gradient-free evolutionary approach suffers from several deficiencies: it does not scale well with the number of variables and samples in the training data, models tend to grow in size and complexity without an adequate accuracy gain, and it is hard to fine-tune the inner model coefficients using just genetic operators. Recently, neural networks have been applied to learn the whole analytic formula, i.e., its structure as well as the coefficients, by means of gradient-based optimization algorithms. We propose a novel neural network-based symbolic regression method that constructs physically plausible models based on limited training data and prior knowledge about the system. The method employs an adaptive weighting scheme to effectively deal with multiple loss function terms and an epoch-wise learning process to reduce the chance of getting stuck in poor local optima. Furthermore, we propose a parameter-free method for choosing the model with the best interpolation and extrapolation performance out of all models generated through the whole learning process. We experimentally evaluate the approach on the TurtleBot 2 mobile robot, the magnetic manipulation system, the equivalent resistance of two resistors in parallel, and the anti-lock braking system. The results clearly show the potential of the method to find sparse and accurate models that comply with the prior knowledge provided.
翻译:许多现实世界系统可以用人类可理解、易于分析且有助于解释系统行为的数学公式来描述。符号回归是一种从数据中生成解析表达式形式的非线性模型的方法。历史上,符号回归主要通过遗传编程实现,该方法通过遗传算子(交叉和变异)对候选解群体进行迭代演化采样。这种无梯度进化方法存在若干缺陷:难以随训练数据中变量和样本数量扩展规模,模型往往在缺乏足够精度增益的情况下增大尺寸和复杂性,且仅通过遗传算子难以精细调整内部模型系数。近年来,神经网络已被应用于通过基于梯度的优化算法学习完整的解析公式(即其结构与系数)。我们提出了一种基于神经网络的新型符号回归方法,基于有限的训练数据和关于系统的先验知识构建物理上合理的模型。该方法采用自适应加权方案有效处理多个损失函数项,并通过逐轮学习过程降低陷入不良局部最优的风险。此外,我们提出了一种无参数方法,从整个学习过程中生成的所有模型中选择具有最佳插值与外推性能的模型。我们在TurtleBot 2移动机器人、磁操控系统、两个并联电阻的等效电阻以及防抱死制动系统上对该方法进行了实验评估。结果明确显示了该方法在符合所提供先验知识的前提下寻找稀疏且精确模型的潜力。