Neural Networks for Symbolic Regression

Many real-world systems can be described by mathematical formulas that are human-comprehensible, easy to analyze and can be helpful in explaining the system's behaviour. Symbolic regression is a method that generates nonlinear models from data in the form of analytic expressions. Historically, symbolic regression has been predominantly realized using genetic programming, a method that iteratively evolves a population of candidate solutions that are sampled by genetic operators crossover and mutation. This gradient-free evolutionary approach suffers from several deficiencies: it does not scale well with the number of variables and samples in the training data, models tend to grow in size and complexity without an adequate accuracy gain, and it is hard to fine-tune the inner model coefficients using just genetic operators. Recently, neural networks have been applied to learn the whole analytic formula, i.e., its structure as well as the coefficients, by means of gradient-based optimization algorithms. We propose a novel neural network-based symbolic regression method that constructs physically plausible models based on limited training data and prior knowledge about the system. The method employs an adaptive weighting scheme to effectively deal with multiple loss function terms and an epoch-wise learning process to reduce the chance of getting stuck in poor local optima. Furthermore, we propose a parameter-free method for choosing the model with the best interpolation and extrapolation performance out of all models generated through the whole learning process. We experimentally evaluate the approach on the TurtleBot 2 mobile robot, the magnetic manipulation system, the equivalent resistance of two resistors in parallel, and the anti-lock braking system. The results clearly show the potential of the method to find sparse and accurate models that comply with the prior knowledge provided.

翻译：许多实际系统可以由人类可理解的数学公式描述，这些公式易于分析并有助于解释系统的行为。符号回归是一种从数据中生成非线性模型的方法，其输出形式为解析表达式。传统上，符号回归主要通过遗传编程实现，该方法迭代地进化候选解群体，并通过遗传算子（交叉和变异）进行采样。这种无梯度进化方法存在若干缺陷：其在训练数据中的变量和样本数量扩展性不佳；模型容易在缺乏足够精度提升的情况下体积和复杂度增长；且难以仅通过遗传算子对内部模型系数进行精细调整。近年来，神经网络已被用于通过梯度优化算法学习完整的解析公式，包括其结构及系数。本文提出一种基于神经网络的新型符号回归方法，该方法能够基于有限的训练数据及对系统的先验知识构建物理上合理的模型。该方法采用自适应加权方案有效处理多个损失函数项，并引入逐轮学习过程以减少陷入不良局部最优的概率。此外，我们提出一种无参数方法，可从整个学习过程中生成的所有模型中选出插值和外推性能最优的模型。我们在TurtleBot 2移动机器人、磁操控系统、两个电阻并联的等效电阻以及防抱死制动系统上对该方法进行了实验评估。结果清晰表明，该方法具有发现既稀疏又准确、且符合所提供先验知识的模型的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《AI中毒攻击》34页slides

专知会员服务

26+阅读 · 2022年10月17日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

73+阅读 · 2022年7月11日

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日