Neural Networks for Symbolic Regression

Many real-world systems can be described by mathematical formulas that are human-comprehensible, easy to analyze and can be helpful in explaining the system's behaviour. Symbolic regression is a method that generates nonlinear models from data in the form of analytic expressions. Historically, symbolic regression has been predominantly realized using genetic programming, a method that iteratively evolves a population of candidate solutions that are sampled by genetic operators crossover and mutation. This gradient-free evolutionary approach suffers from several deficiencies: it does not scale well with the number of variables and samples in the training data, models tend to grow in size and complexity without an adequate accuracy gain, and it is hard to fine-tune the inner model coefficients using just genetic operators. Recently, neural networks have been applied to learn the whole analytic formula, i.e., its structure as well as the coefficients, by means of gradient-based optimization algorithms. We propose a novel neural network-based symbolic regression method that constructs physically plausible models based on limited training data and prior knowledge about the system. The method employs an adaptive weighting scheme to effectively deal with multiple loss function terms and an epoch-wise learning process to reduce the chance of getting stuck in poor local optima. Furthermore, we propose a parameter-free method for choosing the model with the best interpolation and extrapolation performance out of all models generated through the whole learning process. We experimentally evaluate the approach on the TurtleBot 2 mobile robot, the magnetic manipulation system, the equivalent resistance of two resistors in parallel, and the anti-lock braking system. The results clearly show the potential of the method to find sparse and accurate models that comply with the prior knowledge provided.

翻译：许多现实世界系统可以用人类可理解、易于分析且有助于解释系统行为的数学公式来描述。符号回归是一种从数据中生成解析表达式形式的非线性模型的方法。历史上，符号回归主要通过遗传编程实现，该方法通过遗传算子（交叉和变异）对候选解群体进行迭代演化采样。这种无梯度进化方法存在若干缺陷：难以随训练数据中变量和样本数量扩展规模，模型往往在缺乏足够精度增益的情况下增大尺寸和复杂性，且仅通过遗传算子难以精细调整内部模型系数。近年来，神经网络已被应用于通过基于梯度的优化算法学习完整的解析公式（即其结构与系数）。我们提出了一种基于神经网络的新型符号回归方法，基于有限的训练数据和关于系统的先验知识构建物理上合理的模型。该方法采用自适应加权方案有效处理多个损失函数项，并通过逐轮学习过程降低陷入不良局部最优的风险。此外，我们提出了一种无参数方法，从整个学习过程中生成的所有模型中选择具有最佳插值与外推性能的模型。我们在TurtleBot 2移动机器人、磁操控系统、两个并联电阻的等效电阻以及防抱死制动系统上对该方法进行了实验评估。结果明确显示了该方法在符合所提供先验知识的前提下寻找稀疏且精确模型的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日