We develop faultless, fixed-depth, string-based, prefix and postfix symbolic regression grammars, capable of producing \emph{any} expression from a set of operands, unary operators and/or binary operators. Using these grammars, we outline simplified forms of 5 popular heuristic search strategies: Brute Force Search, Monte Carlo Tree Search, Particle Swarm Optimization, Genetic Programming, and Simulated Annealing. For each algorithm, we compare the relative performance of prefix vs postfix for ten ground-truth expressions implemented entirely within a common C++/Eigen framework. Our experiments show a comparatively strong correlation between the average number of nodes per layer of the ground truth expression tree and the relative performance of prefix vs postfix. The fixed-depth grammars developed herein can enhance scientific discovery by increasing the efficiency of symbolic regression, enabling faster identification of accurate mathematical models across various disciplines.
翻译:我们开发了无缺陷、固定深度、基于字符串的前缀与后缀符号回归文法,能够从一组操作数、一元运算符和/或二元运算符中生成任意表达式。利用这些文法,我们概述了五种常用启发式搜索策略的简化形式:暴力搜索、蒙特卡洛树搜索、粒子群优化、遗传编程和模拟退火。针对每种算法,我们在统一的C++/Eigen框架内实现了十个基准表达式,并比较了前缀与后缀表示法的相对性能。实验结果表明,基准表达式树每层平均节点数与前后缀表示法的相对性能之间存在较强的相关性。本文提出的固定深度文法可通过提升符号回归效率来增强科学发现能力,从而加快各学科领域精确数学模型的识别速度。