Symbolic regression (SR) is the problem of learning a symbolic expression from numerical data. Recently, deep neural models trained on procedurally-generated synthetic datasets showed competitive performance compared to more classical Genetic Programming (GP) algorithms. Unlike their GP counterparts, these neural approaches are trained to generate expressions from datasets given as context. This allows them to produce accurate expressions in a single forward pass at test time. However, they usually do not benefit from search abilities, which result in low performance compared to GP on out-of-distribution datasets. In this paper, we propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure using a context-aware neural mutation model, which is initially pre-trained to learn promising mutations, and further refined from successful experiences in an online fashion. The approach demonstrates state-of-the-art performance on the well-known \texttt{SRBench} benchmark.
翻译:符号回归(SR)是从数值数据中学习符号表达式的问题。近年来,基于程序化生成合成数据集训练的深度神经网络模型,在性能上与更经典的遗传编程(GP)算法相比展现出竞争优势。与GP方法不同,这些神经方法被训练为以给定数据集为上下文直接生成表达式,从而在测试时通过单次前向传播即可生成精确表达式。然而,它们通常缺乏搜索能力,导致在分布外数据集上的性能低于GP。本文提出了一种融合两者优势的新方法,该方法基于蒙特卡洛树搜索过程,采用上下文感知的神经变异模型——该模型初始时通过预训练学习有前景的变异策略,并进一步通过在线方式从成功经验中优化。该方法在著名的\texttt{SRBench}基准测试中展现出最先进的性能。