We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE. It combines simple atomic units with shared weights to recursively encode and decode the individual nodes in the hierarchy. Encoding is performed bottom-up and decoding top-down. We empirically show that HVAE can be trained efficiently with small corpora of mathematical expressions and can accurately encode expressions into a smooth low-dimensional latent space. The latter can be efficiently explored with various optimization methods to address the task of symbolic regression. Indeed, random search through the latent space of HVAE performs better than random search through expressions generated by manually crafted probabilistic grammars for mathematical expressions. Finally, EDHiE system for symbolic regression, which applies an evolutionary algorithm to the latent space of HVAE, reconstructs equations from a standard symbolic regression benchmark better than a state-of-the-art system based on a similar combination of deep learning and evolutionary algorithms.\v{z}
翻译:我们提出了一种基于新型变分自编码器HVAE的符号回归方法,该方法专为生成层次化结构而设计。它结合了具有共享权重的简单原子单元,以递归方式对层次结构中的各个节点进行编码和解码。编码过程采用自底向上方式,而解码则采用自顶向下方式。实证结果表明,HVAE能够在小型数学表达式语料库上高效训练,并将表达式准确编码至平滑的低维潜空间。该潜空间可通过多种优化方法进行高效探索,从而完成符号回归任务。事实上,在HVAE潜空间中进行随机搜索的性能优于在手工构建的概率文法生成的数学表达式中进行随机搜索。最终,符号回归系统EDHiE通过将进化算法应用于HVAE潜空间,从标准符号回归基准测试中重构方程的效果优于基于深度学习与进化算法类似组合的现有最优系统。