We propose an approach to symbolic regression based on a novel variational autoencoder for generating hierarchical structures, HVAE. It combines simple atomic units with shared weights to recursively encode and decode the individual nodes in the hierarchy. Encoding is performed bottom-up and decoding top-down. We empirically show that HVAE can be trained efficiently with small corpora of mathematical expressions and can accurately encode expressions into a smooth low-dimensional latent space. The latter can be efficiently explored with various optimization methods to address the task of symbolic regression. Indeed, random search through the latent space of HVAE performs better than random search through expressions generated by manually crafted probabilistic grammars for mathematical expressions. Finally, EDHiE system for symbolic regression, which applies an evolutionary algorithm to the latent space of HVAE, reconstructs equations from a standard symbolic regression benchmark better than a state-of-the-art system based on a similar combination of deep learning and evolutionary algorithms.\v{z}
翻译:我们提出了一种基于新型变分自编码器(HVAE)的符号回归方法,用于生成层次化结构。该方法将简单的原子单元与共享权重相结合,以递归方式编码和解码层级中的各个节点。编码采用自底向上方式,解码则采用自顶向下方式。实验表明,HVAE能够利用小型数学表达式语料库进行高效训练,并可将表达式准确编码至平滑的低维潜空间。该潜空间可通过多种优化方法高效探索,以解决符号回归任务。实际上,在HVAE潜空间中进行随机搜索的效果,优于在人工设计的概率文法生成的数学表达式中进行随机搜索。最后,符号回归系统EDHiE将进化算法应用于HVAE的潜空间,在标准符号回归基准测试中重建方程的效果,优于基于深度学习与进化算法类似组合的最新系统。