Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.
翻译:符号回归旨在搜索能够精确描述所研究现象的解析表达式。该方法的主要吸引力在于其能够返回可解释的模型,从而为用户提供深刻洞察。历史上,大多数符号回归算法基于进化算法。然而,近期涌现出一批新方法,转而采用枚举算法、混合线性整数规划、神经网络以及贝叶斯优化等技术。为了评估这些新方法在现实世界数据常见挑战中的表现,我们在2022年遗传与进化计算大会上举办了一场竞赛,竞赛包含多个对参赛者保密的合成数据集和真实世界数据集。在真实世界赛道上,我们通过邀请领域专家评判候选模型的可信度,以实际方式评估了可解释性。本文对竞赛结果进行了深入分析,探讨了符号回归算法当前面临的挑战,并强调了未来竞赛可能的改进方向。