Symbolic regression (SR) aims to discover the underlying mathematical expressions that explain observed data. This holds promise for both gaining scientific insight and for producing inherently interpretable and generalizable models for tabular data. In this work we focus on the basics of SR. Deep learning-based SR has recently become competitive with genetic programming approaches, but the role of scale has remained largely unexplored. Inspired by scaling laws in language modeling, we present the first systematic investigation of scaling in SR, using a scalable end-to-end transformer pipeline and carefully generated training data. Across five different model sizes and spanning three orders of magnitude in compute, we find that both validation loss and solved rate follow clear power-law trends with compute. We further identify compute-optimal hyperparameter scaling: optimal batch size and learning rate grow with model size, and a token-to-parameter ratio of $\approx$15 is optimal in our regime, with a slight upward trend as compute increases. These results demonstrate that SR performance is largely predictable from compute and offer important insights for training the next generation of SR models.
翻译:符号回归(SR)旨在发现解释观测数据的基础数学表达式。这既有望获得科学洞见,又能为表格数据生成本质可解释且可泛化的模型。在本工作中,我们聚焦于SR的基础问题。基于深度学习的SR方法近期已能与遗传编程方法竞争,但规模的作用在很大程度上仍未得到探索。受语言建模中缩放定律的启发,我们首次对SR中的缩放进行了系统研究,采用了一个可扩展的端到端Transformer流水线并精心生成了训练数据。在五种不同模型规模、跨越三个数量级的计算量范围内,我们发现验证损失和求解率均随计算量呈现清晰的幂律趋势。我们进一步确定了计算最优的超参数缩放规律:最优批大小和学习率随模型规模增长,且在我们的研究范围内,token与参数比约为15时最优,并随计算量增加呈现轻微上升趋势。这些结果表明SR性能在很大程度上可由计算量预测,并为训练下一代SR模型提供了重要见解。