PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.
翻译:PySR是一个用于实用符号回归的开源库,符号回归是一种旨在发现人类可解释符号模型的机器学习方法。PySR的开发旨在促进和普及符号回归在科学领域的应用,其构建于高性能分布式后端、灵活的搜索算法之上,并与多个深度学习包集成。PySR的内部搜索算法采用多群体进化算法,包含独特的“进化-简化-优化”循环,专门用于优化新发现经验表达式中的未知标量常数。PySR的后端是经过极致优化的Julia库SymbolicRegression.jl,可直接在Julia环境中使用。该库能够将用户自定义算子运行时融合为SIMD内核,支持自动微分,并能将表达式群体分布到集群中数千个核心上。在介绍该软件的同时,我们引入了一个新基准"EmpiricalBench",用于量化符号回归算法在科学领域的适用性。该基准通过原始数据集和合成数据集,评估对历史经验方程的重现能力。