The process of discovering equations from data lies at the heart of physics and in many other areas of research, including mathematical ecology and epidemiology. Recently, machine learning methods known as symbolic regression emerged as a way to automate this task. This study presents an overview of the current literature on symbolic regression, while also comparing the efficiency of five state-of-the-art methods in recovering the governing equations from nine processes, including chaotic dynamics and epidemic models. Benchmark results demonstrate the PySR method as the most suitable for inferring equations, with some estimates being indistinguishable from the original analytical forms. These results highlight the potential of symbolic regression as a robust tool for inferring and modeling real-world phenomena.
翻译:从数据中发现方程的过程是物理学以及数学生态学、流行病学等众多研究领域的核心问题。近年来,被称为符号回归的机器学习方法为实现这一任务的自动化提供了途径。本研究综述了当前关于符号回归的文献,同时比较了五种前沿方法从九个过程(包括混沌动力学和流行病模型)中恢复控制方程的效能。基准测试结果表明,PySR方法最适合用于方程推断,其部分估计结果与原始解析形式难以区分。这些发现凸显了符号回归作为推断和建模现实世界现象的强有力工具的潜力。