AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

Formula alpha mining, which generates predictive signals from financial data, is critical for quantitative investment. Although various algorithmic approaches-such as genetic programming, reinforcement learning, and large language models-have significantly expanded the capacity for alpha discovery, systematic evaluation remains a key challenge. Existing evaluation metrics predominantly include backtesting and correlation-based measures. Backtesting is computationally intensive, inherently sequential, and sensitive to specific strategy parameters. Correlation-based metrics, though efficient, assess only predictive ability and overlook other crucial properties such as temporal stability, robustness, diversity, and interpretability. Additionally, the closed-source nature of most existing alpha mining models hinders reproducibility and slows progress in this field. To address these issues, we propose AlphaEval, a unified, parallelizable, and backtest-free evaluation framework for automated alpha mining models. AlphaEval assesses the overall quality of generated alphas along five complementary dimensions: predictive power, stability, robustness to market perturbations, financial logic, and diversity. Extensive experiments across representative alpha mining algorithms demonstrate that AlphaEval achieves evaluation consistency comparable to comprehensive backtesting, while providing more comprehensive insights and higher efficiency. Furthermore, AlphaEval effectively identifies superior alphas compared to traditional single-metric screening approaches. All implementations and evaluation tools are open-sourced to promote reproducibility and community engagement.

翻译：公式阿尔法挖掘通过从金融数据中生成预测信号，对量化投资至关重要。尽管遗传编程、强化学习与大语言模型等多种算法方法显著拓展了阿尔法发现的能力，但系统化评估仍是一项关键挑战。现有评估指标主要包括回测方法与基于相关性的度量。回测计算密集、本质上是顺序的，且对特定策略参数敏感；基于相关性的度量虽高效，但仅评估预测能力，忽视了时间稳定性、鲁棒性、多样性与可解释性等其他关键属性。此外，现有阿尔法挖掘模型大多闭源，阻碍了可复现性并减缓了该领域的进展。为解决这些问题，我们提出AlphaEval——一个统一、可并行化且无需回测的自动化阿尔法挖掘模型评估框架。AlphaEval从预测能力、稳定性、对市场扰动的鲁棒性、金融逻辑与多样性五个互补维度评估所生成阿尔法的整体质量。在代表性阿尔法挖掘算法上的广泛实验表明，AlphaEval能够达到与全面回测相当的评估一致性，同时提供更全面的洞察与更高效率。此外，相比传统的单指标筛选方法，AlphaEval能有效识别出更优的阿尔法。所有实现代码与评估工具均已开源，以促进可复现性与社区参与。