AlphaPROBE: Alpha Mining via Principled Retrieval and On-graph biased evolution

Extracting signals through alpha factor mining is a fundamental challenge in quantitative finance. Existing automated methods primarily follow two paradigms: Decoupled Factor Generation, which treats factor discovery as isolated events, and Iterative Factor Evolution, which focuses on local parent-child refinements. However, both paradigms lack a global structural view, often treating factor pools as unstructured collections or fragmented chains, which leads to redundant search and limited diversity. To address these limitations, we introduce AlphaPROBE (Alpha Mining via Principled Retrieval and On-graph Biased Evolution), a framework that reframes alpha mining as the strategic navigation of a Directed Acyclic Graph (DAG). By modeling factors as nodes and evolutionary links as edges, AlphaPROBE treats the factor pool as a dynamic, interconnected ecosystem. The framework consists of two core components: a Bayesian Factor Retriever that identifies high-potential seeds by balancing exploitation and exploration through a posterior probability model, and a DAG-aware Factor Generator that leverages the full ancestral trace of factors to produce context-aware, nonredundant optimizations. Extensive experiments on three major Chinese stock market datasets against 8 competitive baselines demonstrate that AlphaPROBE significantly gains enhanced performance in predictive accuracy, return stability and training efficiency. Our results confirm that leveraging global evolutionary topology is essential for efficient and robust automated alpha discovery. We have open-sourced our implementation at https://github.com/gta0804/AlphaPROBE.

翻译：通过Alpha因子挖掘提取信号是量化金融中的一个基本挑战。现有的自动化方法主要遵循两种范式：解耦因子生成，将因子发现视为孤立事件；以及迭代因子进化，专注于局部的父子关系改进。然而，这两种范式都缺乏全局结构视角，通常将因子池视为非结构化的集合或碎片化的链条，这导致了冗余搜索和有限的多样性。为了解决这些局限性，我们提出了AlphaPROBE（基于原则性检索与图上偏置进化的Alpha因子挖掘），该框架将Alpha因子挖掘重新定义为对有向无环图的战略性导航。通过将因子建模为节点、进化链接建模为边，AlphaPROBE将因子池视为一个动态、相互关联的生态系统。该框架包含两个核心组件：一个贝叶斯因子检索器，它通过后验概率模型平衡利用与探索，以识别高潜力的种子因子；以及一个DAG感知的因子生成器，它利用因子的完整祖先轨迹来生成上下文感知、非冗余的优化因子。在三个主要的中国股票市场数据集上，针对8个竞争性基线进行的广泛实验表明，AlphaPROBE在预测精度、收益稳定性和训练效率方面均获得了显著提升。我们的结果证实，利用全局进化拓扑对于高效、稳健的自动化Alpha发现至关重要。我们已在 https://github.com/gta0804/AlphaPROBE 开源了我们的实现。