$\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning

Alphas are pivotal in providing signals for quantitative trading. The industry highly values the discovery of formulaic alphas for their interpretability and ease of analysis, compared with the expressive yet overfitting-prone black-box alphas. In this work, we focus on discovering formulaic alphas. Prior studies on automatically generating a collection of formulaic alphas were mostly based on genetic programming (GP), which is known to suffer from the problems of being sensitive to the initial population, converting to local optima, and slow computation speed. Recent efforts employing deep reinforcement learning (DRL) for alpha discovery have not fully addressed key practical considerations such as alpha correlations and validity, which are crucial for their effectiveness. In this work, we propose a novel framework for alpha discovery using DRL by formulating the alpha discovery process as program construction. Our agent, $\text{Alpha}^2$, assembles an alpha program optimized for an evaluation metric. A search algorithm guided by DRL navigates through the search space based on value estimates for potential alpha outcomes. The evaluation metric encourages both the performance and the diversity of alphas for a better final trading strategy. Our formulation of searching alphas also brings the advantage of pre-calculation dimensional analysis, ensuring the logical soundness of alphas, and pruning the vast search space to a large extent. Empirical experiments on real-world stock markets demonstrates $\text{Alpha}^2$'s capability to identify a diverse set of logical and effective alphas, which significantly improves the performance of the final trading strategy. The code of our method is available at https://github.com/x35f/alpha2.

翻译：阿尔法因子在提供量化交易信号方面至关重要。与表达能力虽强但易过拟合的黑盒阿尔法因子相比，公式化阿尔法因子因其可解释性和易于分析的特点而受到业界高度重视。本工作聚焦于发现公式化阿尔法因子。先前关于自动生成公式化阿尔法因子集合的研究大多基于遗传编程（GP），而GP方法存在对初始种群敏感、易陷入局部最优以及计算速度慢等问题。近期利用深度强化学习（DRL）进行阿尔法因子发现的研究未能充分考虑阿尔法因子间的相关性与有效性等关键实际考量，而这些因素对其实际效果至关重要。本工作中，我们提出了一种利用DRL进行阿尔法因子发现的新框架，将阿尔法发现过程形式化为程序构建问题。我们的智能体$\text{Alpha}^2$能够组装一个针对特定评估指标优化的阿尔法程序。一个由DRL引导的搜索算法基于对潜在阿尔法结果的价值估计，在搜索空间中导航。该评估指标同时鼓励阿尔法因子的性能与多样性，以期获得更优的最终交易策略。我们对阿尔法搜索问题的形式化还带来了预计算量纲分析的优势，确保了阿尔法因子的逻辑合理性，并在很大程度上对庞大的搜索空间进行了剪枝。在真实股票市场上的实证实验表明，$\text{Alpha}^2$能够识别出一组多样化的、逻辑合理且有效的阿尔法因子，从而显著提升了最终交易策略的表现。我们方法的代码可在 https://github.com/x35f/alpha2 获取。