Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery

Automated alpha discovery is difficult because the search space of formulaic factors is combinatorial, the signal-to-noise ratio in daily equity data is low, and unconstrained program generation is operationally unsafe. We present Hubble, an agentic factor mining framework that combines large language models (LLMs) with a domain-specific operator language, an abstract syntax tree (AST) execution sandbox, a dual-channel retrieval-augmented generation (RAG) module, and a family-aware selection mechanism. Instead of treating the LLM as an unconstrained code generator, Hubble restricts generation to interpretable operator trees, evaluates every candidate through a deterministic cross-sectional pipeline, and feeds back both top formulas and structured family-level diagnostics to subsequent rounds. The current system additionally introduces positive/negative RAG, formula-similarity penalties, standardized multi-metric scoring, dual reporting of RankIC and Pearson IC, and persistent diagnostics artifacts for post-hoc research analysis. On a U.S. equity universe of roughly 500 stocks, our main run evaluates 104 valid candidates across three rounds with zero runtime crashes and discovers a top set dominated by range, volatility, and trend families rather than crowded volume-only motifs. We then fix the resulting top-5 factors and validate them on a held-out period from 2025-06-01 to 2026-03-13. In this out-of-sample window, the two range factors and two volatility factors remain positive and several achieve HAC-significant Pearson IC and long-short evidence, whereas the weakest in-sample trend factor decays materially. These results suggest that safe LLM-guided search can be upgraded from a syntax-compliant generator into a reproducible alpha-research workflow that jointly optimizes validity, diversity, interpretability, and family-level generalization.

翻译：摘要：自动化的阿尔法因子发现面临诸多困难：公式化因子的搜索空间具有组合爆炸特性，日频股票数据中的信噪比低下，且无约束的程序生成在操作上存在安全隐患。我们提出Hubble——一个智能体因子挖掘框架，它将大型语言模型（LLMs）与领域特定算子语言、抽象语法树（AST）执行沙箱、双通道检索增强生成（RAG）模块以及族感知选择机制相结合。不同于将LLM视为无约束代码生成器，Hubble将生成过程限制为可解释的算子树，通过确定性截面管道评估每个候选因子，并将最优公式及结构化的族级诊断反馈至后续轮次。当前系统还引入了正负反馈RAG、公式相似度惩罚、标准化多指标评分、RankIC与Pearson IC双重报告，以及用于事后研究分析的持久化诊断产物。在美国约500只股票组成的股票池中，主要实验三轮评估了104个有效候选因子，运行期间零运行时崩溃，并发现最优因子集以区间类、波动率类和趋势类因子为主，而非拥挤的仅依赖成交量模式。随后，我们固定得出的前5个因子，并在2025年6月1日至2026年3月13日的样本外区间进行验证。在该样本外窗口中，两个区间类因子和两个波动率类因子保持正值，且多项指标达到经HAC修正显著性的Pearson IC和多空证据，而样本内最弱的趋势类因子则明显衰减。这些结果表明，安全的LLM引导搜索可从语法合规生成器升级为可复现的阿尔法研究工作流，该工作流同时优化了有效性、多样性、可解释性及族级泛化能力。