Retrieval algorithms like BM25 and query likelihood with Dirichlet smoothing remain strong and efficient first-stage rankers, yet improvements have mostly relied on parameter tuning and human intuition. We investigate whether a large language model, guided by an evaluator and evolutionary search, can automatically discover improved lexical retrieval algorithms. We introduce RankEvolve, a program evolution setup based on AlphaEvolve, in which candidate ranking algorithms are represented as executable code and iteratively mutated, recombined, and selected based on retrieval performance across 12 IR datasets from BEIR and BRIGHT. RankEvolve starts from two seed programs: BM25 and query likelihood with Dirichlet smoothing. The evolved algorithms are novel, effective, and show promising transfer to the full BEIR and BRIGHT benchmarks as well as TREC DL 19 and 20. Our results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.
翻译:BM25和狄利克雷平滑查询似然等检索算法仍然是强大且高效的一阶段排序器,但其改进主要依赖于参数调整和人类直觉。本研究探讨了在评估器引导和进化搜索下,大型语言模型能否自动发现改进的词汇检索算法。我们提出了RankEvolve——一种基于AlphaEvolve的程序进化框架,其中候选排序算法以可执行代码形式表示,并依据在BEIR和BRIGHT的12个IR数据集上的检索性能进行迭代变异、重组和选择。RankEvolve从两个种子程序开始:BM25和狄利克雷平滑查询似然。进化得到的算法具有新颖性和高效性,并在完整BEIR/BRIGHT基准以及TREC DL 19/20上展现出良好的迁移性能。我们的结果表明,评估器引导的LLM程序进化是实现新型排序算法自动发现的有效路径。