When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

Query optimization is a crucial component for the efficacy of Retrieval-Augmented Generation (RAG) systems. While reinforcement learning (RL)-based agentic and reasoning methods have recently emerged as a promising direction on query optimization, most existing approaches focus on the expansion and abstraction of a single query. However, complex user queries are prevalent in real-world scenarios, often requiring multiple parallel and sequential search strategies to handle disambiguation and decomposition. Directly applying RL to these complex cases introduces significant hurdles. Determining the optimal number of sub-queries and effectively re-ranking and merging retrieved documents vastly expands the search space and complicates reward design, frequently leading to training instability. To address these challenges, we propose a novel RL framework called Adaptive Complex Query Optimization (ACQO). Our framework is designed to adaptively determine when and how to expand the search process. It features two core components: an Adaptive Query Reformulation (AQR) module that dynamically decides when to decompose a query into multiple sub-queries, and a Rank-Score Fusion (RSF) module that ensures robust result aggregation and provides stable reward signals for the learning agent. To mitigate training instabilities, we adopt a Curriculum Reinforcement Learning (CRL) approach, which stabilizes the training process by progressively introducing more challenging queries through a two-stage strategy. Our comprehensive experiments demonstrate that ACQO achieves state-of-the-art performance on three complex query benchmarks, significantly outperforming established baselines. The framework also showcases improved computational efficiency and broad compatibility with different retrieval architectures, establishing it as a powerful and generalizable solution for next-generation RAG systems.

翻译：查询优化是提升检索增强生成（RAG）系统效能的关键组成部分。尽管基于强化学习（RL）的智能体推理方法近期已成为查询优化的一个前景广阔的方向，但现有方法大多聚焦于单一查询的扩展与抽象。然而，在实际应用场景中，复杂的用户查询普遍存在，通常需要采用多路并行与顺序检索策略来处理歧义消解与查询分解。直接将RL应用于此类复杂情形会带来显著挑战：确定最优子查询数量、有效重排序与合并检索文档，这些任务极大地扩展了搜索空间并增加了奖励函数设计的复杂度，常导致训练过程不稳定。为应对这些挑战，我们提出了一种名为自适应复杂查询优化（ACQO）的新型RL框架。该框架旨在自适应地决定何时以及如何扩展搜索过程，其核心包含两个模块：自适应查询重构（AQR）模块动态决策何时将查询分解为多个子查询；排序-分数融合（RSF）模块确保检索结果的稳健聚合，并为学习智能体提供稳定的奖励信号。为缓解训练不稳定性，我们采用课程强化学习（CRL）方法，通过两阶段策略逐步引入更具挑战性的查询，从而稳定训练过程。综合实验表明，ACQO在三个复杂查询基准测试中取得了最先进的性能，显著优于现有基线方法。该框架还展现出更高的计算效率以及对不同检索架构的广泛兼容性，使其成为面向新一代RAG系统的强大且可推广的解决方案。