LLM-based deep research agents are largely built on the ReAct framework. This linear design makes it difficult to revisit earlier states, branch into alternative search directions, or maintain global awareness under long contexts, often leading to local optima, redundant exploration, and inefficient search. We propose Re-TRAC, an agentic framework that performs cross-trajectory exploration by generating a structured state representation after each trajectory to summarize evidence, uncertainties, failures, and future plans, and conditioning subsequent trajectories on this state representation. This enables iterative reflection and globally informed planning, reframing research as a progressive process. Empirical results show that Re-TRAC consistently outperforms ReAct by 15-20% on BrowseComp with frontier LLMs. For smaller models, we introduce Re-TRAC-aware supervised fine-tuning, achieving state-of-the-art performance at comparable scales. Notably, Re-TRAC shows a monotonic reduction in tool calls and token usage across rounds, indicating progressively targeted exploration driven by cross-trajectory reflection rather than redundant search.
翻译:基于大型语言模型的深度研究智能体主要建立在ReAct框架之上。这种线性设计使得智能体难以回溯早期状态、分支至替代搜索方向或在长上下文下保持全局感知,常导致局部最优、冗余探索和低效搜索。我们提出Re-TRAC智能体框架,通过在每个轨迹后生成结构化状态表示来总结证据、不确定性、失败案例与未来计划,并基于该状态表示调节后续轨迹,从而实现跨轨迹探索。该框架支持迭代反思与全局知情规划,将研究重构为渐进式过程。实验结果表明,在BrowseComp基准测试中,Re-TRAC使用前沿大型语言模型时持续优于ReAct框架15-20%。针对较小模型,我们引入Re-TRAC感知监督微调方法,在同等规模下实现了最先进的性能。值得注意的是,Re-TRAC在多轮迭代中呈现出工具调用次数与令牌使用量的单调递减趋势,表明其探索过程由跨轨迹反思驱动而逐步聚焦,而非进行冗余搜索。