Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.
翻译:仓库级软件工程任务要求大型语言模型(LLMs)能够通过多轮工具交互,高效地从复杂代码库中导航并提取信息。现有方法面临显著局限:无需训练的上下文学习方法难以有效指导智能体基于环境反馈进行工具利用与决策,而基于训练的方法通常依赖从更大规模LLMs进行成本高昂的知识蒸馏,这在企业环境中引入了数据合规性问题。为应对这些挑战,我们提出了RepoSearch-R1,一种由蒙特卡洛树搜索(MCTS)驱动的新型智能体强化学习框架。该方法使智能体能够通过自训练生成多样化、高质量推理轨迹,无需模型蒸馏或外部监督。基于RepoSearch-R1,我们构建了专门针对仓库问答任务设计的RepoQA-Agent。在仓库问答任务上的综合评估表明,RepoSearch-R1在答案完整性方面取得显著提升:相比无检索方法提升16.0%,优于迭代检索方法19.5%,且相较于通用智能体强化学习方法训练效率提高33%。我们的冷启动训练方法消除了数据合规性顾虑,同时在仓库级推理任务中保持了强大的探索多样性与答案完整性。