Search agents, which integrate language models (LMs) with web search, are becoming crucial for answering complex user queries. Constructing training datasets for deep research tasks, involving multi-step retrieval and reasoning, remains challenging due to expensive human annotation, or cumbersome prerequisites. In this work, we introduce ORBIT, a training dataset with 20K reasoning-intensive queries with short verifiable answers, generated using a frugal framework without relying on paid API services. The modular framework relies on four stages: seed creation, question-answer pair generation, and two stages of verification: self and external. ORBIT spans 15 domains and each training pair requires 4-5 reasoning steps, with external search verification required from the complete web. We train Qwen3-4B as the base model on ORBIT using GRPO and evaluate it on Wikipedia question answering tasks. Extensive experiment results demonstrate that ORBIT-4B achieves strong performance among sub-4B LLMs as search agents, proving the utility of synthetic datasets. Our framework, code and datasets are open-sourced and available publicly.
翻译:集成语言模型与网络搜索的搜索代理正成为解答复杂用户查询的关键工具。由于人工标注成本高昂或预处理要求繁琐,构建面向深度研究任务(涉及多步检索与推理)的训练数据集仍面临挑战。本研究提出ORBIT——一个包含2万条推理密集型查询及其简短可验证答案的训练数据集,该数据集采用不依赖付费API服务的经济型框架生成。该模块化框架包含四个阶段:种子生成、问答对生成,以及两个验证阶段:自验证与外部验证。ORBIT覆盖15个领域,每个训练对需要4-5步推理,且需通过完整网络搜索进行外部验证。我们以Qwen3-4B为基座模型,在ORBIT上采用GRPO方法训练,并在维基百科问答任务中进行评估。大量实验结果表明,ORBIT-4B作为搜索代理在参数规模小于4B的大型语言模型中展现出卓越性能,验证了合成数据集的有效性。我们的框架、代码及数据集均已开源并公开可用。