Search agents, which integrate language models (LMs) with web search, are becoming crucial for answering complex user queries. Constructing training datasets for deep research tasks, involving multi-step retrieval and reasoning, remains challenging due to expensive human annotation, or cumbersome prerequisites. In this work, we introduce ORBIT, a training dataset with 20K reasoning-intensive queries with short verifiable answers, generated using a frugal framework without relying on paid API services. The modular framework relies on four stages: seed creation, question--answer pair generation, and two stages of verification: self and external. ORBIT spans 15 domains and each training pair requires 4--5 reasoning steps, with external search verification required from the complete web. We train Qwen3-4B as the base model on ORBIT using GRPO and evaluate it on Wikipedia question answering tasks. Extensive experiment results demonstrate that ORBIT-4B achieves strong performance among sub-4B LLMs as search agents, proving the utility of synthetic datasets. Our framework, code and datasets are open-sourced and available publicly.
翻译:搜索智能体通过将语言模型(LMs)与网络搜索相结合,在回答复杂用户查询方面正变得日益重要。为深度研究任务构建训练数据集(涉及多步检索与推理)仍面临挑战,原因在于昂贵的人工标注或繁琐的前置条件。在本工作中,我们提出ORBIT,这是一个包含2万个具有短可验证答案的推理密集型查询的训练数据集,其生成过程采用一种无需依赖付费API服务的节俭框架。该模块化框架包含四个阶段:种子创建、问答对生成,以及两个验证阶段:自我验证与外部验证。ORBIT涵盖15个领域,每个训练对需要4-5个推理步骤,并需通过完整网络进行外部搜索验证。我们以Qwen3-4B为基座模型,在ORBIT上使用GRPO进行训练,并在维基百科问答任务上评估其性能。大量实验结果表明,ORBIT-4B作为搜索智能体在4B以下参数规模的LLM中取得了优异性能,证明了合成数据集的实用性。我们的框架、代码及数据集均已开源并公开提供。