Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agent coordination, and inefficient test-time scaling often limit performance. We introduce ReThinker, a confidence-aware agentic framework that orchestrates retrieval, tool use, and multi-agent reasoning through a stage-wise Solver-Critic-Selector architecture. Rather than following a fixed pipeline, ReThinker dynamically allocates computation based on model confidence, enabling adaptive tool invocation, guided multi-dimensional reflection, and robust confidence-weighted selection. To support scalable training without human annotation, we further propose a reverse data synthesis pipeline and an adaptive trajectory recycling strategy that transform successful reasoning traces into high-quality supervision. Experiments on HLE, GAIA, and XBench demonstrate that ReThinker consistently outperforms state-of-the-art foundation models with tools and existing deep research systems, achieving state-of-the-art results on expert-level reasoning tasks.
翻译:专家级科学推理对于大型语言模型而言仍然具有挑战性,尤其是在诸如“人类终极考试”(HLE)等基准测试中,僵化的工具流程、脆弱的多智能体协调以及低效的测试时扩展常常限制了性能表现。我们提出了ReThinker,一个置信度感知的智能体框架,它通过分阶段的求解器-评判器-选择器架构,协调检索、工具使用和多智能体推理。ReThinker并非遵循固定的流程,而是根据模型置信度动态分配计算资源,从而实现自适应的工具调用、引导性的多维度反思以及鲁棒的置信度加权选择。为了支持无需人工标注的可扩展训练,我们进一步提出了一种反向数据合成流程和一种自适应轨迹回收策略,将成功的推理轨迹转化为高质量的监督数据。在HLE、GAIA和XBench上的实验表明,ReThinker在使用工具的情况下持续优于最先进的基座模型和现有的深度研究系统,在专家级推理任务上取得了最先进的结果。