Human cognition operates through two complementary modes: fast intuitive thinking and slow deliberate thinking. Vanilla large language models (LLMs) predominantly follow the fast-thinking paradigm, producing immediate responses; while recent large reasoning models (LRMs) adopt slow-thinking strategies, generating detailed reasoning chains before arriving at answers. While LRMs often achieve higher accuracy, this comes at the cost of substantially increased token usage. To address this efficiency-accuracy trade-off, we propose OThink-R1, a hybrid reasoning framework that integrates both modes within a single LRM and enables automatic mode switching based on problem characteristics. We first identify three major patterns of essential and redundant reasoning trajectories in LRMs, which guide the design of an auxiliary LLM-based judge that adaptively determines when slow thinking is necessary. Leveraging the judge's decisions, we construct a hybrid fine-tuning dataset by pruning redundant reasoning to produce fast-thinking samples and retaining complete reasoning for slow-thinking samples. This dataset is then used to fine-tune LRMs, equipping them with inherent autonomous mode-selection capabilities. Extensive experiments on mathematical and question-answering benchmarks show that OThink-R1 reduces reasoning token usage significantly while maintaining competitive accuracy. The code is available at https://github.com/AgenticIR-Lab/OThink-R1.
翻译:人类认知通过两种互补的模式运作:快速的直觉思维和缓慢的审慎思维。普通的大语言模型主要遵循快思维范式,产生即时响应;而近期的大推理模型则采用慢思维策略,在得出答案前生成详细的推理链。虽然大推理模型通常能获得更高的准确性,但这以显著增加的令牌使用量为代价。为了解决这种效率与准确性的权衡,我们提出了OThink-R1,一种混合推理框架,它将两种模式集成在单一的大推理模型中,并能根据问题特征自动切换模式。我们首先识别了大推理模型中必要与冗余推理轨迹的三种主要模式,这指导了一种基于辅助大语言模型的判断器的设计,该判断器能自适应地决定何时需要慢思维。利用判断器的决策,我们通过剪枝冗余推理以生成快思维样本,并保留完整推理作为慢思维样本,从而构建了一个混合微调数据集。该数据集随后被用于对大推理模型进行微调,使其具备内在的自主模式选择能力。在数学和问答基准上的大量实验表明,OThink-R1在保持竞争力的准确性的同时,显著减少了推理令牌的使用量。代码可在 https://github.com/AgenticIR-Lab/OThink-R1 获取。