Large reasoning models typically follow a read-then-think paradigm: they observe the complete input, reason over a static context, and then produce the answer. Yet many real-world scenarios are inherently dynamic, such as audio and video stream, where information arrives as a continuous stream and models must reason, update, and respond under partial observations. Recent streaming reasoning methods allow models to think while reading, but they largely rely on supervised imitation of pre-constructed trajectories, which limits their flexibility. In this paper, we propose AdaSR, an adaptive streaming reasoning framework that enables models to reason during input streaming and perform final deliberation once the stream is complete, learning when to think, and how much computation to allocate across different stages. To optimize this hierarchical reasoning process, we introduce Hierarchical Relative Policy Optimization (HRPO), which decomposes policy optimization into streaming reasoning and deep reasoning phases, providing more fine-grained advantage assignment instead of uniformly distributing a single sequence-level advantage over all tokens. HRPO integrates format, accuracy, and adaptive thinking rewards to enforce valid reasoning protocols, preserve final task performance, and encourage latency-aware computation allocation. Experiments show that AdaSR achieves a better balance among reasoning accuracy, computational efficiency, and streaming latency compared with supervised fine-tuning baseline. We release our code at https://github.com/EIT-NLP/StreamingLLM/tree/main/AdaSR.
翻译:大型推理模型通常遵循“先读取后思考”范式:它们先观察完整输入,基于静态上下文进行推理,再生成答案。然而,许多实际场景(如音频和视频流)本质上是动态的——信息以连续流形式到达,模型必须在部分观测条件下进行推理、更新和响应。近期流推理方法允许模型边读取边思考,但大多依赖对预构建轨迹的监督模仿,限制了灵活性。本文提出AdaSR——一种自适应流推理框架,使模型能在输入流式传输过程中进行推理,并在流结束后执行最终深思,自主学习何时思考、如何在不同阶段分配计算资源。为优化这一分层推理过程,我们引入分层相对策略优化(HRPO),将策略优化分解为流推理与深度推理两个阶段,提供更细粒度的优势赋值(而非将单一序列级优势均匀分配给所有词元)。HRPO融合格式奖励、准确率奖励与自适应思考奖励,以强制有效推理协议、保持最终任务性能、并鼓励延迟感知的计算分配。实验表明,与监督微调基线相比,AdaSR在推理准确率、计算效率与流延迟之间实现了更优平衡。我们已在https://github.com/EIT-NLP/StreamingLLM/tree/main/AdaSR 公开代码。