Recursive Think-Answer Process for LLMs and VLMs

Think-Answer reasoners such as DeepSeek-R1 have made notable progress by leveraging interpretable internal reasoning. However, despite the frequent presence of self-reflective cues like "Oops!", they remain vulnerable to output errors during single-pass inference. To address this limitation, we propose an efficient Recursive Think-Answer Process (R-TAP) that enables models to engage in iterative reasoning cycles and generate more accurate answers, going beyond conventional single-pass approaches. Central to this approach is a confidence generator that evaluates the certainty of model responses and guides subsequent improvements. By incorporating two complementary rewards-Recursively Confidence Increase Reward and Final Answer Confidence Reward-we show that R-TAP-enhanced models consistently outperform conventional single-pass methods for both large language models (LLMs) and vision-language models (VLMs). Moreover, by analyzing the frequency of "Oops"-like expressions in model responses, we find that R-TAP-applied models exhibit significantly fewer self-reflective patterns, resulting in more stable and faster inference-time reasoning. We hope R-TAP pave the way evolving into efficient and elaborated methods to refine the reasoning processes of future AI.

翻译：诸如DeepSeek-R1等思考-应答推理器通过利用可解释的内部推理机制已取得显著进展。然而，尽管其输出中频繁出现"Oops!"等自我反思提示，这些模型在单次推理过程中仍易产生输出错误。为克服此局限，我们提出一种高效的递归思考-应答过程（R-TAP），使模型能够进行迭代推理循环并生成更准确的答案，从而超越传统的单次推理方法。该方法的核心理念是通过置信度生成器评估模型响应的确定性，并指导后续改进。通过引入两种互补奖励机制——递归置信度增长奖励与最终答案置信度奖励——我们证明经R-TAP增强的模型在大型语言模型（LLMs）和视觉语言模型（VLMs）任务中均持续优于传统单次推理方法。此外，通过分析模型响应中"Oops"类表达的出现频率，我们发现应用R-TAP的模型展现出显著减少的自我反思模式，从而实现更稳定、更快速的推理过程。我们期待R-TAP能为开发高效精细的推理优化方法开辟道路，推动未来人工智能推理过程的演进。

相关内容

TAP

关注 819

ACM应用感知TAP(ACM Transactions on Applied Perception)旨在通过发表有助于统一这些领域研究的高质量论文来增强计算机科学与心理学/感知之间的协同作用。该期刊发表跨学科研究，在跨计算机科学和感知心理学的任何主题领域都具有重大而持久的价值。所有论文都必须包含感知和计算机科学两个部分。主题包括但不限于：视觉感知：计算机图形学，科学/数据/信息可视化，数字成像，计算机视觉，立体和3D显示技术。听觉感知：听觉显示和界面，听觉听觉编码，空间声音，语音合成和识别。触觉：触觉渲染，触觉输入和感知。感觉运动知觉：手势输入，身体运动输入。感官感知：感官整合，多模式渲染和交互。官网地址：http://dblp.uni-trier.de/db/journals/tap/

从感知到推理：深度思考赋能多模态大语言模型

专知会员服务

26+阅读 · 2025年11月19日

强化学习遇见大语言模型：贯穿 LLM 生命周期的进展与应用综述

专知会员服务

39+阅读 · 2025年9月23日

大语言模型推理系统综述

专知会员服务

30+阅读 · 2025年7月1日

结合知识增强的大型语言模型复杂问题求解综述

专知会员服务

16+阅读 · 2025年5月7日