Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps. While prompt-based methods like Chain-of-Thought (CoT) can improve LLM reasoning at inference time, optimizing reasoning capabilities during training remains challenging. We introduce LaTent Reasoning Optimization (LaTRO), a principled framework that formulates reasoning as sampling from a latent distribution and optimizes it via variational approaches. LaTRO enables LLMs to concurrently improve both their reasoning process and ability to evaluate reasoning quality, without requiring external feedback or reward models. We validate LaTRO through experiments on GSM8K and ARC-Challenge datasets using multiple model architectures. On GSM8K, LaTRO improves zero-shot accuracy by an average of 12.5% over base models and 9.6% over supervised fine-tuning across Phi-3.5-mini, Mistral-7B, and Llama-3.1-8B. Our findings suggest that pre-trained LLMs possess latent reasoning capabilities that can be unlocked and enhanced through our proposed optimization approach in a self-improvement manner. The code of LaTRO is available at \url{https://github.com/SalesforceAIResearch/LaTRO}.

翻译：大语言模型（LLMs）已展现出令人印象深刻的能力，但在需要多步骤的复杂推理任务上仍存在困难。虽然基于提示的方法（如思维链）能在推理时提升LLM的表现，但在训练过程中优化其推理能力仍具挑战性。我们提出了潜在推理优化（LaTRO），这是一个将推理形式化为从潜在分布中采样、并通过变分方法进行优化的原则性框架。LaTRO使LLMs能够同步提升其推理过程与评估推理质量的能力，且无需外部反馈或奖励模型。我们通过在GSM8K和ARC-Challenge数据集上使用多种模型架构进行实验验证了LaTRO的有效性。在GSM8K上，LaTRO相较于Phi-3.5-mini、Mistral-7B和Llama-3.1-8B的基础模型，平均零样本准确率提升了12.5%；相较于监督微调方法，平均提升了9.6%。我们的研究结果表明，预训练LLMs具备潜在的推理能力，可通过我们提出的优化方法以自我改进的方式被解锁和增强。LaTRO的代码已公开于 \url{https://github.com/SalesforceAIResearch/LaTRO}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日