思维链至关重要：通过推理路径监督提升长上下文语言模型性能 (Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision)

Recent advances in Large Language Models (LLMs) have highlighted the challenge of handling long-context tasks, where models need to reason over extensive input contexts to aggregate target information. While Chain-of-Thought (CoT) prompting has shown promise for multi-step reasoning, its effectiveness for long-context scenarios remains underexplored. Through systematic investigation across diverse tasks, we demonstrate that CoT's benefits generalize across most long-context scenarios and amplify with increasing context length. Motivated by this critical observation, we propose LongRePS, a process-supervised framework that teaches models to generate high-quality reasoning paths for enhanced long-context performance. Our framework incorporates a self-sampling mechanism to bootstrap reasoning paths and a novel quality assessment protocol specifically designed for long-context scenarios. Experimental results on various long-context benchmarks demonstrate the effectiveness of our approach, achieving significant improvements over outcome supervision baselines on both in-domain tasks (+13.6/+3.8 points for LLaMA/Qwen on MuSiQue) and cross-domain generalization (+9.3/+8.1 points on average across diverse QA tasks). Our code, data and trained models are made public to facilitate future research.

翻译：大型语言模型（LLM）的最新进展凸显了处理长上下文任务的挑战，此类任务需要模型在大量输入上下文中进行推理以整合目标信息。尽管思维链（CoT）提示方法在多步推理中展现出潜力，但其在长上下文场景中的有效性仍未得到充分探索。通过对多种任务的系统性研究，我们证明CoT的益处可推广至大多数长上下文场景，且其效果随上下文长度增加而增强。基于这一关键发现，我们提出LongRePS——一种过程监督框架，通过教导模型生成高质量推理路径来提升长上下文性能。该框架包含用于自举推理路径的自采样机制，以及专为长上下文场景设计的新型质量评估协议。在多个长上下文基准测试上的实验结果表明，我们的方法在领域内任务（LLaMA/Qwen在MuSiQue上分别提升+13.6/+3.8分）和跨领域泛化任务（在多样化QA任务上平均提升+9.3/+8.1分）中均显著优于结果监督基线。我们已公开代码、数据及训练模型以促进后续研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日