Central to many self-improvement pipelines for large language models (LLMs) is the assumption that models can improve by reflecting on past mistakes. We study a phenomenon termed contextual drag: the presence of failed attempts in the context biases subsequent generations toward structurally similar errors. Across evaluations of 11 proprietary and open-weight models on 8 reasoning tasks, contextual drag induces 10-20% performance drops, and iterative self-refinement in models with severe contextual drag can collapse into self-deterioration. Structural analysis using tree edit distance reveals that subsequent reasoning trajectories inherit structurally similar error patterns from the context. We demonstrate that neither external feedback nor successful self-verification suffices to eliminate this effect. While mitigation strategies such as fallback-behavior fine-tuning and context denoising yield partial improvements, they fail to fully restore baseline performance, positioning contextual drag as a persistent failure mode in current reasoning architectures.
翻译:许多大语言模型(LLM)自我改进流程的核心假设是,模型能够通过反思过去的错误实现提升。我们研究了一种称为“上下文拖累”的现象:上下文中存在的失败尝试会使其后续生成结果偏向结构相似的错误。通过对11个专有模型和开源权重模型在8项推理任务上的评估,上下文拖累导致性能下降10-20%,且在存在严重上下文拖累的模型中,迭代式自我精炼可能退化为自我恶化。基于树编辑距离的结构分析表明,后续推理轨迹会从上下文中继承结构相似的错误模式。我们证明,无论是外部反馈还是成功的自我验证都不足以消除这种效应。尽管缓解策略(如回退行为微调和上下文去噪)能带来部分改进,但均无法完全恢复基线性能,这表明上下文拖累是当前推理架构中一种持续存在的失效模式。