Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.
翻译:与大型语言模型的多轮交互通常会在对话历史中保留助手先前的回复。本研究重新审视了这一设计选择,探讨大型语言模型是否受益于以其自身历史回复为条件。通过分析真实场景中的多轮对话,我们比较了标准(全上下文)提示与仅用户轮次提示方法——后者省略所有先前的助手回复,涵盖三个开放推理模型和一个最先进模型。令人惊讶的是,我们发现移除助手历史回复对大部分轮次的响应质量没有影响。省略助手侧历史可将累计上下文长度减少高达10倍。为解释这一现象,我们发现多轮对话中存在大量(36.4%)自包含提示,且许多后续提示仅通过当前用户轮次及先前用户轮次即可提供充足解答信息。在分析仅用户轮次提示显著优于全上下文的情况时,我们发现了上下文污染现象:模型过度依赖其先前回复,导致错误、幻觉或风格伪影在多轮对话中传播。基于这些发现,我们设计了一种选择性省略助手侧上下文的情境过滤方法。研究结果表明,选择性省略助手历史可在降低内存消耗的同时提升响应质量。