Multi-turn conversation has emerged as a predominant interaction paradigm for Large Language Models (LLMs). Users often employ follow-up questions to refine their intent, expecting LLMs to adapt dynamically. However, recent research reveals that LLMs suffer a substantial performance drop in multi-turn settings compared to single-turn interactions with fully specified instructions, a phenomenon termed ``Lost in Conversation'' (LiC). While this prior work attributes LiC to model unreliability, we argue that the root cause lies in an intent alignment gap rather than intrinsic capability deficits. In this paper, we first demonstrate that LiC is not a failure of model capability but rather a breakdown in interaction between users and LLMs. We theoretically show that scaling model size or improving training alone cannot resolve this gap, as it arises from structural ambiguity in conversational context rather than representational limitations. To address this, we propose to decouple intent understanding from task execution through a Mediator-Assistant architecture. By utilizing an experience-driven Mediator to explicate user inputs into explicit, well-structured instructions based on historical interaction patterns, our approach effectively bridges the gap between vague user intent and model interpretation. Experimental results demonstrate that this method significantly mitigates performance degradation in multi-turn conversations across diverse LLMs.
翻译:多轮对话已成为大语言模型(LLMs)的主要交互范式。用户常通过追问来细化意图,期望LLMs能动态适应。然而,近期研究表明,与包含完整指令的单轮交互相比,LLMs在多轮场景中性能显著下降,该现象被称为“对话迷失”(LiC)。现有研究将LiC归因于模型不可靠性,我们认为其根本原因在于意图对齐差距而非内在能力缺陷。本文首先论证LiC并非模型能力失效,而是用户与LLMs交互过程中的断层。我们通过理论分析表明,仅扩大模型规模或改进训练无法解决此差距,因其源于对话语境的结构性歧义而非表征能力局限。为此,我们提出通过“中介-助理”架构将意图理解与任务执行解耦。该方法利用经验驱动型中介器,基于历史交互模式将用户输入解析为明确、结构化的指令,从而有效弥合模糊用户意图与模型解读之间的鸿沟。实验结果表明,该方法能显著缓解多种LLMs在多轮对话中的性能退化现象。