Modern conversational agents condition on an ever-growing dialogue history at each turn, incurring redundant attention and encoding costs that grow with conversation length. Naive truncation or summarization degrades fidelity, while existing context compressors lack cross-turn memory sharing or revision, causing information loss and compounding errors in long dialogues. We revisit the context compression under conversational dynamics and empirically present its fragility. To improve both efficiency and robustness, we introduce Context-Driven Incremental Compression (C-DIC), which treats a conversation as interleaved contextual threads and stores revisable per-thread compression states in a single, compact dialogue memory. At each turn, a lightweight retrieve, revise, and write-back loop shares information across turns and updates stale memories, stabilizing long-horizon behavior. In addition, we adapt truncated backpropagation-through-time (TBPTT) to our multi-turn setting, learning cross-turn dependencies without full-history backpropagation. Extensive experiments on long-form dialogue benchmarks demonstrate superior performance and efficiency of C-DIC; notably, C-DIC shows stable inference latency and perplexity over hundreds of dialogue turns, supporting a scalable path to high-quality dialogue modeling.
翻译:现代对话智能体在每轮交互中都会处理不断增长的对话历史,导致注意力机制和编码成本随对话长度同步增长。简单的截断或总结方法会损害生成保真度,而现有上下文压缩器缺乏跨轮次记忆共享或修正机制,在长对话中会导致信息丢失和误差累积。我们重新审视对话动态中的上下文压缩问题,并通过实验揭示了其脆弱性。为兼顾效率与鲁棒性,我们提出上下文驱动增量压缩(C-DIC),该方法将对话视为交织的上下文线程,并在单一紧凑的对话存储器中保存每线程可修正的压缩状态。每轮交互中,轻量级的"检索-修正-回写"循环机制实现跨轮信息共享并更新陈旧记忆,从而稳定长期行为。此外,我们还将截断式时间反向传播(TBPTT)适配至多轮对话场景,在不依赖完整历史反向传播的条件下学习跨轮依赖关系。在长对话基准上的大量实验表明,C-DIC在性能与效率上均表现优异;值得注意的是,C-DIC在数百轮对话中仍能保持稳定的推理延迟与困惑度,为高质量对话建模提供了可扩展的路径。