To deliver coherent and personalized experiences in long-term conversations, existing approaches typically perform retrieval augmented response generation by constructing memory banks from conversation history at either the turn-level, session-level, or through summarization techniques.In this paper, we present two key findings: (1) The granularity of memory unit matters: turn-level, session-level, and summarization-based methods each exhibit limitations in both memory retrieval accuracy and the semantic quality of the retrieved content. (2) Prompt compression methods, such as LLMLingua-2, can effectively serve as a denoising mechanism, enhancing memory retrieval accuracy across different granularities. Building on these insights, we propose SeCom, a method that constructs the memory bank at segment level by introducing a conversation segmentation model that partitions long-term conversations into topically coherent segments, while applying compression based denoising on memory units to enhance memory retrieval. Experimental results show that SeCom exhibits a significant performance advantage over baselines on long-term conversation benchmarks LOCOMO and Long-MT-Bench+. Additionally, the proposed conversation segmentation method demonstrates superior performance on dialogue segmentation datasets such as DialSeg711, TIAGE, and SuperDialSeg.
翻译:为实现长期对话中的连贯性与个性化体验,现有方法通常通过从对话历史构建记忆库来进行检索增强的响应生成,其构建粒度可分为轮次级、会话级或基于摘要技术。本文提出两个关键发现:(1) 记忆单元的粒度至关重要:轮次级、会话级和基于摘要的方法在记忆检索准确性和检索内容的语义质量方面均存在局限。(2) 提示压缩方法(如LLMLingua-2)可有效作为去噪机制,提升不同粒度下的记忆检索准确性。基于这些发现,我们提出SeCom方法,该方法在片段级别构建记忆库:通过引入对话分割模型将长期对话划分为主题连贯的片段,同时对记忆单元应用基于压缩的去噪以增强记忆检索。实验结果表明,在长期对话基准测试LOCOMO和Long-MT-Bench+上,SeCom相比基线模型展现出显著的性能优势。此外,所提出的对话分割方法在DialSeg711、TIAGE和SuperDialSeg等对话分割数据集上也表现出优越性能。