Recent retrieval-augmented models enhance basic methods by building a hierarchical structure over retrieved text chunks through recursive embedding, clustering, and summarization. The most relevant information is then retrieved from both the original text and generated summaries. However, such approaches face limitations with dynamic datasets, where adding or removing documents over time complicates the updating of hierarchical representations formed through clustering. We propose a new algorithm to efficiently maintain the recursive-abstractive tree structure in dynamic datasets, without compromising performance. Additionally, we introduce a novel post-retrieval method that applies query-focused recursive abstractive processing to substantially improve context quality. Our method overcomes the limitations of other approaches by functioning as a black-box post-retrieval layer compatible with any retrieval algorithm. Both algorithms are validated through extensive experiments on real-world datasets, demonstrating their effectiveness in handling dynamic data and improving retrieval performance.
翻译:近年来,基于检索增强的模型通过递归嵌入、聚类和摘要技术,在检索文本块上构建层次化结构,从而改进了基础方法。随后从原始文本和生成的摘要中检索最相关信息。然而,此类方法在处理动态数据集时面临局限,因为随时间添加或删除文档会使得通过聚类形成的层次化表示难以更新。我们提出一种新算法,可在不影响性能的前提下,高效维护动态数据集中的递归摘要树结构。此外,我们引入一种新颖的后检索方法,该方法应用面向查询的递归摘要处理以显著提升上下文质量。我们的方法通过作为兼容任意检索算法的黑盒后检索层,克服了其他方法的局限性。两种算法均在真实数据集上通过大量实验验证,证明了其在处理动态数据和提升检索性能方面的有效性。