The majority of prior work on information retrieval (IR) assumes that the corpus is static, whereas in the real world, the documents are continually updated. In this paper, we incorporate often overlooked dynamic nature of knowledge into the retrieval systems. Our work treats retrieval not as static archives but as dynamic knowledge bases better aligned with real-world environments. We conduct a comprehensive evaluation of dual encoders and generative retrieval, utilizing the StreamingQA benchmark designed for the temporal knowledge updates. Our initial results show that while generative retrieval outperforms dual encoders in static settings, the opposite is true in dynamic settings. Surprisingly, however, when we utilize a parameter-efficient pre-training method to enhance adaptability of generative retrieval to new corpora, our resulting model, Dynamic Generative Retrieval (DynamicGR), exhibits unexpected findings. It (1) efficiently compresses new knowledge in their internal index, attaining a remarkable storage capacity due to its fully parametric architecture and (2) outperforms dual encoders not only in static settings but in dynamic scenarios with a 5% margin in hit@5, requiring 4 times less training time.
翻译:信息检索(IR)领域的大部分先前工作假设语料库是静态的,然而在现实世界中,文档是持续更新的。在本文中,我们将常被忽视的知识动态特性融入检索系统。我们的工作将检索不仅视为静态档案,而是作为更贴合现实环境的动态知识库进行处理。我们利用专为时间知识更新设计的StreamingQA基准,对双编码器和生成式检索进行了全面评估。初步结果表明,尽管生成式检索在静态设置中优于双编码器,但在动态设置中情况恰恰相反。然而,令人惊讶的是,当我们采用参数高效预训练方法增强生成式检索对新语料库的适应性时,我们得到的模型——动态生成式检索(DynamicGR)——展现出意想不到的结果。它(1)能够高效地将新知识压缩到其内部索引中,由于其全参数化架构,实现了显著的存储容量;(2)不仅在静态设置中优于双编码器,在动态场景中命中率@5也高出5%,且训练时间减少4倍。