Differentiable Search Index (DSI) utilizes Pre-trained Language Models (PLMs) for efficient document retrieval without relying on external indexes. However, DSIs need full re-training to handle updates in dynamic corpora, causing significant computational inefficiencies. We introduce PromptDSI, a rehearsal-free, prompt-based approach for instance-wise incremental learning in document retrieval. PromptDSI attaches prompts to the frozen PLM's encoder of DSI, leveraging its powerful representation to efficiently index new corpora while maintaining a balance between stability and plasticity. We eliminate the initial forward pass of prompt-based continual learning methods that doubles training and inference time. Moreover, we propose a topic-aware prompt pool that employs neural topic embeddings as fixed keys. This strategy ensures diverse and effective prompt usage, addressing the challenge of parameter underutilization caused by the collapse of the query-key matching mechanism. Our empirical evaluations demonstrate that PromptDSI matches IncDSI in managing forgetting while significantly enhancing recall by over 4% on new corpora.
翻译:可微搜索索引(DSI)利用预训练语言模型(PLM)实现高效文档检索,无需依赖外部索引。然而,DSI需要完全重新训练以处理动态语料库的更新,导致显著的计算效率低下。我们提出了PromptDSI,一种基于提示的无预演方法,用于文档检索中的实例级增量学习。PromptDSI将提示附加到DSI冻结的PLM编码器上,利用其强大的表示能力高效索引新语料库,同时在稳定性和可塑性之间保持平衡。我们消除了基于提示的持续学习方法中使训练和推理时间加倍的前向初始传递。此外,我们提出了一种主题感知提示池,该池使用神经主题嵌入作为固定键。该策略确保了提示的多样化和有效使用,解决了由查询-键匹配机制崩溃引起的参数利用不足的挑战。我们的实证评估表明,PromptDSI在管理遗忘方面与IncDSI相当,同时在新语料库上的召回率显著提高了4%以上。