Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs per document and can be served using commodity CPUs. When combined with a BM25 retriever, this approach matches the quality of a state-of-the art dual encoder retriever, that still requires an accelerator for query encoding. We introduce NAIL (Non-Autoregressive Indexing with Language models) as a model architecture that is compatible with recent encoder-decoder and decoder-only large language models, such as T5, GPT-3 and PaLM. This model architecture can leverage existing pre-trained checkpoints and can be fine-tuned for efficiently constructing document representations that do not require neural processing of queries.
翻译:神经文档重排序器在准确性方面极为有效。然而,最佳模型需要专用硬件进行服务,这既昂贵又往往不切实际。为避免这一服务时的需求,我们提出了一种方法,通过词法化评分函数捕获Transformer交叉注意力模型高达86%的性能增益,该函数每文档仅需Transformer的10^-6% FLOPs,且可在商用CPU上提供服务。当与BM25检索器结合时,该方法可匹配最先进的双编码器检索器(后者仍需要加速器进行查询编码)的质量。我们引入NAIL(基于语言模型的非自回归索引)作为模型架构,该架构兼容最近的编码器-解码器及仅解码器大型语言模型(如T5、GPT-3和PaLM)。此模型架构可利用现有预训练检查点,并可微调以高效构建无需对查询进行神经处理的文档表示。