While Dense Retrieval Models (DRMs) have advanced Information Retrieval (IR), one limitation of these neural models is their narrow generalizability and robustness. To cope with this issue, one can leverage the Mixture-of-Experts (MoE) architecture. While previous IR studies have incorporated MoE architectures within the Transformer layers of DRMs, our work investigates an architecture that integrates a single MoE block (SB-MoE) after the output of the final Transformer layer. Our empirical evaluation investigates how SB-MoE compares, in terms of retrieval effectiveness, to standard fine-tuning. In detail, we fine-tune three DRMs (TinyBERT, BERT, and Contriever) across four benchmark collections with and without adding the MoE block. Moreover, since MoE showcases performance variations with respect to its parameters (i.e., the number of experts), we conduct additional experiments to investigate this aspect further. The findings show the effectiveness of SB-MoE especially for DRMs with a low number of parameters (i.e., TinyBERT), as it consistently outperforms the fine-tuned underlying model on all four benchmarks. For DRMs with a higher number of parameters (i.e., BERT and Contriever), SB-MoE requires larger numbers of training samples to yield better retrieval performance.
翻译:尽管稠密检索模型(DRMs)推动了信息检索(IR)领域的发展,这些神经模型的一个局限在于其泛化能力和鲁棒性较为有限。为应对这一问题,可采用专家混合(MoE)架构。先前的研究已将MoE架构集成于DRMs的Transformer层中,而本研究探索了一种在最终Transformer层输出后集成单一MoE块(SB-MoE)的架构。我们通过实证评估,比较了SB-MoE与标准微调在检索效能上的差异。具体而言,我们在四个基准数据集上对三种DRMs(TinyBERT、BERT和Contriever)进行了微调,分别测试了添加与不添加MoE块的效果。此外,鉴于MoE性能会因其参数(即专家数量)不同而产生变化,我们进行了额外实验以深入探究这一特性。研究结果表明,SB-MoE对参数量较少的DRMs(如TinyBERT)尤为有效,其在全部四个基准测试中均持续优于经过微调的基线模型。对于参数量较大的DRMs(如BERT和Contriever),SB-MoE需要更多训练样本才能获得更优的检索性能。