Masked auto-encoder pre-training has emerged as a prevalent technique for initializing and enhancing dense retrieval systems. It generally utilizes additional Transformer decoder blocks to provide sustainable supervision signals and compress contextual information into dense representations. However, the underlying reasons for the effectiveness of such a pre-training technique remain unclear. The usage of additional Transformer-based decoders also incurs significant computational costs. In this study, we aim to shed light on this issue by revealing that masked auto-encoder (MAE) pre-training with enhanced decoding significantly improves the term coverage of input tokens in dense representations, compared to vanilla BERT checkpoints. Building upon this observation, we propose a modification to the traditional MAE by replacing the decoder of a masked auto-encoder with a completely simplified Bag-of-Word prediction task. This modification enables the efficient compression of lexical signals into dense representations through unsupervised pre-training. Remarkably, our proposed method achieves state-of-the-art retrieval performance on several large-scale retrieval benchmarks without requiring any additional parameters, which provides a 67% training speed-up compared to standard masked auto-encoder pre-training with enhanced decoding.
翻译:掩码自编码器预训练已成为初始化与增强密集检索系统的流行技术。该技术通常利用额外的Transformer解码器模块提供可持续的监督信号,并将上下文信息压缩至密集表示中。然而,此类预训练技术有效性的根本原因仍不明确。额外Transformer解码器的使用也带来了显著的计算成本。本研究旨在阐明这一问题,揭示相较于原始BERT检查点,采用增强解码的掩码自编码器(MAE)预训练显著提升了输入令牌在密集表示中的词覆盖范围。基于此发现,我们提出对传统MAE的改进,将掩码自编码器的解码器替换为完全简化的词袋预测任务。这一改进通过无监督预训练实现了词汇信号向密集表示的高效压缩。值得注意的是,我们的方法在多个大规模检索基准上实现了最先进的检索性能,且无需任何额外参数,相比采用增强解码的标准掩码自编码器预训练,训练速度提升了67%。