Dialogue response selection aims to select an appropriate response from several candidates based on a given user and system utterance history. Most existing works primarily focus on post-training and fine-tuning tailored for cross-encoders. However, there are no post-training methods tailored for dense encoders in dialogue response selection. We argue that when the current language model, based on dense dialogue systems (such as BERT), is employed as a dense encoder, it separately encodes dialogue context and response, leading to a struggle to achieve the alignment of both representations. Thus, we propose Dial-MAE (Dialogue Contextual Masking Auto-Encoder), a straightforward yet effective post-training technique tailored for dense encoders in dialogue response selection. Dial-MAE uses an asymmetric encoder-decoder architecture to compress the dialogue semantics into dense vectors, which achieves better alignment between the features of the dialogue context and response. Our experiments have demonstrated that Dial-MAE is highly effective, achieving state-of-the-art performance on two commonly evaluated benchmarks.
翻译:对话回复选择旨在根据给定的用户与系统对话历史,从多个候选中选取合适的回复。现有工作主要集中于针对交叉编码器设计的后训练与微调方法。然而,在对话回复选择领域尚缺乏面向密集编码器的后训练技术。我们认为,当基于密集对话系统(如BERT)的现有语言模型作为密集编码器使用时,其分别对对话上下文和回复进行编码,难以实现两者表示的协同对齐。为此,我们提出Dial-MAE(对话上下文掩码自编码器),一种专为对话回复选择中密集编码器设计的简洁有效的后训练技术。该模型采用非对称编码器-解码器架构,将对话语义压缩为密集向量,从而更好地实现对话上下文与回复特征的对齐。实验表明,Dial-MAE具有显著效果,在两个常用基准测试中均达到了当前最优性能。