Dialogue response selection aims to select an appropriate response from several candidates based on a given user and system utterance history. Recent studies have been improving the accuracy of dialogue response selection through post-training, mostly relying on naive masked language modeling methods. However, the recently developed generative methods have shown promising text representation capabilities in IR community, which could potentially lead to better dialogue semantics modeling. Thus, in this paper, we propose Dial-MAE (Dialogue Contextual Masking Auto-encoder), a straightforward yet effective post-training technique tailored for dialogue response selection. Dial-MAE uses an asymmetric encoder-decoder architecture that learns to better compress the semantics of the dialogue into dialogue-dense vectors. The process of Dial-MAE involves a deep encoder creating a dialogue embedding with the masked dialogue context, followed by a shallow decoder that uses this embedding along with the highly masked response to restore the original response. Our experiments have demonstrated that Dial-MAE is highly effective, achieving state-of-the-art performance on two commonly evaluated benchmarks.
翻译:对话回复选择旨在基于给定的用户和系统对话历史,从多个候选回复中筛选出合适的回复。现有研究多通过后训练提升对话回复选择的准确性,且主要依赖简单的掩码语言建模方法。然而,信息检索领域最新发展的生成式方法展现出了卓越的文本表征能力,这有望实现更优的对话语义建模。为此,本文提出Dial-MAE(对话上下文掩码自编码器),一种面向对话回复选择的简洁高效的后训练技术。Dial-MAE采用非对称编码器-解码器架构,通过学习将对话语义更高效地压缩为对话稠密向量。其处理流程包括:由深度编码器基于掩码后的对话上下文生成对话嵌入向量,随后由浅层解码器结合该嵌入向量与高度掩码的原始回复,逐步恢复出原始回复。实验表明,Dial-MAE具有显著有效性,在两个通用评估基准上均实现了最先进的性能。