Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word addition, deletion, and modification of sentences may cause flipped predictions. To alleviate this problem, we propose a novel Dual Attention Enhanced BERT (DABERT) to enhance the ability of BERT to capture fine-grained differences in sentence pairs. DABERT comprises (1) Dual Attention module, which measures soft word matches by introducing a new dual channel alignment mechanism to model affinity and difference attention. (2) Adaptive Fusion module, this module uses attention to learn the aggregation of difference and affinity features, and generates a vector describing the matching details of sentence pairs. We conduct extensive experiments on well-studied semantic matching and robustness test datasets, and the experimental results show the effectiveness of our proposed method.
翻译:基于Transformer的预训练语言模型(如BERT)在语义句子匹配任务中取得了显著成果。然而,现有模型在捕捉细微差异方面仍存在能力不足的问题。诸如单词增删或句子修改等微小噪声可能导致预测结果反转。针对该问题,我们提出了一种新型双重注意力增强型BERT(DABERT),旨在提升BERT对句子对中细粒度差异的捕捉能力。DABERT包含:(1) 双重注意力模块,通过引入新的双通道对齐机制来分别建模亲和度注意力与差异注意力,从而实现软词匹配的量化;(2) 自适应融合模块,该模块利用注意力机制学习差异特征与亲和特征的聚合,并生成描述句子对匹配细节的向量。我们在经典语义匹配数据集及鲁棒性测试数据集上进行了大量实验,实验结果验证了所提方法的有效性。