Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue systems. However, the inevitable errors from automatic speech recognition (ASR) usually impair the understanding performance and lead to error propagation. Although there are some attempts to address this problem through contrastive learning, they (1) treat clean manual transcripts and ASR transcripts equally without discrimination in fine-tuning; (2) neglect the fact that the semantically similar pairs are still pushed away when applying contrastive learning; (3) suffer from the problem of Kullback-Leibler (KL) vanishing. In this paper, we propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL), a novel framework for improving ASR robustness in SLU. Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models. We also introduce a distance polarization regularizer to avoid pushing away the intra-cluster pairs as much as possible. Moreover, we use a cyclical annealing schedule to mitigate KL vanishing issue. Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.
翻译:口语理解(SLU)是面向任务对话系统中的基础任务。然而,自动语音识别(ASR)产生的不可避免的错误通常会损害理解性能并导致错误传播。尽管已有研究尝试通过对比学习解决该问题,但现有方法存在以下不足:(1) 在微调阶段,将干净人工转录文本与ASR转录文本同等处理,未加区分;(2) 忽略了应用对比学习时语义相似对仍被推离的事实;(3) 面临Kullback-Leibler(KL)散度消失问题。本文提出了一种新型框架——互学习与大间隔对比学习(ML-LMCL),用于提升SLU中ASR鲁棒性。具体而言,在微调阶段,我们应用互学习机制,分别基于人工转录文本和ASR转录文本训练两个SLU模型,旨在使这两个模型迭代地共享知识。同时,引入距离极化正则化项,尽可能避免推离簇内样本对。此外,采用循环退火调度机制缓解KL散度消失问题。在三个数据集上的实验表明,ML-LMCL优于现有模型,并取得了新的最佳性能。