Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.
翻译:语音增强被证明能有效降低含噪语音信号中的噪声,从而提升下游自动语音识别性能,其中多任务学习策略被用于联合优化这两个任务。然而,通过SE目标学习到的增强语音未必总能带来良好的ASR结果。从优化角度来看,SE和ASR任务的梯度之间有时存在干扰,这可能会阻碍多任务学习,最终导致ASR性能次优。本文提出一种简单而有效的方法,称为梯度补救,用于从角度和幅度两个维度解决噪声鲁棒语音识别中任务梯度之间的干扰问题。具体来说,我们首先将SE任务的梯度投影到与ASR梯度成锐角的动态曲面上,以消除它们之间的冲突并辅助ASR优化。此外,我们自适应地重新缩放两个梯度的幅度,以防止占主导地位的ASR任务被SE梯度误导。实验结果表明,所提出的方法很好地解决了梯度干扰问题,并在RATS和CHiME-4数据集上相比于多任务学习基线分别实现了9.3%和11.1%的相对词错误率降低。我们的代码可在GitHub上获取。