Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.
翻译:语音增强被证明可有效降低含噪语音信号中的噪声以辅助下游自动语音识别任务,其中采用多任务学习策略实现这两个任务的联合优化。然而,通过语音增强目标学习到的增强语音并不总能产生良好的自动语音识别结果。从优化角度看,语音增强与自动语音识别任务的梯度间有时存在干扰,这种干扰会阻碍多任务学习并最终导致次优的自动语音识别性能。本文提出一种名为梯度修复的简单有效方法,从角度和幅度两个维度解决噪声鲁棒语音识别中任务梯度间的干扰问题。具体而言,我们首先将语音增强任务的梯度投影到与自动语音识别梯度呈锐角的动态曲面上,以消除两者间的冲突并辅助自动语音识别优化。此外,我们自适应地重新缩放两个梯度的幅度,防止主导的自动语音识别任务被语音增强梯度误导。实验结果表明,所提方法能有效解决梯度干扰问题,在RATS和CHiME-4数据集上分别相较于多任务学习基线实现9.3%和11.1%的相对词错误率降低。我们的代码已在GitHub开源。